Klientu atbalsts: 27018494

Grāmatu iegāde | Jauns profils | Ienākt

Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning 2nd edition [Mīkstie vāki]

4.20/5 (10 ratings by Goodreads)

Valliappa Lakshmanan

Formāts: Paperback / softback, 446 pages
Izdošanas datums: 30-Apr-2022
Izdevniecība: O'Reilly Media
ISBN-10: 1098118952
ISBN-13: 9781098118952

Citas grāmatas par šo tēmu:

Mīkstie vāki
Cena: 73,03 €*
* ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
Standarta cena: 85,92 €
Ietaupiet 15%
Grāmatu piegādes laiks ir 3-4 nedēļas, ja grāmata ir uz vietas izdevniecības noliktavā. Ja izdevējam nepieciešams publicēt jaunu tirāžu, grāmatas piegāde var aizkavēties.
Daudzums:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Ielikt grozā
Piegādes laiks - 4-6 nedēļas
Pievienot vēlmju sarakstam

Formāts: Paperback / softback, 446 pages
Izdošanas datums: 30-Apr-2022
Izdevniecība: O'Reilly Media
ISBN-10: 1098118952
ISBN-13: 9781098118952

Citas grāmatas par šo tēmu:

Permanent link: https://www.kriso.lv/db/9781098118952.html

Keywords:

Learn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build using Google Cloud Platform (GCP). This hands-on guide shows data engineers and data scientists how to implement an end-to-end data pipeline with cloud native tools on GCP.

Throughout this updated second edition, you'll work through a sample business decision by employing a variety of data science approaches. Follow along by building a data pipeline in your own project on GCP, and discover how to solve data science problems in a transformative and more collaborative way.

You'll learn how to:

Employ best practices in building highly scalable data and ML pipelines on Google Cloud Automate and schedule data ingest using Cloud Run Create and populate a dashboard in Data Studio Build a real-time analytics pipeline using Pub/Sub, Dataflow, and BigQuery Conduct interactive data exploration with BigQuery Create a Bayesian model with Spark on Cloud Dataproc Forecast time series and do anomaly detection with BigQuery ML Aggregate within time windows with Dataflow Train explainable machine learning models with Vertex AI Operationalize ML with Vertex AI Pipelines

Preface

1 Making Better Decisions Based on Data

(28)

Many Similar Decisions

(1)

The Role of Data Scientists

(5)

Scrappy Environment

(1)

Full Stack Cloud Data Scientists

(1)

Collaboration

(1)

Best Practices

(3)

Simple to Complex Solutions

(1)

Cloud Computing

(1)

Serverless

(1)

A Probabilistic Decision

(5)

Probabilistic Approach

(1)

Probability Density Function

(1)

Cumulative Distribution Function

(1)

Choices Made

(4)

Choosing Cloud

(1)

Not a Reference Book

(1)

Getting Started with the Code

(2)

Agile Architecture for Data Science on Google Cloud

(3)

What Is Agile Architecture?

(1)

No-Code, Low-Code

(1)

Use Managed Services

(1)

Summary

(1)

Suggested Resources

(3)

2 Ingesting Data into the Cloud

(52)

Airline On-Time Performance Data

(7)

Knowability

(1)

Causality

(1)

Training-Serving Skew

(1)

Downloading Data

(1)

Hub-and-Spoke Architecture

(1)

Dataset Fields

(2)

Separation of Compute and Storage

(9)

Scaling Up

(2)

Scaling Out with Sharded Data

(2)

Scaling Out with Data-in-Place

(3)

Ingesting Data

(9)

Reverse Engineering a Web Form

(2)

Dataset Download

(2)

Exploration and Cleanup

(1)

Uploading Data to Google Cloud Storage

(4)

Loading Data into Google BigQuery

(8)

Advantages of a Serverless Columnar Database

(2)

Staging on Cloud Storage

(1)

Access Control

(4)

Ingesting CSV Files

(1)

Partitioning

(1)

Scheduling Monthly Downloads

(13)

Ingesting in Python

(6)

Cloud Run

(1)

Securing Cloud Run

(2)

Deploying and Invoking Cloud Run

(1)

Scheduling Cloud Run

(1)

Summary

(1)

Code Break

(1)

Suggested Resources

(3)

3 Creating Compelling Dashboards

(44)

Explain Your Model with Dashboards

(5)

Why Build a Dashboard First?

(2)

Accuracy, Honesty, and Good Design

(2)

Loading Data into Cloud SQL

(8)

Create a Google Cloud SQL Instance

(3)

Create Table of Data

(3)

Interacting with the Database

(1)

Querying Using BigQuery

(5)

Schema Exploration

(1)

Using Preview

(2)

Using Table Explorer

(1)

Creating BigQuery View

100

(1)

Building Our First Model

101

(5)

Contingency Table

101

(2)

Threshold Optimization

103

(3)

Building a Dashboard

106

(13)

Getting Started with Data Studio

107

(2)

Creating Charts

109

(1)

Adding End-User Controls

110

(2)

Showing Proportions with a Pie Chart

112

(5)

Explaining a Contingency Table

117

(2)

Modern Business Intelligence

119

(4)

Digitization

119

(1)

Natural Language Queries

120

(2)

Connected Sheets

122

(1)

Summary

123

(1)

Suggested Resources

123

(2)

4 Streaming Data: Publication and Ingest with Pub/Sub and Dataflow

125

(48)

Designing the Event Feed

126

(7)

Transformations Needed

127

(1)

Architecture

128

(1)

Getting Airport Information

129

(3)

Sharing Data

132

(1)

Time Correction

133

(20)

Apache Beam/Cloud Dataflow

135

(1)

Parsing Airports Data

136

(3)

Adding Time Zone Information

139

(2)

Converting Times to UTC

141

(3)

Correcting Dates

144

(2)

Creating Events

146

(2)

Reading and Writing to the Cloud

148

(2)

Running the Pipeline in the Cloud

150

(3)

Publishing an Event Stream to Cloud Pub/Sub

153

(7)

Speed-Up Factor

154

(1)

Get Records to Publish

155

(1)

How Many Topics?

156

(1)

Iterating Through Records

157

(1)

Building a Batch of Events

158

(1)

Publishing a Batch of Events

159

(1)

Real-Time Stream Processing

160

(9)

Streaming in Dataflow

160

(2)

Windowing a Pipeline

162

(1)

Streaming Aggregation

162

(3)

Using Event Timestamps

165

(1)

Executing the Stream Processing

166

(2)

Analyzing Streaming Data in BigQuery

168

(1)

Real-Time Dashboard

169

(1)

Summary

170

(1)

Suggested Resources

171

(2)

5 Interactive Data Exploration with Vertex AI Workbench

173

(38)

Exploratory Data Analysis

174

(10)

Exploration with SQL

177

(2)

Reading a Query Explanation

179

(5)

Exploratory Data Analysis in Vertex AI Workbench

184

(6)

Jupyter Notebooks

185

(1)

Creating a Notebook

186

(2)

Jupyter Commands

188

(1)

Installing Packages

188

(1)

Jupyter Magic for Google Cloud

189

(1)

Exploring Arrival Delays

190

(14)

Basic Statistics

191

(1)

Plotting Distributions

191

(3)

Quality Control

194

(5)

Arrival Delay Conditioned on Departure Delay

199

(5)

Evaluating the Model

204

(6)

Random Shuffling

204

(1)

Splitting by Date

205

(1)

Training and Testing

206

(4)

Summary

210

(1)

Suggested Resources

210

(1)

6 Bayesian Classifier with Apache Spark on Cloud Dataproc

211

(34)

MapReduce and the Hadoop Ecosystem

211

(3)

How MapReduce Works

212

(2)

Apache Hadoop

214

(1)

Google Cloud Dataproc

214

(7)

Need for Higher-Level Tools

216

(1)

Jobs, Not Clusters

217

(2)

Preinstalling Software

219

(2)

Quantization Using Spark SQL

221

(10)

JupyterLab on Cloud Dataproc

222

(1)

Independence Check Using BigQuery

223

(2)

Spark SQL in JupyterLab

225

(2)

Histogram Equalization

227

(4)

Bayesian Classification

231

(7)

Bayes in Each Bin

231

(2)

Evaluating the Model

233

(1)

Dynamically Resizing Clusters

234

(1)

Comparing to Single Threshold Model

235

(3)

Orchestration

238

(4)

Submitting a Spark Job

238

(1)

Workflow Template

238

(1)

Cloud Composer

239

(1)

Autoscaling

240

(1)

Serverless Spark

241

(1)

Summary

242

(1)

Suggested Resources

243

(2)

7 Logistic Regression Using Spark ML

245

(38)

Logistic Regression

246

(5)

How Logistic Regression Works

246

(3)

Spark ML Library

249

(1)

Getting Started with Spark Machine Learning

250

(1)

Spark Logistic Regression

251

(12)

Creating a Training Dataset

252

(4)

Training the Model

256

(3)

Predicting Using the Model

259

(1)

Evaluating a Model

260

(3)

Feature Engineering

263

(18)

Experimental Framework

263

(4)

Feature Selection

267

(4)

Feature Transformations

271

(3)

Feature Creation

274

(4)

Categorical Variables

278

(2)

Repeatable, Real Time

280

(1)

Summary

281

(1)

Suggested Resources

282

(1)

8 Machine Learning with BigQuery ML

283

(26)

Logistic Regression

283

(7)

Presplit Data

285

(1)

Interrogating the Model

286

(1)

Evaluating the Model

287

(2)

Scale and Simplicity

289

(1)

Nonlinear Machine Learning

290

(6)

XGBoost

290

(2)

Hyperparameter Tuning

292

(2)

Vertex AI AutoML Tables

294

(2)

Time Window Features

296

(4)

Taxi-Out Time

296

(2)

Compounding Delays

298

(1)

Causality

299

(1)

Time Features

300

(5)

Departure Hour

300

(2)

Transform Clause

302

(1)

Categorical Variable

303

(1)

Feature Cross

303

(2)

Summary

305

(1)

Suggested Resources

306

(3)

9 Machine Learning with TensorFlow in Vertex AI

309

(26)

Toward More Complex Models

310

(7)

Preparing BigQuery Data for TensorFlow

314

(1)

Reading Data into TensorFlow

315

(2)

Training and Evaluation in Keras

317

(6)

Model Function

317

(1)

Features

318

(2)

Inputs

320

(1)

Training the Keras Model

320

(2)

Saving and Exporting

322

(1)

Deep Neural Network

322

(1)

Wide-and-Deep Model in Keras

323

(4)

Representing Air Traffic Corridors

323

(1)

Bucketing

324

(1)

Feature Crossing

325

(1)

Wide-and-Deep Classifier

326

(1)

Deploying a Trained TensorFlow Model to Vertex AI

327

(5)

Concepts

328

(1)

Uploading Model

328

(2)

Creating Endpoint

330

(1)

Deploying Model to Endpoint

330

(1)

Invoking the Deployed Model

331

(1)

Summary

332

(1)

Suggested Resources

333

(2)

10 Getting Ready for MLOps with Vertex AI

335

(22)

Developing and Deploying Using Python

336

(7)

Writing model.py

337

(1)

Writing the Training Pipeline

338

(2)

Predefined Split

340

(1)

AutoML

341

(2)

Hyperparameter Tuning

343

(7)

Parameterize Model

344

(1)

Shorten Training Run

345

(2)

Metrics During Training

347

(1)

Hyperparameter Tuning Pipeline

347

(2)

Best Trial to Completion

349

(1)

Explaining the Model

350

(4)

Configuring Explanations Metadata

350

(2)

Creating and Deploying Model

352

(1)

Obtaining Explanations

352

(2)

Summary

354

(1)

Suggested Resources

355

(2)

11 Time-Windowed Features for Real-Time Machine Learning

357

(46)

Time Averages

357

(10)

Apache Beam and Cloud Dataflow

358

(2)

Reading and Writing

360

(2)

Time Windowing

362

(5)

Machine Learning Training

367

(9)

Machine Learning Dataset

367

(6)

Training the Model

373

(3)

Streaming Predictions

376

(9)

Reuse Transforms

377

(2)

Input and Output

379

(1)

Invoking Model

380

(1)

Reusing Endpoint

381

(3)

Batching Predictions

384

(1)

Streaming Pipeline

385

(15)

Writing to BigQuery

385

(1)

Executing Streaming Pipeline

386

(1)

Late and Out-of-Order Records

387

(6)

Possible Streaming Sinks

393

(7)

Summary

400

(1)

Suggested Resources

401

(2)

12 The Full Dataset

403

(16)

Four Years of Data

403

(14)

Creating Dataset

404

(5)

Training Model

409

(2)

Evaluation

411

(6)

Summary

417

(1)

Suggested Resources

417

(2)

Conclusion

419

(4)

Considerations for Sensitive Data Within Machine Learning Datasets

423

(8)

Index

431

Valliappa (Lak) Lakshmanan is the director of analytics and AI solutions at Google Cloud, where he leads a team building cross-industry solutions to business problems. His mission is to democratize machine learning so that it can be done by anyone anywhere. Lak is the author or coauthor of Practical Machine Learning for Computer Vision, Machine Learning Design Patterns, Data Governance The Definitive Guide, Google BigQuery The Definitive Guide, and Data Science on the Google Cloud Platform.

Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning 2nd edition [Mīkstie vāki]

Konts un iestatījumi

Meklēšana

Meklēt datubāzē

Refine By

Tēmas Grāmatas angļu valodā

Izvēlieties iepirkumu grozu