Preface |
|
xi | |
1 Making Better Decisions Based on Data |
|
1 | (28) |
|
|
4 | (1) |
|
The Role of Data Scientists |
|
|
5 | (5) |
|
|
7 | (1) |
|
Full Stack Cloud Data Scientists |
|
|
8 | (1) |
|
|
9 | (1) |
|
|
10 | (3) |
|
Simple to Complex Solutions |
|
|
11 | (1) |
|
|
11 | (1) |
|
|
12 | (1) |
|
|
13 | (5) |
|
|
15 | (1) |
|
Probability Density Function |
|
|
16 | (1) |
|
Cumulative Distribution Function |
|
|
17 | (1) |
|
|
18 | (4) |
|
|
19 | (1) |
|
|
19 | (1) |
|
Getting Started with the Code |
|
|
20 | (2) |
|
Agile Architecture for Data Science on Google Cloud |
|
|
22 | (3) |
|
What Is Agile Architecture? |
|
|
23 | (1) |
|
|
23 | (1) |
|
|
24 | (1) |
|
|
25 | (1) |
|
|
26 | (3) |
2 Ingesting Data into the Cloud |
|
29 | (52) |
|
Airline On-Time Performance Data |
|
|
30 | (7) |
|
|
31 | (1) |
|
|
31 | (1) |
|
|
32 | (1) |
|
|
33 | (1) |
|
Hub-and-Spoke Architecture |
|
|
34 | (1) |
|
|
35 | (2) |
|
Separation of Compute and Storage |
|
|
37 | (9) |
|
|
39 | (2) |
|
Scaling Out with Sharded Data |
|
|
41 | (2) |
|
Scaling Out with Data-in-Place |
|
|
43 | (3) |
|
|
46 | (9) |
|
Reverse Engineering a Web Form |
|
|
46 | (2) |
|
|
48 | (2) |
|
|
50 | (1) |
|
Uploading Data to Google Cloud Storage |
|
|
51 | (4) |
|
Loading Data into Google BigQuery |
|
|
55 | (8) |
|
Advantages of a Serverless Columnar Database |
|
|
55 | (2) |
|
|
57 | (1) |
|
|
57 | (4) |
|
|
61 | (1) |
|
|
62 | (1) |
|
Scheduling Monthly Downloads |
|
|
63 | (13) |
|
|
65 | (6) |
|
|
71 | (1) |
|
|
72 | (2) |
|
Deploying and Invoking Cloud Run |
|
|
74 | (1) |
|
|
75 | (1) |
|
|
76 | (1) |
|
|
77 | (1) |
|
|
78 | (3) |
3 Creating Compelling Dashboards |
|
81 | (44) |
|
Explain Your Model with Dashboards |
|
|
83 | (5) |
|
Why Build a Dashboard First? |
|
|
84 | (2) |
|
Accuracy, Honesty, and Good Design |
|
|
86 | (2) |
|
Loading Data into Cloud SQL |
|
|
88 | (8) |
|
Create a Google Cloud SQL Instance |
|
|
89 | (3) |
|
|
92 | (3) |
|
Interacting with the Database |
|
|
95 | (1) |
|
|
96 | (5) |
|
|
96 | (1) |
|
|
97 | (2) |
|
|
99 | (1) |
|
|
100 | (1) |
|
|
101 | (5) |
|
|
101 | (2) |
|
|
103 | (3) |
|
|
106 | (13) |
|
Getting Started with Data Studio |
|
|
107 | (2) |
|
|
109 | (1) |
|
|
110 | (2) |
|
Showing Proportions with a Pie Chart |
|
|
112 | (5) |
|
Explaining a Contingency Table |
|
|
117 | (2) |
|
Modern Business Intelligence |
|
|
119 | (4) |
|
|
119 | (1) |
|
|
120 | (2) |
|
|
122 | (1) |
|
|
123 | (1) |
|
|
123 | (2) |
4 Streaming Data: Publication and Ingest with Pub/Sub and Dataflow |
|
125 | (48) |
|
|
126 | (7) |
|
|
127 | (1) |
|
|
128 | (1) |
|
Getting Airport Information |
|
|
129 | (3) |
|
|
132 | (1) |
|
|
133 | (20) |
|
Apache Beam/Cloud Dataflow |
|
|
135 | (1) |
|
|
136 | (3) |
|
Adding Time Zone Information |
|
|
139 | (2) |
|
|
141 | (3) |
|
|
144 | (2) |
|
|
146 | (2) |
|
Reading and Writing to the Cloud |
|
|
148 | (2) |
|
Running the Pipeline in the Cloud |
|
|
150 | (3) |
|
Publishing an Event Stream to Cloud Pub/Sub |
|
|
153 | (7) |
|
|
154 | (1) |
|
|
155 | (1) |
|
|
156 | (1) |
|
Iterating Through Records |
|
|
157 | (1) |
|
Building a Batch of Events |
|
|
158 | (1) |
|
Publishing a Batch of Events |
|
|
159 | (1) |
|
Real-Time Stream Processing |
|
|
160 | (9) |
|
|
160 | (2) |
|
|
162 | (1) |
|
|
162 | (3) |
|
|
165 | (1) |
|
Executing the Stream Processing |
|
|
166 | (2) |
|
Analyzing Streaming Data in BigQuery |
|
|
168 | (1) |
|
|
169 | (1) |
|
|
170 | (1) |
|
|
171 | (2) |
5 Interactive Data Exploration with Vertex AI Workbench |
|
173 | (38) |
|
Exploratory Data Analysis |
|
|
174 | (10) |
|
|
177 | (2) |
|
Reading a Query Explanation |
|
|
179 | (5) |
|
Exploratory Data Analysis in Vertex AI Workbench |
|
|
184 | (6) |
|
|
185 | (1) |
|
|
186 | (2) |
|
|
188 | (1) |
|
|
188 | (1) |
|
Jupyter Magic for Google Cloud |
|
|
189 | (1) |
|
|
190 | (14) |
|
|
191 | (1) |
|
|
191 | (3) |
|
|
194 | (5) |
|
Arrival Delay Conditioned on Departure Delay |
|
|
199 | (5) |
|
|
204 | (6) |
|
|
204 | (1) |
|
|
205 | (1) |
|
|
206 | (4) |
|
|
210 | (1) |
|
|
210 | (1) |
6 Bayesian Classifier with Apache Spark on Cloud Dataproc |
|
211 | (34) |
|
MapReduce and the Hadoop Ecosystem |
|
|
211 | (3) |
|
|
212 | (2) |
|
|
214 | (1) |
|
|
214 | (7) |
|
Need for Higher-Level Tools |
|
|
216 | (1) |
|
|
217 | (2) |
|
|
219 | (2) |
|
Quantization Using Spark SQL |
|
|
221 | (10) |
|
JupyterLab on Cloud Dataproc |
|
|
222 | (1) |
|
Independence Check Using BigQuery |
|
|
223 | (2) |
|
|
225 | (2) |
|
|
227 | (4) |
|
|
231 | (7) |
|
|
231 | (2) |
|
|
233 | (1) |
|
Dynamically Resizing Clusters |
|
|
234 | (1) |
|
Comparing to Single Threshold Model |
|
|
235 | (3) |
|
|
238 | (4) |
|
|
238 | (1) |
|
|
238 | (1) |
|
|
239 | (1) |
|
|
240 | (1) |
|
|
241 | (1) |
|
|
242 | (1) |
|
|
243 | (2) |
7 Logistic Regression Using Spark ML |
|
245 | (38) |
|
|
246 | (5) |
|
How Logistic Regression Works |
|
|
246 | (3) |
|
|
249 | (1) |
|
Getting Started with Spark Machine Learning |
|
|
250 | (1) |
|
Spark Logistic Regression |
|
|
251 | (12) |
|
Creating a Training Dataset |
|
|
252 | (4) |
|
|
256 | (3) |
|
Predicting Using the Model |
|
|
259 | (1) |
|
|
260 | (3) |
|
|
263 | (18) |
|
|
263 | (4) |
|
|
267 | (4) |
|
|
271 | (3) |
|
|
274 | (4) |
|
|
278 | (2) |
|
|
280 | (1) |
|
|
281 | (1) |
|
|
282 | (1) |
8 Machine Learning with BigQuery ML |
|
283 | (26) |
|
|
283 | (7) |
|
|
285 | (1) |
|
|
286 | (1) |
|
|
287 | (2) |
|
|
289 | (1) |
|
Nonlinear Machine Learning |
|
|
290 | (6) |
|
|
290 | (2) |
|
|
292 | (2) |
|
|
294 | (2) |
|
|
296 | (4) |
|
|
296 | (2) |
|
|
298 | (1) |
|
|
299 | (1) |
|
|
300 | (5) |
|
|
300 | (2) |
|
|
302 | (1) |
|
|
303 | (1) |
|
|
303 | (2) |
|
|
305 | (1) |
|
|
306 | (3) |
9 Machine Learning with TensorFlow in Vertex AI |
|
309 | (26) |
|
Toward More Complex Models |
|
|
310 | (7) |
|
Preparing BigQuery Data for TensorFlow |
|
|
314 | (1) |
|
Reading Data into TensorFlow |
|
|
315 | (2) |
|
Training and Evaluation in Keras |
|
|
317 | (6) |
|
|
317 | (1) |
|
|
318 | (2) |
|
|
320 | (1) |
|
|
320 | (2) |
|
|
322 | (1) |
|
|
322 | (1) |
|
Wide-and-Deep Model in Keras |
|
|
323 | (4) |
|
Representing Air Traffic Corridors |
|
|
323 | (1) |
|
|
324 | (1) |
|
|
325 | (1) |
|
|
326 | (1) |
|
Deploying a Trained TensorFlow Model to Vertex AI |
|
|
327 | (5) |
|
|
328 | (1) |
|
|
328 | (2) |
|
|
330 | (1) |
|
Deploying Model to Endpoint |
|
|
330 | (1) |
|
Invoking the Deployed Model |
|
|
331 | (1) |
|
|
332 | (1) |
|
|
333 | (2) |
10 Getting Ready for MLOps with Vertex AI |
|
335 | (22) |
|
Developing and Deploying Using Python |
|
|
336 | (7) |
|
|
337 | (1) |
|
Writing the Training Pipeline |
|
|
338 | (2) |
|
|
340 | (1) |
|
|
341 | (2) |
|
|
343 | (7) |
|
|
344 | (1) |
|
|
345 | (2) |
|
|
347 | (1) |
|
Hyperparameter Tuning Pipeline |
|
|
347 | (2) |
|
|
349 | (1) |
|
|
350 | (4) |
|
Configuring Explanations Metadata |
|
|
350 | (2) |
|
Creating and Deploying Model |
|
|
352 | (1) |
|
|
352 | (2) |
|
|
354 | (1) |
|
|
355 | (2) |
11 Time-Windowed Features for Real-Time Machine Learning |
|
357 | (46) |
|
|
357 | (10) |
|
Apache Beam and Cloud Dataflow |
|
|
358 | (2) |
|
|
360 | (2) |
|
|
362 | (5) |
|
Machine Learning Training |
|
|
367 | (9) |
|
|
367 | (6) |
|
|
373 | (3) |
|
|
376 | (9) |
|
|
377 | (2) |
|
|
379 | (1) |
|
|
380 | (1) |
|
|
381 | (3) |
|
|
384 | (1) |
|
|
385 | (15) |
|
|
385 | (1) |
|
Executing Streaming Pipeline |
|
|
386 | (1) |
|
Late and Out-of-Order Records |
|
|
387 | (6) |
|
|
393 | (7) |
|
|
400 | (1) |
|
|
401 | (2) |
12 The Full Dataset |
|
403 | (16) |
|
|
403 | (14) |
|
|
404 | (5) |
|
|
409 | (2) |
|
|
411 | (6) |
|
|
417 | (1) |
|
|
417 | (2) |
Conclusion |
|
419 | (4) |
Considerations for Sensitive Data Within Machine Learning Datasets |
|
423 | (8) |
Index |
|
431 | |