Preface |
|
xiii | |
|
1 What Is Google BigQuery? |
|
|
1 | (22) |
|
Data Processing Architectures |
|
|
1 | (6) |
|
Relational Database Management System |
|
|
2 | (1) |
|
|
3 | (2) |
|
BigQuery: A Serverless, Distributed SQL Engine |
|
|
5 | (2) |
|
|
7 | (5) |
|
Deriving Insights Across Datasets |
|
|
7 | (1) |
|
|
8 | (2) |
|
|
10 | (2) |
|
|
12 | (1) |
|
|
12 | (3) |
|
What Makes BigQuery Possible? |
|
|
15 | (6) |
|
Separation of Compute and Storage |
|
|
15 | (1) |
|
Storage and Networking Infrastructure |
|
|
16 | (2) |
|
|
18 | (1) |
|
Integration with Google Cloud Platform |
|
|
19 | (1) |
|
|
20 | (1) |
|
|
21 | (2) |
|
|
23 | (30) |
|
|
24 | (7) |
|
Retrieving Rows by Using SELECT |
|
|
25 | (2) |
|
Aliasing Column Names with AS |
|
|
27 | (1) |
|
|
28 | (1) |
|
SELECT *, EXCEPT, REPLACE |
|
|
29 | (1) |
|
|
30 | (1) |
|
|
31 | (1) |
|
|
31 | (4) |
|
Computing Aggregates by Using GROUP BY |
|
|
31 | (1) |
|
Counting Records by Using COUNT |
|
|
32 | (1) |
|
Filtering Grouped Items by Using HAVING |
|
|
33 | (1) |
|
Finding Unique Values by Using DISTINCT |
|
|
33 | (2) |
|
A Brief Primer on Arrays and Structs |
|
|
35 | (7) |
|
Creating Arrays by Using ARRAY_AGG |
|
|
36 | (3) |
|
|
39 | (1) |
|
|
39 | (1) |
|
|
40 | (1) |
|
|
41 | (1) |
|
|
42 | (6) |
|
|
42 | (3) |
|
|
45 | (1) |
|
|
46 | (1) |
|
|
47 | (1) |
|
|
48 | (3) |
|
Query History and Caching |
|
|
49 | (1) |
|
|
50 | (1) |
|
Views Versus Shared Queries |
|
|
51 | (1) |
|
|
51 | (2) |
|
3 Data Types, Functions, and Operators |
|
|
53 | (26) |
|
Numeric Types and Functions |
|
|
54 | (5) |
|
|
55 | (1) |
|
Standard-Compliant Floating-Point Division |
|
|
55 | (1) |
|
|
56 | (1) |
|
|
56 | (1) |
|
Precise Decimal Calculations with NUMERIC |
|
|
57 | (2) |
|
|
59 | (6) |
|
|
59 | (1) |
|
|
60 | (1) |
|
Cleaner NULL-Handling with COALESCE |
|
|
61 | (1) |
|
|
62 | (2) |
|
Using COUNTIF to Avoid Casting Booleans |
|
|
64 | (1) |
|
|
65 | (6) |
|
|
66 | (1) |
|
|
67 | (1) |
|
String Manipulation Functions |
|
|
68 | (1) |
|
|
68 | (1) |
|
|
69 | (1) |
|
Summary of String Functions |
|
|
70 | (1) |
|
|
71 | (4) |
|
Parsing and Formatting Timestamps |
|
|
71 | (2) |
|
Extracting Calendar Parts |
|
|
73 | (1) |
|
Arithmetic with Timestamps |
|
|
74 | (1) |
|
|
74 | (1) |
|
Working with GIS Functions |
|
|
75 | (1) |
|
|
76 | (3) |
|
4 Loading Data into BigQuery |
|
|
79 | (56) |
|
|
79 | (15) |
|
Loading from a Local Source |
|
|
80 | (7) |
|
|
87 | (3) |
|
|
90 | (1) |
|
Data Management (DDL and DML) |
|
|
90 | (2) |
|
|
92 | (2) |
|
Federated Queries and External Data Sources |
|
|
94 | (25) |
|
How to Use Federated Queries |
|
|
95 | (3) |
|
When to Use Federated Queries and External Data Sources |
|
|
98 | (7) |
|
Interactive Exploration and Querying of Data in Google Sheets |
|
|
105 | (9) |
|
SQL Queries on Data in Cloud Bigtable |
|
|
114 | (5) |
|
|
119 | (12) |
|
|
119 | (6) |
|
Exporting Stackdriver Logs |
|
|
125 | (2) |
|
Using Cloud Dataflow to Read/Write from BigQuery |
|
|
127 | (4) |
|
|
131 | (3) |
|
|
132 | (2) |
|
|
134 | (1) |
|
5 Developing with BigQuery |
|
|
135 | (50) |
|
Developing Programmatically |
|
|
135 | (24) |
|
Accessing BigQuery via the REST API |
|
|
135 | (7) |
|
Google Cloud Client Library |
|
|
142 | (17) |
|
Accessing BigQuery from Data Science Tools |
|
|
159 | (17) |
|
Notebooks on Google Cloud Platform |
|
|
159 | (5) |
|
Working with BigQuery, pandas, and Jupyter |
|
|
164 | (5) |
|
Working with BigQuery from R |
|
|
169 | (1) |
|
|
170 | (3) |
|
|
173 | (1) |
|
Incorporating BigQuery Data into Google Slides (in G Suite) |
|
|
174 | (2) |
|
Bash Scripting with BigQuery |
|
|
176 | (6) |
|
Creating Datasets and Tables |
|
|
177 | (2) |
|
|
179 | (2) |
|
|
181 | (1) |
|
|
182 | (3) |
|
6 Architecture of BigQuery |
|
|
185 | (44) |
|
|
185 | (5) |
|
|
185 | (5) |
|
|
190 | (1) |
|
|
190 | (21) |
|
|
192 | (5) |
|
|
197 | (14) |
|
|
211 | (15) |
|
|
211 | (6) |
|
|
217 | (9) |
|
|
226 | (3) |
|
7 Optimizing Performance and Cost |
|
|
229 | (64) |
|
Principles of Performance |
|
|
229 | (3) |
|
Key Drivers of Performance |
|
|
230 | (1) |
|
|
230 | (2) |
|
Measuring and Troubleshooting |
|
|
232 | (12) |
|
Measuring Query Speed Using REST API |
|
|
233 | (1) |
|
Measuring Query Speed Using BigQuery Workload Tester |
|
|
234 | (2) |
|
Troubleshooting Workloads Using Stackdriver |
|
|
236 | (2) |
|
Reading Query Plan Information |
|
|
238 | (6) |
|
|
244 | (24) |
|
|
245 | (5) |
|
Caching the Results of Previous Queries |
|
|
250 | (3) |
|
Performing Efficient Joins |
|
|
253 | (9) |
|
Avoiding Overwhelming a Worker |
|
|
262 | (3) |
|
Using Approximate Aggregation Functions |
|
|
265 | (3) |
|
Optimizing How Data Is Stored and Accessed |
|
|
268 | (20) |
|
Minimizing Network Overhead |
|
|
268 | (4) |
|
Choosing an Efficient Storage Format |
|
|
272 | (9) |
|
Partitioning Tables to Reduce Scan Size |
|
|
281 | (3) |
|
Clustering Tables Based on High-Cardinality Keys |
|
|
284 | (4) |
|
Time-Insensitive Use Cases |
|
|
288 | (2) |
|
|
288 | (1) |
|
|
289 | (1) |
|
|
290 | (3) |
|
|
290 | (3) |
|
|
293 | (60) |
|
|
293 | (14) |
|
|
294 | (5) |
|
SQL User-Defined Functions |
|
|
299 | (4) |
|
|
303 | (4) |
|
|
307 | (23) |
|
|
308 | (8) |
|
|
316 | (6) |
|
|
322 | (3) |
|
Data Definition Language and Data Manipulation Language |
|
|
325 | (5) |
|
|
330 | (9) |
|
|
330 | (2) |
|
|
332 | (7) |
|
|
339 | (13) |
|
BigQuery Geographic Information Systems |
|
|
339 | (7) |
|
Useful Statistical Functions |
|
|
346 | (2) |
|
|
348 | (4) |
|
|
352 | (1) |
|
9 Machine Learning in BigQuery |
|
|
353 | (62) |
|
What Is Machine Learning? |
|
|
353 | (6) |
|
Formulating a Machine Learning Problem |
|
|
354 | (2) |
|
Types of Machine Learning Problems |
|
|
356 | (3) |
|
Building a Regression Model |
|
|
359 | (17) |
|
|
359 | (1) |
|
Exploring the Dataset to Find Features |
|
|
360 | (4) |
|
Creating a Training Dataset |
|
|
364 | (1) |
|
Training and Evaluating the Model |
|
|
365 | (2) |
|
Predicting with the Model |
|
|
367 | (2) |
|
|
369 | (2) |
|
More-Complex Regression Models |
|
|
371 | (5) |
|
Building a Classification Model |
|
|
376 | (5) |
|
|
377 | (1) |
|
|
378 | (1) |
|
|
379 | (1) |
|
|
380 | (1) |
|
|
381 | (4) |
|
|
382 | (1) |
|
|
383 | (1) |
|
|
384 | (1) |
|
|
385 | (5) |
|
|
385 | (1) |
|
Clustering Bicycle Stations |
|
|
386 | (1) |
|
|
387 | (1) |
|
Understanding the Clusters |
|
|
388 | (2) |
|
|
390 | (1) |
|
|
390 | (13) |
|
|
391 | (1) |
|
|
392 | (2) |
|
|
394 | (2) |
|
Incorporating User and Movie Information |
|
|
396 | (7) |
|
Custom Machine Learning Models on GCP |
|
|
403 | (10) |
|
|
403 | (5) |
|
|
408 | (1) |
|
|
409 | (4) |
|
|
413 | (2) |
|
10 Administering and Securing BigQuery |
|
|
415 | (34) |
|
|
415 | (2) |
|
Identity and Access Management |
|
|
417 | (4) |
|
|
417 | (1) |
|
|
418 | (3) |
|
|
421 | (1) |
|
|
421 | (8) |
|
|
421 | (1) |
|
|
422 | (1) |
|
Restoring Deleted Records and Tables |
|
|
422 | (1) |
|
Continuous Integration/Continuous Deployment |
|
|
423 | (2) |
|
|
425 | (3) |
|
Dashboards, Monitoring, and Audit Logging |
|
|
428 | (1) |
|
Availability, Disaster Recovery, and Encryption |
|
|
429 | (6) |
|
Zones, Regions, and Multiregions |
|
|
430 | (1) |
|
BigQuery and Failure Handling |
|
|
430 | (3) |
|
Durability, Backups, and Disaster Recovery |
|
|
433 | (1) |
|
|
434 | (1) |
|
|
435 | (13) |
|
|
435 | (2) |
|
Restricting Access to Subsets of Data |
|
|
437 | (4) |
|
Removing All Transactions Related to a Single Individual |
|
|
441 | (3) |
|
|
444 | (2) |
|
|
446 | (1) |
|
Data Exfiltration Protection |
|
|
447 | (1) |
|
|
448 | (1) |
Index |
|
449 | |