Atjaunināt sīkdatņu piekrišanu

E-grāmata: Google BigQuery: The Definitive Guide: Data Warehousing, Analytics, and Machine Learning at Scale

4.22/5 (125 ratings by Goodreads)
  • Formāts: 522 pages
  • Izdošanas datums: 23-Oct-2019
  • Izdevniecība: O'Reilly Media
  • Valoda: eng
  • ISBN-13: 9781492044437
Citas grāmatas par šo tēmu:
  • Formāts - PDF+DRM
  • Cena: 46,20 €*
  • * ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
  • Ielikt grozā
  • Pievienot vēlmju sarakstam
  • Šī e-grāmata paredzēta tikai personīgai lietošanai. E-grāmatas nav iespējams atgriezt un nauda par iegādātajām e-grāmatām netiek atmaksāta.
  • Formāts: 522 pages
  • Izdošanas datums: 23-Oct-2019
  • Izdevniecība: O'Reilly Media
  • Valoda: eng
  • ISBN-13: 9781492044437
Citas grāmatas par šo tēmu:

DRM restrictions

  • Kopēšana (kopēt/ievietot):

    nav atļauts

  • Drukāšana:

    nav atļauts

  • Lietošana:

    Digitālo tiesību pārvaldība (Digital Rights Management (DRM))
    Izdevējs ir piegādājis šo grāmatu šifrētā veidā, kas nozīmē, ka jums ir jāinstalē bezmaksas programmatūra, lai to atbloķētu un lasītu. Lai lasītu šo e-grāmatu, jums ir jāizveido Adobe ID. Vairāk informācijas šeit. E-grāmatu var lasīt un lejupielādēt līdz 6 ierīcēm (vienam lietotājam ar vienu un to pašu Adobe ID).

    Nepieciešamā programmatūra
    Lai lasītu šo e-grāmatu mobilajā ierīcē (tālrunī vai planšetdatorā), jums būs jāinstalē šī bezmaksas lietotne: PocketBook Reader (iOS / Android)

    Lai lejupielādētu un lasītu šo e-grāmatu datorā vai Mac datorā, jums ir nepieciešamid Adobe Digital Editions (šī ir bezmaksas lietotne, kas īpaši izstrādāta e-grāmatām. Tā nav tas pats, kas Adobe Reader, kas, iespējams, jau ir jūsu datorā.)

    Jūs nevarat lasīt šo e-grāmatu, izmantojot Amazon Kindle.

Work with petabyte-scale datasets while building a collaborative, agile workplace in the process. This practical book is the canonical reference to Google BigQuery, the query engine that lets you conduct interactive analysis of large datasets. BigQuery enables enterprises to efficiently store, query, ingest, and learn from their data in a convenient framework. With this book, you&;ll examine how to analyze data at scale to derive insights from large datasets efficiently.

Valliappa Lakshmanan, tech lead for Google Cloud Platform, and Jordan Tigani, engineering director for the BigQuery team, provide best practices for modern data warehousing within an autoscaled, serverless public cloud. Whether you want to explore parts of BigQuery you&;re not familiar with or prefer to focus on specific tasks, this reference is indispensable.

Preface xiii
1 What Is Google BigQuery?
1(22)
Data Processing Architectures
1(6)
Relational Database Management System
2(1)
MapReduce Framework
3(2)
BigQuery: A Serverless, Distributed SQL Engine
5(2)
Working with BigQuery
7(5)
Deriving Insights Across Datasets
7(1)
ETL, EL, and ELT
8(2)
Powerful Analytics
10(2)
Simplicity of Management
12(1)
How BigQuery Came About
12(3)
What Makes BigQuery Possible?
15(6)
Separation of Compute and Storage
15(1)
Storage and Networking Infrastructure
16(2)
Managed Storage
18(1)
Integration with Google Cloud Platform
19(1)
Security and Compliance
20(1)
Summary
21(2)
2 Query Essentials
23(30)
Simple Queries
24(7)
Retrieving Rows by Using SELECT
25(2)
Aliasing Column Names with AS
27(1)
Filtering with WHERE
28(1)
SELECT *, EXCEPT, REPLACE
29(1)
Subqueries with WITH
30(1)
Sorting with ORDER BY
31(1)
Aggregates
31(4)
Computing Aggregates by Using GROUP BY
31(1)
Counting Records by Using COUNT
32(1)
Filtering Grouped Items by Using HAVING
33(1)
Finding Unique Values by Using DISTINCT
33(2)
A Brief Primer on Arrays and Structs
35(7)
Creating Arrays by Using ARRAY_AGG
36(3)
Array of STRUCT
39(1)
TUPLE
39(1)
Working with Arrays
40(1)
UNNEST an Array
41(1)
Joining Tables
42(6)
The JOIN Explained
42(3)
INNER JOIN
45(1)
CROSS JOIN
46(1)
OUTER JOIN
47(1)
Saving and Sharing
48(3)
Query History and Caching
49(1)
Saved Queries
50(1)
Views Versus Shared Queries
51(1)
Summary
51(2)
3 Data Types, Functions, and Operators
53(26)
Numeric Types and Functions
54(5)
Mathematical Functions
55(1)
Standard-Compliant Floating-Point Division
55(1)
SAFE Functions
56(1)
Comparisons
56(1)
Precise Decimal Calculations with NUMERIC
57(2)
Working with BOOL
59(6)
Logical Operations
59(1)
Conditional Expressions
60(1)
Cleaner NULL-Handling with COALESCE
61(1)
Casting and Coercion
62(2)
Using COUNTIF to Avoid Casting Booleans
64(1)
String Functions
65(6)
Internationalization
66(1)
Printing and Parsing
67(1)
String Manipulation Functions
68(1)
Transformation Functions
68(1)
Regular Expressions
69(1)
Summary of String Functions
70(1)
Working with TIMESTAMP
71(4)
Parsing and Formatting Timestamps
71(2)
Extracting Calendar Parts
73(1)
Arithmetic with Timestamps
74(1)
Date, Time, and DateTime
74(1)
Working with GIS Functions
75(1)
Summary
76(3)
4 Loading Data into BigQuery
79(56)
The Basics
79(15)
Loading from a Local Source
80(7)
Specifying a Schema
87(3)
Copying into a New Table
90(1)
Data Management (DDL and DML)
90(2)
Loading Data Efficiently
92(2)
Federated Queries and External Data Sources
94(25)
How to Use Federated Queries
95(3)
When to Use Federated Queries and External Data Sources
98(7)
Interactive Exploration and Querying of Data in Google Sheets
105(9)
SQL Queries on Data in Cloud Bigtable
114(5)
Transfers and Exports
119(12)
Data Transfer Service
119(6)
Exporting Stackdriver Logs
125(2)
Using Cloud Dataflow to Read/Write from BigQuery
127(4)
Moving On-Premises Data
131(3)
Data Migration Methods
132(2)
Summary
134(1)
5 Developing with BigQuery
135(50)
Developing Programmatically
135(24)
Accessing BigQuery via the REST API
135(7)
Google Cloud Client Library
142(17)
Accessing BigQuery from Data Science Tools
159(17)
Notebooks on Google Cloud Platform
159(5)
Working with BigQuery, pandas, and Jupyter
164(5)
Working with BigQuery from R
169(1)
Cloud Dataflow
170(3)
JDBC/ODBC drivers
173(1)
Incorporating BigQuery Data into Google Slides (in G Suite)
174(2)
Bash Scripting with BigQuery
176(6)
Creating Datasets and Tables
177(2)
Executing Queries
179(2)
BigQuery Objects
181(1)
Summary
182(3)
6 Architecture of BigQuery
185(44)
High-Level Architecture
185(5)
Life of a Query Request
185(5)
BigQuery Upgrades
190(1)
Query Engine (Dremel)
190(21)
Dremel Architecture
192(5)
Query Execution
197(14)
Storage
211(15)
Storage Data
211(6)
Metadata
217(9)
Summary
226(3)
7 Optimizing Performance and Cost
229(64)
Principles of Performance
229(3)
Key Drivers of Performance
230(1)
Controlling Cost
230(2)
Measuring and Troubleshooting
232(12)
Measuring Query Speed Using REST API
233(1)
Measuring Query Speed Using BigQuery Workload Tester
234(2)
Troubleshooting Workloads Using Stackdriver
236(2)
Reading Query Plan Information
238(6)
Increasing Query Speed
244(24)
Minimizing I/O
245(5)
Caching the Results of Previous Queries
250(3)
Performing Efficient Joins
253(9)
Avoiding Overwhelming a Worker
262(3)
Using Approximate Aggregation Functions
265(3)
Optimizing How Data Is Stored and Accessed
268(20)
Minimizing Network Overhead
268(4)
Choosing an Efficient Storage Format
272(9)
Partitioning Tables to Reduce Scan Size
281(3)
Clustering Tables Based on High-Cardinality Keys
284(4)
Time-Insensitive Use Cases
288(2)
Batch Queries
288(1)
File Loads
289(1)
Summary
290(3)
Checklist
290(3)
8 Advanced Queries
293(60)
Reusable Queries
293(14)
Parameterized Queries
294(5)
SQL User-Defined Functions
299(4)
Reusing Parts of Queries
303(4)
Advanced SQL
307(23)
Working with Arrays
308(8)
Window Functions
316(6)
Table Metadata
322(3)
Data Definition Language and Data Manipulation Language
325(5)
Beyond SQL
330(9)
JavaScript UDFs
330(2)
Scripting
332(7)
Advanced Functions
339(13)
BigQuery Geographic Information Systems
339(7)
Useful Statistical Functions
346(2)
Hash Algorithms
348(4)
Summary
352(1)
9 Machine Learning in BigQuery
353(62)
What Is Machine Learning?
353(6)
Formulating a Machine Learning Problem
354(2)
Types of Machine Learning Problems
356(3)
Building a Regression Model
359(17)
Choose the Label
359(1)
Exploring the Dataset to Find Features
360(4)
Creating a Training Dataset
364(1)
Training and Evaluating the Model
365(2)
Predicting with the Model
367(2)
Examining Model Weights
369(2)
More-Complex Regression Models
371(5)
Building a Classification Model
376(5)
Training
377(1)
Evaluation
378(1)
Prediction
379(1)
Choosing the Threshold
380(1)
Customizing BigQuery ML
381(4)
Controlling Data Split
382(1)
Balancing Classes
383(1)
Regularization
384(1)
k-Means Clustering
385(5)
What's Being Clustered?
385(1)
Clustering Bicycle Stations
386(1)
Carrying Out Clustering
387(1)
Understanding the Clusters
388(2)
Data-Driven Decisions
390(1)
Recommender Systems
390(13)
The MovieLens Dataset
391(1)
Matrix Factorization
392(2)
Making Recommendations
394(2)
Incorporating User and Movie Information
396(7)
Custom Machine Learning Models on GCP
403(10)
Hyperparameter Tuning
403(5)
AutoML
408(1)
Support for TensorFlow
409(4)
Summary
413(2)
10 Administering and Securing BigQuery
415(34)
Infrastructure Security
415(2)
Identity and Access Management
417(4)
Identity
417(1)
Role
418(3)
Resource
421(1)
Administering BigQuery
421(8)
Job Management
421(1)
Authorizing Users
422(1)
Restoring Deleted Records and Tables
422(1)
Continuous Integration/Continuous Deployment
423(2)
Cost/Billing Exports
425(3)
Dashboards, Monitoring, and Audit Logging
428(1)
Availability, Disaster Recovery, and Encryption
429(6)
Zones, Regions, and Multiregions
430(1)
BigQuery and Failure Handling
430(3)
Durability, Backups, and Disaster Recovery
433(1)
Privacy and Encryption
434(1)
Regulatory Compliance
435(13)
Data Locality
435(2)
Restricting Access to Subsets of Data
437(4)
Removing All Transactions Related to a Single Individual
441(3)
Data Loss Prevention
444(2)
CMEK
446(1)
Data Exfiltration Protection
447(1)
Summary
448(1)
Index 449
Valliappa (Lak) Lakshmanan is a Tech Lead for Big Data and Machine Learning Professional Services on Google Cloud Platform. His mission is to democratize machine learning so that it can be done by anyone anywhere using Google's amazing infrastructure (i.e., without deep knowledge of statistics or programming or ownership of lots of hardware).

Jordan is engineering director for the core BigQuery team. He was one of the founding engineers on BigQuery, and helped grow it to be one of the most successful products in Google's Cloud Platform. He wrote the first book on BigQuery, and has also spoken widely on the subject. Jordan has twenty years of software development experience, ranging from Microsoft Research to Machine Learning startups.