Preface |
|
xii | |
Acknowledgments |
|
xiv | |
About this book |
|
xv | |
About The Authors |
|
xviii | |
|
|
1 | (25) |
|
1.1 Artificial intelligence, machine learning, and deep learning |
|
|
2 | (11) |
|
|
2 | (1) |
|
|
3 | (1) |
|
Learning rules and representations from data |
|
|
4 | (3) |
|
The "deep" in "deep learning" |
|
|
7 | (3) |
|
Understanding how deep learning works, in three figures 8m What deep learning has achieved so far |
|
|
10 | (1) |
|
Don't believe the short-term hype |
|
|
11 | (1) |
|
|
12 | (1) |
|
1.2 Before deep learning: A brief history of machine learning |
|
|
13 | (7) |
|
|
13 | (1) |
|
|
13 | (1) |
|
|
14 | (1) |
|
Decision trees, random forests, and gradient-boosting machines |
|
|
15 | (1) |
|
|
16 | (1) |
|
What makes deep learning different? |
|
|
17 | (1) |
|
The modern machine learning landscape |
|
|
17 | (3) |
|
1.3 Why deep learning? Why now? |
|
|
20 | (6) |
|
|
20 | (1) |
|
|
21 | (1) |
|
|
22 | (1) |
|
|
22 | (1) |
|
The democratization of deep learning |
|
|
23 | (1) |
|
|
24 | (2) |
|
2 The mathematical building blocks of neural networks |
|
|
26 | (77) |
|
2.1 A first look at a neural network |
|
|
27 | (4) |
|
2.2 Data representations for neural networks |
|
|
31 | (6) |
|
|
31 | (1) |
|
|
31 | (1) |
|
Matrices (rank 2 tensors) |
|
|
32 | (1) |
|
Rank 3 and higher-rank tensors |
|
|
32 | (1) |
|
|
33 | (1) |
|
Manipulating tensors in R |
|
|
34 | (1) |
|
The notion of data batches |
|
|
35 | (1) |
|
Real-world examples of data tensors |
|
|
35 | (1) |
|
|
35 | (1) |
|
Time-series data or sequence data |
|
|
36 | (1) |
|
|
36 | (1) |
|
|
37 | (1) |
|
2.3 The gears of neural networks: Tensor operations |
|
|
37 | (11) |
|
|
38 | (2) |
|
|
40 | (1) |
|
|
41 | (2) |
|
|
43 | (1) |
|
Geometric interpretation of tensor operations |
|
|
44 | (3) |
|
A geometric interpretation of deep learning |
|
|
47 | (1) |
|
2.4 The engine of neural networks: Gradient-based optimization |
|
|
48 | (11) |
|
|
49 | (1) |
|
Derivative of a tensor operation: The gradient |
|
|
50 | (1) |
|
Stochastic gradient descent |
|
|
51 | (3) |
|
Chaining derivatives: The backpropagation algorithm |
|
|
54 | (5) |
|
2.5 Looking back at our first example |
|
|
59 | (10) |
|
Reimplementing our first example from scratch in TensorFlow |
|
|
61 | (2) |
|
Running one training step |
|
|
63 | (2) |
|
|
65 | (1) |
|
|
66 | (2) |
|
Introduction to Keras and TensorFlow |
|
|
68 | (1) |
|
|
69 | (1) |
|
|
69 | (2) |
|
3.3 Keras and TensorFlow: A brief history |
|
|
71 | (1) |
|
3.4 Python and R interfaces: A brief history |
|
|
71 | (1) |
|
3.5 Setting up a deep learning workspace |
|
|
72 | (2) |
|
Installing Keras and TensorFlow |
|
|
73 | (1) |
|
3.6 First steps with TensorFlow |
|
|
74 | (1) |
|
|
74 | (1) |
|
|
75 | (14) |
|
Tensor shape and reshaping |
|
|
77 | (1) |
|
|
78 | (1) |
|
|
79 | (1) |
|
|
80 | (1) |
|
Constant tensors and variables |
|
|
81 | (1) |
|
Tensor operations: Doing math in TensorFlow |
|
|
82 | (1) |
|
A second look at the GradientTape API |
|
|
83 | (1) |
|
An end-to-end example: A linear classifier in pure TensorFlow |
|
|
84 | (5) |
|
3.8 Anatomy of a neural network: Understanding core Keras APIs |
|
|
89 | (14) |
|
Layers: The building blocks of deep learning |
|
|
89 | (5) |
|
|
94 | (1) |
|
The "compile" step: Configuring the learning process |
|
|
95 | (3) |
|
|
98 | (1) |
|
Understanding the fit() method |
|
|
99 | (1) |
|
Monitoring loss and metrics on validation data |
|
|
99 | (2) |
|
Inference: Using a model after training |
|
|
101 | (2) |
|
4 Getting started with neural networks: Classification 1 and regression |
|
|
103 | (27) |
|
4.1 Classifying movie reviews: A binary classification example |
|
|
105 | (9) |
|
|
105 | (2) |
|
|
107 | (1) |
|
|
108 | (2) |
|
|
110 | (3) |
|
Using a trained model to generate predictions on new data |
|
|
113 | (1) |
|
|
113 | (1) |
|
|
113 | (1) |
|
4.2 Classifying newswires: A multiclass classification example |
|
|
114 | (8) |
|
|
114 | (2) |
|
|
116 | (1) |
|
|
116 | (1) |
|
|
117 | (2) |
|
Generating predictions on new data |
|
|
119 | (1) |
|
A different way to handle the labels and the loss |
|
|
120 | (1) |
|
The importance of having sufficiently large intermediate layers |
|
|
120 | (1) |
|
|
121 | (1) |
|
|
121 | (1) |
|
4.3 Predicting house prices: A regression example |
|
|
122 | (8) |
|
The Boston housing price dataset |
|
|
122 | (1) |
|
|
123 | (1) |
|
|
123 | (1) |
|
Validatingyour approach using K-fold validation |
|
|
124 | (4) |
|
Generating predictions on new data |
|
|
128 | (1) |
|
|
128 | (2) |
|
5 Fundamentals of machine learning |
|
|
130 | (36) |
|
5.1 Generalization: The goal of machine learning |
|
|
130 | (12) |
|
Underfilling and overfitting |
|
|
131 | (5) |
|
The nature of generalization in deep learning |
|
|
136 | (6) |
|
5.2 Evaluating machine learning models |
|
|
142 | (4) |
|
Training, validation, and test sets |
|
|
142 | (3) |
|
Beating a common-sense baseline |
|
|
145 | (1) |
|
Things to keep in mind about model evaluation |
|
|
146 | (1) |
|
|
146 | (6) |
|
Tuning key gradient descent parameters |
|
|
147 | (2) |
|
Leveraging better architecture priors |
|
|
149 | (1) |
|
Increasing model capacity |
|
|
150 | (2) |
|
5.4 Improving generalization |
|
|
152 | (14) |
|
|
152 | (1) |
|
|
153 | (1) |
|
|
154 | (1) |
|
|
155 | (11) |
|
6 The universal workflow of machine learning |
|
|
166 | (19) |
|
|
168 | (6) |
|
|
168 | (1) |
|
|
169 | (4) |
|
|
173 | (1) |
|
Choose a measure of success |
|
|
173 | (1) |
|
|
174 | (4) |
|
|
174 | (1) |
|
Choose an evaluation protocol |
|
|
175 | (1) |
|
|
176 | (1) |
|
Scale up: Develop a model that over/its |
|
|
177 | (1) |
|
Regularize and tune your model |
|
|
177 | (1) |
|
|
178 | (7) |
|
Explain your work to stakeholders and set expectations |
|
|
178 | (1) |
|
|
179 | (3) |
|
Monitor your model in the wild |
|
|
182 | (1) |
|
|
183 | (2) |
|
7 Working with Keras: A deep dive |
|
|
185 | (35) |
|
7.1 A spectrum of workflows |
|
|
186 | (1) |
|
7.2 Different ways to build Keras models |
|
|
186 | (15) |
|
|
187 | (2) |
|
|
189 | (7) |
|
Subclassing the Model class |
|
|
196 | (3) |
|
Mixing and matching different components |
|
|
199 | (1) |
|
Remember: Use the right tool for the job |
|
|
200 | (1) |
|
7.3 Using built-in training and evaluation loops |
|
|
201 | (9) |
|
|
202 | (2) |
|
|
204 | (1) |
|
Writing your own callbacks |
|
|
205 | (3) |
|
Monitoring and visualization with TensorBoard |
|
|
208 | (2) |
|
7.4 Writing your own training and evaluation loops |
|
|
210 | (10) |
|
|
210 | (1) |
|
Low-level usage of metrics |
|
|
211 | (1) |
|
A complete training and evaluation loop |
|
|
212 | (3) |
|
Make it fast with tfJunction() |
|
|
215 | (1) |
|
Leveraging fit() with a custom training loop |
|
|
216 | (4) |
|
8 Introduction to deep learning for computer vision |
|
|
220 | (38) |
|
8.1 Introduction to convnets |
|
|
221 | (9) |
|
The convolution operation |
|
|
223 | (5) |
|
The max-pooling operation |
|
|
228 | (2) |
|
8.2 Training a convnet from scratch on a small dataset |
|
|
230 | (15) |
|
The relevance of deep learning for small data problems |
|
|
230 | (1) |
|
|
231 | (3) |
|
|
234 | (1) |
|
|
235 | (6) |
|
|
241 | (4) |
|
8.3 Leveraging a pretrained model |
|
|
245 | (13) |
|
Feature extraction with a pretrained model |
|
|
246 | (8) |
|
Fine-tuning a pretrained model |
|
|
254 | (4) |
|
9 Advanced deep learning for computer vision |
|
|
258 | (43) |
|
9.1 Three essential computer vision tasks |
|
|
259 | (1) |
|
9.2 An image segmentation example |
|
|
260 | (9) |
|
9.3 Modern convnet architecture patterns |
|
|
269 | (13) |
|
Modularity, hierarchy, and reuse |
|
|
269 | (3) |
|
|
272 | (3) |
|
|
275 | (3) |
|
Depthwise separable convolutions |
|
|
278 | (2) |
|
Putting it together: A mini Xception-like model |
|
|
280 | (2) |
|
9.4 Interpreting what convnets learn |
|
|
282 | (19) |
|
Visualizing intermediate activations |
|
|
283 | (6) |
|
Visualizing convnet filters |
|
|
289 | (5) |
|
Visualizing heatmaps of class activation |
|
|
294 | (7) |
|
10 Deep learning for time series |
|
|
301 | (33) |
|
10.1 Different kinds of time-series tasks |
|
|
301 | (1) |
|
10.2 A temperature-forecasting example |
|
|
302 | (15) |
|
|
306 | (4) |
|
A common-sense, non--machine learning baseline |
|
|
310 | (1) |
|
Let's try a basic machine learning model |
|
|
311 | (3) |
|
Let's try a ID convolutional model |
|
|
314 | (2) |
|
A first recurrent baseline |
|
|
316 | (1) |
|
10.3 Understanding recurrent neural networks |
|
|
317 | (7) |
|
A recurrent layer in Keras |
|
|
320 | (4) |
|
10.4 Advanced use of recurrent neural networks |
|
|
324 | (10) |
|
Using recurrent dropout to fight overfitting |
|
|
324 | (3) |
|
Stacking recurrent layers |
|
|
327 | (2) |
|
|
329 | (3) |
|
|
332 | (2) |
|
11 Deep learning for text |
|
|
334 | (65) |
|
11.1 Natural language processing: The bird's-eye view |
|
|
334 | (2) |
|
|
336 | (8) |
|
|
337 | (1) |
|
Text splitting (tokenization) |
|
|
338 | (1) |
|
|
339 | (1) |
|
Using layer_text_vectorization |
|
|
340 | (4) |
|
11.3 Two approaches for representing groups of words: Sets and sequences |
|
|
344 | (22) |
|
Preparing the IMDB movie reviews data |
|
|
345 | (2) |
|
Processing words as a set: The bag-of-words approach |
|
|
347 | (8) |
|
Processing words as a sequence: The sequence model approach |
|
|
355 | (11) |
|
11.4 The Transformer architecture |
|
|
366 | (16) |
|
Understanding self-attention |
|
|
366 | (5) |
|
|
371 | (1) |
|
|
372 | (9) |
|
When to use sequence models over bag-of-words models |
|
|
381 | (1) |
|
11.5 Beyond text classification: Sequence-to-sequence learning |
|
|
382 | (17) |
|
A machine translation example |
|
|
383 | (4) |
|
Sequence-to-sequence learning with RNNs |
|
|
387 | (5) |
|
Sequence-to-sequence learning with Transformer |
|
|
392 | (7) |
|
12 Generative deep learning |
|
|
399 | (55) |
|
|
401 | (13) |
|
A brief history of generative deep learning for sequence generation |
|
|
401 | (1) |
|
How do you generate sequence data? |
|
|
402 | (1) |
|
The importance of the sampling strategy |
|
|
402 | (2) |
|
Implementing text generation with Keras |
|
|
404 | (4) |
|
A text-generation callback with variable-temperature sampling |
|
|
408 | (5) |
|
|
413 | (1) |
|
|
414 | (8) |
|
Implementing DeepDream in Keras |
|
|
415 | (6) |
|
|
421 | (1) |
|
12.3 Neural style transfer |
|
|
422 | (10) |
|
|
423 | (1) |
|
|
424 | (1) |
|
Neural style transfer in Keras |
|
|
424 | (7) |
|
|
431 | (1) |
|
12.4 Generating images with variational autoencoders |
|
|
432 | (10) |
|
Sampling from latent spaces of images |
|
|
432 | (1) |
|
Concept vectors for image editing |
|
|
433 | (1) |
|
|
434 | (2) |
|
Implementing a VAE with Keras |
|
|
436 | (6) |
|
|
442 | (1) |
|
12.5 Introduction to generative adversarial networks |
|
|
442 | (12) |
|
A schematic GAN implementation |
|
|
443 | (1) |
|
|
444 | (1) |
|
Getting our hands on the CelebA dataset |
|
|
445 | (2) |
|
|
447 | (1) |
|
|
447 | (1) |
|
|
448 | (4) |
|
|
452 | (2) |
|
13 Best practices for the real world |
|
|
454 | (19) |
|
13.1 Getting the most out of your models |
|
|
455 | (9) |
|
Hyperparameter optimization |
|
|
455 | (7) |
|
|
462 | (2) |
|
13.2 Scaling-up model training |
|
|
464 | (9) |
|
Speeding up training on GPU with mixed precision |
|
|
465 | (2) |
|
|
467 | (4) |
|
|
471 | (2) |
|
|
473 | (38) |
|
14.1 Key concepts in review |
|
|
474 | (10) |
|
|
474 | (1) |
|
What makes deep learning special within the field of machine learning |
|
|
474 | (1) |
|
How to think about deep learning |
|
|
475 | (1) |
|
Key enabling technologies |
|
|
476 | (1) |
|
The universal machine learning workflow |
|
|
477 | (1) |
|
Key network architectures |
|
|
478 | (4) |
|
The space of possibilities |
|
|
482 | (2) |
|
14.2 The limitations of deep learning |
|
|
484 | (8) |
|
The risk of anthropomorphizing machine learning models |
|
|
485 | (2) |
|
Automatons vs. intelligent agents |
|
|
487 | (1) |
|
Local generalization vs. extreme generalization |
|
|
488 | (2) |
|
The purpose of intelligence |
|
|
490 | (1) |
|
Climbing the. spectrum of generalization |
|
|
491 | (1) |
|
14.3 Setting the course toward greater generality in AI |
|
|
492 | (3) |
|
On the importance of setting the right objective: The shortcut rule |
|
|
492 | (2) |
|
|
494 | (1) |
|
14.4 Implementing intelligence: The missing ingredients |
|
|
495 | (6) |
|
Intelligence as sensitivity to abstract analogies |
|
|
496 | (1) |
|
The two poles of abstraction |
|
|
497 | (3) |
|
The two poles of abstraction |
|
|
500 | (1) |
|
The missing half of the picture |
|
|
500 | (1) |
|
14.5 The future of deep learning |
|
|
501 | (6) |
|
|
502 | (1) |
|
Machine learning vs. program synthesis |
|
|
503 | (1) |
|
Blending together deep learning and program synthesis |
|
|
503 | (2) |
|
Lifelong learning and modular subroutine reuse |
|
|
505 | (1) |
|
|
506 | (1) |
|
14.6 Staying up-to-date in a fast-moving field |
|
|
507 | (2) |
|
Practice on real-world problems using Kaggle |
|
|
508 | (1) |
|
Read about the latest developments on arXiv |
|
|
508 | (1) |
|
Explore the Keras ecosystem |
|
|
508 | (1) |
|
|
509 | (2) |
Appendix Python primer for R users |
|
511 | (24) |
Index |
|
535 | |