Preface |
|
xiii | |
Notation |
|
xvii | |
|
1 Importing, Summarizing, and Visualizing Data |
|
|
1 | (18) |
|
|
1 | (2) |
|
1.2 Structuring Features According to Type |
|
|
3 | (3) |
|
|
6 | (1) |
|
|
7 | (1) |
|
|
8 | (7) |
|
1.5.1 Plotting Qualitative Variables |
|
|
9 | (1) |
|
1.5.2 Plotting Quantitative Variables |
|
|
9 | (3) |
|
1.5.3 Data Visualization in a Bivariate Setting |
|
|
12 | (3) |
|
|
15 | (4) |
|
|
19 | (48) |
|
|
19 | (1) |
|
2.2 Supervised and Unsupervised Learning |
|
|
20 | (3) |
|
2.3 Training and Test Loss |
|
|
23 | (8) |
|
2.4 Tradeoffs in Statistical Learning |
|
|
31 | (4) |
|
|
35 | (5) |
|
|
35 | (2) |
|
|
37 | (3) |
|
|
40 | (4) |
|
2.7 Multivariate Normal Models |
|
|
44 | (2) |
|
|
46 | (1) |
|
|
47 | (11) |
|
|
58 | (9) |
|
|
67 | (54) |
|
|
67 | (1) |
|
|
68 | (17) |
|
3.2.1 Generating Random Numbers |
|
|
68 | (1) |
|
3.2.2 Simulating Random Variables |
|
|
69 | (5) |
|
3.2.3 Simulating Random Vectors and Processes |
|
|
74 | (2) |
|
|
76 | (2) |
|
3.2.5 Markov Chain Monte Carlo |
|
|
78 | (7) |
|
3.3 Monte Carlo Estimation |
|
|
85 | (11) |
|
|
85 | (3) |
|
|
88 | (4) |
|
|
92 | (4) |
|
3.4 Monte Carlo for Optimization |
|
|
96 | (17) |
|
3.4.1 Simulated Annealing |
|
|
96 | (4) |
|
3.4.2 Cross-Entropy Method |
|
|
100 | (3) |
|
3.4.3 Splitting for Optimization |
|
|
103 | (2) |
|
|
105 | (8) |
|
|
113 | (8) |
|
|
121 | (46) |
|
|
121 | (1) |
|
4.2 Risk and Loss in Unsupervised Learning |
|
|
122 | (6) |
|
4.3 Expectation-Maximization (EM) Algorithm |
|
|
128 | (3) |
|
4.4 Empirical Distribution and Density Estimation |
|
|
131 | (4) |
|
4.5 Clustering via Mixture Models |
|
|
135 | (7) |
|
|
135 | (2) |
|
4.5.2 EM Algorithm for Mixture Models |
|
|
137 | (5) |
|
4.6 Clustering via Vector Quantization |
|
|
142 | (5) |
|
|
144 | (2) |
|
4.6.2 Clustering via Continuous Multiextremal Optimization |
|
|
146 | (1) |
|
4.7 Hierarchical Clustering |
|
|
147 | (6) |
|
4.8 Principal Component Analysis (PCA) |
|
|
153 | (7) |
|
4.8.1 Motivation: Principal Axes of an Ellipsoid |
|
|
153 | (2) |
|
4.8.2 PCA and Singular Value Decomposition (SVD) |
|
|
155 | (5) |
|
|
160 | (7) |
|
|
167 | (48) |
|
|
167 | (2) |
|
|
169 | (2) |
|
5.3 Analysis via Linear Models |
|
|
171 | (11) |
|
5.3.1 Parameter Estimation |
|
|
171 | (1) |
|
5.3.2 Model Selection and Prediction |
|
|
172 | (1) |
|
5.3.3 Cross-Validation and Predictive Residual Sum of Squares |
|
|
173 | (2) |
|
5.3.4 In-Sample Risk and Akaike Information Criterion |
|
|
175 | (2) |
|
5.3.5 Categorical Features |
|
|
177 | (3) |
|
|
180 | (1) |
|
5.3.7 Coefficient of Determination |
|
|
181 | (1) |
|
5.4 Inference for Normal Linear Models |
|
|
182 | (6) |
|
5.4.1 Comparing Two Normal Linear Models |
|
|
183 | (3) |
|
5.4.2 Confidence and Prediction Intervals |
|
|
186 | (2) |
|
5.5 Nonlinear Regression Models |
|
|
188 | (3) |
|
5.6 Linear Models in Python |
|
|
191 | (13) |
|
|
191 | (2) |
|
|
193 | (2) |
|
5.6.3 Analysis of Variance (ANOVA) |
|
|
195 | (3) |
|
5.6.4 Confidence and Prediction Intervals |
|
|
198 | (1) |
|
|
198 | (1) |
|
|
199 | (5) |
|
5.7 Generalized Linear Models |
|
|
204 | (3) |
|
|
207 | (8) |
|
6 Regularization and Kernel Methods |
|
|
215 | (36) |
|
|
215 | (1) |
|
|
216 | (6) |
|
6.3 Reproducing Kernel Hilbert Spaces |
|
|
222 | (2) |
|
6.4 Construction of Reproducing Kernels |
|
|
224 | (6) |
|
6.4.1 Reproducing Kernels via Feature Mapping |
|
|
224 | (1) |
|
6.4.2 Kernels from Characteristic Functions |
|
|
225 | (2) |
|
6.4.3 Reproducing Kernels Using Orthonormal Features |
|
|
227 | (2) |
|
6.4.4 Kernels from Kernels |
|
|
229 | (1) |
|
|
230 | (5) |
|
6.6 Smoothing Cubic Splines |
|
|
235 | (3) |
|
6.7 Gaussian Process Regression |
|
|
238 | (4) |
|
|
242 | (3) |
|
|
245 | (6) |
|
|
251 | (36) |
|
|
251 | (2) |
|
7.2 Classification Metrics |
|
|
253 | (4) |
|
7.3 Classification via Bayes' Rule |
|
|
257 | (2) |
|
7.4 Linear and Quadratic Discriminant Analysis |
|
|
259 | (7) |
|
7.5 Logistic Regression and Softmax Classification |
|
|
266 | (2) |
|
7.6 k-Nearest Neighbors Classification |
|
|
268 | (1) |
|
7.7 Support Vector Machine |
|
|
269 | (8) |
|
7.8 Classification with Scikit-Learn |
|
|
277 | (2) |
|
|
279 | (8) |
|
8 Decision Trees and Ensemble Methods |
|
|
287 | (36) |
|
|
287 | (2) |
|
8.2 Top-Down Construction of Decision Trees |
|
|
289 | (9) |
|
8.2.1 Regional Prediction Functions |
|
|
290 | (1) |
|
|
291 | (1) |
|
8.2.3 Termination Criterion |
|
|
292 | (2) |
|
8.2.4 Basic Implementation |
|
|
294 | (4) |
|
8.3 Additional Considerations |
|
|
298 | (2) |
|
8.3.1 Binary Versus Non-Binary Trees |
|
|
298 | (1) |
|
|
298 | (1) |
|
8.3.3 Alternative Splitting Rules |
|
|
298 | (1) |
|
8.3.4 Categorical Variables |
|
|
299 | (1) |
|
|
299 | (1) |
|
8.4 Controlling the Tree Shape |
|
|
300 | (5) |
|
8.4.1 Cost-Complexity Pruning |
|
|
303 | (1) |
|
8.4.2 Advantages and Limitations of Decision Trees |
|
|
304 | (1) |
|
8.5 Bootstrap Aggregation |
|
|
305 | (4) |
|
|
309 | (4) |
|
|
313 | (8) |
|
|
321 | (2) |
|
|
323 | (32) |
|
|
323 | (3) |
|
9.2 Feed-Forward Neural Networks |
|
|
326 | (4) |
|
|
330 | (4) |
|
|
334 | (6) |
|
|
334 | (1) |
|
9.4.2 Levenberg-Marquardt Method |
|
|
335 | (1) |
|
9.4.3 Limited-Memory BFGS Method |
|
|
336 | (2) |
|
9.4.4 Adaptive Gradient Methods |
|
|
338 | (2) |
|
|
340 | (9) |
|
9.5.1 Simple Polynomial Regression |
|
|
340 | (4) |
|
9.5.2 Image Classification |
|
|
344 | (5) |
|
|
349 | (6) |
|
A Linear Algebra and Functional Analysis |
|
|
355 | (42) |
|
A.1 Vector Spaces, Bases, and Matrices |
|
|
355 | (5) |
|
|
360 | (1) |
|
A.3 Complex Vectors and Matrices |
|
|
361 | (1) |
|
A.4 Orthogonal Projections |
|
|
362 | (1) |
|
A.5 Eigenvalues and Eigenvectors |
|
|
363 | (5) |
|
A.5.2 Left-and Right-Eigenvectors |
|
|
364 | (4) |
|
A.6 Matrix Decompositions |
|
|
368 | (16) |
|
A.6.2 (P)LU Decomposition |
|
|
368 | (2) |
|
|
370 | (3) |
|
A.6.3 Cholesky Decomposition |
|
|
373 | (2) |
|
A.6.4 QR Decomposition and the Gram-Schmidt Procedure |
|
|
375 | (1) |
|
A.6.5 Singular Value Decomposition |
|
|
376 | (3) |
|
A.6.6 Solving Structured Matrix Equations |
|
|
379 | (5) |
|
|
384 | (6) |
|
|
390 | (7) |
|
A.8.2 Discrete Fourier Transform |
|
|
392 | (2) |
|
A.8.2 Fast Fourier Transform |
|
|
394 | (3) |
|
B Multivariate Differentiation and Optimization |
|
|
397 | (24) |
|
B.1 Multivariate Differentiation |
|
|
397 | (5) |
|
|
400 | (1) |
|
|
400 | (2) |
|
|
402 | (6) |
|
B.2.2 Convexity and Optimization |
|
|
403 | (3) |
|
|
406 | (1) |
|
|
407 | (1) |
|
B.3 Numerical Root-Finding and Minimization |
|
|
408 | (7) |
|
B.3.2 Newton-Like Methods |
|
|
409 | (2) |
|
B.3.2 Quasi-Newton Methods |
|
|
411 | (2) |
|
B.3.3 Normal Approximation Method |
|
|
413 | (1) |
|
B.3.4 Nonlinear Least Squares |
|
|
414 | (1) |
|
B.4 Constrained Minimization via Penalty Functions |
|
|
415 | (6) |
|
C Probability and Statistics |
|
|
421 | (42) |
|
C.1 Random Experiments and Probability Spaces |
|
|
421 | (1) |
|
C.2 Random Variables and Probability Distributions |
|
|
422 | (4) |
|
|
426 | (1) |
|
|
427 | (1) |
|
C.5 Conditioning and Independence |
|
|
428 | (3) |
|
C.5.1 Conditional Probability |
|
|
428 | (1) |
|
|
428 | (1) |
|
C.5.3 Expectation and Covariance |
|
|
429 | (1) |
|
C.5.4 Conditional Density and Conditional Expectation |
|
|
430 | (1) |
|
C.6 Functions of Random Variables |
|
|
431 | (3) |
|
C.7 Multivariate Normal Distribution |
|
|
434 | (5) |
|
C.8 Convergence of Random Variables |
|
|
439 | (6) |
|
C.9 Law of Large Numbers and Central Limit Theorem |
|
|
445 | (6) |
|
|
451 | (2) |
|
|
453 | (1) |
|
|
454 | (3) |
|
|
455 | (1) |
|
C.12.2 Maximum Likelihood Method |
|
|
456 | (1) |
|
C.13 Confidence Intervals |
|
|
457 | (1) |
|
|
458 | (5) |
|
|
463 | (32) |
|
|
463 | (2) |
|
|
465 | (1) |
|
|
466 | (2) |
|
D.4 Functions and Methods |
|
|
468 | (1) |
|
|
469 | (2) |
|
|
471 | (1) |
|
|
472 | (1) |
|
|
473 | (2) |
|
|
475 | (3) |
|
|
478 | (5) |
|
D.10.1 Creating and Shaping Arrays |
|
|
478 | (2) |
|
|
480 | (1) |
|
|
480 | (2) |
|
|
482 | (1) |
|
|
483 | (2) |
|
D.11.1 Creating a Basic Plot |
|
|
483 | (2) |
|
|
485 | (5) |
|
D.12.1 Series and DataFrame |
|
|
485 | (2) |
|
D.12.2 Manipulating Data Frames |
|
|
487 | (1) |
|
D.12.3 Extracting Information |
|
|
488 | (2) |
|
|
490 | (1) |
|
|
490 | (3) |
|
D.13.1 Partitioning the Data |
|
|
491 | (1) |
|
|
491 | (1) |
|
D.13.3 Fitting and Prediction |
|
|
492 | (1) |
|
|
492 | (1) |
|
D.14 System Calls, URL Access, and Speed-Up |
|
|
493 | (2) |
Bibliography |
|
495 | (8) |
Index |
|
503 | |