|
Part I Describing Datasets |
|
|
|
1 First Tools for Looking at Data |
|
|
3 | (26) |
|
|
3 | (1) |
|
1.2 What's Happening? Plotting Data |
|
|
4 | (3) |
|
|
5 | (1) |
|
|
6 | (1) |
|
1.2.3 How to Make Histograms |
|
|
6 | (1) |
|
1.2.4 Conditional Histograms |
|
|
7 | (1) |
|
|
7 | (9) |
|
|
7 | (2) |
|
|
9 | (3) |
|
1.3.3 Computing Mean and Standard Deviation Online |
|
|
12 | (1) |
|
|
12 | (1) |
|
|
13 | (1) |
|
1.3.6 Interquartile Range |
|
|
14 | (1) |
|
1.3.7 Using Summaries Sensibly |
|
|
15 | (1) |
|
|
16 | (4) |
|
1.4.1 Some Properties of Histograms |
|
|
16 | (2) |
|
1.4.2 Standard Coordinates and Normal Data |
|
|
18 | (2) |
|
|
20 | (1) |
|
1.5 Whose is Bigger? Investigating Australian Pizzas |
|
|
20 | (4) |
|
|
24 | (5) |
|
1.6.1 Remember These Definitions |
|
|
24 | (1) |
|
1.6.2 Remember These Terms |
|
|
25 | (1) |
|
1.6.3 Remember These Facts |
|
|
25 | (1) |
|
|
25 | (4) |
|
2 Looking at Relationships |
|
|
29 | (24) |
|
|
29 | (7) |
|
2.1.1 Categorical Data, Counts, and Charts |
|
|
29 | (2) |
|
|
31 | (2) |
|
2.1.3 Scatter Plots for Spatial Data |
|
|
33 | (1) |
|
2.1.4 Exposing Relationships with Scatter Plots |
|
|
34 | (2) |
|
|
36 | (9) |
|
2.2.1 The Correlation Coefficient |
|
|
39 | (3) |
|
2.2.2 Using Correlation to Predict |
|
|
42 | (2) |
|
2.2.3 Confusion Caused by Correlation |
|
|
44 | (1) |
|
2.3 Sterile Males in Wild Horse Herds |
|
|
45 | (2) |
|
|
47 | (6) |
|
2.4.1 Remember These Definitions |
|
|
47 | (1) |
|
2.4.2 Remember These Terms |
|
|
47 | (1) |
|
2.4.3 Remember These Facts |
|
|
47 | (1) |
|
2.4.4 Use These Procedures |
|
|
47 | (1) |
|
|
47 | (6) |
|
|
|
3 Basic Ideas in Probability |
|
|
53 | (34) |
|
3.1 Experiments, Outcomes and Probability |
|
|
53 | (2) |
|
3.1.1 Outcomes and Probability |
|
|
53 | (2) |
|
|
55 | (6) |
|
3.2.1 Computing Event Probabilities by Counting Outcomes |
|
|
56 | (2) |
|
3.2.2 The Probability of Events |
|
|
58 | (2) |
|
3.2.3 Computing Probabilities by Reasoning About Sets |
|
|
60 | (1) |
|
|
61 | (5) |
|
3.3.1 Example: Airline Overbooking |
|
|
64 | (2) |
|
3.4 Conditional Probability |
|
|
66 | (9) |
|
3.4.1 Evaluating Conditional Probabilities |
|
|
67 | (3) |
|
3.4.2 Detecting Rare Events Is Hard |
|
|
70 | (1) |
|
3.4.3 Conditional Probability and Various Forms of Independence |
|
|
71 | (1) |
|
3.4.4 Warning Example: The Prosecutor's Fallacy |
|
|
72 | (1) |
|
3.4.5 Warning Example: The Monty Hall Problem |
|
|
73 | (2) |
|
3.5 Extra Worked Examples |
|
|
75 | (5) |
|
3.5.1 Outcomes and Probability |
|
|
75 | (1) |
|
|
76 | (1) |
|
|
77 | (1) |
|
3.5.4 Conditional Probability |
|
|
78 | (2) |
|
|
80 | (7) |
|
3.6.1 Remember These Definitions |
|
|
80 | (1) |
|
3.6.2 Remember These Terms |
|
|
80 | (1) |
|
3.6.3 Remember and Use These Facts |
|
|
80 | (1) |
|
3.6.4 Remember These Points |
|
|
80 | (1) |
|
|
81 | (6) |
|
4 Random Variables and Expectations |
|
|
87 | (28) |
|
|
87 | (6) |
|
4.1.1 Joint and Conditional Probability for Random Variables |
|
|
89 | (2) |
|
4.1.2 Just a Little Continuous Probability |
|
|
91 | (2) |
|
4.2 Expectations and Expected Values |
|
|
93 | (6) |
|
|
93 | (2) |
|
4.2.2 Mean, Variance and Covariance |
|
|
95 | (3) |
|
4.2.3 Expectations and Statistics |
|
|
98 | (1) |
|
4.3 The Weak Law of Large Numbers |
|
|
99 | (4) |
|
|
99 | (1) |
|
|
100 | (1) |
|
4.3.3 Proving the Inequalities |
|
|
100 | (2) |
|
4.3.4 The Weak Law of Large Numbers |
|
|
102 | (1) |
|
4.4 Using the Weak Law of Large Numbers |
|
|
103 | (5) |
|
4.4.1 Should You Accept a Bet? |
|
|
103 | (1) |
|
4.4.2 Odds, Expectations and Bookmaking: A Cultural Diversion |
|
|
104 | (1) |
|
4.4.3 Ending a Game Early |
|
|
105 | (1) |
|
4.4.4 Making a Decision with Decision Trees and Expectations |
|
|
105 | (1) |
|
|
106 | (2) |
|
|
108 | (7) |
|
4.5.1 Remember These Definitions |
|
|
108 | (1) |
|
4.5.2 Remember These Terms |
|
|
108 | (1) |
|
4.5.3 Use and Remember These Facts |
|
|
109 | (1) |
|
4.5.4 Remember These Points |
|
|
109 | (1) |
|
|
109 | (6) |
|
5 Useful Probability Distributions |
|
|
115 | (26) |
|
5.1 Discrete Distributions |
|
|
115 | (5) |
|
5.1.1 The Discrete Uniform Distribution |
|
|
115 | (1) |
|
5.1.2 Bernoulli Random Variables |
|
|
116 | (1) |
|
5.1.3 The Geometric Distribution |
|
|
116 | (1) |
|
5.1.4 The Binomial Probability Distribution |
|
|
116 | (2) |
|
5.1.5 Multinomial Probabilities |
|
|
118 | (1) |
|
5.1.6 The Poisson Distribution |
|
|
118 | (2) |
|
5.2 Continuous Distributions |
|
|
120 | (3) |
|
5.2.1 The Continuous Uniform Distribution |
|
|
120 | (1) |
|
5.2.2 The Beta Distribution |
|
|
120 | (1) |
|
5.2.3 The Gamma Distribution |
|
|
121 | (1) |
|
5.2.4 The Exponential Distribution |
|
|
122 | (1) |
|
5.3 The Normal Distribution |
|
|
123 | (3) |
|
5.3.1 The Standard Normal Distribution |
|
|
123 | (1) |
|
5.3.2 The Normal Distribution |
|
|
124 | (1) |
|
5.3.3 Properties of the Normal Distribution |
|
|
124 | (2) |
|
5.4 Approximating Binomials with Large TV |
|
|
126 | (4) |
|
|
127 | (1) |
|
|
128 | (1) |
|
5.4.3 Using a Normal Approximation to the Binomial Distribution |
|
|
129 | (1) |
|
|
130 | (11) |
|
5.5.1 Remember These Definitions |
|
|
130 | (1) |
|
5.5.2 Remember These Terms |
|
|
130 | (1) |
|
5.5.3 Remember These Facts |
|
|
131 | (1) |
|
5.5.4 Remember These Points |
|
|
131 | (10) |
|
|
|
6 Samples and Populations |
|
|
141 | (18) |
|
|
141 | (5) |
|
6.1.1 The Sample Mean Is an Estimate of the Population Mean |
|
|
141 | (1) |
|
6.1.2 The Variance of the Sample Mean |
|
|
142 | (2) |
|
6.1.3 When The Urn Model Works |
|
|
144 | (1) |
|
6.1.4 Distributions Are Like Populations |
|
|
145 | (1) |
|
|
146 | (8) |
|
6.2.1 Constructing Confidence Intervals |
|
|
146 | (1) |
|
6.2.2 Estimating the Variance of the Sample Mean |
|
|
146 | (2) |
|
6.2.3 The Probability Distribution of the Sample Mean |
|
|
148 | (1) |
|
6.2.4 Confidence Intervals for Population Means |
|
|
149 | (3) |
|
6.2.5 Standard Error Estimates from Simulation |
|
|
152 | (2) |
|
|
154 | (5) |
|
6.3.1 Remember These Definitions |
|
|
154 | (1) |
|
6.3.2 Remember These Terms |
|
|
154 | (1) |
|
6.3.3 Remember These Facts |
|
|
154 | (1) |
|
6.3.4 Use These Procedures |
|
|
154 | (1) |
|
|
154 | (5) |
|
7 The Significance of Evidence |
|
|
159 | (20) |
|
|
160 | (5) |
|
7.1.1 Evaluating Significance |
|
|
160 | (1) |
|
|
161 | (4) |
|
7.2 Comparing the Mean of Two Populations |
|
|
165 | (4) |
|
7.2.1 Assuming Known Population Standard Deviations |
|
|
165 | (2) |
|
7.2.2 Assuming Same, Unknown Population Standard Deviation |
|
|
167 | (1) |
|
7.2.3 Assuming Different, Unknown Population Standard Deviation |
|
|
168 | (1) |
|
7.3 Other Useful Tests of Significance |
|
|
169 | (5) |
|
7.3.1 F-Tests and Standard Deviations |
|
|
169 | (2) |
|
7.3.2 Χ2 Tests of Model Fit |
|
|
171 | (3) |
|
7.4 P-Value Hacking and Other Dangerous Behavior |
|
|
174 | (1) |
|
|
174 | (5) |
|
7.5.1 Remember These Definitions |
|
|
174 | (1) |
|
7.5.2 Remember These Terms |
|
|
175 | (1) |
|
7.5.3 Remember These Facts |
|
|
175 | (1) |
|
7.5.4 Use These Procedures |
|
|
175 | (1) |
|
|
175 | (4) |
|
|
179 | (18) |
|
8.1 A Simple Experiment: The Effect of a Treatment |
|
|
179 | (7) |
|
8.1.1 Randomized Balanced Experiments |
|
|
180 | (1) |
|
8.1.2 Decomposing Error in Predictions |
|
|
180 | (1) |
|
8.1.3 Estimating the Noise Variance |
|
|
181 | (1) |
|
|
182 | (1) |
|
8.1.5 Unbalanced Experiments |
|
|
183 | (2) |
|
8.1.6 Significant Differences |
|
|
185 | (1) |
|
8.2 Two Factor Experiments |
|
|
186 | (8) |
|
8.2.1 Decomposing the Error |
|
|
188 | (1) |
|
8.2.2 Interaction Between Effects |
|
|
189 | (1) |
|
8.2.3 The Effects of a Treatment |
|
|
190 | (1) |
|
8.2.4 Setting Up An ANOVA Table |
|
|
191 | (3) |
|
|
194 | (3) |
|
8.3.1 Remember These Definitions |
|
|
194 | (1) |
|
8.3.2 Remember These Terms |
|
|
194 | (1) |
|
8.3.3 Remember These Facts |
|
|
194 | (1) |
|
8.3.4 Use These Procedures |
|
|
194 | (1) |
|
|
194 | (3) |
|
9 Inferring Probability Models from Data |
|
|
197 | (28) |
|
9.1 Estimating Model Parameters with Maximum Likelihood |
|
|
197 | (9) |
|
9.1.1 The Maximum Likelihood Principle |
|
|
198 | (1) |
|
9.1.2 Binomial, Geometric and Multinomial Distributions |
|
|
199 | (2) |
|
9.1.3 Poisson and Normal Distributions |
|
|
201 | (3) |
|
9.1.4 Confidence Intervals for Model Parameters |
|
|
204 | (2) |
|
9.1.5 Cautions About Maximum Likelihood |
|
|
206 | (1) |
|
9.2 Incorporating Priors with Bayesian Inference |
|
|
206 | (5) |
|
|
209 | (1) |
|
|
210 | (1) |
|
9.2.3 Cautions About Bayesian Inference |
|
|
211 | (1) |
|
9.3 Bayesian Inference for Normal Distributions |
|
|
211 | (4) |
|
9.3.1 Example: Measuring Depth of a Borehole |
|
|
212 | (1) |
|
9.3.2 Normal Prior and Normal Likelihood Yield Normal Posterior |
|
|
212 | (2) |
|
|
214 | (1) |
|
|
215 | (10) |
|
9.4.1 Remember These Definitions |
|
|
215 | (1) |
|
9.4.2 Remember These Terms |
|
|
216 | (1) |
|
9.4.3 Remember These Facts |
|
|
216 | (1) |
|
9.4.4 Use These Procedures |
|
|
217 | (1) |
|
|
217 | (8) |
|
|
|
10 Extracting Important Relationships in High Dimensions |
|
|
225 | (28) |
|
10.1 Summaries and Simple Plots |
|
|
225 | (6) |
|
|
226 | (1) |
|
10.1.2 Stem Plots and Scatterplot Matrices |
|
|
226 | (1) |
|
|
227 | (1) |
|
10.1.4 The Covariance Matrix |
|
|
228 | (3) |
|
10.2 Using Mean and Covariance to Understand High Dimensional Data |
|
|
231 | (5) |
|
10.2.1 Mean and Covariance Under Affine Transformations |
|
|
231 | (1) |
|
10.2.2 Eigenvectors and Diagonalization |
|
|
232 | (1) |
|
10.2.3 Diagonalizing Covariance by Rotating Blobs |
|
|
233 | (2) |
|
10.2.4 Approximating Blobs |
|
|
235 | (1) |
|
10.2.5 Example: Transforming the Height-Weight Blob |
|
|
235 | (1) |
|
10.3 Principal Components Analysis |
|
|
236 | (6) |
|
10.3.1 The Low Dimensional Representation |
|
|
236 | (2) |
|
10.3.2 The Error Caused by Reducing Dimension |
|
|
238 | (3) |
|
10.3.3 Example: Representing Colors with Principal Components |
|
|
241 | (1) |
|
10.3.4 Example: Representing Faces with Principal Components |
|
|
242 | (1) |
|
10.4 Multi-Dimensional Scaling |
|
|
242 | (5) |
|
10.4.1 Choosing Low D Points Using High D Distances |
|
|
243 | (2) |
|
10.4.2 Factoring a Dot-Product Matrix |
|
|
245 | (1) |
|
10.4.3 Example: Mapping with Multidimensional Scaling |
|
|
246 | (1) |
|
10.5 Example: Understanding Height and Weight |
|
|
247 | (3) |
|
|
250 | (3) |
|
10.6.1 Remember These Definitions |
|
|
250 | (1) |
|
10.6.2 Remember These Terms |
|
|
250 | (1) |
|
10.6.3 Remember These Facts |
|
|
250 | (1) |
|
10.6.4 Use These Procedures |
|
|
250 | (1) |
|
|
250 | (3) |
|
|
253 | (28) |
|
11.1 Classification: The Big Ideas |
|
|
253 | (3) |
|
11.1.1 The Error Rate, and Other Summaries of Performance |
|
|
254 | (1) |
|
11.1.2 More Detailed Evaluation |
|
|
254 | (1) |
|
11.1.3 Overfilling and Cross-Validation |
|
|
255 | (1) |
|
11.2 Classifying with Nearest Neighbors |
|
|
256 | (1) |
|
11.2.1 Practical Considerations for Nearest Neighbors |
|
|
256 | (1) |
|
11.3 Classifying with Naive Bayes |
|
|
257 | (3) |
|
11.3.1 Cross-Validation to Choose a Model |
|
|
259 | (1) |
|
11.4 The Support Vector Machine |
|
|
260 | (8) |
|
|
261 | (1) |
|
|
262 | (1) |
|
11.4.3 Finding a Classifier with Stochastic Gradient Descent |
|
|
262 | (2) |
|
|
264 | (2) |
|
11.4.5 Example: Training an SVM with Stochastic Gradient Descent |
|
|
266 | (2) |
|
11.4.6 Multi-Class Classification with SVMs |
|
|
268 | (1) |
|
11.5 Classifying with Random Forests |
|
|
268 | (6) |
|
11.5.1 Building a Decision Tree: General Algorithm |
|
|
270 | (1) |
|
11.5.2 Building a Decision Tree: Choosing a Split |
|
|
270 | (2) |
|
|
272 | (2) |
|
|
274 | (7) |
|
11.6.1 Remember These Definitions |
|
|
274 | (1) |
|
11.6.2 Remember These Terms |
|
|
274 | (1) |
|
11.6.3 Remember These Facts |
|
|
275 | (1) |
|
11.6.4 Use These Procedures |
|
|
275 | (1) |
|
|
276 | (5) |
|
12 Clustering: Models of High Dimensional Data |
|
|
281 | (24) |
|
12.1 The Curse of Dimension |
|
|
281 | (2) |
|
12.1.1 Minor Banes of Dimension |
|
|
281 | (1) |
|
12.1.2 The Curse: Data Isn't Where You Think It Is |
|
|
282 | (1) |
|
|
283 | (4) |
|
12.2.1 Agglomerative and Divisive Clustering |
|
|
283 | (2) |
|
12.2.2 Clustering and Distance |
|
|
285 | (2) |
|
12.3 The K-Means Algorithm and Variants |
|
|
287 | (7) |
|
|
288 | (2) |
|
|
290 | (1) |
|
12.3.3 Efficient Clustering and Hierarchical K Means |
|
|
291 | (1) |
|
|
292 | (1) |
|
12.3.5 Example: Groceries in Portugal |
|
|
292 | (1) |
|
12.3.6 General Comments on K-Means |
|
|
293 | (1) |
|
12.4 Describing Repetition with Vector Quantization |
|
|
294 | (6) |
|
12.4.1 Vector Quantization |
|
|
296 | (2) |
|
12.4.2 Example: Activity from Accelerometer Data |
|
|
298 | (2) |
|
12.5 The Multivariate Normal Distribution |
|
|
300 | (2) |
|
12.5.1 Affine Transformations and Gaussians |
|
|
301 | (1) |
|
12.5.2 Plotting a 2D Gaussian: Covariance Ellipses |
|
|
301 | (1) |
|
|
302 | (3) |
|
12.6.1 Remember These Definitions |
|
|
302 | (1) |
|
12.6.2 Remember These Terms |
|
|
302 | (1) |
|
12.6.3 Remember These Facts |
|
|
303 | (1) |
|
12.6.4 Use These Procedures |
|
|
303 | (2) |
|
|
305 | (26) |
|
13.1 Regression to Make Predictions |
|
|
305 | (1) |
|
13.2 Regression to Spot Trends |
|
|
306 | (2) |
|
13.3 Linear Regression and Least Squares |
|
|
308 | (5) |
|
|
308 | (1) |
|
|
309 | (1) |
|
13.3.3 Solving the Least Squares Problem |
|
|
309 | (1) |
|
|
310 | (1) |
|
|
310 | (3) |
|
13.4 Producing Good Linear Regressions |
|
|
313 | (8) |
|
13.4.1 Transforming Variables |
|
|
313 | (1) |
|
13.4.2 Problem Data Points Have Significant Impact |
|
|
314 | (3) |
|
13.4.3 Functions of One Explanatory Variable |
|
|
317 | (1) |
|
13.4.4 Regularizing Linear Regressions |
|
|
318 | (3) |
|
13.5 Exploiting Your Neighbors for Regression |
|
|
321 | (2) |
|
13.5.1 Using Your Neighbors to Predict More than a Number |
|
|
323 | (1) |
|
|
323 | (8) |
|
13.6.1 Remember These Definitions |
|
|
323 | (1) |
|
13.6.2 Remember These Terms |
|
|
324 | (1) |
|
13.6.3 Remember These Facts |
|
|
324 | (1) |
|
13.6.4 Remember These Procedures |
|
|
324 | (7) |
|
14 Markov Chains and Hidden Markov Models |
|
|
331 | (24) |
|
|
331 | (7) |
|
14.1.1 Transition Probability Matrices |
|
|
333 | (2) |
|
14.1.2 Stationary Distributions |
|
|
335 | (1) |
|
14.1.3 Example: Markov Chain Models of Text |
|
|
336 | (2) |
|
14.2 Estimating Properties of Markov Chains |
|
|
338 | (4) |
|
|
338 | (1) |
|
14.2.2 Simulation Results as Random Variables |
|
|
339 | (2) |
|
14.2.3 Simulating Markov Chains |
|
|
341 | (1) |
|
14.3 Example: Ranking the Web by Simulating a Markov Chain |
|
|
342 | (2) |
|
14.4 Hidden Markov Models and Dynamic Programming |
|
|
344 | (5) |
|
14.4.1 Hidden Markov Models |
|
|
344 | (1) |
|
14.4.2 Picturing Inference with a Trellis |
|
|
344 | (2) |
|
14.4.3 Dynamic Programming for HMM's: Formalities |
|
|
346 | (2) |
|
14.4.4 Example: Simple Communication Errors |
|
|
348 | (1) |
|
|
349 | (6) |
|
14.5.1 Remember These Definitions |
|
|
349 | (1) |
|
14.5.2 Remember These Terms |
|
|
349 | (1) |
|
14.5.3 Remember These Facts |
|
|
350 | (1) |
|
|
350 | (5) |
|
Part V Mathematical Bits and Pieces |
|
|
|
|
355 | (8) |
|
15.1 Useful Material About Matrices |
|
|
355 | (3) |
|
15.1.1 The Singular Value Decomposition |
|
|
356 | (1) |
|
15.1.2 Approximating A Symmetric Matrix |
|
|
356 | (2) |
|
15.2 Some Special Functions |
|
|
358 | (1) |
|
15.3 Splitting a Node in a Decision Tree |
|
|
359 | (4) |
|
15.3.1 Accounting for Information with Entropy |
|
|
359 | (1) |
|
15.3.2 Choosing a Split with Information Gain |
|
|
360 | (3) |
Index |
|
363 | |