Acknowledgments |
|
xv | |
|
|
|
Chapter 1 Across Statistical Modeling and Machine Learning on a Shoestring |
|
|
3 | (12) |
|
|
3 | (1) |
|
1.2 What Will This Book Cover? |
|
|
4 | (2) |
|
|
4 | (1) |
|
|
5 | (1) |
|
|
6 | (1) |
|
1.3 Organization Of This Book |
|
|
6 | (2) |
|
1.4 Why Are There Mathematical Calculations In The Book? |
|
|
8 | (3) |
|
1.5 What Wont This Book Cover? |
|
|
11 | (1) |
|
|
12 | (3) |
|
References And Further Reading |
|
|
14 | (1) |
|
Chapter 2 Statistical Modeling |
|
|
15 | (28) |
|
2.1 What Is Statistical Modeling? |
|
|
15 | (3) |
|
2.2 Probability Distributions Are The Models |
|
|
18 | (5) |
|
2.3 Axioms Of Probability And Their Consequences: "Rules Of Probability" |
|
|
23 | (3) |
|
2.4 Hypothesis Testing: What You Probably Already Know About Statistics |
|
|
26 | (4) |
|
2.5 Tests With Fewer Assumptions |
|
|
30 | (3) |
|
2.5.1 Wilcoxon Rank-Sum Test, Also Known As the Mann--Whitney U Test (or Simply the WMW Test) |
|
|
30 | (1) |
|
2.5.2 Kolmogorov--Smirnov Test (KS--Test) |
|
|
31 | (2) |
|
2.6 Central Limit Theorem |
|
|
33 | (1) |
|
2.7 Exact Tests And Gene Set Enrichment Analysis |
|
|
33 | (3) |
|
|
36 | (2) |
|
2.9 Some Popular Distributions |
|
|
38 | (5) |
|
2.9.1 The Uniform Distribution |
|
|
38 | (1) |
|
|
39 | (1) |
|
2.9.3 The Exponential Distribution |
|
|
39 | (1) |
|
2.9.4 The Chi-Squared Distribution |
|
|
39 | (1) |
|
2.9.5 The Poisson Distribution |
|
|
39 | (1) |
|
2.9.6 The Bernoulli Distribution |
|
|
40 | (1) |
|
2.9.7 The Binomial Distribution |
|
|
40 | (1) |
|
|
40 | (1) |
|
References And Further Reading |
|
|
41 | (2) |
|
Chapter 3 Multiple Testing |
|
|
43 | (10) |
|
3.1 The Bonferroni Correction And Gene Set Enrichment Analysis |
|
|
43 | (3) |
|
3.2 Multiple Testing In Differential Expression Analysis |
|
|
46 | (2) |
|
|
48 | (1) |
|
3.4 eQTLs: A Very Difficult Multiple-Testing Problem |
|
|
49 | (4) |
|
|
51 | (1) |
|
References And Further Reading |
|
|
52 | (1) |
|
Chapter 4 Parameter Estimation and Multivariate Statistics |
|
|
53 | (34) |
|
4.1 Fitting A Model To Data: Objective Functions And Parameter Estimation |
|
|
53 | (1) |
|
4.2 Maximum Likelihood Estimation |
|
|
54 | (1) |
|
4.3 Likelihood For Gaussian Data |
|
|
55 | (1) |
|
4.4 How To Maximize The Likelihood Analytically |
|
|
56 | (4) |
|
4.5 Other Objective Functions |
|
|
60 | (4) |
|
4.6 Multivariate Statistics |
|
|
64 | (5) |
|
4.7 MLEs For Multivariate Distributions |
|
|
69 | (8) |
|
4.8 Hypothesis Testing Revisited: The Problems With High Dimensions |
|
|
77 | (3) |
|
4.9 Example Of LRT For The Multinomial: GC Content In Genomes |
|
|
80 | (7) |
|
|
83 | (1) |
|
References And Further Reading |
|
|
83 | (4) |
|
|
|
Chapter 5 Distance-Based Clustering |
|
|
87 | (30) |
|
5.1 Multivariate Distances For Clustering |
|
|
87 | (4) |
|
5.2 Agglomerative Clustering |
|
|
91 | (4) |
|
5.2 Clustering Dna And Protein Sequences |
|
|
95 | (3) |
|
5.4 Is The Clustering Right? |
|
|
98 | (2) |
|
|
100 | (6) |
|
5.6 So What Is Learning Anyway? |
|
|
106 | (1) |
|
5.7 Choosing The Number Of Clusters For Ac-Means |
|
|
107 | (2) |
|
5.8 K-Medoids And Exemplar-Based Clustering |
|
|
109 | (1) |
|
5.9 Graph-Based Clustering: "Distances" Versus "Interactions" Or "Connections" |
|
|
110 | (3) |
|
5.10 Clustering As Dimensionality Reduction |
|
|
113 | (4) |
|
|
113 | (2) |
|
References And Further Reading |
|
|
115 | (2) |
|
Chapter 6 Mixture Models and Hidden Variables for Clustering and Beyond |
|
|
117 | (28) |
|
6.1 The Gaussian Mixture Model |
|
|
118 | (5) |
|
6.2 E-M Updates For The Mixture Of Gaussians |
|
|
123 | (4) |
|
6.3 Deriving The E-M Algorithm For The Mixture Of Gaussians |
|
|
127 | (4) |
|
6.4 Gaussian Mixtures In Practice And The Curse Of Dimensionality |
|
|
131 | (1) |
|
6.5 Choosing The Number Of Clusters Using The AIC |
|
|
131 | (2) |
|
6.6 Applications Of Mixture Models In Bioinformatics |
|
|
133 | (12) |
|
|
141 | (1) |
|
References And Further Reading |
|
|
142 | (3) |
|
|
|
Chapter 7 Univariate Regression |
|
|
145 | (24) |
|
7.1 Simple Linear Regression As A Probabilistic Model |
|
|
145 | (1) |
|
7.2 Deriving The Mles For Linear Regression |
|
|
146 | (3) |
|
7.3 Hypothesis Testing In Linear Regression |
|
|
149 | (5) |
|
7.4 Least Squares Interpretation Of Linear Regression |
|
|
154 | (1) |
|
7.5 Application Of Linear Regression To eQTLs |
|
|
155 | (2) |
|
7.6 From Hypothesis Testing To Statistical Modeling: Predicting Protein Level Based On mRNA Level |
|
|
157 | (4) |
|
7.7 Regression Is Not Just "Linear"---Polynomial And Local Regressions |
|
|
161 | (4) |
|
7.8 Generalized Linear Models |
|
|
165 | (4) |
|
|
167 | (1) |
|
References And Further Reading |
|
|
167 | (2) |
|
Chapter 8 Multiple Regression |
|
|
169 | (16) |
|
8.1 Predicting Y Using Multiple As |
|
|
169 | (2) |
|
8.2 Hypothesis Testing In Multiple Dimensions: Partial Correlations |
|
|
171 | (3) |
|
8.3 Example Of A High-Dimensional Multiple Regression: Regressing Gene Expression Levels On Transcription Factor Binding Sites |
|
|
174 | (5) |
|
8.4 AIC And Feature Selection And Overfitting In Multiple Regression |
|
|
179 | (6) |
|
|
182 | (1) |
|
References And Further Reading |
|
|
183 | (2) |
|
Chapter 9 Regularization in Multiple Regression and Beyond |
|
|
185 | (18) |
|
9.1 Regularization And Penalized Likelihood |
|
|
186 | (3) |
|
9.2 Differences Between The Effects Of L1 And L2 Penalties On Correlated Features |
|
|
189 | (1) |
|
9.3 Regularization Beyond Sparsity: Encouraging Your Own Model Structure |
|
|
190 | (2) |
|
9.4 Penalized Likelihood As Maximum A Posteriori (MAP) Estimation |
|
|
192 | (1) |
|
9.5 Choosing Prior Distributions For Parameters: Heavy-Tails If You Can |
|
|
193 | (10) |
|
|
197 | (2) |
|
References And Further Reading |
|
|
199 | (4) |
|
SECTION IV Classification |
|
|
|
Chapter 10 Linear Classification |
|
|
203 | (22) |
|
10.1 Classification Boundaries And Linear Classification |
|
|
205 | (1) |
|
10.2 Probabilistic Classification Models |
|
|
206 | (2) |
|
|
208 | (2) |
|
10.4 Linear Discriminant Analysis (LDA) And The Log Likelihood Ratio |
|
|
210 | (4) |
|
10.5 Generative And Discriminative Models For Classification |
|
|
214 | (1) |
|
10.6 Naive Bayes: Generative Map Classification |
|
|
215 | (6) |
|
10.7 Training Naive Bayes Classifiers |
|
|
221 | (1) |
|
10.8 Naive Bayes And Data Integration |
|
|
222 | (3) |
|
|
223 | (1) |
|
References And Further Reading |
|
|
223 | (2) |
|
Chapter 11 Nonlinear Classification |
|
|
225 | (16) |
|
11.1 Two Approaches To Choose Nonlinear Boundaries: Data-Guided And Multiple Simple Units |
|
|
226 | (2) |
|
11.2 Distance-Based Classification With k-Nearest Neighbors |
|
|
228 | (2) |
|
11.3 SVMs For Nonlinear Classification |
|
|
230 | (4) |
|
|
234 | (2) |
|
11.5 Random Forests And Ensemble Classifiers: The Wisdom Of The Crowd |
|
|
236 | (1) |
|
11.6 Multiclass Classification |
|
|
237 | (4) |
|
|
238 | (1) |
|
References And Further Reading |
|
|
239 | (2) |
|
Chapter 12 Evaluating Classifiers |
|
|
241 | (16) |
|
12.1 Classification Performance Statistics In The Ideal Classification Setup |
|
|
241 | (1) |
|
12.2 Measures Of Classification Performance |
|
|
242 | (3) |
|
12.3 ROC Curves And Precision-Recall Plots |
|
|
245 | (3) |
|
12.4 Evaluating Classifiers When You Don't Have Enough Data |
|
|
248 | (3) |
|
12.5 Leave-One-Out Cross-Validation |
|
|
251 | (2) |
|
12.6 Better Classification Methods Versus Better Features |
|
|
253 | (4) |
|
|
254 | (1) |
|
References And Further Reading |
|
|
255 | (2) |
Index |
|
257 | |