Atjaunināt sīkdatņu piekrišanu

Statistical Modeling and Machine Learning for Molecular Biology [Hardback]

(University of Toronto, Canada)
Citas grāmatas par šo tēmu:
  • Hardback
  • Cena: 249,78 €
  • Grāmatu piegādes laiks ir 3-4 nedēļas, ja grāmata ir uz vietas izdevniecības noliktavā. Ja izdevējam nepieciešams publicēt jaunu tirāžu, grāmatas piegāde var aizkavēties.
  • Daudzums:
  • Ielikt grozā
  • Piegādes laiks - 4-6 nedēļas
  • Pievienot vēlmju sarakstam
  • Bibliotēkām
Citas grāmatas par šo tēmu:
Molecular biologists are performing increasingly large and complicated experiments, but often have little background in data analysis. The book is devoted to teaching the statistical and computational techniques molecular biologists need to analyze their data. It explains the big-picture concepts in data analysis using a wide variety of real-world molecular biological examples such as eQTLs, ortholog identification, motif finding, inference of population structure, protein fold prediction and many more. The book takes a pragmatic approach, focusing on techniques that are based on elegant mathematics yet are the simplest to explain to scientists with little background in computers and statistics.
Acknowledgments xv
Section I: Overview
Chapter 1 Across Statistical Modeling and Machine Learning on a Shoestring
3(12)
1.1 About This Book
3(1)
1.2 What Will This Book Cover?
4(2)
1.2.1 Clustering
4(1)
1.2.2 Regression
5(1)
1.2.3 Classification
6(1)
1.3 Organization Of This Book
6(2)
1.4 Why Are There Mathematical Calculations In The Book?
8(3)
1.5 What Won't This Book Cover?
11(1)
1.6 Why Is This A Book?
12(2)
References And Further Reading
14(1)
Chapter 2 Statistical Modeling
15(28)
2.1 What Is Statistical Modeling?
15(3)
2.2 Probability Distributions Are The Models
18(5)
2.3 Axioms Of Probability And Their Consequences: "Rules Of Probability"
23(3)
2.4 Hypothesis Testing: What You Probably Already Know About Statistics
26(4)
2.5 Tests With Fewer Assumptions
30(3)
2.5.1 Wilcoxon Rank-Sum Test, Also Known As the Mann-Whitney U Test (or Simply the WMW Test)
30(1)
2.5.2 Kolmogorov-Smirnov Test (KS-Test)
31(2)
2.6 Central Limit Theorem
33(1)
2.7 Exact Tests And Gene Set Enrichment Analysis
33(3)
2.8 Permutation Tests
36(2)
2.9 Some Popular Distributions
38(2)
2.9.1 The Uniform Distribution
38(1)
2.9.2 The T-Distribution
39(1)
2.9.3 The Exponential Distribution
39(1)
2.9.4 The Chi-Squared Distribution
39(1)
2.9.5 The Poisson Distribution
39(1)
2.9.6 The Bernoulli Distribution
40(1)
2.9.7 The Binomial Distribution
40(1)
Exercises
40(1)
References And Further Reading
41(2)
Chapter 3 Multiple Testing
43(10)
3.1 The Bonferroni Correction And Gene Set Enrichment Analysis
43(3)
3.2 Multiple Testing In Differential Expression Analysis
46(2)
3.3 False Discovery Rate
48(1)
3.4 eQTLs: A Very Difficult Multiple-Testing Problem
49(2)
Exercises
51(1)
References And Further Reading
52(1)
Chapter 4 Parameter Estimation And Multivariate Statistics
53(34)
4.1 Fitting A Model To Data: Objective Functions And Parameter Estimation
53(1)
4.2 Maximum Likelihood Estimation
54(1)
4.3 Likelihood For Gaussian Data
55(1)
4.4 How To Maximize The Likelihood Analytically
56(4)
4.5 Other Objective Functions
60(4)
4.6 Multivariate Statistics
64(5)
4.7 MLEs For Multivariate Distributions
69(8)
4.8 Hypothesis Testing Revisited: The Problems With High Dimensions
77(3)
4.9 Example Of LRT For The Multinomial: GC Content In Genomes
80(3)
Exercises
83(1)
References And Further Reading
83(4)
Section II: Clustering
Chapter 5 Distance-Based Clustering
87(30)
5.1 Multivariate Distances For Clustering
87(4)
5.2 Agglomerative Clustering
91(4)
5.2 Clustering DNA And Protein Sequences
95(3)
5.4 Is The Clustering Right?
98(2)
5.5 K-Means Clustering
100(6)
5.6 So What Is Learning Anyway?
106(1)
5.7 Choosing The Number Of Clusters For K-Means
107(2)
5.8 K-Medoids And Exemplar-Based Clustering
109(1)
5.9 Graph-Based Clustering: "Distances" Versus "Interactions" Or "Connections"
110(3)
5.10 Clustering As Dimensionality Reduction
113(1)
Exercises
113(2)
References And Further Reading
115(2)
Chapter 6 Mixture Models And Hidden Variables For Clustering And Beyond
117(28)
6.1 The Gaussian Mixture Model
118(5)
6.2 E-M Updates For The Mixture Of Gaussians
123(4)
6.3 Deriving The E-M Algorithm For The Mixture Of Gaussians
127(4)
6.4 Gaussian Mixtures In Practice And The Curse Of Dimensionality
131(1)
6.5 Choosing The Number Of Clusters Using The AIC
131(2)
6.6 Applications Of Mixture Models In Bioinformatics
133(8)
Exercises
141(1)
References And Further Reading
142(3)
Section III: Regression
Chapter 7 Univariate Regression
145(24)
7.1 Simple Linear Regression As A Probabilistic Model
145(1)
7.2 Deriving The MLEs For Linear Regression
146(3)
7.3 Hypothesis Testing In Linear Regression
149(5)
7.4 Least Squares Interpretation Of Linear Regression
154(1)
7.5 Application Of Linear Regression To EQTLs
155(2)
7.6 From Hypothesis Testing To Statistical Modeling: Predicting Protein Level Based On mRNA Level
157(4)
7.7 Regression Is Not Just "Linear"-Polynomial And Local Regressions
161(4)
7.8 Generalized Linear Models
165(2)
Exercises
167(1)
References And Further Reading
167(2)
Chapter 8 Multiple Regression
169(16)
8.1 Predicting Y Using Multiple Xs
169(2)
8.2 Hypothesis Testing In Multiple Dimensions: Partial Correlations
171(3)
8.3 Example Of A High-Dimensional Multiple Regression: Regressing Gene Expression Levels On Transcription Factor Binding Sites
174(5)
8.4 AIC And Feature Selection And Overfitting In Multiple Regression
179(3)
Exercises
182(1)
References And Further Reading
183(2)
Chapter 9 Regularization In Multiple Regression And Beyond
185(18)
9.1 Regularization And Penalized Likelihood
186(3)
9.2 Differences Between The Effects Of L1 And L2 Penalties On Correlated Features
189(1)
9.3 Regularization Beyond Sparsity: Encouraging Your Own Model Structure
190(2)
9.4 Penalized Likelihood As Maximum A Posteriori (Map) Estimation
192(1)
9.5 Choosing Prior Distributions For Parameters: Heavy-Tails If You Can
193(4)
Exercises
197(2)
References And Further Reading
199(4)
Section IV Classification
Chapter 10 Linear Classification
203(22)
10.1 Classification Boundaries And Linear Classification
205(1)
10.2 Probabilistic Classification Models
206(2)
10.3 Logistic Regression
208(2)
10.4 Linear Discriminant Analysis (LDA) And The Log Likelihood Ratio
210(4)
10.5 Generative And Discriminative Models For Classification
214(1)
10.6 Naive Bayes: Generative Map Classification
215(6)
10.7 Training Naive Bayes Classifiers
221(1)
10.8 Naive Bayes And Data Integration
222(1)
Exercises
223(1)
References And Further Reading
223(2)
Chapter 11 Nonlinear Classification
225(16)
11.1 Two Approaches To Choose Nonlinear Boundaries: Data-Guided And Multiple Simple Units
226(2)
11.2 Distance-Based Classification With k-Nearest Neighbors
228(2)
11.3 SVMs For Nonlinear Classification
230(4)
11.4 Decision Trees
234(2)
11.5 Random Forests And Ensemble Classifiers: The Wisdom Of The Crowd
236(1)
11.6 Multiclass Classification
237(1)
Exercises
238(1)
References And Further Reading
239(2)
Chapter 12 Evaluating Classifiers
241(16)
12.1 Classification Performance Statistics In The Ideal Classification Setup
241(1)
12.2 Measures Of Classification Performance
242(3)
12.3 Roc Curves And Precision-Recall Plots
245(3)
12.4 Evaluating Classifiers When You Don't Have Enough Data
248(3)
12.5 Leave-One-Out Cross-Validation
251(2)
12.6 Better Classification Methods Versus Better Features
253(1)
Exercises
254(1)
References And Further Reading
255(2)
Index 257
Alan M Moses is currently Associate Professor and Canada Research Chair in Computational Biology in the Departments of Cell & Systems Biology and Computer Science at the University of Toronto. His research touches on many of the major areas in computational biology, including DNA and protein sequence analysis, phylogenetic models, population genetics, expression profiles, regulatory network simulations and image analysis.