Preface |
|
xiii | |
Symbol Description |
|
xv | |
|
|
1 | (16) |
|
1.1 Heredity, Genetics, and Genomics |
|
|
1 | (1) |
|
1.2 Principles of Population Genomics |
|
|
2 | (10) |
|
|
2 | (1) |
|
|
3 | (6) |
|
|
9 | (2) |
|
1.2.4 Drift and Selection |
|
|
11 | (1) |
|
1.3 R Packages and Conventions |
|
|
12 | (4) |
|
1.4 Required Knowledge and Other Readings |
|
|
16 | (1) |
|
|
17 | (30) |
|
2.1 Samples and Sampling Designs |
|
|
17 | (5) |
|
2.1.1 How Much DNA in a Sample? |
|
|
17 | (1) |
|
|
18 | (1) |
|
|
19 | (3) |
|
2.2 Low-Throughput Technologies |
|
|
22 | (5) |
|
2.2.1 Genotypes From Phenotypes |
|
|
22 | (1) |
|
2.2.2 DNA Cleavage Methods |
|
|
23 | (1) |
|
2.2.3 Repeat Length Polymorphism |
|
|
24 | (1) |
|
2.2.4 Sanger and Shotgun Sequencing |
|
|
24 | (1) |
|
2.2.5 DNA Methylation and Bisulfite Sequencing |
|
|
25 | (2) |
|
2.3 High-Throughput Technologies |
|
|
27 | (4) |
|
|
27 | (1) |
|
2.3.2 High-Throughput Sequencing |
|
|
27 | (1) |
|
2.3.3 Restriction Site Associated DNA |
|
|
28 | (1) |
|
|
29 | (1) |
|
|
29 | (1) |
|
2.3.6 Sequencing of Pooled Individuals |
|
|
30 | (1) |
|
2.3.7 Designing a Study With HTS |
|
|
30 | (1) |
|
2.3.8 The Future of DNA Sequencing |
|
|
30 | (1) |
|
|
31 | (4) |
|
|
31 | (1) |
|
2.4.2 Archiving and Compression |
|
|
32 | (3) |
|
2.5 Bioinformatics and Genomics |
|
|
35 | (6) |
|
2.5.1 Processing Sanger Sequencing Data With sangerseqR |
|
|
36 | (1) |
|
2.5.2 Read Mapping With Rsubread |
|
|
36 | (3) |
|
2.5.3 Managing Read Alignments With Rsamtools |
|
|
39 | (2) |
|
2.6 Simulation of High-Throughput Sequencing Data |
|
|
41 | (3) |
|
|
44 | (3) |
|
|
47 | (26) |
|
3.1 What is an R Data Object? |
|
|
47 | (2) |
|
3.2 Data Classes for Genomic Data |
|
|
49 | (9) |
|
3.2.1 The Class "loci" (pegas) |
|
|
49 | (1) |
|
3.2.2 The Class "genind" (adegenet) |
|
|
50 | (1) |
|
3.2.3 The Classes "SNPbin" and "genlight" (adegenet) |
|
|
51 | (1) |
|
3.2.4 The Class "SnpMatrix" (snpStats) |
|
|
52 | (1) |
|
3.2.5 The Class "DNAbin" (ape) |
|
|
53 | (2) |
|
3.2.6 The Classes "XString" and "XStringSet" (Biostrings) |
|
|
55 | (1) |
|
3.2.7 The Package SNPRelate |
|
|
56 | (2) |
|
3.3 Data Input and Output |
|
|
58 | (10) |
|
|
58 | (1) |
|
3.3.2 Reading Spreadsheet Files |
|
|
59 | (1) |
|
|
60 | (5) |
|
3.3.4 Reading PED and BED Files |
|
|
65 | (1) |
|
3.3.5 Reading Sequence Files |
|
|
66 | (1) |
|
3.3.6 Reading Annotation Files |
|
|
67 | (1) |
|
|
67 | (1) |
|
|
68 | (1) |
|
3.5 Managing Files and Projects |
|
|
69 | (2) |
|
|
71 | (2) |
|
|
73 | (20) |
|
4.1 Basic Data Manipulation in R |
|
|
73 | (5) |
|
4.1.1 Subsetting, Replacement, and Deletion |
|
|
73 | (1) |
|
4.1.2 Commonly Used Functions |
|
|
74 | (2) |
|
4.1.3 Recycling and Coercion |
|
|
76 | (1) |
|
|
77 | (1) |
|
|
78 | (2) |
|
|
80 | (1) |
|
|
81 | (10) |
|
4.4.1 Mitochondrial Genomes of the Asiatic Golden Cat |
|
|
82 | (1) |
|
4.4.2 Complete Genomes of the Fruit Fly |
|
|
83 | (1) |
|
|
84 | (1) |
|
4.4.4 Influenza H1N1 Virus Sequences |
|
|
85 | (2) |
|
4.4.5 Jaguar Microsatellites |
|
|
87 | (1) |
|
4.4.6 Bacterial Whole Genome Sequences |
|
|
87 | (1) |
|
4.4.7 Metabarcoding of Fish Communities |
|
|
88 | (3) |
|
|
91 | (2) |
|
5 Data Exploration and Summaries |
|
|
93 | (64) |
|
5.1 Genotype and Allele Frequencies |
|
|
93 | (5) |
|
|
95 | (1) |
|
|
96 | (2) |
|
5.2 Haplotype and Nucleotide Diversity |
|
|
98 | (5) |
|
5.2.1 The Class "haplotype" |
|
|
98 | (3) |
|
5.2.2 Haplotype and Nucleotide Diversity From DNA Sequences |
|
|
101 | (2) |
|
5.3 Genetic and Genomic Distances |
|
|
103 | (4) |
|
5.3.1 Theoretical Background |
|
|
103 | (1) |
|
|
103 | (2) |
|
5.3.3 Distances From DNA Sequences |
|
|
105 | (1) |
|
5.3.4 Distances From Allele Sharing |
|
|
105 | (1) |
|
5.3.5 Distances From Microsatellites |
|
|
106 | (1) |
|
|
107 | (3) |
|
|
110 | (4) |
|
|
110 | (2) |
|
5.5.2 Summaries With Genomic Positions |
|
|
112 | (1) |
|
|
113 | (1) |
|
|
114 | (11) |
|
5.6.1 Matrix Decomposition |
|
|
115 | (1) |
|
5.6.1.1 Eigendecomposition |
|
|
115 | (2) |
|
5.6.1.2 Singular Value Decomposition |
|
|
117 | (1) |
|
5.6.1.3 Power Method and Random Matrices |
|
|
118 | (1) |
|
5.6.2 Principal Component Analysis |
|
|
118 | (1) |
|
|
119 | (2) |
|
|
121 | (2) |
|
|
123 | (1) |
|
5.6.3 Multidimensional Scaling |
|
|
124 | (1) |
|
|
125 | (29) |
|
5.7.1 Mitochondrial Genomes of the Asiatic Golden Cat |
|
|
125 | (2) |
|
5.7.2 Complete Genomes of the Fruit Fly |
|
|
127 | (7) |
|
|
134 | (4) |
|
5.7.4 Influenza H1N1 Virus Sequences |
|
|
138 | (4) |
|
5.7.5 Jaguar Microsatellites |
|
|
142 | (7) |
|
5.7.6 Bacterial Whole Genome Sequences |
|
|
149 | (3) |
|
5.7.7 Metabarcoding of Fish Communities |
|
|
152 | (2) |
|
|
154 | (3) |
|
6 Linkage Disequilibrium and Haplotype Structure |
|
|
157 | (28) |
|
6.1 Why Linkage Disequilibrium is Important? |
|
|
157 | (2) |
|
6.2 Linkage Disequilibrium: Two Loci |
|
|
159 | (4) |
|
|
159 | (1) |
|
6.2.1.1 Theoretical Background |
|
|
159 | (1) |
|
6.2.1.2 Implementation in pegas |
|
|
160 | (2) |
|
|
162 | (1) |
|
|
163 | (9) |
|
6.3.1 Haplotypes From Unphased Genotypes |
|
|
163 | (1) |
|
6.3.1.1 The Expectation-Maximization Algorithm |
|
|
164 | (1) |
|
6.3.1.2 Implementation in haplo.stats |
|
|
164 | (3) |
|
6.3.2 Locus-Specific Imputation |
|
|
167 | (1) |
|
6.3.3 Maps of Linkage Disequilibrium |
|
|
168 | (1) |
|
6.3.3.1 Phased Genotypes With pegas |
|
|
168 | (2) |
|
|
170 | (1) |
|
|
171 | (1) |
|
|
172 | (8) |
|
6.4.1 Complete Genomes of the Fruit Fly |
|
|
172 | (4) |
|
|
176 | (1) |
|
6.4.3 Jaguar Microsatellites |
|
|
177 | (3) |
|
|
180 | (5) |
|
7 Population Genetic Structure |
|
|
185 | (56) |
|
7.1 Hardy-Weinberg Equilibrium |
|
|
185 | (2) |
|
|
187 | (9) |
|
7.2.1 Theoretical Background |
|
|
187 | (2) |
|
7.2.2 Implementations in pegas and in mmod |
|
|
189 | (4) |
|
7.2.3 Implementations in snpStats and in SNPRelate |
|
|
193 | (3) |
|
|
196 | (6) |
|
7.3.1 Minimum Spanning Trees and Networks |
|
|
197 | (2) |
|
7.3.2 Statistical Parsimony |
|
|
199 | (1) |
|
|
200 | (1) |
|
|
201 | (1) |
|
|
202 | (12) |
|
7.4.1 Principles of Discriminant Analysis |
|
|
202 | (1) |
|
7.4.2 Discriminant Analysis of Principal Components |
|
|
203 | (4) |
|
|
207 | (1) |
|
7.4.4 Maximum Likelihood Methods |
|
|
207 | (3) |
|
7.4.5 Bayesian Clustering |
|
|
210 | (4) |
|
|
214 | (8) |
|
|
214 | (3) |
|
7.5.2 Principal Component Analysis of Coancestry |
|
|
217 | (1) |
|
7.5.3 A Second Look at F-Statistics |
|
|
218 | (4) |
|
|
222 | (17) |
|
7.6.1 Mitochondrial Genomes of the Asiatic Golden Cat |
|
|
222 | (3) |
|
7.6.2 Complete Genomes of the Fruit Fly |
|
|
225 | (9) |
|
7.6.3 Influenza H1N1 Virus Sequences |
|
|
234 | (3) |
|
7.6.4 Jaguar Microsatellites |
|
|
237 | (2) |
|
|
239 | (2) |
|
|
241 | (24) |
|
8.1 Geographical Data in R |
|
|
241 | (2) |
|
8.1.1 Packages and Classes |
|
|
242 | (1) |
|
8.1.2 Calculating Geographical Distances |
|
|
242 | (1) |
|
8.2 A Third Look at F-Statistics |
|
|
243 | (7) |
|
8.2.1 Hierarchical Components of Genetic Diversity |
|
|
243 | (3) |
|
8.2.2 Analysis of Molecular Variance |
|
|
246 | (4) |
|
8.3 Moran / and Spatial Autocorrelation |
|
|
250 | (1) |
|
8.4 Spatial Principal Component Analysis |
|
|
251 | (4) |
|
8.5 Finding Boundaries Between Populations |
|
|
255 | (4) |
|
8.5.1 Spatial Ancestry (tess3r) |
|
|
255 | (2) |
|
8.5.2 Bayesian Methods (Geneland) |
|
|
257 | (2) |
|
|
259 | (4) |
|
8.6.1 Complete Genomes of the Fruit Fly |
|
|
259 | (1) |
|
|
260 | (3) |
|
|
263 | (2) |
|
9 Past Demographic Events |
|
|
265 | (44) |
|
|
265 | (10) |
|
9.1.1 The Standard Coalescent |
|
|
265 | (3) |
|
9.1.2 The Sequential Markovian Coalescent |
|
|
268 | (1) |
|
9.1.3 Simulation of Coalescent Data |
|
|
269 | (6) |
|
|
275 | (3) |
|
|
275 | (1) |
|
|
275 | (1) |
|
|
276 | (1) |
|
|
276 | (1) |
|
|
277 | (1) |
|
9.3 Coalescent-Based Inference |
|
|
278 | (6) |
|
9.3.1 Maximum Likelihood Methods |
|
|
278 | (2) |
|
9.3.2 Analysis of Markov Chain Monte Carlo Outputs |
|
|
280 | (2) |
|
|
282 | (1) |
|
|
282 | (2) |
|
9.4 Heterochronous Samples |
|
|
284 | (2) |
|
9.5 Site Frequency Spectrum Methods |
|
|
286 | (6) |
|
9.5.1 The Stairway Method |
|
|
288 | (1) |
|
|
289 | (1) |
|
|
289 | (3) |
|
9.6 Whole-Genome Methods (psmcr) |
|
|
292 | (1) |
|
|
293 | (13) |
|
9.7.1 Mitochondrial Genomes of the Asiatic Golden Cat |
|
|
293 | (5) |
|
9.7.2 Complete Genomes of the Fruit Fly |
|
|
298 | (4) |
|
9.7.3 Influenza H1N1 Virus Sequences |
|
|
302 | (2) |
|
9.7.4 Bacterial Whole Genome Sequences |
|
|
304 | (2) |
|
|
306 | (3) |
|
|
309 | (32) |
|
|
309 | (4) |
|
|
309 | (1) |
|
10.1.2 Selection in Protein-Coding Sequences |
|
|
310 | (3) |
|
|
313 | (11) |
|
10.2.1 A Fourth Look at F-Statistics |
|
|
313 | (1) |
|
10.2.2 Association Studies (LEA) |
|
|
314 | (1) |
|
10.2.3 Principal Component Analysis (pcadapt) |
|
|
314 | (1) |
|
10.2.4 Scans for Selection With Extended Haplotypes |
|
|
315 | (5) |
|
|
320 | (4) |
|
10.3 Time-Series of Allele Frequencies |
|
|
324 | (2) |
|
|
326 | (12) |
|
10.4.1 Mitochondrial Genomes of the Asiatic Golden Cat |
|
|
326 | (1) |
|
10.4.2 Complete Genomes of the Fruit Fly |
|
|
327 | (8) |
|
10.4.3 Influenza H1N1 Virus Sequences |
|
|
335 | (3) |
|
|
338 | (3) |
A Installing R Packages |
|
341 | (4) |
B Compressing Large Sequence Files |
|
345 | (4) |
C Sampling of Alleles in a Population |
|
349 | (2) |
D Glossary |
|
351 | (2) |
Bibliography |
|
353 | (20) |
Index |
|
373 | |