1 A Model Selection Tale |
|
1 | (20) |
|
|
|
|
|
1 | (1) |
|
1.2 Elements of the history of words and ideas |
|
|
1 | (1) |
|
1.3 Modeling in astronomy |
|
|
2 | (3) |
|
1.4 Triangulation in geodesy |
|
|
5 | (2) |
|
1.5 The measurement of meridian arcs |
|
|
7 | (3) |
|
1.6 A model selection tale |
|
|
10 | (4) |
|
|
14 | (3) |
|
1.8 Expeditions for choosing a good model |
|
|
17 | (1) |
|
1.9 The control of errors |
|
|
18 | (1) |
|
|
18 | (2) |
|
|
20 | (1) |
2 Model's Introduction |
|
21 | (50) |
|
|
|
22 | (8) |
|
2.1.1 Empirical risk minimization |
|
|
23 | (3) |
|
2.1.2 The model choice paradigm |
|
|
26 | (1) |
|
2.1.3 Model selection via penalization |
|
|
27 | (3) |
|
2.2 Selection of linear Gaussian models |
|
|
30 | (5) |
|
2.2.1 Examples of Gaussian frameworks |
|
|
31 | (2) |
|
2.2.2 Some model selection problems |
|
|
33 | (2) |
|
2.2.3 The least squares procedure |
|
|
35 | (1) |
|
2.3 Selecting linear models |
|
|
35 | (8) |
|
2.3.1 Mallows' heuristics |
|
|
37 | (1) |
|
2.3.2 Schwarz's heuristics |
|
|
37 | (1) |
|
2.3.3 A first model selection theorem for linear models |
|
|
38 | (5) |
|
2.4 Adaptive estimation in the minimax sense |
|
|
43 | (18) |
|
2.4.1 Minimax lower bounds |
|
|
45 | (9) |
|
2.4.2 Adaptive properties of penalized estimators for Gaussian sequences |
|
|
54 | (1) |
|
2.4.3 Adaptation with respect to ellipsoids |
|
|
55 | (1) |
|
2.4.4 Adaptation with respect to arbitrary lp-bodies |
|
|
56 | (5) |
|
|
61 | (10) |
|
2.5.1 Functional analysis: from function spaces to sequence spaces |
|
|
61 | (2) |
|
|
63 | (8) |
3 Non Linear Gaussian Model Selection |
|
71 | (30) |
|
|
|
71 | (5) |
|
3.2 Selecting ellipsoids and l2 regularization |
|
|
76 | (8) |
|
3.2.1 Adaptation over Besov ellipsoids |
|
|
77 | (2) |
|
3.2.2 A first penalization strategy |
|
|
79 | (2) |
|
|
81 | (3) |
|
|
84 | (3) |
|
|
85 | (1) |
|
3.3.2 Selecting l1 balls and the Lasso |
|
|
86 | (1) |
|
|
87 | (14) |
|
3.4.1 Concentration inequalities |
|
|
87 | (9) |
|
3.4.2 Information inequalities |
|
|
96 | (2) |
|
|
98 | (3) |
4 Bayesian Model Choice |
|
101 | (20) |
|
|
|
4.1 The Bayesian paradigm |
|
|
101 | (6) |
|
4.1.1 The posterior distribution |
|
|
101 | (3) |
|
|
104 | (1) |
|
4.1.3 Conjugate prior distributions |
|
|
104 | (1) |
|
4.1.4 Noninformative priors |
|
|
105 | (1) |
|
4.1.5 Bayesian credible sets |
|
|
106 | (1) |
|
4.2 Bayesian discrimination between models |
|
|
107 | (6) |
|
4.2.1 The model index as a parameter |
|
|
107 | (2) |
|
|
109 | (1) |
|
4.2.3 The ban on improper priors |
|
|
110 | (2) |
|
4.2.4 The Bayesian Information Criterium |
|
|
112 | (1) |
|
4.2.5 Bayesian Model Averaging |
|
|
113 | (1) |
|
4.3 The case of linear regression models |
|
|
113 | (8) |
|
|
114 | (1) |
|
4.3.2 Zellner's G prior distribution |
|
|
114 | (3) |
|
|
117 | (1) |
|
4.3.4 Calculation of evidences and Bayes factors |
|
|
117 | (1) |
|
|
118 | (3) |
5 Some Computational Aspects Of Bayesian Model Choice |
|
121 | (14) |
|
|
|
5.1 Some Monte Carlo strategies to approximate the evidence |
|
|
121 | (6) |
|
5.1.1 The basic Monte Carlo solution |
|
|
123 | (1) |
|
5.1.2 Usual importance sampling approximations |
|
|
124 | (2) |
|
5.1.3 The Harmonic mean approximation |
|
|
126 | (1) |
|
|
127 | (1) |
|
5.2 The bridge sampling methodology to compare embedded models |
|
|
127 | (3) |
|
5.3 A Monte Carlo Markov Chain method for variable selection |
|
|
130 | (5) |
|
|
130 | (3) |
|
5.3.2 A Stochastic Search for the Most Likely Model |
|
|
133 | (2) |
6 Randomization And Aggregation For Predictive Modeling With Classification Data |
|
135 | (30) |
|
|
|
135 | (1) |
|
6.2 Randomness, bless our data! |
|
|
136 | (6) |
|
6.2.1 A probabilistic view of classification data |
|
|
136 | (4) |
|
6.2.2 Let the data go: error estimation and model validation |
|
|
140 | (2) |
|
6.3 Power to the masses: aggregation principles |
|
|
142 | (8) |
|
6.3.1 Voting and averaging in binary classification |
|
|
142 | (1) |
|
6.3.2 A lazy way to multi-class classification |
|
|
143 | (1) |
|
6.3.3 Agreement and averaging in the context of scoring |
|
|
144 | (4) |
|
6.3.4 From bipartite ranking to K-partite ranking |
|
|
148 | (2) |
|
6.4 Time for doers: popular aggregation meta-algorithms |
|
|
150 | (7) |
|
|
151 | (1) |
|
|
152 | (2) |
|
6.4.3 Forests for bipartite ranking and scoring |
|
|
154 | (3) |
|
6.5 Time for thinkers: Theory of aggregated rules |
|
|
157 | (8) |
|
6.5.1 Aggregation of classification rules |
|
|
157 | (1) |
|
6.5.2 Consistency of Forests |
|
|
158 | (2) |
|
6.5.3 From bipartite consistency to K-partite consistency |
|
|
160 | (5) |
7 Mixture Models |
|
165 | (72) |
|
|
7.1 Mixture models as a many-purpose tool |
|
|
165 | (10) |
|
7.1.1 Starting from applications |
|
|
165 | (3) |
|
7.1.2 The mixture model answer |
|
|
168 | (2) |
|
7.1.3 Classical mixture models |
|
|
170 | (5) |
|
|
175 | (1) |
|
|
175 | (11) |
|
|
175 | (1) |
|
7.2.2 Maximum likelihood and variants |
|
|
176 | (3) |
|
7.2.3 Theoretical difficulties related to the likelihood |
|
|
179 | (1) |
|
7.2.4 Estimation algorithms |
|
|
180 | (6) |
|
7.3 Model selection in density estimation |
|
|
186 | (14) |
|
7.3.1 Need to select a model |
|
|
186 | (3) |
|
7.3.2 Frequentist approach and deviance |
|
|
189 | (5) |
|
7.3.3 Bayesian approach and integrated likelihood |
|
|
194 | (6) |
|
7.4 Model selection in (semi-)supervised classification |
|
|
200 | (8) |
|
7.4.1 Need to select a model |
|
|
200 | (3) |
|
7.4.2 Error rates-based criteria |
|
|
203 | (2) |
|
7.4.3 A predictive deviance criterion |
|
|
205 | (3) |
|
7.5 Model selection in clustering |
|
|
208 | (9) |
|
7.5.1 Need to select a model |
|
|
208 | (1) |
|
7.5.2 Partition-based criteria |
|
|
209 | (2) |
|
7.5.3 The Integrated Completed Likelihood criterion |
|
|
211 | (6) |
|
7.6 Experiments on real data sets |
|
|
217 | (17) |
|
7.6.1 BIC: extra-solar planets |
|
|
218 | (1) |
|
7.6.2 AICcond/BIC/AIC/BEC/ecv: benchmark data sets |
|
|
219 | (2) |
|
7.6.3 AICcond/ecvV: textile data set |
|
|
221 | (1) |
|
7.6.4 BIC: social comparison theory |
|
|
222 | (2) |
|
7.6.5 NEC: marketing data |
|
|
224 | (1) |
|
7.6.6 ICL: prostate cancer data |
|
|
225 | (3) |
|
7.6.7 BIC: density estimation in the steel industry |
|
|
228 | (1) |
|
7.6.8 BIC: partitioning communes of Wallonia |
|
|
229 | (2) |
|
7.6.9 ICLbic/BIC: acoustic emission control |
|
|
231 | (1) |
|
7.6.10 ICLbic/ICL/BIC/ILbayes: a seabird data set |
|
|
232 | (2) |
|
7.7 Future methodological challenges |
|
|
234 | (3) |
8 Calibration Of Penalties |
|
237 | (10) |
|
|
8.1 The concept of minimal penalty |
|
|
238 | (5) |
|
8.1.1 A small number of models |
|
|
239 | (3) |
|
8.1.2 A large number of models |
|
|
242 | (1) |
|
8.2 Data-driven penalties |
|
|
243 | (4) |
|
8.2.1 From theory to practice |
|
|
243 | (1) |
|
8.2.2 The slope heuristics |
|
|
244 | (3) |
9 High Dimensional Clustering |
|
247 | (36) |
|
|
|
|
247 | (3) |
|
9.2 HD clustering: Curse or blessing? |
|
|
250 | (6) |
|
9.2.1 HD density estimation: Curse |
|
|
250 | (2) |
|
9.2.2 HD clustering: A mix of curse and blessing |
|
|
252 | (2) |
|
9.2.3 Intermediate conclusion |
|
|
254 | (2) |
|
|
256 | (6) |
|
9.3.1 Gaussian mixture of factor analysers |
|
|
256 | (1) |
|
9.3.2 HD Gaussian mixture models |
|
|
257 | (1) |
|
|
258 | (4) |
|
9.3.4 Intermediate conclusion |
|
|
262 | (1) |
|
|
262 | (20) |
|
9.4.1 Parsimonious mixture models |
|
|
263 | (3) |
|
9.4.2 Variable selection through regularization |
|
|
266 | (4) |
|
9.4.3 Variable role modelling |
|
|
270 | (4) |
|
|
274 | (7) |
|
9.4.5 Intermediate conclusion |
|
|
281 | (1) |
|
9.5 Future methodological challenges |
|
|
282 | (1) |
10 Clustering Of Co-Expressed Genes |
|
283 | (30) |
|
Marie-Laure Martin-Magniette |
|
|
|
|
|
283 | (1) |
|
10.2 Model-based clustering |
|
|
284 | (2) |
|
10.3 Clustering of microarray data |
|
|
286 | (10) |
|
|
286 | (1) |
|
10.3.2 Gaussian mixture models |
|
|
287 | (1) |
|
|
288 | (8) |
|
10.4 Clustering of RNA-seq data |
|
|
296 | (11) |
|
|
296 | (1) |
|
10.4.2 Poisson mixture models |
|
|
297 | (2) |
|
|
299 | (8) |
|
|
307 | (6) |
11 Forecasting The French National Electricity Consumption: From Sparse Models To Aggregated Forecasts |
|
313 | (14) |
|
|
11.1 Functional regression models |
|
|
315 | (2) |
|
11.2 Data Mining using sparse approximation of the intra day load curves |
|
|
317 | (3) |
|
11.2.1 Choice of a generic dictionary |
|
|
318 | (1) |
|
11.2.2 Mining and clustering |
|
|
319 | (1) |
|
11.2.3 Patterns of consumption |
|
|
320 | (1) |
|
11.3 Sparse modeling with adaptive dictionaries |
|
|
320 | (1) |
|
|
321 | (2) |
|
|
322 | (1) |
|
|
322 | (1) |
|
11.5 Performances & Software |
|
|
323 | (1) |
|
11.6 Conclusion and perspectives |
|
|
324 | (1) |
|
|
325 | (2) |
Bibliography |
|
327 | (26) |
Index |
|
353 | |