Preface |
|
xiii | |
Software support |
|
xv | |
Acknowledgements |
|
xvii | |
|
1 Role of probability theory in science |
|
|
1 | (20) |
|
|
1 | (1) |
|
1.2 Inference requires a probability theory |
|
|
2 | (3) |
|
1.2.1 The two rules for manipulating probabilities |
|
|
4 | (1) |
|
1.3 Usual form of Bayes' theorem |
|
|
5 | (5) |
|
1.3.1 Discrete hypothesis space |
|
|
5 | (1) |
|
1.3.2 Continuous hypothesis space |
|
|
6 | (1) |
|
1.3.3 Bayes' theorem - model of the learning process |
|
|
7 | (1) |
|
1.3.4 Example of the use of Bayes' theorem |
|
|
8 | (2) |
|
1.4 Probability and frequency |
|
|
10 | (2) |
|
1.4.1 Example: incorporating frequency information |
|
|
11 | (1) |
|
|
12 | (3) |
|
1.6 The two basic problems in statistical inference |
|
|
15 | (1) |
|
1.7 Advantages of the Bayesian approach |
|
|
16 | (1) |
|
|
17 | (4) |
|
2 Probability theory as extended logic |
|
|
21 | (20) |
|
|
21 | (1) |
|
2.2 Fundamentals of logic |
|
|
21 | (4) |
|
2.2.1 Logical propositions |
|
|
21 | (1) |
|
2.2.2 Compound propositions |
|
|
22 | (1) |
|
2.2.3 Truth tables and Boolean algebra |
|
|
22 | (2) |
|
2.2.4 Deductive inference |
|
|
24 | (1) |
|
2.2.5 Inductive or plausible inference |
|
|
25 | (1) |
|
|
25 | (1) |
|
2.4 An adequate set of operations |
|
|
26 | (3) |
|
2.4.1 Examination of a logic function |
|
|
27 | (2) |
|
2.5 Operations for plausible inference |
|
|
29 | (8) |
|
2.5.1 The desiderata of Bayesian probability theory |
|
|
30 | (1) |
|
2.5.2 Development of the product rule |
|
|
30 | (4) |
|
2.5.3 Development of sum rule |
|
|
34 | (2) |
|
2.5.4 Qualitative properties of product and sum rules |
|
|
36 | (1) |
|
2.6 Uniqueness of the product and sum rules |
|
|
37 | (2) |
|
|
39 | (1) |
|
|
39 | (2) |
|
3 The how-to of Bayesian inference |
|
|
41 | (31) |
|
|
41 | (1) |
|
|
41 | (2) |
|
|
43 | (2) |
|
|
45 | (1) |
|
3.5 Model comparison and Occam's razor |
|
|
45 | (5) |
|
3.6 Sample spectral line problem |
|
|
50 | (2) |
|
3.6.1 Background information |
|
|
50 | (2) |
|
|
52 | (7) |
|
3.7.1 Choice of prior p(T\M1, I) |
|
|
53 | (2) |
|
3.7.2 Calculation of p(D\M1, T, I) |
|
|
55 | (3) |
|
3.7.3 Calculation of p(D\M2, I) |
|
|
58 | (1) |
|
3.7.4 Odds, uniform prior |
|
|
58 | (1) |
|
3.7.5 Odds, Jeffreys prior |
|
|
58 | (1) |
|
3.8 Parameter estimation problem |
|
|
59 | (2) |
|
3.8.1 Sensitivity of odds to Tmax |
|
|
59 | (2) |
|
|
61 | (2) |
|
|
63 | (2) |
|
|
65 | (4) |
|
3.11.1 Systematic error example |
|
|
66 | (3) |
|
|
69 | (3) |
|
4 Assigning probabilities |
|
|
72 | (24) |
|
|
72 | (1) |
|
4.2 Binomial distribution |
|
|
72 | (7) |
|
4.2.1 Bernoulli's law of large numbers |
|
|
75 | (1) |
|
4.2.2 The gambler's coin problem |
|
|
75 | (2) |
|
4.2.3 Bayesian analysis of an opinion poll |
|
|
77 | (2) |
|
4.3 Multinomial distribution |
|
|
79 | (1) |
|
4.4 Can you really answer that question? |
|
|
80 | (2) |
|
4.5 Logical versus causal connections |
|
|
82 | (1) |
|
4.6 Exchangeable distributions |
|
|
83 | (2) |
|
|
85 | (4) |
|
4.7.1 Bayesian and frequentist comparison |
|
|
87 | (2) |
|
4.8 Constructing likelihood functions |
|
|
89 | (4) |
|
4.8.1 Deterministic model |
|
|
90 | (1) |
|
4.8.2 Probabilistic model |
|
|
91 | (2) |
|
|
93 | (1) |
|
|
94 | (2) |
|
5 Frequentist statistical inference |
|
|
96 | (43) |
|
|
96 | (1) |
|
5.2 The concept of a random variable |
|
|
96 | (1) |
|
|
97 | (1) |
|
5.4 Probability distributions |
|
|
98 | (2) |
|
5.5 Descriptive properties of distributions |
|
|
100 | (5) |
|
5.5.1 Relative line shape measures for distributions |
|
|
101 | (1) |
|
5.5.2 Standard random variable |
|
|
102 | (1) |
|
5.5.3 Other measures of central tendency and dispersion |
|
|
103 | (1) |
|
5.5.4 Median baseline subtraction |
|
|
104 | (1) |
|
5.6 Moment generating functions |
|
|
105 | (2) |
|
5.7 Some discrete probability distributions |
|
|
107 | (6) |
|
5.7.1 Binomial distribution |
|
|
107 | (2) |
|
5.7.2 The Poisson distribution |
|
|
109 | (3) |
|
5.7.3 Negative binomial distribution |
|
|
112 | (1) |
|
5.8 Continuous probability distributions |
|
|
113 | (6) |
|
5.8.1 Normal distribution |
|
|
113 | (3) |
|
5.8.2 Uniform distribution |
|
|
116 | (1) |
|
|
116 | (1) |
|
|
117 | (1) |
|
5.8.5 Negative exponential distribution |
|
|
118 | (1) |
|
5.9 Central Limit Theorem |
|
|
119 | (1) |
|
5.10 Bayesian demonstration of the Central Limit Theorem |
|
|
120 | (4) |
|
5.11 Distribution of the sample mean |
|
|
124 | (1) |
|
5.11.1 Signal averaging example |
|
|
125 | (1) |
|
5.12 Transformation of a random Variable |
|
|
125 | (2) |
|
5.13 Random and pseudo-random numbers |
|
|
127 | (9) |
|
5.13.1 Pseudo-random number generators |
|
|
131 | (1) |
|
5.13.2 Tests for randomness |
|
|
132 | (4) |
|
|
136 | (1) |
|
|
137 | (2) |
|
|
139 | (23) |
|
|
139 | (2) |
|
|
141 | (2) |
|
|
143 | (4) |
|
6.4 The Student's t distribution |
|
|
147 | (3) |
|
6.5 F distribution (F-test) |
|
|
150 | (2) |
|
|
152 | (8) |
|
|
152 | (4) |
|
6.6.2 Confidence intervals for μ, unknown variance |
|
|
156 | (2) |
|
6.6.3 Confidence intervals: difference of two means |
|
|
158 | (1) |
|
6.6.4 Confidence intervals for σ2 |
|
|
159 | (1) |
|
6.6.5 Confidence intervals: ratio of two variances |
|
|
159 | (1) |
|
|
160 | (1) |
|
|
161 | (1) |
|
7 Frequentist hypothesis testing |
|
|
162 | (22) |
|
|
162 | (1) |
|
|
162 | (10) |
|
7.2.1 Hypothesis testing with the x2 statistic |
|
|
163 | (4) |
|
7.2.2 Hypothesis test on the difference of two means |
|
|
167 | (3) |
|
7.2.3 One-sided and two-sided hypothesis tests |
|
|
170 | (2) |
|
7.3 Are two distributions the same? |
|
|
172 | (5) |
|
7.3.1 Pearson x2 goodness-of-fit test |
|
|
173 | (4) |
|
7.3.2 Comparison of two-binned data sets |
|
|
177 | (1) |
|
7.4 Problem with frequentist hypothesis testing |
|
|
177 | (4) |
|
7.4.1 Bayesian resolution to optional stopping problem |
|
|
179 | (2) |
|
|
181 | (3) |
|
8 Maximum entropy probabilities |
|
|
184 | (28) |
|
|
184 | (1) |
|
8.2 The maximum entropy principle |
|
|
185 | (1) |
|
|
186 | (1) |
|
8.4 Alternative justification of MaxEnt |
|
|
187 | (3) |
|
|
190 | (1) |
|
8.5.1 Incorporating a prior |
|
|
190 | (1) |
|
8.5.2 Continuous probability distributions |
|
|
191 | (1) |
|
8.6 How to apply the MaxEnt principle |
|
|
191 | (1) |
|
8.6.1 Lagrange multipliers of variational calculus |
|
|
191 | (1) |
|
|
192 | (11) |
|
|
192 | (2) |
|
8.7.2 Uniform distribution |
|
|
194 | (1) |
|
8.7.3 Exponential distribution |
|
|
195 | (2) |
|
8.7.4 Normal and truncated Gaussian distributions |
|
|
197 | (5) |
|
8.7.5 Multivariate Gaussian distribution |
|
|
202 | (1) |
|
8.8 MaxEnt image reconstruction |
|
|
203 | (5) |
|
8.8.1 The kangaroo justification |
|
|
203 | (3) |
|
8.8.2 MaxEnt for uncertain constraints |
|
|
206 | (2) |
|
8.9 Pixon multiresolution image reconstruction |
|
|
208 | (3) |
|
|
211 | (1) |
|
9 Bayesian inference with Gaussian errors |
|
|
212 | (31) |
|
|
212 | (1) |
|
9.2 Bayesian estimate of a mean |
|
|
212 | (15) |
|
9.2.1 Mean: known noise σ |
|
|
213 | (4) |
|
9.2.2 Mean: known noise, unequal σ |
|
|
217 | (1) |
|
9.2.3 Mean: unknown noise σ |
|
|
218 | (6) |
|
9.2.4 Bayesian estimate of σ |
|
|
224 | (3) |
|
9.3 Is the signal variable? |
|
|
227 | (1) |
|
9.4 Comparison of two independent samples |
|
|
228 | (12) |
|
9.4.1 Do the samples differ? |
|
|
230 | (3) |
|
9.4.2 How do the samples differ? |
|
|
233 | (1) |
|
|
233 | (3) |
|
9.4.4 The difference in means |
|
|
236 | (1) |
|
9.4.5 Ratio of the standard deviations |
|
|
237 | (2) |
|
9.4.6 Effect of the prior ranges |
|
|
239 | (1) |
|
|
240 | (1) |
|
|
241 | (2) |
|
10 Linear model fitting (Gaussian errors) |
|
|
243 | (44) |
|
|
243 | (1) |
|
10.2 Parameter estimation |
|
|
244 | (12) |
|
10.2.1 Most probable amplitudes |
|
|
249 | (4) |
|
10.2.2 More powerful matrix formulation |
|
|
253 | (3) |
|
|
256 | (1) |
|
10.4 The posterior is a Gaussian |
|
|
257 | (7) |
|
10.4.1 Joint credible regions |
|
|
260 | (4) |
|
10.5 Model parameter errors |
|
|
264 | (9) |
|
10.5.1 Marginalization and the covariance matrix |
|
|
264 | (4) |
|
10.5.2 Correlation coefficient |
|
|
268 | (4) |
|
10.5.3 More on model parameter errors |
|
|
272 | (1) |
|
10.6 Correlated data errors |
|
|
273 | (2) |
|
10.7 Model comparison with Gaussian posteriors |
|
|
275 | (4) |
|
10.8 Frequentist testing and errors |
|
|
279 | (4) |
|
10.8.1 Other model comparison methods |
|
|
281 | (2) |
|
|
283 | (1) |
|
|
284 | (3) |
|
11 Nonlinear model fitting |
|
|
287 | (25) |
|
|
287 | (1) |
|
11.2 Asymptotic normal approximation |
|
|
288 | (3) |
|
11.3 Laplacian approximations |
|
|
291 | (3) |
|
|
291 | (2) |
|
11.3.2 Marginal parameter posteriors |
|
|
293 | (1) |
|
11.4 Finding the most probable parameters |
|
|
294 | (4) |
|
11.4.1 Simulated annealing |
|
|
296 | (1) |
|
|
297 | (1) |
|
11.5 Iterative linearization |
|
|
298 | (4) |
|
11.5.1 Levenberg-Marquardt method |
|
|
300 | (1) |
|
11.5.2 Marquardt's recipe |
|
|
301 | (1) |
|
|
302 | (5) |
|
|
304 | (2) |
|
11.6.2 Marginal and projected distributions |
|
|
306 | (1) |
|
11.7 Errors in both coordinates |
|
|
307 | (2) |
|
|
309 | (1) |
|
|
309 | (3) |
|
12 Markov chain Monte Carlo |
|
|
312 | (40) |
|
|
312 | (1) |
|
12.2 Metropolis-Hastings algorithm |
|
|
313 | (6) |
|
12.3 Why does Metropolis-Hastings work? |
|
|
319 | (2) |
|
|
321 | (1) |
|
|
321 | (1) |
|
|
322 | (4) |
|
|
326 | (4) |
|
12.8 Towards an automated MCMC |
|
|
330 | (1) |
|
12.9 Extrasolar planet example |
|
|
331 | (11) |
|
12.9.1 Model probabilities |
|
|
335 | (2) |
|
|
337 | (5) |
|
12.10 MCMC robust summary statistic |
|
|
342 | (4) |
|
|
346 | (3) |
|
|
349 | (3) |
|
13 Bayesian revolution in spectral analysis |
|
|
352 | (24) |
|
|
352 | (1) |
|
13.2 New insights on the periodogram |
|
|
352 | (6) |
|
13.2.1 How to compute p(f|D, I) |
|
|
356 | (2) |
|
13.3 Strong prior signal model |
|
|
358 | (2) |
|
13.4 No specific prior signal model |
|
|
360 | (5) |
|
13.4.1 X-ray astronomy example |
|
|
362 | (1) |
|
13.4.2 Radio astronomy example |
|
|
363 | (2) |
|
13.5 Generalized Lomb-Scargle periodogram |
|
|
365 | (5) |
|
13.5.1 Relationship to Lomb-Scargle periodogram |
|
|
367 | (1) |
|
|
367 | (3) |
|
13.6 Non-uniform sampling |
|
|
370 | (3) |
|
|
373 | (3) |
|
14 Bayesian inference with Poisson sampling |
|
|
376 | (13) |
|
|
376 | (1) |
|
14.2 Infer a Poisson rate |
|
|
377 | (2) |
|
14.2.1 Summary of posterior |
|
|
378 | (1) |
|
14.3 Signal + known background |
|
|
379 | (1) |
|
14.4 Analysis of ON/OFF measurements |
|
|
380 | (6) |
|
14.4.1 Estimating the source rate |
|
|
381 | (3) |
|
14.4.2 Source detection question |
|
|
384 | (2) |
|
14.5 Time-varying Poisson rate |
|
|
386 | (2) |
|
|
388 | (1) |
|
Appendix A Singular value decomposition |
|
|
389 | (3) |
|
Appendix B Discrete Fourier Transforms |
|
|
392 | (42) |
|
|
392 | (1) |
|
B.2 Orthogonal and orthonormal functions |
|
|
392 | (2) |
|
B.3 Fourier series and integral transform |
|
|
394 | (4) |
|
|
395 | (1) |
|
|
396 | (2) |
|
B.4 Convolution and correlation |
|
|
398 | (5) |
|
B.4.1 Convolution theorem |
|
|
399 | (1) |
|
B.4.2 Correlation theorem |
|
|
400 | (1) |
|
B.4.3 Importance of convolution in science |
|
|
401 | (2) |
|
|
403 | (1) |
|
B.6 Nyquist sampling theorem |
|
|
404 | (3) |
|
|
406 | (1) |
|
B.7 Discrete Fourier Transform |
|
|
407 | (4) |
|
B.7.1 Graphical development |
|
|
407 | (2) |
|
B.7.2 Mathematical development of the DFT |
|
|
409 | (1) |
|
|
410 | (1) |
|
|
411 | (4) |
|
B.8.1 DFT as an approximate Fourier transform |
|
|
411 | (2) |
|
B.8.2 Inverse discrete Fourier transform |
|
|
413 | (2) |
|
B.9 The Fast Fourier Transform |
|
|
415 | (2) |
|
B.10 Discrete convolution and correlation |
|
|
417 | (5) |
|
B.10.1 Deconvolving a noisy signal |
|
|
418 | (2) |
|
B.10.2 Deconvolution with an optimal Weiner filter |
|
|
420 | (1) |
|
B.10.3 Treatment of end effects by zero padding |
|
|
421 | (1) |
|
B.11 Accurate amplitudes by zero padding |
|
|
422 | (2) |
|
B.12 Power-spectrum estimation |
|
|
424 | (4) |
|
B.12.1 Parseval's theorem and power spectral density |
|
|
424 | (1) |
|
B.12.2 Periodogram power-spectrum estimation |
|
|
425 | (1) |
|
B.12.3 Correlation spectrum estimation |
|
|
426 | (2) |
|
B.13 Discrete power spectral density estimation |
|
|
428 | (4) |
|
B.13.1 Discrete form of Parseval's theorem |
|
|
428 | (1) |
|
B.13.2 One-sided discrete power spectral density |
|
|
429 | (1) |
|
B.13.3 Variance of periodogram estimate |
|
|
429 | (2) |
|
B.13.4 Yule's stochastic spectrum estimation model |
|
|
431 | (1) |
|
B.13.5 Reduction of periodogram variance |
|
|
431 | (1) |
|
|
432 | (2) |
|
Appendix C Difference in two samples |
|
|
434 | (11) |
|
|
434 | (1) |
|
C.2 Probabilities of the four hypotheses |
|
|
434 | (5) |
|
C.2.1 Evaluation of p(C, S|D1, D2, I) |
|
|
434 | (2) |
|
C.2.2 Evaluation of p(C, S|D1, D2, I) |
|
|
436 | (2) |
|
C.2.3 Evaluation of p(C, S|D1, D2, I) |
|
|
438 | (1) |
|
C.2.4 Evaluation of p(C, S|D1, D2, I) |
|
|
439 | (1) |
|
C.3 The difference in the means |
|
|
439 | (3) |
|
C.3.1 The two-sample problem |
|
|
440 | (1) |
|
C.3.2 The Behrens-Fisher problem |
|
|
441 | (1) |
|
C.4 The ratio of the standard deviations |
|
|
442 | (3) |
|
C.4.1 Estimating the ratio, given the means are the same |
|
|
442 | (1) |
|
C.4.2 Estimating the ratio, given the means are different |
|
|
443 | (2) |
|
Appendix D Poisson ON/OFF details |
|
|
445 | (5) |
|
D.1 Derivation of p(s|Non, I) |
|
|
445 | (3) |
|
|
446 | (1) |
|
|
447 | (1) |
|
D.2 Derivation of the Bayes factor B{s+b, b} |
|
|
448 | (2) |
|
Appendix E Multivariate Gaussian from maximum entropy |
|
|
450 | (5) |
References |
|
455 | (6) |
Index |
|
461 | |