Customer Support: +372 7440010

Help | New account | Log In

E-book: Flexible Imputation of Missing Data, Second Edition 2nd edition [Taylor & Francis e-book]

Stef van Buuren (TNO Quality of Life, Leiden, The Netherlands)

Format: 444 pages, 76 Illustrations, color
Series: Chapman & Hall/CRC Interdisciplinary Statistics
Pub. Date: 30-Sep-2021
Publisher: Chapman & Hall/CRC
ISBN-13: 9780429492259

Other books in subject:

Probability & statistics

Taylor & Francis e-book
Price: 133,87 €*
* this price gives unlimited concurrent access for unlimited time
Regular price: 191,24 €
Save 30%

Format: 444 pages, 76 Illustrations, color
Series: Chapman & Hall/CRC Interdisciplinary Statistics
Pub. Date: 30-Sep-2021
Publisher: Chapman & Hall/CRC
ISBN-13: 9780429492259

Other books in subject:

Probability & statistics

More info about Taylor & Francis e-books

Companion website: https://www.taylorfrancis.com/books/9780429492259

This is the second edition of a popular book on multiple imputation, focused on explaining the application of methods through detailed worked examples using the MICE package as developed by the author. This new edition incorporates the recent developments in this fast-moving field.

Missing data pose challenges to real-life data analysis. Simple ad-hoc fixes, like deletion or mean imputation, only work under highly restrictive conditions, which are often not met in practice. Multiple imputation replaces each missing value by multiple plausible values. The variability between these replacements reflects our ignorance of the true (but missing) value. Each of the completed data set is then analyzed by standard methods, and the results are pooled to obtain unbiased estimates with correct confidence intervals. Multiple imputation is a general approach that also inspires novel solutions to old problems by reformulating the task at hand as a missing-data problem.

This class-tested book avoids mathematical and technical details as much as possible: formulas are accompanied by verbal statements that explain the formula in accessible terms. The book sharpens the reader’s intuition on how to think about missing data, and provides all the tools needed to execute a well-grounded quantitative analysis in the presence of missing data.

Foreword

Donald B. Rubin

Preface to second edition

xvii

Preface to first edition

xxi

About the author

xxiii

List of symbols

xxv

List of algorithms

xxvii

I Basics

(160)

1 Introduction

(26)

1.1 The problem of missing data

(5)

1.1.1 Current practice

(3)

1.1.2 Changing perspective on missing data

(2)

1.2 Concepts of MCAR, MAR and MNAR

(1)

1.3 Ad-hoc solutions

(10)

1.3.1 Listwise deletion

(2)

1.3.2 Pairwise deletion

(1)

1.3.3 Mean imputation

(1)

1.3.4 Regression imputation

(1)

1.3.5 Stochastic regression imputation

(2)

1.3.6 LOCF and BOCF

(1)

1.3.7 Indicator method

(1)

1.3.8 Summary

(1)

1.4 Multiple imputation in a nutshell

(4)

1.4.1 Procedure

(1)

1.4.2 Reasons to use multiple imputation

(1)

1.4.3 Example of multiple imputation

(2)

1.5 Goal of the book

(1)

1.6 What the book does not cover

(3)

1.6.1 Prevention

(1)

1.6.2 Weighting procedures

(1)

1.6.3 Likelihood-based approaches

(1)

1.7 Structure of the book

(1)

1.8 Exercises

(3)

2 Multiple imputation

(34)

2.1 Historic overview

(4)

2.1.1 Imputation

(1)

2.1.2 Multiple imputation

(2)

2.1.3 The expanding literature on multiple imputation

(1)

2.2 Concepts in incomplete data

(8)

2.2.1 Incomplete-data perspective

(1)

2.2.2 Causes of missing data

(2)

2.2.3 Notation

(1)

2.2.4 MCAR, MAR, and MNAR again

(2)

2.2.5 Ignorable and nonignorable

(1)

2.2.6 Implications of ignorability

(2)

2.3 Why and when multiple imputation works

(8)

2.3.1 Goal of multiple imputation

(1)

2.3.2 Three sources of variation

(3)

2.3.3 Proper imputation

(2)

2.3.4 Scope of the imputation model

(1)

2.3.5 Variance ratios

(1)

2.3.6 Degrees of freedom

(1)

2.3.7 Numerical example

(1)

2.4 Statistical intervals and tests

(2)

2.4.1 Scalar or multi-parameter inference?

(1)

2.4.2 Scalar inference

(1)

2.4.3 Numerical example

(1)

2.5 How to evaluate imputation methods

(4)

2.5.1 Simulation designs and performance measures

(1)

2.5.2 Evaluation criteria

(1)

2.5.3 Example

(2)

2.6 Imputation is not prediction

(2)

2.7 When not to use multiple imputation

(1)

2.8 How many imputations?

(3)

2.9 Exercises

(2)

3 Univariate missing data

(42)

3.1 How to generate multiple imputations

(4)

3.1.1 Predict method

(1)

3.1.2 Predict + noise method

(1)

3.1.3 Predict + noise + parameter uncertainty

(1)

3.1.4 A second predictor

(1)

3.1.5 Drawing from the observed data

(1)

3.1.6 Conclusion

(1)

3.2 Imputation under the normal linear normal

(7)

3.2.1 Overview

(1)

3.2.2 Algorithms

(2)

3.2.3 Performance

(1)

3.2.4 Generating MAR missing data

(2)

3.2.5 MAR missing data generation in multivariate data

(1)

3.2.6 Conclusion

(1)

3.3 Imputation under non-normal distributions

(3)

3.3.1 Overview

(1)

3.3.2 Imputation from the t-distribution

(2)

3.4 Predictive mean matching

(7)

3.4.1 Overview

(2)

3.4.2 Computational details

(2)

3.4.3 Number of donors

(1)

3.4.4 Pitfalls

(2)

3.4.5 Conclusion

(1)

3.5 Classification and regression trees

(3)

3.5.1 Overview

(3)

3.6 Categorical data

(4)

3.6.1 Generalized linear model

(2)

3.6.2 Perfect prediction

(1)

3.6.3 Evaluation

(1)

3.7 Other data types

(5)

3.7.1 Count data

(1)

3.7.2 Semi-continuous data

(1)

3.7.3 Censored, truncated and rounded data

(3)

3.8 Nonignorable missing data

(6)

3.8.1 Overview

(1)

3.8.2 Selection model

(1)

3.8.3 Pattern-mixture model

(1)

3.8.4 Converting selection and pattern-mixture models

(1)

3.8.5 Sensitivity analysis

100

(1)

3.8.6 Role of sensitivity analysis

101

(1)

3.8.7 Recent developments

102

(1)

3.9 Exercises

102

(3)

4 Multivariate missing data

105

(34)

4.1 Missing data pattern

105

(6)

4.1.1 Overview

105

(2)

4.1.2 Summary statistics

107

(2)

4.1.3 Influx and outflux

109

(2)

4.2 Issues in multivariate imputation

111

(1)

4.3 Monotone data imputation

112

(3)

4.3.1 Overview

112

(1)

4.3.2 Algorithm

113

(2)

4.4 Joint modeling

115

(4)

4.4.1 Overview

115

(1)

4.4.2 Continuous data

115

(2)

4.4.3 Categorical data

117

(2)

4.5 Fully conditional specification

119

(11)

4.5.1 Overview

119

(1)

4.5.2 The MICE algorithm

120

(2)

4.5.3 Compatibility

122

(2)

4.5.4 Congeniality or compatibility?

124

(1)

4.5.5 Model-based and data-based imputation

125

(1)

4.5.6 Number of iterations

126

(1)

4.5.7 Example of slow convergence

126

(3)

4.5.8 Performance

129

(1)

4.6 FCS and JM

130

(5)

4.6.1 Relations between FCS and JM

130

(1)

4.6.2 Comparisons

130

(1)

4.6.3 Illustration

131

(4)

4.7 MICE extensions

135

(2)

4.7.1 Skipping imputations and overimputation

135

(1)

4.7.2 Blocks of variables, hybrid imputation

135

(1)

4.7.3 Blocks of units, monotone blocks

136

(1)

4.7.4 Tile imputation

136

(1)

4.8 Conclusion

137

(1)

4.9 Exercises

137

(2)

5 Analysis of imputed data

139

(22)

5.1 Workflow

139

(6)

5.1.1 Recommended workflows

140

(2)

5.1.2 Not recommended workflow: Averaging the data

142

(2)

5.1.3 Not recommended workflow: Stack imputed data

144

(1)

5.1.4 Repeated analyses

144

(1)

5.2 Parameter pooling

145

(2)

5.2.1 Scalar inference of normal quantities

145

(1)

5.2.2 Scalar inference of non-normal quantities

146

(1)

5.3 Multi-parameter inference

147

(6)

5.3.1 D1 Multivariate Wald test

147

(2)

5.3.2 D2 Combining test statistics

149

(1)

5.3.3 D3 Likelihood ratio test

150

(2)

5.3.4 D1, D2 or D3?

152

(1)

5.4 Stepwise model selection

153

(4)

5.4.1 Variable selection techniques

153

(1)

5.4.2 Computation

154

(1)

5.4.3 Model optimism

155

(2)

5.5 Parallel computation

157

(1)

5.6 Conclusion

158

(1)

5.7 Exercises

158

(3)

II Advanced techniques

161

(96)

6 Imputation in practice

163

(34)

6.1 Overview of modeling choices

163

(2)

6.2 Ignorable or nonignorable?

165

(1)

6.3 Model form and predictors

166

(4)

6.3.1 Model form

166

(1)

6.3.2 Predictors

167

(3)

6.4 Derived variables

170

(14)

6.4.1 Ratio of two variables

170

(5)

6.4.2 Interaction terms

175

(1)

6.4.3 Quadratic relations

176

(1)

6.4.4 Compositional data

177

(4)

6.4.5 Sum scores

181

(1)

6.4.6 Conditional imputation

182

(2)

6.5 Algorithmic options

184

(5)

6.5.1 Visit sequence

184

(3)

6.5.2 Convergence

187

(2)

6.6 Diagnostics

189

(5)

6.6.1 Model fit versus distributional discrepancy

190

(1)

6.6.2 Diagnostic graphs

190

(4)

6.7 Conclusion

194

(1)

6.8 Exercises

195

(2)

7 Multilevel multiple imputation

197

(44)

7.1 Introduction

197

(1)

7.2 Notation for multilevel models

197

(3)

7.3 Missing values in multilevel data

200

(4)

7.3.1 Practical issues in multilevel imputation

201

(1)

7.3.2 Ad-hoc solutions for multilevel data

202

(1)

7.3.3 Likelihood solutions

203

(1)

7.4 Multilevel imputation by joint modeling

204

(1)

7.5 Multilevel imputation by fully conditional specification

205

(2)

7.5.1 Add cluster means of predictors

206

(1)

7.5.2 Model cluster heterogeneity

207

(1)

7.6 Continuous outcome

207

(7)

7.6.1 General principle

208

(1)

7.6.2 Methods

209

(1)

7.6.3 Example

209

(5)

7.7 Discrete outcome

214

(4)

7.7.1 Methods

214

(1)

7.7.2 Example

215

(3)

7.8 Imputation of level-2 variable

218

(1)

7.9 Comparative work

219

(1)

7.10 Guidelines and advice

220

(20)

7.10.1 Intercept-only model, missing outcomes

222

(1)

7.10.2 Random intercepts, missing level-1 predictor

222

(2)

7.10.3 Random intercepts, contextual model

224

(2)

7.10.4 Random intercepts, missing level-2 predictor

226

(2)

7.10.5 Random intercepts, interactions

228

(4)

7.10.6 Random slopes, missing outcomes and predictors

232

(2)

7.10.7 Random slopes, interactions

234

(4)

7.10.8 Recipes

238

(2)

7.11 Future research

240

(1)

8 Individual causal effects

241

(16)

8.1 Need for individual causal effects

241

(2)

8.2 Problem of causal inference

243

(2)

8.3 Framework

245

(1)

8.4 Generating imputations by FCS

246

(8)

8.4.1 Naive FCS

246

(1)

8.4.2 FCS with a prior for p

247

(6)

8.4.3 Extensions

253

(1)

8.5 Bibliographic notes

254

(3)

III Case studies

257

(80)

9 Measurement issues

259

(36)

9.1 Too many columns

259

(12)

9.1.1 Scientific question

260

(1)

9.1.2 Leiden 85+ Cohort

260

(1)

9.1.3 Data exploration

261

(2)

9.1.4 Outflux

263

(2)

9.1.5 Finding problems: loggedEvents

265

(2)

9.1.6 Quick predictor selection: quickpred

267

(1)

9.1.7 Generating the imputations

268

(2)

9.1.8 A further improvement: Survival as predictor variable

270

(1)

9.1.9 Some guidance

270

(1)

9.2 Sensitivity analysis

271

(6)

9.2.1 Causes and consequences of missing data

272

(2)

9.2.2 Scenarios

274

(1)

9.2.3 Generating imputations under the δ-adjustment

274

(1)

9.2.4 Complete-data model

275

(2)

9.2.5 Conclusion

277

(1)

9.3 Correct prevalence estimates from self-reported data

277

(6)

9.3.1 Description of the problem

277

(1)

9.3.2 Don't count on predictions

278

(2)

9.3.3 The main idea

280

(1)

9.3.4 Data

281

(1)

9.3.5 Application

281

(2)

9.3.6 Conclusion

283

(1)

9.4 Enhancing comparability

283

(11)

9.4.1 Description of the problem

283

(1)

9.4.2 Full dependence: Simple equating

284

(2)

9.4.3 Independence: Imputation without a bridge study

286

(2)

9.4.4 Fully dependent or independent?

288

(1)

9.4.5 Imputation using a bridge study

289

(3)

9.4.6 Interpretation

292

(1)

9.4.7 Conclusion

293

(1)

9.5 Exercises

294

(1)

10 Selection issues

295

(16)

10.1 Correcting for selective drop-out

295

(7)

10.1.1 POPS study: 19 years follow-up

295

(1)

10.1.2 Characterization of the drop-out

296

(1)

10.1.3 Imputation model

296

(3)

10.1.4 A solution "that does not look good"

299

(2)

10.1.5 Results

301

(1)

10.1.6 Conclusion

302

(1)

10.2 Correcting for nonresponse

302

(7)

10.2.1 Fifth Dutch Growth Study

303

(1)

10.2.2 Nonresponse

303

(1)

10.2.3 Comparison to known population totals

304

(1)

10.2.4 Augmenting the sample

304

(2)

10.2.5 Imputation model

306

(1)

10.2.6 Influence of nonresponse on final height

307

(1)

10.2.7 Discussion

308

(1)

10.3 Exercises

309

(2)

11 Longitudinal data

311

(26)

11.1 Long and wide format

311

(2)

11.2 SE Fireworks Disaster Study

313

(7)

11.2.1 Intention to treat

314

(1)

11.2.2 Imputation model

315

(2)

11.2.3 Inspecting imputations

317

(1)

11.2.4 Complete-data model

318

(1)

11.2.5 Results from the complete-data model

319

(1)

11.3 Time raster imputation

320

(12)

11.3.1 Change score

321

(1)

11.3.2 Scientific question: Critical periods

322

(2)

11.3.3 Broken stick model

324

(2)

11.3.4 Terneuzen Birth Cohort

326

(2)

11.3.5 Shrinkage and the change score

328

(1)

11.3.6 Imputation

328

(2)

11.3.7 Complete-data model

330

(2)

11.4 Conclusion

332

(2)

11.5 Exercises

334

(3)

IV Extensions

337

(14)

12 Conclusion

339

(12)

12.1 Some dangers, some do's and some don'ts

339

(3)

12.1.1 Some dangers

339

(1)

12.1.2 Some do's

340

(1)

12.1.3 Some don'ts

341

(1)

12.2 Reporting

342

(3)

12.2.1 Reporting guidelines

343

(1)

12.2.2 Template

344

(1)

12.3 Other applications

345

(2)

12.3.1 Synthetic datasets for data protection

345

(1)

12.3.2 Analysis of coarsened data

345

(1)

12.3.3 File matching of multiple datasets

346

(1)

12.3.4 Planned missing data for efficient designs

346

(1)

12.3.5 Adjusting for verification bias

347

(1)

12.4 Future developments

347

(2)

12.4.1 Derived variables

347

(1)

12.4.2 Algorithms for blocks and batches

347

(1)

12.4.3 Nested imputation

348

(1)

12.4.4 Better trials with dynamic treatment regimes

348

(1)

12.4.5 Distribution-free pooling rules

348

(1)

12.4.6 Improved diagnostic techniques

349

(1)

12.4.7 Building block in modular statistics

349

(1)

12.5 Exercises

349

(2)

References

351

(42)

Author index

393

(12)

Subject index

405

Stef van Buuren is a statistical consultant at the Netherlands Organisation for Applied Scientific Research TNO in Leiden with a broad knowledge of quantitative issues in public health. Since 2015, Van Buuren holds is the world's first Professor of Missing Data at the department of Methodology & Statistics, FSS, University of Utrecht. He is the originator of various new statistical tools.

Permanent link: https://www.kriso.lv/db/9780429492259_pe.html

Keywords:

E-book: Flexible Imputation of Missing Data, Second Edition 2nd edition [Taylor & Francis e-book]

Account & settings

Search

Search database

Refine By

Subjects Publishers Subjects

Choose shopping cart