Preface |
|
vii | |
|
1 Introducing Synthetic Data Generation |
|
|
1 | (22) |
|
|
1 | (3) |
|
|
2 | (1) |
|
Synthesis Without Real Data |
|
|
2 | (1) |
|
|
3 | (1) |
|
The Benefits of Synthetic Data |
|
|
4 | (4) |
|
|
4 | (1) |
|
Enabling Better Analytics |
|
|
5 | (1) |
|
Synthetic Data as a Proxy |
|
|
6 | (1) |
|
Learning to Trust Synthetic Data |
|
|
6 | (2) |
|
Synthetic Data Case Studies |
|
|
8 | (13) |
|
Manufacturing and Distribution |
|
|
9 | (2) |
|
|
11 | (6) |
|
|
17 | (2) |
|
|
19 | (2) |
|
|
21 | (2) |
|
2 Implementing Data Synthesis |
|
|
23 | (26) |
|
|
24 | (1) |
|
|
24 | (1) |
|
Trade-Offs in Selecting PETs to Enable Data Access |
|
|
25 | (14) |
|
|
28 | (1) |
|
|
29 | (4) |
|
|
33 | (3) |
|
Examples of Applying the Decision Framework |
|
|
36 | (3) |
|
|
39 | (3) |
|
|
39 | (2) |
|
|
41 | (1) |
|
The Data Synthesis Pipeline |
|
|
42 | (5) |
|
Synthesis Program Management |
|
|
47 | (1) |
|
|
48 | (1) |
|
3 Getting Started: Distribution Fitting |
|
|
49 | (20) |
|
|
50 | (1) |
|
|
50 | (10) |
|
Fitting Distributions to Real Data |
|
|
60 | (2) |
|
Generating Synthetic Data from a Distribution |
|
|
62 | (5) |
|
Measuring How Well Synthetic Data Fits a Distribution |
|
|
62 | (1) |
|
|
63 | (4) |
|
|
67 | (1) |
|
|
67 | (2) |
|
4 Evaluating Synthetic Data Utility |
|
|
69 | (26) |
|
Synthetic Data Utility Framework: Replication of Analysis |
|
|
71 | (3) |
|
Synthetic Data Utility Framework: Utility Metrics |
|
|
74 | (18) |
|
Comparing Univariate Distributions |
|
|
75 | (4) |
|
Comparing Bivariate Statistics |
|
|
79 | (4) |
|
Comparing Multivariate Prediction Models |
|
|
83 | (4) |
|
|
87 | (5) |
|
|
92 | (3) |
|
5 Methods for Synthesizing Data |
|
|
95 | (20) |
|
Generating Synthetic Data from Theory |
|
|
95 | (4) |
|
Sampling from a Multivariate Normal Distribution |
|
|
96 | (1) |
|
Inducing Correlations with Specified Marginal Distributions |
|
|
97 | (1) |
|
Copulas with Known Marginal Distributions |
|
|
98 | (1) |
|
Generating Realistic Synthetic Data |
|
|
99 | (4) |
|
Fitting Real Data to Known Distributions |
|
|
101 | (1) |
|
Using Machine Learning to Fit the Distributions |
|
|
102 | (1) |
|
|
103 | (3) |
|
|
106 | (1) |
|
|
107 | (1) |
|
|
108 | (4) |
|
|
112 | (3) |
|
6 Identity Disclosure in Synthetic Data |
|
|
115 | (22) |
|
|
116 | (7) |
|
|
116 | (1) |
|
|
117 | (1) |
|
|
117 | (2) |
|
|
119 | (1) |
|
Meaningful Identity Disclosure |
|
|
120 | (1) |
|
Defining Information Gain |
|
|
121 | (1) |
|
|
121 | (1) |
|
|
122 | (1) |
|
How Privacy Law Impacts the Creation and Use of Synthetic Data |
|
|
123 | (12) |
|
|
125 | (4) |
|
|
129 | (1) |
|
|
130 | (3) |
|
Article 29 Working Party Opinion |
|
|
133 | (2) |
|
|
135 | (2) |
|
7 Practical Data Synthesis |
|
|
137 | (10) |
|
|
137 | (5) |
|
For Every Pre-Processing Step There Is a Post-Processing Step |
|
|
138 | (1) |
|
|
138 | (1) |
|
|
138 | (1) |
|
Not All Fields Have to Be Synthesized |
|
|
139 | (1) |
|
|
140 | (1) |
|
|
141 | (1) |
|
|
141 | (1) |
|
Missing Data and Other Data Characteristics |
|
|
141 | (1) |
|
|
142 | (1) |
|
Organizing Data Synthesis |
|
|
142 | (4) |
|
|
142 | (1) |
|
|
143 | (1) |
|
Synthesizing Cohorts Versus Full Datasets |
|
|
143 | (1) |
|
|
144 | (1) |
|
Privacy Assurance as Certification |
|
|
144 | (1) |
|
Performing Validation Studies to Get Buy-In |
|
|
144 | (1) |
|
|
145 | (1) |
|
|
145 | (1) |
|
|
146 | (1) |
Index |
|
147 | |