Klientu atbalsts: 27018494

Grāmatu iegāde | Jauns profils | Ienākt

Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data [Mīkstie vāki]

4.00/5 (17 ratings by Goodreads)

Khaled El Emam, Lucy Mosquera, Richard Hoptroff

Formāts: Paperback / softback, 175 pages, height x width: 232x178 mm
Izdošanas datums: 02-Jun-2020
Izdevniecība: O'Reilly Media
ISBN-10: 1492072745
ISBN-13: 9781492072744

Citas grāmatas par šo tēmu:

Databases - (Noliktavā: 1 punkts)

Mīkstie vāki
Cena: 60,87 €*
* ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
Standarta cena: 71,61 €
Ietaupiet 15%
Grāmatu piegādes laiks ir 3-4 nedēļas, ja grāmata ir uz vietas izdevniecības noliktavā. Ja izdevējam nepieciešams publicēt jaunu tirāžu, grāmatas piegāde var aizkavēties.
Daudzums:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Ielikt grozā
Piegādes laiks - 4-6 nedēļas
Pievienot vēlmju sarakstam

Formāts: Paperback / softback, 175 pages, height x width: 232x178 mm
Izdošanas datums: 02-Jun-2020
Izdevniecība: O'Reilly Media
ISBN-10: 1492072745
ISBN-13: 9781492072744

Citas grāmatas par šo tēmu:

Databases - (Noliktavā: 1 punkts)

Permanent link: https://www.kriso.lv/db/9781492072744.html

Keywords:

Building and testing machine learning models requires access to large and diverse data. But where can you find usable datasets without running into privacy issues? This practical book introduces techniques for generating synthetic data&;fake data generated from real data&;so you can perform secondary analysis to do research, understand customer behaviors, develop new products, or generate new revenue.

Data scientists will learn how synthetic data generation provides a way to make such data broadly available for secondary purposes while addressing many privacy concerns. Analysts will learn the principles and steps for generating synthetic data from real datasets. And business leaders will see how synthetic data can help accelerate time to a product or solution.

This book describes:

Steps for generating synthetic data using multivariate normal distributions
Methods for distribution fitting covering different goodness-of-fit metrics
How to replicate the simple structure of original data
An approach for modeling data structure to consider complex relationships
Multiple approaches and metrics you can use to assess data utility
How analysis performed on real data can be replicated with synthetic data
Privacy implications of synthetic data and methods to assess identity disclosure

Preface

vii

1 Introducing Synthetic Data Generation

(22)

Defining Synthetic Data

(3)

Synthesis from Real Data

(1)

Synthesis Without Real Data

(1)

Synthesis and Utility

(1)

The Benefits of Synthetic Data

(4)

Efficient Access to Data

(1)

Enabling Better Analytics

(1)

Synthetic Data as a Proxy

(1)

Learning to Trust Synthetic Data

(2)

Synthetic Data Case Studies

(13)

Manufacturing and Distribution

(2)

Healthcare

(6)

Financial Services

(2)

Transportation

(2)

Summary

(2)

2 Implementing Data Synthesis

(26)

When to Synthesize

(1)

Identifiability Spectrum

(1)

Trade-Offs in Selecting PETs to Enable Data Access

(14)

Decision Criteria

(1)

PETs Considered

(4)

Decision Framework

(3)

Examples of Applying the Decision Framework

(3)

Data Synthesis Projects

(3)

Data Synthesis Steps

(2)

Data Preparation

(1)

The Data Synthesis Pipeline

(5)

Synthesis Program Management

(1)

Summary

(1)

3 Getting Started: Distribution Fitting

(20)

Framing Data

(1)

How Data Is Distributed

(10)

Fitting Distributions to Real Data

(2)

Generating Synthetic Data from a Distribution

(5)

Measuring How Well Synthetic Data Fits a Distribution

(1)

The Overfitting Dilemma

(4)

A Little Light Weeding

(1)

Summary

(2)

4 Evaluating Synthetic Data Utility

(26)

Synthetic Data Utility Framework: Replication of Analysis

(3)

Synthetic Data Utility Framework: Utility Metrics

(18)

Comparing Univariate Distributions

(4)

Comparing Bivariate Statistics

(4)

Comparing Multivariate Prediction Models

(4)

Distinguishability

(5)

Summary

(3)

5 Methods for Synthesizing Data

(20)

Generating Synthetic Data from Theory

(4)

Sampling from a Multivariate Normal Distribution

(1)

Inducing Correlations with Specified Marginal Distributions

(1)

Copulas with Known Marginal Distributions

(1)

Generating Realistic Synthetic Data

(4)

Fitting Real Data to Known Distributions

101

(1)

Using Machine Learning to Fit the Distributions

102

(1)

Hybrid Synthetic Data

103

(3)

Machine Learning Methods

106

(1)

Deep Learning Methods

107

(1)

Synthesizing Sequences

108

(4)

Summary

112

(3)

6 Identity Disclosure in Synthetic Data

115

(22)

Types of Disclosure

116

(7)

Identity Disclosure

116

(1)

Learning Something New

117

(1)

Attribute Disclosure

117

(2)

Inferential Disclosure

119

(1)

Meaningful Identity Disclosure

120

(1)

Defining Information Gain

121

(1)

Bringing It All Together

121

(1)

Unique Matches

122

(1)

How Privacy Law Impacts the Creation and Use of Synthetic Data

123

(12)

Issues Under the GDPR

125

(4)

Issues Under the CCPA

129

(1)

Issues Under HIPAA

130

(3)

Article 29 Working Party Opinion

133

(2)

Summary

135

(2)

7 Practical Data Synthesis

137

(10)

Managing Data Complexity

137

(5)

For Every Pre-Processing Step There Is a Post-Processing Step

138

(1)

Field Types

138

(1)

The Need for Rules

138

(1)

Not All Fields Have to Be Synthesized

139

(1)

Synthesizing Dates

140

(1)

Synthesizing Geography

141

(1)

Lookup Fields and Tables

141

(1)

Missing Data and Other Data Characteristics

141

(1)

Partial Synthesis

142

(1)

Organizing Data Synthesis

142

(4)

Computing Capacity

142

(1)

A Toolbox of Techniques

143

(1)

Synthesizing Cohorts Versus Full Datasets

143

(1)

Continuous Data Feeds

144

(1)

Privacy Assurance as Certification

144

(1)

Performing Validation Studies to Get Buy-In

144

(1)

Motivated Intruder Tests

145

(1)

Who Owns Synthetic Data?

145

(1)

Conclusions

146

(1)

Index

147

Dr. Khaled El Emam is a senior scientist at the Children's Hospital of Eastern Ontario (CHEO) Research Institute and Director of the multi-disciplinary Electronic Health Information Laboratory. Lucy Mosquera has a bachelor's degree in Biology and Mathematics from Queen's University and is a current graduate student in the department of statistics at the University of British Columbia. During her time at Queen's, Lucy provided data management support on a dozen clinical trials and observational studies run through Kingston General Hospital's Clinical Evaluation Research Unit. Lucy has also worked on clinical trial data sharing methods based on homomorphic encryption and secret sharing protocols. At Replica Analytics, Lucy is responsible for developing statistical and machine learning models for data generation, and integrating subject area expertise in clinical trial data into synthetic data generation methods, as well as the statistical assessments of our synthetic data generation. Dr. Richard Hoptroff is a long term technology inventor, investor and entrepreneur.

Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data [Mīkstie vāki]

Konts un iestatījumi

Meklēšana

Meklēt datubāzē

Refine By

Tēmas Grāmatas angļu valodā

Izvēlieties iepirkumu grozu