Atjaunināt sīkdatņu piekrišanu

E-grāmata: Data Science for Mathematicians [Taylor & Francis e-book]

Edited by
  • Taylor & Francis e-book
  • Cena: 231,23 €*
  • * this price gives unlimited concurrent access for unlimited time
  • Standarta cena: 330,33 €
  • Ietaupiet 30%

Mathematicians have skills that, if deepened in the right ways, would enable them to use data to answer questions important to them and others, and report those answers in compelling ways. Data science combines parts of mathematics, statistics, computer science. Gaining such power and the ability to teach has reinvigorated the careers of mathematicians. This handbook will assist mathematicians to better understand the opportunities presented by data science. As it applies to the curriculum, research, and career opportunities, data scienc is a fast-growing field. Contributors from both academics and industry present their views on these opportunities and how to advantage them.

Foreword xv
1 Introduction
1(12)
Nathan Carter
1.1 Who should read this book?
1(2)
1.2 What is data science?
3(3)
1.3 Is data science new?
6(2)
1.4 What can I expect from this book?
8(2)
1.5 What will this book expect from me?
10(3)
2 Programming with Data
13(30)
Sean Raleigh
2.1 Introduction
14(1)
2.2 The computing environment
14(9)
2.2.1 Hardware
14(1)
2.2.2 The command line
15(1)
2.2.3 Programming languages
16(1)
2.2.4 Integrated development environments (IDEs)
17(1)
2.2.5 Notebooks
18(4)
2.2.6 Version control
22(1)
2.3 Best practices
23(7)
2.3.1 Write readable code
23(3)
2.3.2 Don't repeat yourself
26(1)
2.3.3 Set seeds for random processes
27(1)
2.3.4 Profile, benchmark, and optimize judiciously
27(1)
2.3.5 Test your code
28(1)
2.3.6 Don't rely on black boxes
29(1)
2.4 Data-centric coding
30(11)
2.4.1 Obtaining data
30(1)
2.4.1.1 Files
30(1)
2.4.1.2 The web
31(1)
2.4.1.3 Databases
31(2)
2.4.1.4 Other sources and concerns
33(1)
2.4.2 Data structures
34(1)
2.4.3 Cleaning data
35(1)
2.4.3.1 Missing data
36(1)
2.4.3.2 Data values
37(1)
2.4.3.3 Outliers
38(1)
2.4.3.4 Other issues
38(2)
2.4.4 Exploratory data analysis (EDA)
40(1)
2.5 Getting help
41(1)
2.6 Conclusion
41(2)
3 Linear Algebra
43(56)
Jeffery Leader
3.1 Data and matrices
44(7)
3.1.1 Data, vectors, and matrices
44(2)
3.1.2 Term-by-document matrices
46(1)
3.1.3 Matrix storage and manipulation issues
47(4)
3.2 Matrix decompositions
51(27)
3.2.1 Matrix decompositions and data science
51(1)
3.2.2 The LU decomposition
51(1)
3.2.2.1 Gaussian elimination
51(2)
3.2.2.2 The matrices L and U
53(2)
3.2.2.3 Permuting rows
55(1)
3.2.2.4 Computational notes
56(2)
3.2.3 The Cholesky decomposition
58(2)
3.2.4 Least-squares curve-fitting
60(3)
3.2.5 Recommender systems and the QR decomposition
63(1)
3.2.5.1 A motivating example
63(2)
3.2.5.2 The QR decomposition
65(5)
3.2.5.3 Applications of the QR decomposition
70(1)
3.2.6 The singular value decomposition
71(3)
3.2.6.1 SVD in our recommender system
74(3)
3.2.6.2 Further reading on the SVD
77(1)
3.3 Eigenvalues and eigenvectors
78(14)
3.3.1 Eigenproblems
78(4)
3.3.2 Finding eigenvalues
82(2)
3.3.3 The power method
84(2)
3.3.4 PageRank
86(6)
3.4 Numerical computing
92(3)
3.4.1 Floating point computing
92(1)
3.4.2 Floating point arithmetic
92(2)
3.4.3 Further reading
94(1)
3.5 Projects
95(4)
3.5.1 Creating a database
95(1)
3.5.2 The QR decomposition and query-matching
96(1)
3.5.3 The SVD and latent semantic indexing
96(1)
3.5.4 Searching a web
96(3)
4 Basic Statistics
99(86)
David White
4.1 Introduction
100(3)
4.2 Exploratory data analysis and visualizations
103(8)
4.2.1 Descriptive statistics
106(3)
4.2.2 Sampling and bias
109(2)
4.3 Modeling
111(13)
4.3.1 Linear regression
112(4)
4.3.2 Polynomial regression
116(1)
4.3.3 Group-wise models and clustering
117(1)
4.3.4 Probability models
118(4)
4.3.5 Maximum likelihood estimation
122(2)
4.4 Confidence intervals
124(9)
4.4.1 The sampling distribution
125(2)
4.4.2 Confidence intervals from the sampling distribution
127(3)
4.4.3 Bootstrap resampling
130(3)
4.5 Inference
133(12)
4.5.1 Hypothesis testing
133(1)
4.5.1.1 First example
133(3)
4.5.1.2 General strategy for hypothesis testing
136(1)
4.5.1.3 Inference to compare two populations
137(1)
4.5.1.4 Other types of hypothesis tests
138(1)
4.5.2 Randomization-based inference
139(3)
4.5.3 Type I and Type II error
142(1)
4.5.4 Power and effect size
142(1)
4.5.5 The trouble with p-hacking
143(1)
4.5.6 Bias and scope of inference
144(1)
4.6 Advanced regression
145(14)
4.6.1 Transformations
145(1)
4.6.2 Outliers and high leverage points
146(2)
4.6.3 Multiple regression, interaction
148(4)
4.6.4 What to do when the regression assumptions fail
152(3)
4.6.5 Indicator variables and ANOVA
155(4)
4.7 The linear algebra approach to statistics
159(14)
4.7.1 The general linear model
160(5)
4.7.2 Ridge regression and penalized regression
165(1)
4.7.3 Logistic regression
166(5)
4.7.4 The generalized linear model
171(1)
4.7.5 Categorical data analysis
172(1)
4.8 Causality
173(4)
4.8.1 Experimental design
173(3)
4.8.2 Quasi-experiments
176(1)
4.9 Bayesian statistics
177(3)
4.9.1 Bayes' formula
177(1)
4.9.2 Prior and posterior distributions
178(2)
4.10 A word on curricula
180(2)
4.10.1 Data wrangling
180(1)
4.10.2 Cleaning data
181(1)
4.11 Conclusion
182(1)
4.12 Sample projects
182(3)
5 Clustering
185(54)
Amy S. Wagaman
5.1 Introduction
186(2)
5.1.1 What is clustering?
186(1)
5.1.2 Example applications
186(1)
5.1.3 Clustering observations
187(1)
5.2 Visualization
188(1)
5.3 Distances
189(4)
5.4 Partitioning and the /c-means algorithm
193(11)
5.4.1 The fc-means algorithm
193(2)
5.4.2 Issues with fc-means
195(2)
5.4.3 Example with wine data
197(3)
5.4.4 Validation
200(4)
5.4.5 Other partitioning algorithms
204(1)
5.5 Hierarchical clustering
204(7)
5.5.1 Linkages
205(1)
5.5.2 Algorithm
206(1)
5.5.3 Hierarchical simple example
207(1)
5.5.4 Dendrograms and wine example
208(3)
5.5.5 Other hierarchical algorithms
211(1)
5.6 Case study
211(6)
5.6.1 /c-means results
212(2)
5.6.2 Hierarchical results
214(1)
5.6.3 Case study conclusions
215(2)
5.7 Model-based methods
217(7)
5.7.1 Model development
217(1)
5.7.2 Model estimation
218(2)
5.7.3 Mclust and model selection
220(1)
5.7.4 Example with wine data
220(1)
5.7.5 Model-based versus fc-means
221(3)
5.8 Density-based methods
224(4)
5.8.1 Example with iris data
226(2)
5.9 Dealing with network data
228(4)
5.9.1 Network clustering example
229(3)
5.10 Challenges
232(2)
5.10.1 Feature selection
232(1)
5.10.2 Hierarchical clusters
233(1)
5.10.3 Overlapping clusters, or fuzzy clustering
234(1)
5.11 Exercises
234(5)
6 Operations Research
239(52)
Alice Paul
Susan Martonosi
6.1 History and background
241(3)
6.1.1 How does OR connect to data science?
241(1)
6.1.2 The OR process
242(1)
6.1.3 Balance between efficiency and complexity
243(1)
6.2 Optimization
244(16)
6.2.1 Complexity-tractability trade-off
246(1)
6.2.2 Linear optimization
247(2)
6.2.2.1 Duality and optimality conditions
249(3)
6.2.2.2 Extension to integer programming
252(1)
6.2.3 Convex optimization
252(4)
6.2.3.1 Duality and optimality conditions
256(2)
6.2.4 Non-convex optimization
258(2)
6.3 Simulation
260(13)
6.3.1 Probability principles of simulation
261(1)
6.3.2 Generating random variables
262(1)
6.3.2.1 Simulation from a known distribution
262(5)
6.3.2.2 Simulation from an empirical distribution: bootstrapping
267(1)
6.3.2.3 Markov Chain Monte Carlo (MCMC) methods
267(2)
6.3.3 Simulation techniques for statistical and machine learning model assessment
269(1)
6.3.3.1 Bootstrapping confidence intervals
269(1)
6.3.3.2 Cross-validation
270(1)
6.3.4 Simulation techniques for prescriptive analytics
271(1)
6.3.4.1 Discrete-event simulation
272(1)
6.3.4.2 Agent-based modeling
272(1)
6.3.4.3 Using these tools for prescriptive analytics
273(1)
6.4 Stochastic optimization
273(4)
6.4.1 Dynamic programming formulation
274(1)
6.4.2 Solution techniques
275(2)
6.5 Putting the methods to use: prescriptive analytics
277(3)
6.5.1 Bike-sharing systems
277(1)
6.5.2 A customer choice model for online retail
278(1)
6.5.3 HIV treatment and prevention
279(1)
6.6 Tools
280(3)
6.6.1 Optimization solvers
281(1)
6.6.2 Simulation software and packages
282(1)
6.6.3 Stochastic optimization software and packages
283(1)
6.7 Looking to the future
283(2)
6.8 Projects
285(6)
6.8.1 The vehicle routing problem
285(1)
6.8.2 The unit commitment problem for power systems
286(3)
6.8.3 Modeling project
289(1)
6.8.4 Data project
289(2)
7 Dimensionality Reduction
291(48)
Sofya Chepushtanova
Elin Farnell
Eric Kehoe
Michael Kirby
Henry Kvinge
7.1 Introduction
292(2)
7.2 The geometry of data and dimension
294(4)
7.3 Principal Component Analysis
298(6)
7.3.1 Derivation and properties
298(2)
7.3.2 Connection to SVD
300(1)
7.3.3 How PCA is used for dimension estimation and data reduction
300(1)
7.3.4 Topological dimension
301(2)
7.3.5 Multidimensional scaling
303(1)
7.4 Good projections
304(2)
7.5 Non-integer dimensions
306(6)
7.5.1 Background on dynamical systems
307(1)
7.5.2 Fractal dimension
308(1)
7.5.3 The correlation dimension
309(2)
7.5.4 Correlation dimension of the Lorenz attractor
311(1)
7.6 Dimension reduction on the Grassmannian
312(6)
7.7 Dimensionality reduction in the presence of symmetry
318(3)
7.8 Category theory applied to data visualization
321(5)
7.9 Other methods
326(7)
7.9.1 Nonlinear Principal Component Analysis
326(4)
7.9.2 Whitney's reduction network
330(1)
7.9.3 The generalized singular value decomposition
331(1)
7.9.4 False nearest neighbors
332(1)
7.9.5 Additional methods
332(1)
7.10 Interesting theorems on dimension
333(3)
7.10.1 Whitney's theorem
333(1)
7.10.2 Takens' theorem
333(1)
7.10.3 Nash embedding theorems
334(1)
7.10.4 Iohnson-Lindenstrauss lemma
335(1)
7.11 Conclusions
336(3)
7.11.1 Summary and method of application
336(1)
7.11.2 Suggested exercises
336(3)
8 Machine Learning
339(70)
Mahesh Agarwal
Nathan Carter
David Oury
8.1 Introduction
340(2)
8.1.1 Core concepts of supervised learning
341(1)
8.1.2 Types of supervised learning
342(1)
8.2 Training dataset and test dataset
342(4)
8.2.1 Constraints
342(2)
8.2.2 Methods for data separation
344(2)
8.3 Machine learning workflow
346(14)
8.3.1 Step 1: obtaining the initial dataset
348(2)
8.3.2 Step 2: preprocessing
350(1)
8.3.2.1 Missing values and outliers
351(1)
8.3.2.2 Feature engineering
352(1)
8.3.3 Step 3: creating training and test datasets
353(1)
8.3.4 Step 4: model creation
354(1)
8.3.4.1 Scaling and normalization
354(1)
8.3.4.2 Feature selection
355(2)
8.3.5 Step 5: prediction and evaluation
357(1)
8.3.6 Iterative model building
358(2)
8.4 Implementing the ML workflow
360(4)
8.4.1 Using scikit-learn
360(3)
8.4.2 Transformer objects
363(1)
8.5 Gradient descent
364(6)
8.5.1 Loss functions
364(1)
8.5.2 A powerful optimization tool
365(1)
8.5.3 Application to regression
366(1)
8.5.4 Support for regularization
367(3)
8.6 Logistic regression
370(7)
8.6.1 Logistic regression framework
371(1)
8.6.2 Parameter estimation for logistic regression
371(2)
8.6.3 Evaluating the performance of a classifier
373(4)
8.7 Naive Bayes classifier
377(5)
8.7.1 Using Bayes' rule
377(2)
8.7.1.1 Estimating the probabilities
379(1)
8.7.1.2 Laplace smoothing
379(1)
8.7.2 Health care example
380(2)
8.8 Support vector machines
382(10)
8.8.1 Linear SVMs in the case of linear separability
383(3)
8.8.2 Linear SVMs without linear separability
386(3)
8.8.3 Nonlinear SVMs
389(3)
8.9 Decision trees
392(10)
8.9.1 Classification trees
395(3)
8.9.2 Regression decision trees
398(1)
8.9.3 Pruning
399(3)
8.10 Ensemble methods
402(4)
8.10.1 Bagging
403(1)
8.10.2 Random forests
403(1)
8.10.3 Boosting
404(2)
8.11 Next steps
406(3)
9 Deep Learning
409(32)
Samuel S. Watson
9.1 Introduction
410(3)
9.1.1 Overview
410(1)
9.1.2 History of neural networks
411(2)
9.2 Multilayer perceptrons
413(5)
9.2.1 Backpropagation
414(3)
9.2.2 Neurons
417(1)
9.2.3 Neural networks for classification
417(1)
9.3 Training techniques
418(4)
9.3.1 Initialization
419(1)
9.3.2 Optimization algorithms
419(2)
9.3.3 Dropout
421(1)
9.3.4 Batch normalization
421(1)
9.3.5 Weight regularization
421(1)
9.3.6 Early stopping
422(1)
9.4 Convolutional neural networks
422(7)
9.4.1 Convnet layers
423(1)
9.4.2 Convolutional architectures for ImageNet
424(5)
9.5 Recurrent neural networks
429(2)
9.5.1 LSTM cells
430(1)
9.6 Transformers
431(4)
9.6.1 Overview
431(1)
9.6.2 Attention layers
432(2)
9.6.3 Self-attention layers
434(1)
9.6.4 Word order
434(1)
9.6.5 Using transformers
434(1)
9.7 Deep learning frameworks
435(5)
9.7.1 Hardware acceleration
435(1)
9.7.2 History of deep learning frameworks
436(2)
9.7.3 TensorFlow with Keras
438(2)
9.8 Open questions
440(1)
9.9 Exercises and solutions
440(1)
10 Topological Data Analysis
441(34)
Henry Adams
Johnathan Bush
Joshua Mirth
10.1 Introduction
441(2)
10.2 Example applications
443(3)
10.2.1 Image processing
443(1)
10.2.2 Molecule configurations
443(2)
10.2.3 Agent-based modeling
445(1)
10.2.4 Dynamical systems
445(1)
10.3 Topology
446(1)
10.4 Simplicial complexes
447(2)
10.5 Homology
449(8)
10.5.1 Simplicial homology
450(1)
10.5.2 Homology definitions
451(1)
10.5.3 Homology example
452(1)
10.5.4 Homology computation using linear algebra
453(4)
10.6 Persistent homology
457(6)
10.7 Sublevelset persistence
463(1)
10.8 Software and exercises
464(3)
10.9 References
467(1)
10.10 Appendix: stability of persistent homology
467(8)
10.10.1 Distances between datasets
468(3)
10.10.2 Bottleneck distance and visualization
471(2)
10.10.3 Stability results
473(2)
Bibliography 475(40)
Index 515
Nathan Carter is a professor at Bentley University.