Atjaunināt sīkdatņu piekrišanu

Essential Math for AI: Next-Level Mathematics for Efficient and Successful AI Systems [Mīkstie vāki]

3.81/5 (32 ratings by Goodreads)
  • Formāts: Paperback / softback, 602 pages, height x width: 233x178 mm
  • Izdošanas datums: 17-Jan-2023
  • Izdevniecība: O'Reilly Media
  • ISBN-10: 1098107632
  • ISBN-13: 9781098107635
  • Mīkstie vāki
  • Cena: 73,03 €*
  • * ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
  • Standarta cena: 85,92 €
  • Ietaupiet 15%
  • Grāmatu piegādes laiks ir 3-4 nedēļas, ja grāmata ir uz vietas izdevniecības noliktavā. Ja izdevējam nepieciešams publicēt jaunu tirāžu, grāmatas piegāde var aizkavēties.
  • Daudzums:
  • Ielikt grozā
  • Piegādes laiks - 4-6 nedēļas
  • Pievienot vēlmju sarakstam
  • Formāts: Paperback / softback, 602 pages, height x width: 233x178 mm
  • Izdošanas datums: 17-Jan-2023
  • Izdevniecība: O'Reilly Media
  • ISBN-10: 1098107632
  • ISBN-13: 9781098107635

Companies are scrambling to integrate AI into their systems and operations. But to build truly successful solutions, you need a firm grasp of the underlying mathematics. This accessible guide walks you through the math necessary to thrive in the AI field such as focusing on real-world applications rather than dense academic theory.

Engineers, data scientists, and students alike will examine mathematical topics critical for AI--including regression, neural networks, optimization, backpropagation, convolution, Markov chains, and more--through popular applications such as computer vision, natural language processing, and automated systems. And supplementary Jupyter notebooks shed light on examples with Python code and visualizations. Whether you're just beginning your career or have years of experience, this book gives you the foundation necessary to dive deeper in the field.

  • Understand the underlying mathematics powering AI systems, including generative adversarial networks, random graphs, large random matrices, mathematical logic, optimal control, and more
  • Learn how to adapt mathematical methods to different applications from completely different fields
  • Gain the mathematical fluency to interpret and explain how AI systems arrive at their decisions

Preface xvii
1 Why Learn the Mathematics of Al?
1(12)
What Is AI?
2(1)
Why Is AI So Popular Now?
3(1)
What Is AI Able to Do?
3(3)
An AI Agent's Specific Tasks
4(2)
What Are AI's Limitations?
6(2)
What Happens When AI Systems Fail?
8(1)
Where Is AI Headed?
8(2)
Who Are the Current Main Contributors to the AI Field?
10(1)
What Math Is Typically Involved in AI?
10(1)
Summary and Looking Ahead
11(2)
2 Data, Data, Data
13(38)
Data for AI
14(2)
Real Data Versus Simulated Data
16(1)
Mathematical Models: Linear Versus Nonlinear
16(2)
An Example of Real Data
18(3)
An Example of Simulated Data
21(4)
Mathematical Models: Simulations and AI
25(2)
Where Do We Get Our Data From?
27(2)
The Vocabulary of Data Distributions, Probability, and Statistics
29(6)
Random Variables
30(1)
Probability Distributions
31(1)
Marginal Probabilities
31(1)
The Uniform and the Normal Distributions
31(1)
Conditional Probabilities and Bayes' Theorem
31(1)
Conditional Probabilities and Joint Distributions
32(1)
Prior Distribution, Posterior Distribution, and Likelihood Function
32(1)
Mixtures of Distributions
32(1)
Sums and Products of Random Variables
33(1)
Using Graphs to Represent Joint Probability Distributions
33(1)
Expectation, Mean, Variance, and Uncertainty
33(1)
Covariance and Correlation
33(1)
Markov Process
34(1)
Normalizing, Scaling, and/or Standardizing a Random Variable or Data Set
34(1)
Common Examples
35(1)
Continuous Distributions Versus Discrete Distributions (Density Versus Mass)
35(2)
The Power of the Joint Probability Density Function
37(1)
Distribution of Data: The Uniform Distribution
38(2)
Distribution of Data: The Bell-Shaped Normal (Gaussian) Distribution
40(3)
Distribution of Data: Other Important and Commonly Used Distributions
43(4)
The Various Uses of the Word "Distribution"
47(1)
A/B Testing
48(1)
Summary and Looking Ahead
48(3)
3 Fitting Functions to Data
51(62)
Traditional and Very Useful Machine Learning Models
53(2)
Numerical Solutions Versus Analytical Solutions
55(1)
Regression: Predict a Numerical Value
56(29)
Training Function
58(2)
Loss Function
60(11)
Optimization
71(14)
Logistic Regression: Classify into Two Classes
85(3)
Training Function
85(1)
Loss Function
86(2)
Optimization
88(1)
Softmax Regression: Classify into Multiple Classes
88(5)
Training Function
90(2)
Loss Function
92(1)
Optimization
92(1)
Incorporating These Models into the Last Layer of a Neural Network
93(1)
Other Popular Machine Learning Techniques and Ensembles of Techniques
93(15)
Support Vector Machines
94(4)
Decision Trees
98(9)
Random Forests
107(1)
K-means Clustering
108(1)
Performance Measures for Classification Models
108(2)
Summary and Looking Ahead
110(3)
4 Optimization for Neural Networks
113(48)
The Brain Cortex and Artificial Neural Networks
113(2)
Training Function: Fully Connected, or Dense, Feed Forward Neural Networks
115(16)
A Neural Network Is a Computational Graph Representation of the Training Function
117(1)
Linearly Combine, Add Bias, Then Activate
117(5)
Common Activation Functions
122(3)
Universal Function Approximation
125(6)
Approximation Theory for Deep Learning
131(1)
Loss Functions
131(2)
Optimization
133(12)
Mathematics and the Mysterious Success of Neural Networks
134(1)
Gradient Descent ω→i+1 = ω→i- ηΔL(ω→i)
135(2)
Explaining the Role of the Learning Rate Hyperparameter η
137(3)
Convex Versus Nonconvex Landscapes
140(3)
Stochastic Gradient Descent
143(1)
Initializing the Weights ω→0 for the Optimization Process
144(1)
Regularization Techniques
145(7)
Dropout
145(1)
Early Stopping
146(1)
Batch Normalization of Each Layer
146(1)
Control the Size of the Weights by Penalizing Their Norm
147(3)
Penalizing the l2 Norm Versus Penalizing the l2 Norm
150(1)
Explaining the Role of the Regularization Hyperparameter α
151(1)
Hyperparameter Examples That Appear in Machine Learning
152(1)
Chain Rule and Backpropagation: Calculating ΔL(ω→i)
153(4)
Backpropagation Is Not Too Different from How Our Brain Learns
154(1)
Why Is It Better to Backpropagate?
155(1)
Backpropagation in Detail
155(2)
Assessing the Significance of the Input Data Features
157(1)
Summary and Looking Ahead
158(3)
5 Convolutional Neural Networks and Computer Vision
161(26)
Convolution and Cross-Correlation
163(5)
Translation Invariance and Translation Equivariance
167(1)
Convolution in Usual Space Is a Product in Frequency Space
167(1)
Convolution from a Systems Design Perspective
168(3)
Convolution and Impulse Response for Linear and Translation Invariant Systems
169(2)
Convolution and One-Dimensional Discrete Signals
171(1)
Convolution and Two-Dimensional Discrete Signals
172(7)
Filtering Images
174(4)
Feature Maps
178(1)
Linear Algebra Notation
179(4)
The One-Dimensional Case: Multiplication by a Toeplitz Matrix
182(1)
The Two-Dimensional Case: Multiplication by a Doubly Block Circulant Matrix
182(1)
Pooling
183(1)
A Convolutional Neural Network for Image Classification
184(2)
Summary and Looking Ahead
186(1)
6 Singular Value Decomposition: Image Processing, Natural Language Processing, and Social Media
187(32)
Matrix Factorization
188(3)
Diagonal Matrices
191(2)
Matrices as Linear Transformations Acting on Space
193(7)
Action of A on the Right Singular Vectors
194(1)
Action of A on the Standard Unit Vectors and the Unit Square Determined by Them
195(1)
Action of A on the Unit Circle
196(1)
Breaking Down the Circle-to-Ellipse Transformation According to the Singular Value Decomposition
197(1)
Rotation and Reflection Matrices
198(1)
Action of A on a General Vector →x
199(1)
Three Ways to Multiply Matrices
200(1)
The Big Picture
201(3)
The Condition Number and Computational Stability
203(1)
The Ingredients of the Singular Value Decomposition
204(1)
Singular Value Decomposition Versus the Eigenvalue Decomposition
204(2)
Computation of the Singular Value Decomposition
206(2)
Computing an Eigenvector Numerically
207(1)
The Pseudoinverse
208(1)
Applying the Singular Value Decomposition to Images
209(3)
Principal Component Analysis and Dimension Reduction
212(2)
Principal Component Analysis and Clustering
214(1)
A Social Media Application
214(1)
Latent Semantic Analysis
215(1)
Randomized Singular Value Decomposition
216(1)
Summary and Looking Ahead
216(3)
7 Natural Language and Finance AI: Vectorization and Time Series
219(44)
Natural Language AI
222(1)
Preparing Natural Language Data for Machine Processing
223(3)
Statistical Models and the log Function
226(1)
Zipf's Law for Term Counts
226(1)
Various Vector Representations for Natural Language Documents
227(14)
Term Frequency Vector Representation of a Document or Bag of Words
227(1)
Term Frequency-Inverse Document Frequency Vector Representation of a Document
228(1)
Topic Vector Representation of a Document Determined by Latent Semantic Analysis
228(4)
Topic Vector Representation of a Document Determined by Latent Dirichlet Allocation
232(1)
Topic Vector Representation of a Document Determined by Latent Discriminant Analysis
233(1)
Meaning Vector Representations of Words and of Documents Determined by Neural Network Embeddings
234(7)
Cosine Similarity
241(2)
Natural Language Processing Applications
243(4)
Sentiment Analysis
243(1)
Spam Filter
244(1)
Search and Information Retrieval
244(2)
Machine Translation
246(1)
Image Captioning
247(1)
Chatbots
247(1)
Other Applications
247(1)
Transformers and Attention Models
247(8)
The Transformer Architecture
248(3)
The Attention Mechanism
251(4)
Transformers Are Far from Perfect
255(1)
Convolutional Neural Networks for Time Series Data
255(2)
Recurrent Neural Networks for Time Series Data
257(4)
How Do Recurrent Neural Networks Work?
258(2)
Gated Recurrent Units and Long Short-Term Memory Units
260(1)
An Example of Natural Language Data
261(1)
Finance AI
261(1)
Summary and Looking Ahead
262(1)
8 Probabilistic Generative Models
263(34)
What Are Generative Models Useful For?
264(1)
The Typical Mathematics of Generative Models
265(3)
Shifting Our Brain from Deterministic Thinking to Probabilistic Thinking
268(2)
Maximum Likelihood Estimation
270(2)
Explicit and Implicit Density Models
272(1)
Explicit Density-Tractable: Fully Visible Belief Networks
273(3)
Example: Generating Images via PixelCNN and Machine Audio via WaveNet
273(3)
Explicit Density-Tractable: Change of Variables Nonlinear Independent Component Analysis
276(1)
Explicit Density-Intractable: Variational Autoencoders Approximation via Variational Methods
277(2)
Explicit Density-Intractable: Boltzman Machine Approximation via Markov Chain
279(1)
Implicit Density-Markov Chain: Generative Stochastic Network
279(1)
Implicit Density-Direct: Generative Adversarial Networks
280(1)
How Do Generative Adversarial Networks Work?
281(2)
Example: Machine Learning and Generative Networks for High Energy Physics
283(2)
Other Generative Models
285(4)
Naive Bayes Classification Model
286(2)
Gaussian Mixture Model
288(1)
The Evolution of Generative Models
289(4)
Hopfield Nets
290(1)
Boltzmann Machine
291(1)
Restricted Boltzmann Machine (Explicit Density and Intractable)
291(1)
The Original Autoencoder
292(1)
Probabilistic Language Modeling
293(2)
Summary and Looking Ahead
295(2)
9 Graph Models
297(50)
Graphs: Nodes, Edges, and Features for Each
299(3)
Example: PageRank Algorithm
302(5)
Inverting Matrices Using Graphs
307(1)
Cayley Graphs of Groups: Pure Algebra and Parallel Computing
308(1)
Message Passing Within a Graph
309(1)
The Limitless Applications of Graphs
310(14)
Brain Networks
311(1)
Spread of Disease
312(1)
Spread of Information
312(1)
Detecting and Tracking Fake News Propagation
312(2)
Web-Scale Recommendation Systems
314(1)
Fighting Cancer
314(2)
Biochemical Graphs
316(1)
Molecular Graph Generation for Drug and Protein Structure Discovery
316(1)
Citation Networks
316(1)
Social Media Networks and Social Influence Prediction
316(1)
Sociological Structures
317(1)
Bayesian Networks
317(1)
Traffic Forecasting
317(1)
Logistics and Operations Research
318(1)
Language Models
318(2)
Graph Structure of the Web
320(1)
Automatically Analyzing Computer Programs
321(1)
Data Structures in Computer Science
321(1)
Load Balancing in Distributed Networks
322(1)
Artificial Neural Networks
323(1)
Random Walks on Graphs
324(2)
Node Representation Learning
326(1)
Tasks for Graph Neural Networks
327(3)
Node Classification
327(1)
Graph Classification
328(1)
Clustering and Community Detection
329(1)
Graph Generation
329(1)
Influence Maximization
329(1)
Link Prediction
330(1)
Dynamic Graph Models
330(1)
Bayesian Networks
331(7)
A Bayesian Network Represents a Compactified Conditional Probability Table
333(1)
Making Predictions Using a Bayesian Network
334(1)
Bayesian Networks Are Belief Networks, Not Causal Networks
334(1)
Keep This in Mind About Bayesian Networks
335(1)
Chains, Forks, and Colliders
336(1)
Given a Data Set, How Do We Set Up a Bayesian Network for the Involved Variables?
337(1)
Graph Diagrams for Probabilistic Causal Modeling
338(2)
A Brief History of Graph Theory
340(1)
Main Considerations in Graph Theory
341(3)
Spanning Trees and Shortest Spanning Trees
341(1)
Cut Sets and Cut Vertices
342(1)
Planarity
342(1)
Graphs as Vector Spaces
343(1)
Realizability
343(1)
Coloring and Matching
344(1)
Enumeration
344(1)
Algorithms and Computational Aspects of Graphs
344(1)
Summary and Looking Ahead
345(2)
10 Operations Research
347(64)
No Free Lunch
349(1)
Complexity Analysis and 0() Notation
350(3)
Optimization: The Heart of Operations Research
353(3)
Thinking About Optimization
356(9)
Optimization: Finite Dimensions, Unconstrained
357(1)
Optimization: Finite Dimensions, Constrained Lagrange Multipliers
357(3)
Optimization: Infinite Dimensions, Calculus of Variations
360(5)
Optimization on Networks
365(5)
Traveling Salesman Problem
365(1)
Minimum Spanning Tree
366(1)
Shortest Path
367(1)
Max-Flow Min-Cut
368(1)
Max-Flow Min-Cost
369(1)
The Critical Path Method for Project Design
369(1)
The n-Queens Problem
370(1)
Linear Optimization
371(31)
The General Form and the Standard Form
372(1)
Visualizing a Linear Optimization Problem in Two Dimensions
373(1)
Convex to Linear
374(3)
The Geometry of Linear Optimization
377(2)
The Simplex Method
379(7)
Transportation and Assignment Problems
386(1)
Duality, Lagrange Relaxation, Shadow Prices, Max-Min, Min-Max, and All That
386(15)
Sensitivity
401(1)
Game Theory and Multiagents
402(2)
Queuing
404(1)
Inventory
405(1)
Machine Learning for Operations Research
405(1)
Hamilton-Jacobi-Bellman Equation
406(1)
Operations Research for AI
407(1)
Summary and Looking Ahead
407(4)
11 Probability
411(40)
Where Did Probability Appear in This Book?
412(3)
What More Do We Need to Know That Is Essential for AI?
415(1)
Causal Modeling and the Do Calculus
415(5)
An Alternative: The Do Calculus
417(3)
Paradoxes and Diagram Interpretations
420(4)
Monty Hall Problem
420(2)
Berkson's Paradox
422(1)
Simpson's Paradox
422(2)
Large Random Matrices
424(8)
Examples of Random Vectors and Random Matrices
424(3)
Main Considerations in Random Matrix Theory
427(2)
Random Matrix Ensembles
429(1)
Eigenvalue Density of the Sum of Two Large Random Matrices
430(1)
Essential Math for Large Random Matrices
430(2)
Stochastic Processes
432(6)
Bernoulli Process
433(1)
Poisson Process
433(1)
Random Walk
434(1)
Wiener Process or Brownian Motion
435(1)
Martingale
435(1)
Levy Process
436(1)
Branching Process
436(1)
Markov Chain
436(1)
Ito's Lemma
437(1)
Markov Decision Processes and Reinforcement Learning
438(3)
Examples of Reinforcement Learning
438(1)
Reinforcement Learning as a Markov Decision Process
439(2)
Reinforcement Learning in the Context of Optimal Control and Nonlinear Dynamics
441(1)
Python Library for Reinforcement Learning
441(1)
Theoretical and Rigorous Grounds
441(7)
Which Events Have a Probability?
442(1)
Can We Talk About a Wider Range of Random Variables?
443(1)
A Probability Triple (Sample Space, Sigma Algebra, Probability Measure)
443(1)
Where Is the Difficulty?
444(1)
Random Variable, Expectation, and Integration
445(1)
Distribution of a Random Variable and the Change of Variable Theorem
446(1)
Next Steps in Rigorous Probability Theory
447(1)
The Universality Theorem for Neural Networks
448(1)
Summary and Looking Ahead
448(3)
12 Mathematical Logic
451(14)
Various Logic Frameworks
452(1)
Propositional Logic
452(5)
From Few Axioms to a Whole Theory
455(1)
Codifying Logic Within an Agent
456(1)
How Do Deterministic and Probabilistic Machine Learning Fit In?
456(1)
First-Order Logic
457(3)
Relationships Between For All and There Exist
458(2)
Probabilistic Logic
460(1)
Fuzzy Logic
460(1)
Temporal Logic
461(1)
Comparison with Human Natural Language
462(1)
Machines and Complex Mathematical Reasoning
462(1)
Summary and Looking Ahead
463(2)
13 Artificial Intelligence and Partial Differential Equations
465(66)
What Is a Partial Differential Equation?
466(1)
Modeling with Differential Equations
467(5)
Models at Different Scales
468(1)
The Parameters of a PDE
468(1)
Changing One Thing in a PDE Can Be a Big Deal
469(2)
Can AI Step In?
471(1)
Numerical Solutions Are Very Valuable
472(21)
Continuous Functions Versus Discrete Functions
472(2)
PDE Themes from My Ph.D. Thesis
474(3)
Discretization and the Curse of Dimensionality
477(1)
Finite Differences
478(6)
Finite Elements
484(5)
Variational or Energy Methods
489(1)
Monte Carlo Methods
490(3)
Some Statistical Mechanics: The Wonderful Master Equation
493(2)
Solutions as Expectations of Underlying Random Processes
495(1)
Transforming the PDE
495(4)
Fourier Transform
495(3)
Laplace Transform
498(1)
Solution Operators
499(10)
Example Using the Heat Equation
499(2)
Example Using the Poisson Equation
501(2)
Fixed Point Iteration
503(6)
AI for PDEs
509(13)
Deep Learning to Learn Physical Parameter Values
509(1)
Deep Learning to Learn Meshes
510(2)
Deep Learning to Approximate Solution Operators of PDEs
512(7)
Numerical Solutions of High-Dimensional Differential Equations
519(1)
Simulating Natural Phenomena Directly from Data
520(2)
Hamilton-Jacobi-Bellman PDE for Dynamic Programming
522(6)
PDEs for AI?
528(1)
Other Considerations in Partial Differential Equations
528(2)
Summary and Looking Ahead
530(1)
14 Artificial Intelligence, Ethics, Mathematics, Law, and Policy
531(18)
Good AI
533(1)
Policy Matters
534(2)
What Could Go Wrong?
536(3)
From Math to Weapons
536(1)
Chemical Warfare Agents
537(1)
AI and Politics
538(1)
Unintended Outcomes of Generative Models
539(1)
How to Fix It?
539(5)
Addressing Underrepresentation in Training Data
539(1)
Addressing Bias in Word Vectors
540(1)
Addressing Privacy
540(1)
Addressing Fairness
541(1)
Injecting Morality into AI
542(1)
Democratization and Accessibility of AI to Nonexperts
543(1)
Prioritizing High Quality Data
543(1)
Distinguishing Bias from Discrimination
544(1)
The Hype
545(1)
Final Thoughts
546(3)
Index 549
Hala Nelson is an Associate Professor of Mathematics at James Madison University. She has a Ph.D. in mathematics from the Courant Institute of Mathematical Sciences at New York University. Prior to James Madison University, she was a postdoctoral assistant professor at the University of Michigan, Ann Arbor. She specializes in mathematical modeling and consults for emergency and infrastructure services in the public sector. She likes to translate complex ideas into simple and practical terms. To her, most mathematical concepts are painless and relatable, unless the person presenting them either does not understand them very well or is trying to show off. Other facts: Hala Nelson grew up in Lebanon during its brutal civil war. She lost her hair at a very young age in a missile explosion. This event, and many that followed, shaped her interests in human behavior, the nature of intelligence, and AI. Her dad taught her math, at home and in French, until she graduated high school. Her favorite quote from her dad about math is, "It is the one clean science".