Klientu atbalsts: 27018494

Principles of Deep Learning Theory: An Effective Theory Approach to Understanding Neural Networks [Hardback]

4.40/5 (20 ratings by Goodreads)

Sho Yaida, Contributions by Boris Hanin (Princeton University, New Jersey), Daniel A. Roberts (Massachusetts Institute of Technology)

Formāts: Hardback, 472 pages, height x width x depth: 261x184x26 mm, weight: 1060 g, Worked examples or Exercises
Izdošanas datums: 26-May-2022
Izdevniecība: Cambridge University Press
ISBN-10: 1316519333
ISBN-13: 9781316519332

Citas grāmatas par šo tēmu:

Statistical physics
Artificial intelligence - (Noliktavā: 4 punkts)
Mathematical physics - (Noliktavā: 1 punkts)
Machine learning

Hardback
Cena: 83,33 €
Grāmatu piegādes laiks ir 3-4 nedēļas, ja grāmata ir uz vietas izdevniecības noliktavā. Ja izdevējam nepieciešams publicēt jaunu tirāžu, grāmatas piegāde var aizkavēties.
Daudzums:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Ielikt grozā
Piegādes laiks - 4-6 nedēļas
Pievienot vēlmju sarakstam

Formāts: Hardback, 472 pages, height x width x depth: 261x184x26 mm, weight: 1060 g, Worked examples or Exercises
Izdošanas datums: 26-May-2022
Izdevniecība: Cambridge University Press
ISBN-10: 1316519333
ISBN-13: 9781316519332

Citas grāmatas par šo tēmu:

Statistical physics
Artificial intelligence - (Noliktavā: 4 punkts)
Mathematical physics - (Noliktavā: 1 punkts)
Machine learning

Permanent link: https://www.kriso.lv/db/9781316519332.html

Keywords:

Deep learning Machine learning

"This textbook establishes a theoretical framework for understanding deep learning models of practical relevance. With an approach that borrows from theoretical physics, Roberts and Yaida provide clear and pedagogical explanations of how realistic deep neural networks actually work. To make results from the theoretical forefront accessible, the authors eschew the subject's traditional emphasis on intimidating formality without sacrificing accuracy. Straightforward and approachable, this volume balances detailed first-principle derivations of novel results with insight and intuition for theorists and practitioners alike. This self-contained textbook is ideal for students and researchers interested in artificial intelligence with minimal prerequisites of linear algebra, calculus, and informal probability theory, and it can easily fill a semester-long course on deep learning theory. For the first time, the exciting practical advances in modern artificial intelligence capabilities can be matched with a set of effective principles, providing a timeless blueprint for theoretical research in deep learning"--

Recenzijas

'In the history of science and technology, the engineering artifact often comes first: the telescope, the steam engine, digital communication. The theory that explains its function and its limitations often appears later: the laws of refraction, thermodynamics, and information theory. With the emergence of deep learning, AI-powered engineering wonders have entered our lives but our theoretical understanding of the power and limits of deep learning is still partial. This is one of the first books devoted to the theory of deep learning, and lays out the methods and results from recent theoretical approaches in a coherent manner.' Yann LeCun, New York University and Chief AI Scientist at Meta 'For a physicist, it is very interesting to see deep learning approached from the point of view of statistical physics. This book provides a fascinating perspective on a topic of increasing importance in the modern world.' Edward Witten, Institute for Advanced Study 'This is an important book that contributes big, unexpected new ideas for unraveling the mystery of deep learning's effectiveness, in unusually clear prose. I hope it will be read and debated by experts in all the relevant disciplines.' Scott Aaronson, University of Texas at Austin 'It is not an exaggeration to say that the world is being revolutionized by deep learning methods for AI. But why do these deep networks work? This book offers an approach to this problem through the sophisticated tools of statistical physics and the renormalization group. The authors provide an elegant guided tour of these methods, interesting for experts and non-experts alike. They write with clarity and even moments of humor. Their results, many presented here for the first time, are the first steps in what promises to be a rich research program, combining theoretical depth with practical consequences.' William Bialek, Princeton University 'This book's physics-trained authors have made a cool discovery, that feature learning depends critically on the ratio of depth to width in the neural net.' Gilbert Strang, Massachusetts Institute of Technology 'An excellent resource for graduate students focusing on neural networks and machine learning Highly recommended.' J. Brzezinski, Choice 'The book is a joy and a challenge to read at the same time. The joy is in gaining a much deeper understanding of deep learning (pun intended) and in savoring the authors' subtle humor, with physics undertones. In a field where research and practice largely overlap, this is an important book for any professional.' Bogdan Hoanca, Optics and Photonics News

Papildus informācija

This volume develops an effective theory approach to understanding deep neural networks of practical relevance.

Preface

0 Initialization

(1)

0.1 An Effective Theory Approach

(1)

0.2 The Theoretical Minimum

(8)

1 Pretraining

(26)

1.1 Gaussian Integrals

(9)

1.2 Probability, Correlation and Statistics, and All That

(5)

1.3 Nearly-Gaussian Distributions

(11)

2 Neural Networks

(16)

2.1 Function Approximation

(6)

2.2 Activation Functions

(4)

2.3 Ensembles

(6)

3 Effective Theory of Deep Linear Networks at Initialization

(18)

3.1 Deep Linear Networks

(2)

3.2 Criticality

(3)

3.3 Fluctuations

(6)

3.4 Chaos

(6)

4 RG Flow of Preactivations

(38)

4.1 First Layer: Good-Old Gaussian

(6)

4.2 Second Layer: Genesis of Non-Gaussianity

(11)

4.3 Deeper Layers: Accumulation of Non-Gaussianity

(6)

4.4 Marginalization Rules

(4)

4.5 Subleading Corrections

100

(3)

4.6 RG Flow and RG Flow

103

(6)

5 Effective Theory of Preactivations at Initialization

109

(44)

5.1 Criticality Analysis of the Kernel

110

(13)

5.2 Criticality for Scale-Invariant Activations

123

(2)

5.3 Universality Beyond Scale-Invariant Activations

125

(12)

5.3.1 General Strategy

126

(2)

5.3.2 No Criticality: Sigmoid, Softplus, Nonlinear Monomials, etc

128

(2)

5.3.3 K* = 0 Universality Class: tanh, sin, etc

130

(5)

5.3.4 Half-Stable Universality Classes: SWISH, etc. and GELU, etc.

135

(2)

5.4 Fluctuations

137

(9)

5.4.1 Fluctuations for the Scale-Invariant Universality Class

139

(2)

5.4.2 Fluctuations for the K* = 0 Universality Class

141

(5)

5.5 Finite-Angle Analysis for the Scale-Invariant Universality Class

146

(7)

6 Bayesian Learning

153

(38)

6.1 Bayesian Probability

154

(2)

6.2 Bayesian Inference and Neural Networks

156

(13)

6.2.1 Bayesian Model Fitting

157

(8)

6.2.2 Bayesian Model Comparison

165

(4)

6.3 Bayesian Inference at Infinite Width

169

(10)

6.3.1 The Evidence for Criticality

169

(4)

6.3.2 Let's Not Wire Together

173

(5)

6.3.3 Absence of Representation Learning

178

(1)

6.4 Bayesian Inference at Finite Width

179

(12)

6.4.1 Hebbian Learning, Inc

179

(3)

6.4.2 Let's Wire Together

182

(4)

6.4.3 Presence of Representation Learning

186

(5)

7 Gradient-Based Learning

191

(8)

7.1 Supervised Learning

192

(2)

7.2 Gradient Descent and Function Approximation

194

(5)

8 RG Flow of the Neural Tangent Kernel

199

(28)

8.0 Forward Equation for the NTK

200

(6)

8.1 First Layer: Deterministic NTK

206

(1)

8.2 Second Layer: Fluctuating NTK

207

(4)

8.3 Deeper Layers: Accumulation of NTK Fluctuations

211

(16)

8.3.0 Interlude: Interlayer Correlations

211

(4)

8.3.1 NTK Mean

215

(1)

8.3.2 NTK-Preactivation Cross Correlations

216

(5)

8.3.3 NTK Variance

221

(6)

9 Effective Theory of the NTK at Initialization

227

(20)

9.1 Criticality Analysis of the NTK

228

(5)

9.2 Scale-Invariant Universality Class

233

(3)

9.3 K* = 0 Universality Class

236

(5)

9.4 Criticality, Exploding and Vanishing Problems, and None of That

241

(6)

10 Kernel Learning

247

(44)

10.1 A Small Step

248

(4)

10.1.1 No Wiring

250

(1)

10.1.2 No Representation Learning

250

(2)

10.2 A Giant Leap

252

(12)

10.2.1 Newton's Method

253

(4)

10.2.2 Algorithm Independence

257

(2)

10.2.3 Aside: Cross-Entropy Loss

259

(2)

10.2.4 Kernel Prediction

261

(3)

10.3 Generalization

264

(18)

10.3.1 Bias-Variance Tradeoff and Criticality

267

(10)

10.3.2 Interpolation and Extrapolation

277

(5)

10.4 Linear Models and Kernel Methods

282

(9)

10.4.1 Linear Models

282

(2)

10.4.2 Kernel Methods

284

(3)

10.4.3 Infinite-Width Networks as Linear Models

287

(4)

11 Representation Learning

291

(44)

11.1 Differential of the Neural Tangent Kernel

293

(3)

11.2 RG Flow of the dNTK

296

(14)

11.2.0 Forward Equation for the dNTK

297

(2)

11.2.1 First Layer: Zero dNTK

299

(1)

11.2.2 Second Layer: Nonzero dNTK

300

(1)

11.2.3 Deeper Layers: Growing dNTK

301

(9)

11.3 Effective Theory of the dNTK at Initialization

310

(7)

11.3.1 Scale-Invariant Universality Class

312

(2)

11.3.2 K* = 0 Universality Class

314

(3)

11.4 Nonlinear Models and Nearly-Kernel Methods

317

(18)

11.4.1 Nonlinear Models

318

(6)

11.4.2 Nearly-Kernel Methods

324

(6)

11.4.3 Finite-Width Networks as Nonlinear Models

330

(5)

∞ The End of Training

335

(54)

∞.1 Two More Differentials

337

(10)

∞.2 Training at Finite Width

347

(37)

∞.2.1 A Small Step Following a Giant Leap

351

(7)

∞.2.2 Many Many Steps of Gradient Descent

358

(15)

∞.2.3 Prediction at Finite Width

373

(11)

∞.3 RG Flow of the ddNTKs: The Full Expressions

384

(5)

ε Epilogue: Model Complexity from the Macroscopic Perspective

389

(10)

A Information in Deep Learning

399

(26)

A.1 Entropy and Mutual Information

400

(9)

A.2 Information at Infinite Width: Criticality

409

(2)

A.3 Information at Finite Width: Optimal Aspect Ratio

411

(14)

B Residual Learning

425

(14)

B.1 Residual Multilayer Perceptrons

428

(1)

B.2 Residual Infinite Width: Criticality Analysis

429

(2)

B.3 Residual Finite Width: Optimal Aspect Ratio

431

(5)

B.4 Residual Building Blocks

436

(3)

References

439

(6)

Index

445

Daniel A. Roberts was cofounder and CTO of Diffeo, an AI company acquired by Salesforce; a research scientist at Facebook AI Research; and a member of the School of Natural Sciences at the Institute for Advanced Study in Princeton, NJ. He was a Hertz Fellow, earning a PhD from MIT in theoretical physics, and was also a Marshall Scholar at Cambridge and Oxford Universities. Sho Yaida is a research scientist at Meta AI. Prior to joining Meta AI, he obtained his PhD in physics at Stanford University and held postdoctoral positions at MIT and at Duke University. At Meta AI, he uses tools from theoretical physics to understand neural networks, the topic of this book. Boris Hanin is an Assistant Professor at Princeton University in the Operations Research and Financial Engineering Department. Prior to joining Princeton in 2020, Boris was an Assistant Professor at Texas A&M in the Math Department and an NSF postdoc at MIT. He has taught graduate courses on the theory and practice of deep learning at both Texas A&M and Princeton.

Principles of Deep Learning Theory: An Effective Theory Approach to Understanding Neural Networks [Hardback]

Recenzijas

Papildus informācija

Konts un iestatījumi

Meklēšana

Meklēt datubāzē

Refine By

Tēmas Grāmatas angļu valodā

Izvēlieties iepirkumu grozu