Atjaunināt sīkdatņu piekrišanu

E-grāmata: Bandit Algorithms

4.57/5 (20 ratings by Goodreads)
(University of Alberta), (University of Alberta)
  • Formāts: PDF+DRM
  • Izdošanas datums: 16-Jul-2020
  • Izdevniecība: Cambridge University Press
  • Valoda: eng
  • ISBN-13: 9781108687492
  • Formāts - PDF+DRM
  • Cena: 49,96 €*
  • * ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
  • Ielikt grozā
  • Pievienot vēlmju sarakstam
  • Šī e-grāmata paredzēta tikai personīgai lietošanai. E-grāmatas nav iespējams atgriezt un nauda par iegādātajām e-grāmatām netiek atmaksāta.
  • Formāts: PDF+DRM
  • Izdošanas datums: 16-Jul-2020
  • Izdevniecība: Cambridge University Press
  • Valoda: eng
  • ISBN-13: 9781108687492

DRM restrictions

  • Kopēšana (kopēt/ievietot):

    nav atļauts

  • Drukāšana:

    nav atļauts

  • Lietošana:

    Digitālo tiesību pārvaldība (Digital Rights Management (DRM))
    Izdevējs ir piegādājis šo grāmatu šifrētā veidā, kas nozīmē, ka jums ir jāinstalē bezmaksas programmatūra, lai to atbloķētu un lasītu. Lai lasītu šo e-grāmatu, jums ir jāizveido Adobe ID. Vairāk informācijas šeit. E-grāmatu var lasīt un lejupielādēt līdz 6 ierīcēm (vienam lietotājam ar vienu un to pašu Adobe ID).

    Nepieciešamā programmatūra
    Lai lasītu šo e-grāmatu mobilajā ierīcē (tālrunī vai planšetdatorā), jums būs jāinstalē šī bezmaksas lietotne: PocketBook Reader (iOS / Android)

    Lai lejupielādētu un lasītu šo e-grāmatu datorā vai Mac datorā, jums ir nepieciešamid Adobe Digital Editions (šī ir bezmaksas lietotne, kas īpaši izstrādāta e-grāmatām. Tā nav tas pats, kas Adobe Reader, kas, iespējams, jau ir jūsu datorā.)

    Jūs nevarat lasīt šo e-grāmatu, izmantojot Amazon Kindle.

"Decision-making in the face of uncertainty is a significant challenge in machine learning, and the multi-armed bandit model is a commonly used framework to address it. This comprehensive and rigorous introduction to the multi-armed bandit problem examines all the major settings, including stochastic, adversarial, and Bayesian frameworks. A focus on both mathematical intuition and carefully worked proofs makes this an excellent reference for established researchers and a helpful resource for graduate students in computer science, engineering, statistics, applied mathematics and economics. Linear bandits receive special attention as one of the most useful models in applications, while other chapters are dedicated to combinatorial bandits, ranking, non-stationary problems, Thompson sampling and pure exploration. The book ends with a peek into the world beyond bandits with an introduction to partial monitoring and learning in Markov decision processes"--

Recenzijas

'This year marks the 68th anniversary of 'multi-armed bandits' introduced by Herbert Robbins in 1952, and the 35th anniversary of his 1985 paper with me that advanced multi-armed bandit theory in new directions via the concept of 'regret' and a sharp asymptotic lower bound for the regret. This vibrant subject has attracted important multidisciplinary developments and applications. Bandit Algorithms gives it a comprehensive and up-to-date treatment, and meets the need for such books in instruction and research in the subject, as in a new course on contextual bandits and recommendation technology that I am developing at Stanford.' Tze L. Lai, Stanford University 'This is a timely book on the theory of multi-armed bandits, covering a very broad range of basic and advanced topics. The rigorous treatment combined with intuition makes it an ideal resource for anyone interested in the mathematical and algorithmic foundations of a fascinating and rapidly growing field of research.' Nicolņ Cesa-Bianchi, University of Milan 'The field of bandit algorithms, in its modern form, and driven by prominent new applications, has been taking off in multiple directions. The book by Lattimore and Szepesvįri is a timely contribution that will become a standard reference on the subject. The book offers a thorough exposition of an enormous amount of material, neatly organized in digestible pieces. It is mathematically rigorous, but also pleasant to read, rich in intuition and historical notes, and without superfluous details. Highly recommended.' John Tsitsiklis, Massachusetts Institute of Technology

Papildus informācija

A comprehensive and rigorous introduction for graduate students and researchers, with applications in sequential decision-making problems.
Preface xiii
Notation xv
Part I Bandits, Probability and Concentration
1(72)
1 Introduction
3(9)
1.1 The Language of Bandits
4(3)
1.2 Applications
7(3)
1.3 Notes
10(1)
1.4 Bibliographic Remarks
10(2)
2 Foundations of Probability ()
12(24)
2.1 Probability Spaces and Random Elements
12(7)
2.2 σ-Algebras and Knowledge
19(1)
2.3 Conditional Probabilities
20(1)
2.4 Independence
21(2)
2.5 Integration and Expectation
23(2)
2.6 Conditional Expectation
25(4)
2.7 Notes
29(3)
2.8 Bibliographic Remarks
32(1)
2.9 Exercises
33(3)
3 Stochastic Processes and Markov Chains ()
36(9)
3.1 Stochastic Processes
37(1)
3.2 Markov Chains
37(2)
3.3 Martingales and Stopping Times
39(2)
3.4 Notes
41(1)
3.5 Bibliographic Remarks
42(1)
3.6 Exercises
43(2)
4 Stochastic Bandits
45(15)
4.1 Core Assumptions
45(1)
4.2 The Learning Objective
45(1)
4.3 Knowledge and Environment Classes
46(2)
4.4 The Regret
48(2)
4.5 Decomposing the Regret
50(1)
4.6 The Canonical Bandit Model ()
51(2)
4.7 The Canonical Bandit Model for Uncountable Action Sets ()
53(1)
4.8 Notes
54(1)
4.9 Bibliographical Remarks
55(1)
4.10 Exercises
56(4)
5 Concentration of Measure
60(13)
5.1 Tail Probabilities
60(1)
5.2 The Inequalities of Markov and Chebyshev
61(1)
5.3 The Cramer-Chernoff Method and Subgaussian Random Variables
62(2)
5.4 Notes
64(2)
5.5 Bibliographical Remarks
66(1)
5.6 Exercises
66(7)
Part II Stochastic Bandits with Finitely Many Arms
73(50)
6 The Explore-Then-Commit Algorithm
75(9)
6.1 Algorithm and Regret Analysis
75(3)
6.2 Notes
78(1)
6.3 Bibliographical Remarks
78(1)
6.4 Exercises
79(5)
7 The Upper Confidence Bound Algorithm
84(13)
7.1 The Optimism Principle
84(7)
7.2 Notes
91(1)
7.3 Bibliographical Remarks
92(1)
7.4 Exercises
92(5)
8 The Upper Confidence Bound Algorithm: Asymptotic Optimality
97(6)
8.1 Asymptotically Optimal UCB
97(3)
8.2 Notes
100(1)
8.3 Bibliographic Remarks
100(1)
8.4 Exercises
101(2)
9 The Upper Confidence Bound Algorithm: Minimax Optimality
103(9)
9.1 The MOSS Algorithm
103(3)
9.2 Two Problems
106(2)
9.3 Notes
108(1)
9.4 Bibliographic Remarks
108(1)
9.5 Exercises
109(3)
10 The Upper Confidence Bound Algorithm: Bernoulli Noise
112(11)
10.1 Concentration for Sums of Bernoulli Random Variables
112(3)
10.2 The KL-UCB Algorithm
115(3)
10.3 Notes
118(1)
10.4 Bibliographic Remarks
119(1)
10.5 Exercises
119(4)
Part III Adversarial Bandits with Finitely Many Arms
123(30)
11 The Exp3 Algorithm
127(15)
11.1 Adversarial Bandit Environments
127(2)
11.2 Importance-Weighted Estimators
129(2)
11.3 The Exp3 Algorithm
131(1)
11.4 Regret Analysis
131(4)
11.5 Notes
135(2)
11.6 Bibliographic Remarks
137(1)
11.7 Exercises
138(4)
12 The Exp3-IX Algorithm
142(11)
12.1 The Exp3-IX Algorithm
142(2)
12.2 Regret Analysis
144(4)
12.3 Notes
148(1)
12.4 Bibliographic Remarks
149(1)
12.5 Exercises
149(4)
Part IV Lower Bounds for Bandits with Finitely Many Arms
153(38)
13 Lower Bounds: Basic Ideas
155(5)
13.1 Main Ideas Underlying Minimax Lower Bounds
155(3)
13.2 Notes
158(1)
13.3 Bibliographic Remarks
158(1)
13.4 Exercises
159(1)
14 Foundations of Information Theory
160(10)
14.1 Entropy and Optimal Coding
160(2)
14.2 Relative Entropy
162(3)
14.3 Notes
165(2)
14.4 Bibliographic Remarks
167(1)
14.5 Exercises
167(3)
15 Minimax Lower Bounds
170(7)
15.1 Relative Entropy Between Bandits
170(1)
15.2 Minimax Lower Bounds
171(2)
15.3 Notes
173(1)
15.4 Bibliographic Remarks
174(1)
15.5 Exercises
174(3)
16 Instance-Dependent Lower Bounds
177(8)
16.1 Asymptotic Bounds
177(3)
16.2 Finite-Time Bounds
180(1)
16.3 Notes
181(1)
16.4 Bibliographic Remarks
181(1)
16.5 Exercises
181(4)
17 High-Probability Lower Bounds
185(6)
17.1 Stochastic Bandits
186(2)
17.2 Adversarial Bandits
188(2)
17.3 Notes
190(1)
17.4 Bibliographic Remarks
190(1)
17.5 Exercises
190(1)
Part V Contextual and Linear Bandits
191(74)
18 Contextual Bandits
193(12)
18.1 Contextual Bandits: One Bandit per Context
193(2)
18.2 Bandits with Expert Advice
195(2)
18.3 Exp4
197(1)
18.4 Regret Analysis
198(2)
18.5 Notes
200(1)
18.6 Bibliographic Remarks
201(1)
18.7 Exercises
202(3)
19 Stochastic Linear Bandits
205(14)
19.1 Stochastic Contextual Bandits
205(2)
19.2 Stochastic Linear Bandits
207(2)
19.3 Regret Analysis
209(2)
19.4 Notes
211(2)
19.5 Bibliographic Remarks
213(1)
19.6 Exercises
214(5)
20 Confidence Bounds for Least Squares Estimators
219(12)
20.1 Martingales and the Method of Mixtures
221(4)
20.2 Notes
225(1)
20.3 Bibliographic Remarks
226(1)
20.4 Exercises
227(4)
21 Optimal Design for Least Squares Estimators
231(5)
21.1 The Kiefer--Wolfowitz Theorem
231(2)
21.2 Notes
233(2)
21.3 Bibliographic Remarks
235(1)
21.4 Exercises
235(1)
22 Stochastic Linear Bandits with Finitely Many Arms
236(4)
22.1 Notes
237(1)
22.2 Bibliographic Remarks
238(1)
22.3 Exercises
238(2)
23 Stochastic Linear Bandits with Sparsity
240(10)
23.1 Sparse Linear Stochastic Bandits
240(1)
23.2 Elimination on the Hypercube
241(3)
23.3 Online to Confidence Set Conversion
244(4)
23.4 Sparse Online Linear Prediction
248(1)
23.5 Notes
248(1)
23.6 Bibliographical Remarks
249(1)
23.7 Exercises
249(1)
24 Minimax Lower Bounds for Stochastic Linear Bandits
250(8)
24.1 Hypercube
250(2)
24.2 Unit Ball
252(1)
24.3 Sparse Parameter Vectors
253(2)
24.4 Misspecified Models
255(1)
24.5 Notes
256(1)
24.6 Bibliographic Remarks
257(1)
24.7 Exercises
257(1)
25 Asymptotic Lower Bounds for Stochastic Linear Bandits
258(7)
25.1 An Asymptotic Lower Bound for Fixed Action Sets
258(4)
25.2 Clouds Looming for Optimism
262(1)
25.3 Notes
263(1)
25.4 Bibliographic Remarks
264(1)
25.5 Exercises
264(1)
Part VI Adversarial Linear Bandits
265(48)
26 Foundations of Convex Analysis
267(11)
26.1 Convex Sets and Functions
267(2)
26.2 Jensen's Inequality
269(1)
26.3 Bregman Divergence
269(2)
26.4 Legendre Functions
271(2)
26.5 Optimisation
273(1)
26.6 Projections
274(1)
26.7 Notes
274(1)
26.8 Bibliographic Remarks
275(1)
26.9 Exercises
275(3)
27 Exp3 for Adversarial Linear Bandits
278(8)
27.1 Exponential Weights for Linear Bandits
278(2)
27.2 Regret Analysis
280(1)
27.3 Continuous Exponential Weights
281(2)
27.4 Notes
283(1)
27.5 Bibliographic Remarks
283(1)
27.6 Exercises
284(2)
28 Follow-the-regularised-Leader and Mirror Descent
286(20)
28.1 Online Linear Optimisation
286(4)
28.2 Regret Analysis
290(4)
28.3 Application to Linear Bandits
294(1)
28.4 Linear Bandits on the Unit Ball
295(3)
28.5 Notes
298(3)
28.6 Bibliographic Remarks
301(1)
28.7 Exercises
301(5)
29 The Relation between Adversarial and Stochastic Linear Bandits
306(7)
29.1 Unified View
306(1)
29.2 Reducing Stochastic Linear Bandits to Adversarial Linear Bandits
307(1)
29.3 Stochastic Linear Bandits with Parameter Noise
308(1)
29.4 Contextual Linear Bandits
309(1)
29.5 Notes
310(1)
29.6 Bibliographic Remarks
311(1)
29.7 Exercises
311(2)
Part VII Other Topics
313(108)
30 Combinatorial Bandits
317(14)
30.1 Notation and Assumptions
317(1)
30.2 Applications
318(1)
30.3 Bandit Feedback
319(1)
30.4 Semi-bandit Feedback and Mirror Descent
320(1)
30.5 Follow-the-Perturbed-Leader
321(5)
30.6 Notes
326(1)
30.7 Bibliographic Remarks
327(1)
30.8 Exercises
328(3)
31 Non-stationary Bandits
331(9)
31.1 Adversarial Bandits
331(3)
31.2 Stochastic Bandits
334(2)
31.3 Notes
336(1)
31.4 Bibliographic Remarks
337(1)
31.5 Exercises
338(2)
32 Ranking
340(13)
32.1 Click Models
341(2)
32.2 Policy
343(2)
32.3 Regret Analysis
345(4)
32.4 Notes
349(2)
32.5 Bibliographic Remarks
351(1)
32.6 Exercises
351(2)
33 Pure Exploration
353(16)
33.1 Simple Regret
353(2)
33.2 Best-Arm Identification with a Fixed Confidence
355(6)
33.3 Best-Arm Identification with a Budget
361(1)
33.4 Notes
362(2)
33.5 Bibliographical Remarks
364(1)
33.6 Exercises
365(4)
34 Foundations of Bayesian Learning
369(17)
34.1 Statistical Decision Theory and Bayesian Learning
369(1)
34.2 Bayesian Learning and the Posterior Distribution
370(4)
34.3 Conjugate Pairs, Conjugate Priors and the Exponential Family
374(3)
34.4 The Bayesian Bandit Environment
377(1)
34.5 Posterior Distributions in Bandits
378(1)
34.6 Bayesian Regret
379(1)
34.7 Notes
380(2)
34.8 Bibliographic Remarks
382(1)
34.9 Exercises
382(4)
35 Bayesian Bandits
386(18)
35.1 Bayesian Optimal Regret for k-Armed Stochastic Bandits
386(1)
35.2 Optimal Stopping
387(2)
35.3 One-armed bandits
389(4)
35.4 Gittins Index
393(6)
35.5 Computing the Gittins Index
399(1)
35.6 Notes
399(2)
35.7 Bibliographical Remarks
401(1)
35.8 Exercises
402(2)
36 Thompson Sampling
404(17)
36.1 Finite-Armed Bandits
404(2)
36.2 Frequentist Analysis
406(3)
36.3 Linear Bandits
409(2)
36.4 Information Theoretic Analysis
411(3)
36.5 Notes
414(2)
36.6 Bibliographic Remarks
416(1)
36.7 Exercises
417(4)
Part VIII Beyond Bandits
421(63)
37 Partial Monitoring
423(29)
37.1 Finite Adversarial Partial Monitoring Problems
424(2)
37.2 The Structure of Partial Monitoring
426(4)
37.3 Classification of Finite Adversarial Partial Monitoring
430(1)
37.4 Lower Bounds
430(5)
37.5 Policy and Upper Bounds
435(4)
37.6 Proof of Theorem 37.16
439(1)
37.7 Proof of Theorem 37.17
440(4)
37.8 Proof of the Classification Theorem
444(1)
37.9 Notes
445(2)
37.10 Bibliographical Remarks
447(2)
37.11 Exercises
449(3)
38 Markov Decision Processes
452(32)
38.1 Problem Set-Up
452(3)
38.2 Optimal Policies and the Bellman Optimality Equation
455(3)
38.3 Finding an Optimal Policy
458(4)
38.4 Learning in Markov Decision Processes
462(1)
38.5 Upper Confidence Bounds for Reinforcement Learning
463(2)
38.6 Proof of Upper Bound
465(3)
38.7 Proof of Lower Bound
468(3)
38.8 Notes
471(2)
38.9 Bibliographical Remarks
473(2)
38.10 Exercises
475(9)
Bibliography 484(29)
Index 513
Tor Lattimore is a research scientist at DeepMind. His research is focused on decision making in the face of uncertainty, including bandit algorithms and reinforcement learning. Before joining DeepMind he was an assistant professor at Indiana University and a postdoctoral fellow at the University of Alberta. Csaba Szepesvįri is a Professor in the Department of Computing Science at the University of Alberta and a Principal Investigator of the Alberta Machine Intelligence Institute. He also leads the 'Foundations' team at DeepMind. He has co-authored a book on nonlinear approximate adaptive controllers and authored a book on reinforcement learning, in addition to publishing over 200 journal and conference papers. He is an action editor of the Journal of Machine Learning Research.