Atjaunināt sīkdatņu piekrišanu

E-grāmata: Reinforcement Learning

4.00/5 (19 ratings by Goodreads)
  • Formāts: 408 pages
  • Izdošanas datums: 06-Nov-2020
  • Izdevniecība: O'Reilly Media
  • Valoda: eng
  • ISBN-13: 9781492072362
Citas grāmatas par šo tēmu:
  • Formāts - PDF+DRM
  • Cena: 46,20 €*
  • * ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
  • Ielikt grozā
  • Pievienot vēlmju sarakstam
  • Šī e-grāmata paredzēta tikai personīgai lietošanai. E-grāmatas nav iespējams atgriezt un nauda par iegādātajām e-grāmatām netiek atmaksāta.
  • Formāts: 408 pages
  • Izdošanas datums: 06-Nov-2020
  • Izdevniecība: O'Reilly Media
  • Valoda: eng
  • ISBN-13: 9781492072362
Citas grāmatas par šo tēmu:

DRM restrictions

  • Kopēšana (kopēt/ievietot):

    nav atļauts

  • Drukāšana:

    nav atļauts

  • Lietošana:

    Digitālo tiesību pārvaldība (Digital Rights Management (DRM))
    Izdevējs ir piegādājis šo grāmatu šifrētā veidā, kas nozīmē, ka jums ir jāinstalē bezmaksas programmatūra, lai to atbloķētu un lasītu. Lai lasītu šo e-grāmatu, jums ir jāizveido Adobe ID. Vairāk informācijas šeit. E-grāmatu var lasīt un lejupielādēt līdz 6 ierīcēm (vienam lietotājam ar vienu un to pašu Adobe ID).

    Nepieciešamā programmatūra
    Lai lasītu šo e-grāmatu mobilajā ierīcē (tālrunī vai planšetdatorā), jums būs jāinstalē šī bezmaksas lietotne: PocketBook Reader (iOS / Android)

    Lai lejupielādētu un lasītu šo e-grāmatu datorā vai Mac datorā, jums ir nepieciešamid Adobe Digital Editions (šī ir bezmaksas lietotne, kas īpaši izstrādāta e-grāmatām. Tā nav tas pats, kas Adobe Reader, kas, iespējams, jau ir jūsu datorā.)

    Jūs nevarat lasīt šo e-grāmatu, izmantojot Amazon Kindle.

Reinforcement learning (RL) will deliver one of the biggest breakthroughs in AI over the next decade, enabling algorithms to learn from their environment to achieve arbitrary goals. This exciting development avoids constraints found in traditional machine learning (ML) algorithms. This practical book shows data science and AI professionals how to learn by reinforcementand enable a machine to learn by itself.

Author Phil Winder of Winder Research covers everything from basic building blocks to state-of-the-art practices. You'll explore the current state of RL, focus on industrial applications, learnnumerous algorithms, and benefit from dedicated chapters on deploying RL solutions to production. This is no cookbook; doesn't shy away from math and expects familiarity with ML.

  • Learn what RL is and how the algorithms help solve problems
  • Become grounded in RL fundamentals including Markov decision processes, dynamic programming, and temporal difference learning
  • Dive deep into a range of value and policy gradient methods
  • Apply advanced RL solutions such as meta learning, hierarchical learning, multi-agent, and imitation learning
  • Understand cutting-edge deep RL algorithms including Rainbow, PPO, TD3, SAC, and more
  • Get practical examples through the accompanying website
Preface xv
1 Why Reinforcement Learning? 1(24)
Why Now?
2(1)
Machine Learning
3(1)
Reinforcement Learning
4(4)
When Should You Use RL?
5(2)
RL Applications
7(1)
Taxonomy of RL Approaches
8(4)
Model-Free or Model-Based
8(1)
How Agents Use and Update Their Strategy
9(1)
Discrete or Continuous Actions
10(1)
Optimization Methods
11(1)
Policy Evaluation and Improvement
11(1)
Fundamental Concepts in Reinforcement Learning
12(6)
The First RL Algorithm
12(3)
Is RL the Same as ML?
15(1)
Reward and Feedback
16(2)
Reinforcement Learning as a Discipline
18(2)
Summary
20(1)
Further Reading
20(5)
2 Markov Decision Processes, Dynamic Programming, and Monte Carlo Methods 25(34)
Multi-Arm Bandit Testing
25(10)
Reward Engineering
26(1)
Policy Evaluation: The Value Function
26(3)
Policy Improvement: Choosing the Best Action
29(2)
Simulating the Environment
31(1)
Running the Experiment
31(2)
Improving the e-greedy Algorithm
33(2)
Markov Decision Processes
35(7)
Inventory Control
36(4)
Inventory Control Simulation
40(2)
Policies and Value Functions
42(8)
Discounted Rewards
42(1)
Predicting Rewards with the State-Value Function
43(4)
Predicting Rewards with the Action-Value Function
47(1)
Optimal Policies
48(2)
Monte Carlo Policy Generation
50(2)
Value Iteration with Dynamic Programming
52(5)
Implementing Value Iteration
54(2)
Results of Value Iteration
56(1)
Summary
57(1)
Further Reading
57(2)
3 Temporal-Difference Learning, Q-Learning, and n-Step Algorithms 59(28)
Formulation of Temporal-Difference Learning
60(10)
Q-Learning
62(2)
SARSA
64(1)
Q-Learning Versus SARSA
65(3)
Case Study: Automatically Scaling Application Containers to Reduce Cost
68(2)
Industrial Example: Real-Time Bidding in Advertising
70(4)
Defining the MDP
70(1)
Results of the Real-Time Bidding Environments
71(2)
Further Improvements
73(1)
Extensions to Q-Learning
74(2)
Double Q-Learning
74(1)
Delayed Q-Learning
74(1)
Comparing Standard, Double, and Delayed Q-learning
75(1)
Opposition Learning
75(1)
n-Step Algorithms
76(4)
n-Step Algorithms on Grid Environments
79(1)
Eligibility Traces
80(3)
Extensions to Eligibility Traces
83(2)
Watkins's Q(A)
83(1)
Fuzzy Wipes in Watkins's Q(A)
84(1)
Speedy Q-Learning
84(1)
Accumulating Versus Replacing Eligibility Traces
84(1)
Summary
85(1)
Further Reading
85(2)
4 Deep Q-Networks 87(28)
Deep Learning Architectures
88(4)
Fundamentals
88(1)
Common Neural Network Architectures
89(1)
Deep Learning Frameworks
90(1)
Deep Reinforcement Learning
91(1)
Deep Q-Learning
92(7)
Experience Replay
92(1)
Q-Network Clones
92(1)
Neural Network Architecture
93(1)
Implementing DQN
93(1)
Example: DQN on the CartPole Environment
94(4)
Case Study: Reducing Energy Usage in Buildings
98(1)
Rainbow DQN
99(4)
Distributional RL
100(2)
Prioritized Experience Replay
102(1)
Noisy Nets
102(1)
Dueling Networks
102(1)
Example: Rainbow DQN on Atari Games
103(4)
Results
104(2)
Discussion
106(1)
Other DQN Improvements
107(4)
Improving Exploration
108(1)
Improving Rewards
109(1)
Learning from Offline Data
109(2)
Summary
111(1)
Further Reading
112(3)
5 Policy Gradient Methods 115(30)
Benefits of Learning a Policy Directly
115(1)
How to Calculate the Gradient of a Policy
116(1)
Policy Gradient Theorem
117(2)
Policy Functions
119(3)
Linear Policies
120(2)
Arbitrary Policies
122(1)
Basic Implementations
122(14)
Monte Carlo (REINFORCE)
122(2)
REINFORCE with Baseline
124(3)
Gradient Variance Reduction
127(2)
n-Step Actor-Critic and Advantage Actor-Critic (A2C)
129(5)
Eligibility Traces Actor-Critic
134(1)
A Comparison of Basic Policy Gradient Algorithms
135(1)
Industrial Example: Automatically Purchasing Products for Customers
136(6)
The Environment: Gym-Shopping-Cart
137(1)
Expectations
137(1)
Results from the Shopping Cart Environment
138(4)
Summary
142(1)
Further Reading
143(2)
6 Beyond Policy Gradients 145(46)
Off-Policy Algorithms
145(7)
Importance Sampling
146(2)
Behavior and Target Policies
148(1)
Off-Policy Q-Learning
149(1)
Gradient Temporal-Difference Learning
149(1)
Greedy-GQ
150(1)
Off-Policy Actor-Critics
151(1)
Deterministic Policy Gradients
152(11)
Deterministic Policy Gradients
152(2)
Deep Deterministic Policy Gradients
154(4)
Twin Delayed DDPG
158(3)
Case Study: Recommendations Using Reviews
161(2)
Improvements to DPG
163(1)
Trust Region Methods
163(11)
Kullback-Leibler Divergence
165(2)
Natural Policy Gradients and Trust Region Policy Optimization
167(2)
Proximal Policy Optimization
169(5)
Example: Using Servos for a Real-Life Reacher
174(7)
Experiment Setup
175(1)
RL Algorithm Implementation
175(2)
Increasing the Complexity of the Algorithm
177(1)
Hyperparameter Tuning in a Simulation
178(2)
Resulting Policies
180(1)
Other Policy Gradient Algorithms
181(3)
Retrace(A)
182(1)
Actor-Critic with Experience Replay (ACER)
182(1)
Actor-Critic Using Kronecker-Factored Trust Regions (ACKTR)
183(1)
Emphatic Methods
183(1)
Extensions to Policy Gradient Algorithms
184(1)
Quantile Regression in Policy Gradient Algorithms
184(1)
Summary
184(2)
Which Algorithm Should I Use?
185(1)
A Note on Asynchronous Methods
185(1)
Further Reading
186(5)
7 Learning All Possible Policies with Entropy Methods 191(24)
What Is Entropy?
191(1)
Maximum Entropy Reinforcement Learning
192(1)
Soft Actor-Critic
193(3)
SAC Implementation Details and Discrete Action Spaces
194(1)
Automatically Adjusting Temperature
194(1)
Case Study: Automated Traffic Management to Reduce Queuing
195(1)
Extensions to Maximum Entropy Methods
196(2)
Other Measures of Entropy (and Ensembles)
196(1)
Optimistic Exploration Using the Upper Bound of Double Q-Learning
196(1)
Tinkering with Experience Replay
197(1)
Soft Policy Gradient
197(1)
Soft Q-Learning (and Derivatives)
197(1)
Path Consistency Learning
198(1)
Performance Comparison: SAC Versus PPO
198(2)
How Does Entropy Encourage Exploration?
200(5)
How Does the Temperature Parameter Alter Exploration?
203(2)
Industrial Example: Learning to Drive with a Remote Control Car
205(6)
Description of the Problem
205(1)
Minimizing Training Time
205(3)
Dramatic Actions
208(1)
Hyperparameter Search
209(1)
Final Policy
209(1)
Further Improvements
210(1)
Summary
211(4)
Equivalence Between Policy Gradients and Soft Q-Learning
211(1)
What Does This Mean For the Future?
212(1)
What Does This Mean Now?
212(3)
8 Improving How an Agent Learns 215(36)
Rethinking the MDP
216(4)
Partially Observable Markov Decision Process
216(2)
Case Study: Using POMDPs in Autonomous Vehicles
218(1)
Contextual Markov Decision Processes
219(1)
MDPs with Changing Actions
219(1)
Regularized MDPs
220(1)
Hierarchical Reinforcement Learning
220(5)
Naive HRL
221(1)
High-Low Hierarchies with Intrinsic Rewards (HIRO)
222(1)
Learning Skills and Unsupervised RL
223(1)
Using Skills in HRL
224(1)
HRL Conclusions
225(1)
Multi-Agent Reinforcement Learning
225(10)
MARL Frameworks
226(2)
Centralized or Decentralized
228(1)
Single-Agent Algorithms
229(1)
Case Study: Using Single-Agent Decentralized Learning in UAVs
230(1)
Centralized Learning, Decentralized Execution
231(1)
Decentralized Learning
232(1)
Other Combinations
233(1)
Challenges of MARL
234(1)
MARL Conclusions
235(1)
Expert Guidance
235(5)
Behavior Cloning
236(1)
Imitation RL
236(1)
Inverse RL
237(1)
Curriculum Learning
238(2)
Other Paradigms
240(1)
Meta-Learning
240(1)
Transfer Learning
240(1)
Summary
241(1)
Further Reading
242(9)
9 Practical Reinforcement Learning 251(46)
The RL Project Life Cycle
251(5)
Life Cycle Definition
253(3)
Problem Definition: What Is an RL Project?
256(8)
RL Problems Are Sequential
256(1)
RL Problems Are Strategic
257(1)
Low-Level RL Indicators
258(2)
Types of Learning
260(4)
RL Engineering and Refinement
264(25)
Process
264(1)
Environment Engineering
265(3)
State Engineering or State Representation Learning
268(2)
Policy Engineering
270(5)
Mapping Policies to Action Spaces
275(4)
Exploration
279(6)
Reward Engineering
285(4)
Summary
289(1)
Further Reading
290(7)
10 Operational Reinforcement Learning 297(44)
Implementation
298(19)
Frameworks
298(3)
Scaling RL
301(8)
Evaluation
309(8)
Deployment
317(16)
Goals
317(4)
Architecture
321(2)
Ancillary Tooling
323(5)
Safety, Security, and Ethics
328(5)
Summary
333(1)
Further Reading
334(7)
11 Conclusions and the Future 341(18)
Tips and Tricks
341(4)
Framing the Problem
341(1)
Your Data
342(1)
Training
343(1)
Evaluation
344(1)
Deployment
345(1)
Debugging
345(3)
${ALGORITHM_NAME} Can't Solve ${ENVIRONMENT}!
347(1)
Monitoring for Debugging
348(1)
The Future of Reinforcement Learning
348(7)
RL Market Opportunities
349(1)
Future RL and Research Directions
350(5)
Concluding Remarks
355(2)
Next Steps
356(1)
Now It's Your Turn
356(1)
Further Reading
357(2)
A The Gradient of a Logistic Policy for Two Actions 359(4)
B The Gradient of a Softmax Policy 363(2)
Glossary 365(6)
Acronyms and Common Terms
365(3)
Symbols and Notation
368(3)
Index 371
Dr. Phil Winder is a multidisciplinary Software Engineer and Data Scientist. As the CEO of Winder Research, a Cloud-Native Data Science consultancy based in the UK, he helps startups and enterprises utilise Data Science. Through a combination of consulting and development they are able to grow and scale their business by improving their products and platforms.

For the past 5 years, Phil has taught thousands of Engineers about Data Science in his range of Data Science training courses at conferences, in public, in private and on the online Safari learning platform. In these courses Phil focuses on the practicalities of using Data Science in industry on a wide range of topics from cleaning data all the way through to deep reinforcement learning.