Klientu atbalsts: 27018494

Grāmatu iegāde | Jauns profils | Ienākt

E-grāmata: Reinforcement Learning

4.00/5 (19 ratings by Goodreads)

Phil Winder Ph.D.

Formāts: 408 pages
Izdošanas datums: 06-Nov-2020
Izdevniecība: O'Reilly Media
Valoda: eng
ISBN-13: 9781492072362

Citas grāmatas par šo tēmu:

Artificial intelligence

Formāts - PDF+DRM
Cena: 46,20 €*
* ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
Ielikt grozā
Pievienot vēlmju sarakstam
Šī e-grāmata paredzēta tikai personīgai lietošanai. E-grāmatas nav iespējams atgriezt un nauda par iegādātajām e-grāmatām netiek atmaksāta.

Formāts: 408 pages
Izdošanas datums: 06-Nov-2020
Izdevniecība: O'Reilly Media
Valoda: eng
ISBN-13: 9781492072362

Citas grāmatas par šo tēmu:

Artificial intelligence

DRM restrictions

Kopēšana (kopēt/ievietot):

nav atļauts
Drukāšana:

nav atļauts
Lietošana:

Digitālo tiesību pārvaldība (Digital Rights Management (DRM))
Izdevējs ir piegādājis šo grāmatu šifrētā veidā, kas nozīmē, ka jums ir jāinstalē bezmaksas programmatūra, lai to atbloķētu un lasītu. Lai lasītu šo e-grāmatu, jums ir jāizveido Adobe ID. Vairāk informācijas šeit. E-grāmatu var lasīt un lejupielādēt līdz 6 ierīcēm (vienam lietotājam ar vienu un to pašu Adobe ID).

Nepieciešamā programmatūra
Lai lasītu šo e-grāmatu mobilajā ierīcē (tālrunī vai planšetdatorā), jums būs jāinstalē šī bezmaksas lietotne: PocketBook Reader (iOS / Android)

Lai lejupielādētu un lasītu šo e-grāmatu datorā vai Mac datorā, jums ir nepieciešamid Adobe Digital Editions (šī ir bezmaksas lietotne, kas īpaši izstrādāta e-grāmatām. Tā nav tas pats, kas Adobe Reader, kas, iespējams, jau ir jūsu datorā.)

Jūs nevarat lasīt šo e-grāmatu, izmantojot Amazon Kindle.

Reinforcement learning (RL) will deliver one of the biggest breakthroughs in AI over the next decade, enabling algorithms to learn from their environment to achieve arbitrary goals. This exciting development avoids constraints found in traditional machine learning (ML) algorithms. This practical book shows data science and AI professionals how to learn by reinforcementand enable a machine to learn by itself.

Author Phil Winder of Winder Research covers everything from basic building blocks to state-of-the-art practices. You'll explore the current state of RL, focus on industrial applications, learnnumerous algorithms, and benefit from dedicated chapters on deploying RL solutions to production. This is no cookbook; doesn't shy away from math and expects familiarity with ML.

Learn what RL is and how the algorithms help solve problems
Become grounded in RL fundamentals including Markov decision processes, dynamic programming, and temporal difference learning
Dive deep into a range of value and policy gradient methods
Apply advanced RL solutions such as meta learning, hierarchical learning, multi-agent, and imitation learning
Understand cutting-edge deep RL algorithms including Rainbow, PPO, TD3, SAC, and more
Get practical examples through the accompanying website

Preface

1 Why Reinforcement Learning?

(24)

Why Now?

(1)

Machine Learning

(1)

Reinforcement Learning

(4)

When Should You Use RL?

(2)

RL Applications

(1)

Taxonomy of RL Approaches

(4)

Model-Free or Model-Based

(1)

How Agents Use and Update Their Strategy

(1)

Discrete or Continuous Actions

(1)

Optimization Methods

(1)

Policy Evaluation and Improvement

(1)

Fundamental Concepts in Reinforcement Learning

(6)

The First RL Algorithm

(3)

Is RL the Same as ML?

(1)

Reward and Feedback

(2)

Reinforcement Learning as a Discipline

(2)

Summary

(1)

Further Reading

(5)

2 Markov Decision Processes, Dynamic Programming, and Monte Carlo Methods

(34)

Multi-Arm Bandit Testing

(10)

Reward Engineering

(1)

Policy Evaluation: The Value Function

(3)

Policy Improvement: Choosing the Best Action

(2)

Simulating the Environment

(1)

Running the Experiment

(2)

Improving the e-greedy Algorithm

(2)

Markov Decision Processes

(7)

Inventory Control

(4)

Inventory Control Simulation

(2)

Policies and Value Functions

(8)

Discounted Rewards

(1)

Predicting Rewards with the State-Value Function

(4)

Predicting Rewards with the Action-Value Function

(1)

Optimal Policies

(2)

Monte Carlo Policy Generation

(2)

Value Iteration with Dynamic Programming

(5)

Implementing Value Iteration

(2)

Results of Value Iteration

(1)

Summary

(1)

Further Reading

(2)

3 Temporal-Difference Learning, Q-Learning, and n-Step Algorithms

(28)

Formulation of Temporal-Difference Learning

(10)

Q-Learning

(2)

SARSA

(1)

Q-Learning Versus SARSA

(3)

Case Study: Automatically Scaling Application Containers to Reduce Cost

(2)

Industrial Example: Real-Time Bidding in Advertising

(4)

Defining the MDP

(1)

Results of the Real-Time Bidding Environments

(2)

Further Improvements

(1)

Extensions to Q-Learning

(2)

Double Q-Learning

(1)

Delayed Q-Learning

(1)

Comparing Standard, Double, and Delayed Q-learning

(1)

Opposition Learning

(1)

n-Step Algorithms

(4)

n-Step Algorithms on Grid Environments

(1)

Eligibility Traces

(3)

Extensions to Eligibility Traces

(2)

Watkins's Q(A)

(1)

Fuzzy Wipes in Watkins's Q(A)

(1)

Speedy Q-Learning

(1)

Accumulating Versus Replacing Eligibility Traces

(1)

Summary

(1)

Further Reading

(2)

4 Deep Q-Networks

(28)

Deep Learning Architectures

(4)

Fundamentals

(1)

Common Neural Network Architectures

(1)

Deep Learning Frameworks

(1)

Deep Reinforcement Learning

(1)

Deep Q-Learning

(7)

Experience Replay

(1)

Q-Network Clones

(1)

Neural Network Architecture

(1)

Implementing DQN

(1)

Example: DQN on the CartPole Environment

(4)

Case Study: Reducing Energy Usage in Buildings

(1)

Rainbow DQN

(4)

Distributional RL

100

(2)

Prioritized Experience Replay

102

(1)

Noisy Nets

102

(1)

Dueling Networks

102

(1)

Example: Rainbow DQN on Atari Games

103

(4)

Results

104

(2)

Discussion

106

(1)

Other DQN Improvements

107

(4)

Improving Exploration

108

(1)

Improving Rewards

109

(1)

Learning from Offline Data

109

(2)

Summary

111

(1)

Further Reading

112

(3)

5 Policy Gradient Methods

115

(30)

Benefits of Learning a Policy Directly

115

(1)

How to Calculate the Gradient of a Policy

116

(1)

Policy Gradient Theorem

117

(2)

Policy Functions

119

(3)

Linear Policies

120

(2)

Arbitrary Policies

122

(1)

Basic Implementations

122

(14)

Monte Carlo (REINFORCE)

122

(2)

REINFORCE with Baseline

124

(3)

Gradient Variance Reduction

127

(2)

n-Step Actor-Critic and Advantage Actor-Critic (A2C)

129

(5)

Eligibility Traces Actor-Critic

134

(1)

A Comparison of Basic Policy Gradient Algorithms

135

(1)

Industrial Example: Automatically Purchasing Products for Customers

136

(6)

The Environment: Gym-Shopping-Cart

137

(1)

Expectations

137

(1)

Results from the Shopping Cart Environment

138

(4)

Summary

142

(1)

Further Reading

143

(2)

6 Beyond Policy Gradients

145

(46)

Off-Policy Algorithms

145

(7)

Importance Sampling

146

(2)

Behavior and Target Policies

148

(1)

Off-Policy Q-Learning

149

(1)

Gradient Temporal-Difference Learning

149

(1)

Greedy-GQ

150

(1)

Off-Policy Actor-Critics

151

(1)

Deterministic Policy Gradients

152

(11)

Deterministic Policy Gradients

152

(2)

Deep Deterministic Policy Gradients

154

(4)

Twin Delayed DDPG

158

(3)

Case Study: Recommendations Using Reviews

161

(2)

Improvements to DPG

163

(1)

Trust Region Methods

163

(11)

Kullback-Leibler Divergence

165

(2)

Natural Policy Gradients and Trust Region Policy Optimization

167

(2)

Proximal Policy Optimization

169

(5)

Example: Using Servos for a Real-Life Reacher

174

(7)

Experiment Setup

175

(1)

RL Algorithm Implementation

175

(2)

Increasing the Complexity of the Algorithm

177

(1)

Hyperparameter Tuning in a Simulation

178

(2)

Resulting Policies

180

(1)

Other Policy Gradient Algorithms

181

(3)

Retrace(A)

182

(1)

Actor-Critic with Experience Replay (ACER)

182

(1)

Actor-Critic Using Kronecker-Factored Trust Regions (ACKTR)

183

(1)

Emphatic Methods

183

(1)

Extensions to Policy Gradient Algorithms

184

(1)

Quantile Regression in Policy Gradient Algorithms

184

(1)

Summary

184

(2)

Which Algorithm Should I Use?

185

(1)

A Note on Asynchronous Methods

185

(1)

Further Reading

186

(5)

7 Learning All Possible Policies with Entropy Methods

191

(24)

What Is Entropy?

191

(1)

Maximum Entropy Reinforcement Learning

192

(1)

Soft Actor-Critic

193

(3)

SAC Implementation Details and Discrete Action Spaces

194

(1)

Automatically Adjusting Temperature

194

(1)

Case Study: Automated Traffic Management to Reduce Queuing

195

(1)

Extensions to Maximum Entropy Methods

196

(2)

Other Measures of Entropy (and Ensembles)

196

(1)

Optimistic Exploration Using the Upper Bound of Double Q-Learning

196

(1)

Tinkering with Experience Replay

197

(1)

Soft Policy Gradient

197

(1)

Soft Q-Learning (and Derivatives)

197

(1)

Path Consistency Learning

198

(1)

Performance Comparison: SAC Versus PPO

198

(2)

How Does Entropy Encourage Exploration?

200

(5)

How Does the Temperature Parameter Alter Exploration?

203

(2)

Industrial Example: Learning to Drive with a Remote Control Car

205

(6)

Description of the Problem

205

(1)

Minimizing Training Time

205

(3)

Dramatic Actions

208

(1)

Hyperparameter Search

209

(1)

Final Policy

209

(1)

Further Improvements

210

(1)

Summary

211

(4)

Equivalence Between Policy Gradients and Soft Q-Learning

211

(1)

What Does This Mean For the Future?

212

(1)

What Does This Mean Now?

212

(3)

8 Improving How an Agent Learns

215

(36)

Rethinking the MDP

216

(4)

Partially Observable Markov Decision Process

216

(2)

Case Study: Using POMDPs in Autonomous Vehicles

218

(1)

Contextual Markov Decision Processes

219

(1)

MDPs with Changing Actions

219

(1)

Regularized MDPs

220

(1)

Hierarchical Reinforcement Learning

220

(5)

Naive HRL

221

(1)

High-Low Hierarchies with Intrinsic Rewards (HIRO)

222

(1)

Learning Skills and Unsupervised RL

223

(1)

Using Skills in HRL

224

(1)

HRL Conclusions

225

(1)

Multi-Agent Reinforcement Learning

225

(10)

MARL Frameworks

226

(2)

Centralized or Decentralized

228

(1)

Single-Agent Algorithms

229

(1)

Case Study: Using Single-Agent Decentralized Learning in UAVs

230

(1)

Centralized Learning, Decentralized Execution

231

(1)

Decentralized Learning

232

(1)

Other Combinations

233

(1)

Challenges of MARL

234

(1)

MARL Conclusions

235

(1)

Expert Guidance

235

(5)

Behavior Cloning

236

(1)

Imitation RL

236

(1)

Inverse RL

237

(1)

Curriculum Learning

238

(2)

Other Paradigms

240

(1)

Meta-Learning

240

(1)

Transfer Learning

240

(1)

Summary

241

(1)

Further Reading

242

(9)

9 Practical Reinforcement Learning

251

(46)

The RL Project Life Cycle

251

(5)

Life Cycle Definition

253

(3)

Problem Definition: What Is an RL Project?

256

(8)

RL Problems Are Sequential

256

(1)

RL Problems Are Strategic

257

(1)

Low-Level RL Indicators

258

(2)

Types of Learning

260

(4)

RL Engineering and Refinement

264

(25)

Process

264

(1)

Environment Engineering

265

(3)

State Engineering or State Representation Learning

268

(2)

Policy Engineering

270

(5)

Mapping Policies to Action Spaces

275

(4)

Exploration

279

(6)

Reward Engineering

285

(4)

Summary

289

(1)

Further Reading

290

(7)

10 Operational Reinforcement Learning

297

(44)

Implementation

298

(19)

Frameworks

298

(3)

Scaling RL

301

(8)

Evaluation

309

(8)

Deployment

317

(16)

Goals

317

(4)

Architecture

321

(2)

Ancillary Tooling

323

(5)

Safety, Security, and Ethics

328

(5)

Summary

333

(1)

Further Reading

334

(7)

11 Conclusions and the Future

341

(18)

Tips and Tricks

341

(4)

Framing the Problem

341

(1)

Your Data

342

(1)

Training

343

(1)

Evaluation

344

(1)

Deployment

345

(1)

Debugging

345

(3)

${ALGORITHM_NAME} Can't Solve ${ENVIRONMENT}!

347

(1)

Monitoring for Debugging

348

(1)

The Future of Reinforcement Learning

348

(7)

RL Market Opportunities

349

(1)

Future RL and Research Directions

350

(5)

Concluding Remarks

355

(2)

Next Steps

356

(1)

Now It's Your Turn

356

(1)

Further Reading

357

(2)

A The Gradient of a Logistic Policy for Two Actions

359

(4)

B The Gradient of a Softmax Policy

363

(2)

Glossary

365

(6)

Acronyms and Common Terms

365

(3)

Symbols and Notation

368

(3)

Index

371

Dr. Phil Winder is a multidisciplinary Software Engineer and Data Scientist. As the CEO of Winder Research, a Cloud-Native Data Science consultancy based in the UK, he helps startups and enterprises utilise Data Science. Through a combination of consulting and development they are able to grow and scale their business by improving their products and platforms.

For the past 5 years, Phil has taught thousands of Engineers about Data Science in his range of Data Science training courses at conferences, in public, in private and on the online Safari learning platform. In these courses Phil focuses on the practicalities of using Data Science in industry on a wide range of topics from cleaning data all the way through to deep reinforcement learning.

Biežāk uzdotie jautājumi par e-grāmatām

Permanent link: https://www.kriso.lv/db/97814920723622e.html

Keywords:

E-grāmata: Reinforcement Learning

DRM restrictions

Kopēšana (kopēt/ievietot):

Drukāšana:

Lietošana:

Konts un iestatījumi

Meklēšana

Meklēt datubāzē

Refine By

Tēmas Ebook Subjects

Izvēlieties iepirkumu grozu