Atjaunināt sīkdatņu piekrišanu

E-grāmata: Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning

  • Formāts - PDF+DRM
  • Cena: 118,37 €*
  • * ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
  • Ielikt grozā
  • Pievienot vēlmju sarakstam
  • Šī e-grāmata paredzēta tikai personīgai lietošanai. E-grāmatas nav iespējams atgriezt un nauda par iegādātajām e-grāmatām netiek atmaksāta.

DRM restrictions

  • Kopēšana (kopēt/ievietot):

    nav atļauts

  • Drukāšana:

    nav atļauts

  • Lietošana:

    Digitālo tiesību pārvaldība (Digital Rights Management (DRM))
    Izdevējs ir piegādājis šo grāmatu šifrētā veidā, kas nozīmē, ka jums ir jāinstalē bezmaksas programmatūra, lai to atbloķētu un lasītu. Lai lasītu šo e-grāmatu, jums ir jāizveido Adobe ID. Vairāk informācijas šeit. E-grāmatu var lasīt un lejupielādēt līdz 6 ierīcēm (vienam lietotājam ar vienu un to pašu Adobe ID).

    Nepieciešamā programmatūra
    Lai lasītu šo e-grāmatu mobilajā ierīcē (tālrunī vai planšetdatorā), jums būs jāinstalē šī bezmaksas lietotne: PocketBook Reader (iOS / Android)

    Lai lejupielādētu un lasītu šo e-grāmatu datorā vai Mac datorā, jums ir nepieciešamid Adobe Digital Editions (šī ir bezmaksas lietotne, kas īpaši izstrādāta e-grāmatām. Tā nav tas pats, kas Adobe Reader, kas, iespējams, jau ir jūsu datorā.)

    Jūs nevarat lasīt šo e-grāmatu, izmantojot Amazon Kindle.

Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning introduce the evolving area of static and dynamic simulation-based optimization. Covered in detail are model-free optimization techniques especially designed for those discrete-event, stochastic systems which can be simulated but whose analytical models are difficult to find in closed mathematical forms.

Key features of this revised and improved Second Edition include:

· Extensive coverage, via step-by-step recipes, of powerful new algorithms for static simulation optimization, including simultaneous perturbation, backtracking adaptive search and nested partitions, in addition to traditional methods, such as response surfaces, Nelder-Mead search and meta-heuristics (simulated annealing, tabu search, and genetic algorithms)

· Detailed coverage of the Bellman equation framework for Markov Decision Processes (MDPs), along with dynamic programming(value and policy iteration) for discounted, average, and total reward performance metrics

· An in-depth consideration of dynamic simulation optimization via temporal differences and Reinforcement Learning: Q-Learning, SARSA, and R-SMART algorithms, and policy search, via API, Q-P-Learning, actor-critics, and learning automata

· A special examination of neural-network-based function approximation for Reinforcement Learning, semi-Markov decision processes (SMDPs), finite-horizon problems, two time scales, case studies for industrial tasks, computer codes (placed online) and convergence proofs, via Banach fixed point theory and Ordinary Differential Equations

Themed around three areas in separate sets of chapters Static Simulation Optimization, Reinforcement Learning and Convergence Analysis this book is written for researchers and students in the fields of engineering (industrial, systems,electrical and computer), operations research, computer science and applied mathematics.
Preface vii
Acknowledgments xi
List of Figures
xxi
List of Tables
xxvi
1 Background
1(12)
1 Motivation
1(4)
1.1 Main Branches
2(1)
1.2 Difficulties with Classical Optimization
3(1)
1.3 Recent Advances in Simulation Optimization
3(2)
2 Goals and Limitations
5(1)
2.1 Goals
5(1)
2.2 Limitations
6(1)
3 Notation
6(4)
3.1 Some Basic Conventions
6(1)
3.2 Vector Notation
7(1)
3.3 Notation for Matrices
8(1)
3.4 Notation for n-tuples
8(1)
3.5 Notation for Sets
8(1)
3.6 Notation for Sequences
9(1)
3.7 Notation for Transformations
9(1)
3.8 Max, Min, and Arg Max
10(1)
4 Organization
10(3)
2 Simulation Basics
13(16)
1
Chapter Overview
13(1)
2 Introduction
13(1)
3 Models
14(2)
4 Simulation Modeling
16(11)
4.1 Random Number Generation
16(4)
4.2 Event Generation
20(5)
4.3 Independence of Samples Collected
25(2)
5 Concluding Remarks
27(2)
3 Simulation-Based Optimization: An Overview
29(8)
1
Chapter Overview
29(1)
2 Parametric Optimization
29(3)
3 Control Optimization
32(3)
4 Concluding Remarks
35(2)
4 Parametric Optimization: Response Surfaces And Neural Networks
37(34)
1
Chapter Overview
37(1)
2 RSM: An Overview
37(2)
3 RSM: Details
39(8)
3.1 Sampling
39(1)
3.2 Function Fitting
40(6)
3.3 How Good Is the Guessed Metamodel?
46(1)
3.4 Optimization with a Metamodel
46(1)
4 Neuro-Response Surface Methods
47(22)
4.1 Linear Neural Networks
48(5)
4.2 Non-linear Neural Networks
53(16)
5 Concluding Remarks
69(2)
5 Parametric Optimization: Stochastic Gradients And Adaptive Search
71(52)
1
Chapter Overview
71(1)
2 Continuous Optimization
72(13)
2.1 Steepest Descent
72(11)
2.2 Non-derivative Methods
83(2)
3 Discrete Optimization
85(35)
3.1 Ranking and Selection
86(3)
3.2 Meta-heuristics
89(9)
3.3 Stochastic Adaptive Search
98(22)
4 Concluding Remarks
120(3)
6 Control Optimization With Stochastic Dynamic Programming
123(74)
1
Chapter Overview
123(1)
2 Stochastic Processes
123(2)
3 Markov, Semi-Markov, and Decision Processes
125(25)
3.1 Markov Chains
129(6)
3.2 Semi-Markov Processes
135(2)
3.3 Markov Decision Problems
137(13)
4 Average Reward MDPs and DP
150(9)
4.1 Bellman Policy Equation
151(1)
4.2 Policy Iteration
152(2)
4.3 Value Iteration and Its Variants
154(5)
5 Discounted Reward MDPs and DP
159(8)
5.1 Discounted Reward
160(1)
5.2 Discounted Reward MDPs
161(1)
5.3 Bellman Policy Equation
162(1)
5.4 Policy Iteration
163(1)
5.5 Value Iteration
164(2)
5.6 Gauss-Siedel Value Iteration
166(1)
6 Bellman Equation Revisited
167(2)
7 Semi-Markov Decision Problems
169(15)
7.1 Natural and Decision-Making Processes
171(2)
7.2 Average Reward SMDPs
173(7)
7.3 Discounted Reward SMDPs
180(4)
8 Modified Policy Iteration
184(3)
8.1 Steps for Discounted Reward MDPs
185(1)
8.2 Steps for Average Reward MDPs
186(1)
9 The MDP and Mathematical Programming
187(2)
10 Finite Horizon MDPs
189(4)
11 Conclusions
193(4)
7 Control Optimization With Reinforcement Learning
197(72)
1
Chapter Overview
197(1)
2 The Twin Curses of DP
198(5)
2.1 Breaking the Curses
199(2)
2.2 MLE and Small MDPs
201(2)
3 Reinforcement Learning: Fundamentals
203(8)
3.1 Q-Factors
204(2)
3.2 Q-Factor Value Iteration
206(1)
3.3 Robbins-Monro Algorithm
207(1)
3.4 Robbins-Monro and Q-Factors
208(1)
3.5 Asynchronous Updating and Step Sizes
209(2)
4 MDPs
211(23)
4.1 Discounted Reward
211(20)
4.2 Average Reward
231(2)
4.3 R-SMART and Other Algorithms
233(1)
5 SMDPs
234(10)
5.1 Discounted Reward
234(1)
5.2 Average Reward
235(9)
6 Model-Building Algorithms
244(4)
6.1 RTDP
245(1)
6.2 Model-Building Q-Learning
246(1)
6.3 Indirect Model-Building
247(1)
7 Finite Horizon Problems
248(1)
8 Function Approximation
249(16)
8.1 State Aggregation
250(5)
8.2 Function Fitting
255(10)
9 Conclusions
265(4)
8 Control Optimization With Stochastic Search
269(12)
1
Chapter Overview
269(1)
2 The MCAT Framework
270(6)
2.1 Step-by-Step Details of an MCAT Algorithm
272(2)
2.2 An Illustrative 3-State Example
274(2)
2.3 Multiple Actions
276(1)
3 Actor Critics
276(4)
3.1 Discounted Reward MDPs
277(1)
3.2 Average Reward MDPs
278(1)
3.3 Average Reward SMDPs
279(1)
4 Concluding Remarks
280(1)
9 Convergence: Background Material
281(38)
1
Chapter Overview
281(1)
2 Vectors and Vector Spaces
282(2)
3 Norms
284(1)
3.1 Properties of Norms
284(1)
4 Normed Vector Spaces
285(1)
5 Functions and Mappings
285(2)
5.1 Domain and Range
285(2)
5.2 The Notation for Transformations
287(1)
6 Mathematical Induction
287(3)
7 Sequences
290(11)
7.1 Convergent Sequences
292(1)
7.2 Increasing and Decreasing Sequences
293(1)
7.3 Boundedness
293(6)
7.4 Limit Theorems and Squeeze Theorem
299(2)
8 Sequences in Rn
301(1)
9 Cauchy Sequences in Rn
301(2)
10 Contraction Mappings in Rn
303(6)
11 Stochastic Approximation
309(10)
11.1 Convergence with Probability 1
309(1)
11.2 Ordinary Differential Equations
310(3)
11.3 Stochastic Approximation and ODEs
313(6)
10 Convergence Analysis Of Parametric Optimization Methods
319(32)
1
Chapter Overview
319(1)
2 Preliminaries
319(5)
2.1 Continuous Functions
320(1)
2.2 Partial Derivatives
320(1)
2.3 A Continuously Differentiable Function
320(1)
2.4 Stationary Points and Local and Global Optima
321(1)
2.5 Taylor's Theorem
322(2)
3 Steepest Descent
324(3)
4 Finite Differences Perturbation Estimates
327(1)
5 Simultaneous Perturbation
328(8)
5.1 Stochastic Gradient
329(5)
5.2 ODE Approach
334(2)
5.3 Spall's Conditions
336(1)
6 Stochastic Adaptive Search
336(14)
6.1 Pure Random Search
338(2)
6.2 Learning Automata Search Technique
340(1)
6.3 Backtracking Adaptive Search
341(2)
6.4 Simulated Annealing
343(6)
6.5 Modified Stochastic Ruler
349(1)
7 Concluding Remarks
350(1)
11 Convergence Analysis Of Control Optimization Methods
351(100)
1
Chapter Overview
351(1)
2 Dynamic Programming: Background
351(7)
2.1 Special Notation
353(1)
2.2 Monotonicity of T, Tˆμ, L, and Lˆμ
354(1)
2.3 Key Results for Average and Discounted MDPs
355(3)
3 Discounted Reward DP: MDPs
358(13)
3.1 Bellman Equation for Discounted Reward
358(9)
3.2 Policy Iteration
367(2)
3.3 Value Iteration
369(2)
4 Average Reward DP: MDPs
371(18)
4.1 Bellman Equation for Average Reward
371(5)
4.2 Policy Iteration
376(4)
4.3 Value Iteration
380(9)
5 DP: SMDPs
389(1)
6 Asynchronous Stochastic Approximation
390(10)
6.1 Asynchronous Convergence
390(6)
6.2 Two-Time-Scale Convergence
396(4)
7 Reinforcement Learning: Convergence Background
400(4)
7.1 Discounted Reward MDPs
401(1)
7.2 Average Reward MDPs
402(2)
8 Reinforcement Learning for MDPs: Convergence
404(20)
8.1 Q-Learning: Discounted Reward MDPs
404(11)
8.2 Relative Q-Learning: Average Reward MDPs
415(2)
8.3 CAP-I: Discounted Reward MDPs
417(5)
8.4 Q-P-Learning: Discounted Reward MDPs
422(2)
9 Reinforcement Learning for SMDPs: Convergence
424(23)
9.1 Q-Learning: Discounted Reward SMDPs
424(1)
9.2 Average Reward SMDPs
425(22)
10 Reinforcement Learning for Finite Horizon: Convergence
447(2)
11 Conclusions
449(2)
12 Case Studies
451(22)
1
Chapter Overview
451(1)
2 Airline Revenue Management
451(8)
3 Preventive Maintenance
459(4)
4 Production Line Buffer Optimization
463(5)
5 Other Case Studies
468(2)
6 Conclusions
470(3)
Appendix 473(4)
Bibliography 477(27)
Index 504
Abhijit Gosavi is a leading international authority on reinforcement learning, stochastic dynamic programming and simulation-based optimization. The first edition of his Springer book Simulation-Based Optimization that appeared in 2003 was the first text to have appeared on that topic. He is regularly an invited speaker at major national and international conferences on operations research, reinforcement learning, adaptive/approximate dynamic programming, and systems engineering.

He has published more than fifty journal and conference articles many of which have appeared in leading scholarly journals such as Management Science, Automatica, INFORMS Journal on Computing, Machine Learning, Journal of Retailing, Systems and Control Letters and the European Journal of Operational Research. He has also authored numerous book chapters on simulation-based optimization and operations research. His research has been funded by the National Science Foundation, Department of Defense, Missouri Department of Transportation, University of Missouri Research Board and industry. He has consulted extensively for the U.S. Department of Veterans Affairs and the mass media as a statistical/simulation analyst. He has received teaching awards from the Institute of Industrial Engineers.

He currently serves as an Associate Professor of Engineering Management and Systems Engineering at Missouri University of Science and Technology in Rolla, MO. He holds a masters degree in Mechanical Engineering from the Indian Institute of Technology and a Ph.D. in Industrial Engineering from the University of South Florida. He is a member of INFORMS, IIE and ASEE.