Preface |
|
vii | |
Acknowledgments |
|
xi | |
|
|
xxi | |
|
|
xxvi | |
|
|
1 | (12) |
|
|
1 | (4) |
|
|
2 | (1) |
|
1.2 Difficulties with Classical Optimization |
|
|
3 | (1) |
|
1.3 Recent Advances in Simulation Optimization |
|
|
3 | (2) |
|
|
5 | (1) |
|
|
5 | (1) |
|
|
6 | (1) |
|
|
6 | (4) |
|
3.1 Some Basic Conventions |
|
|
6 | (1) |
|
|
7 | (1) |
|
3.3 Notation for Matrices |
|
|
8 | (1) |
|
3.4 Notation for n-tuples |
|
|
8 | (1) |
|
|
8 | (1) |
|
3.6 Notation for Sequences |
|
|
9 | (1) |
|
3.7 Notation for Transformations |
|
|
9 | (1) |
|
3.8 Max, Min, and Arg Max |
|
|
10 | (1) |
|
|
10 | (3) |
|
|
13 | (16) |
|
|
13 | (1) |
|
|
13 | (1) |
|
|
14 | (2) |
|
|
16 | (11) |
|
4.1 Random Number Generation |
|
|
16 | (4) |
|
|
20 | (5) |
|
4.3 Independence of Samples Collected |
|
|
25 | (2) |
|
|
27 | (2) |
|
3 Simulation-Based Optimization: An Overview |
|
|
29 | (8) |
|
|
29 | (1) |
|
2 Parametric Optimization |
|
|
29 | (3) |
|
|
32 | (3) |
|
|
35 | (2) |
|
4 Parametric Optimization: Response Surfaces And Neural Networks |
|
|
37 | (34) |
|
|
37 | (1) |
|
|
37 | (2) |
|
|
39 | (8) |
|
|
39 | (1) |
|
|
40 | (6) |
|
3.3 How Good Is the Guessed Metamodel? |
|
|
46 | (1) |
|
3.4 Optimization with a Metamodel |
|
|
46 | (1) |
|
4 Neuro-Response Surface Methods |
|
|
47 | (22) |
|
4.1 Linear Neural Networks |
|
|
48 | (5) |
|
4.2 Non-linear Neural Networks |
|
|
53 | (16) |
|
|
69 | (2) |
|
5 Parametric Optimization: Stochastic Gradients And Adaptive Search |
|
|
71 | (52) |
|
|
71 | (1) |
|
2 Continuous Optimization |
|
|
72 | (13) |
|
|
72 | (11) |
|
2.2 Non-derivative Methods |
|
|
83 | (2) |
|
|
85 | (35) |
|
3.1 Ranking and Selection |
|
|
86 | (3) |
|
|
89 | (9) |
|
3.3 Stochastic Adaptive Search |
|
|
98 | (22) |
|
|
120 | (3) |
|
6 Control Optimization With Stochastic Dynamic Programming |
|
|
123 | (74) |
|
|
123 | (1) |
|
|
123 | (2) |
|
3 Markov, Semi-Markov, and Decision Processes |
|
|
125 | (25) |
|
|
129 | (6) |
|
3.2 Semi-Markov Processes |
|
|
135 | (2) |
|
3.3 Markov Decision Problems |
|
|
137 | (13) |
|
4 Average Reward MDPs and DP |
|
|
150 | (9) |
|
4.1 Bellman Policy Equation |
|
|
151 | (1) |
|
|
152 | (2) |
|
4.3 Value Iteration and Its Variants |
|
|
154 | (5) |
|
5 Discounted Reward MDPs and DP |
|
|
159 | (8) |
|
|
160 | (1) |
|
5.2 Discounted Reward MDPs |
|
|
161 | (1) |
|
5.3 Bellman Policy Equation |
|
|
162 | (1) |
|
|
163 | (1) |
|
|
164 | (2) |
|
5.6 Gauss-Siedel Value Iteration |
|
|
166 | (1) |
|
6 Bellman Equation Revisited |
|
|
167 | (2) |
|
7 Semi-Markov Decision Problems |
|
|
169 | (15) |
|
7.1 Natural and Decision-Making Processes |
|
|
171 | (2) |
|
|
173 | (7) |
|
7.3 Discounted Reward SMDPs |
|
|
180 | (4) |
|
8 Modified Policy Iteration |
|
|
184 | (3) |
|
8.1 Steps for Discounted Reward MDPs |
|
|
185 | (1) |
|
8.2 Steps for Average Reward MDPs |
|
|
186 | (1) |
|
9 The MDP and Mathematical Programming |
|
|
187 | (2) |
|
|
189 | (4) |
|
|
193 | (4) |
|
7 Control Optimization With Reinforcement Learning |
|
|
197 | (72) |
|
|
197 | (1) |
|
|
198 | (5) |
|
|
199 | (2) |
|
|
201 | (2) |
|
3 Reinforcement Learning: Fundamentals |
|
|
203 | (8) |
|
|
204 | (2) |
|
3.2 Q-Factor Value Iteration |
|
|
206 | (1) |
|
3.3 Robbins-Monro Algorithm |
|
|
207 | (1) |
|
3.4 Robbins-Monro and Q-Factors |
|
|
208 | (1) |
|
3.5 Asynchronous Updating and Step Sizes |
|
|
209 | (2) |
|
|
211 | (23) |
|
|
211 | (20) |
|
|
231 | (2) |
|
4.3 R-SMART and Other Algorithms |
|
|
233 | (1) |
|
|
234 | (10) |
|
|
234 | (1) |
|
|
235 | (9) |
|
6 Model-Building Algorithms |
|
|
244 | (4) |
|
|
245 | (1) |
|
6.2 Model-Building Q-Learning |
|
|
246 | (1) |
|
6.3 Indirect Model-Building |
|
|
247 | (1) |
|
7 Finite Horizon Problems |
|
|
248 | (1) |
|
|
249 | (16) |
|
|
250 | (5) |
|
|
255 | (10) |
|
|
265 | (4) |
|
8 Control Optimization With Stochastic Search |
|
|
269 | (12) |
|
|
269 | (1) |
|
|
270 | (6) |
|
2.1 Step-by-Step Details of an MCAT Algorithm |
|
|
272 | (2) |
|
2.2 An Illustrative 3-State Example |
|
|
274 | (2) |
|
|
276 | (1) |
|
|
276 | (4) |
|
3.1 Discounted Reward MDPs |
|
|
277 | (1) |
|
|
278 | (1) |
|
|
279 | (1) |
|
|
280 | (1) |
|
9 Convergence: Background Material |
|
|
281 | (38) |
|
|
281 | (1) |
|
2 Vectors and Vector Spaces |
|
|
282 | (2) |
|
|
284 | (1) |
|
|
284 | (1) |
|
|
285 | (1) |
|
|
285 | (2) |
|
|
285 | (2) |
|
5.2 The Notation for Transformations |
|
|
287 | (1) |
|
|
287 | (3) |
|
|
290 | (11) |
|
|
292 | (1) |
|
7.2 Increasing and Decreasing Sequences |
|
|
293 | (1) |
|
|
293 | (6) |
|
7.4 Limit Theorems and Squeeze Theorem |
|
|
299 | (2) |
|
|
301 | (1) |
|
|
301 | (2) |
|
10 Contraction Mappings in Rn |
|
|
303 | (6) |
|
11 Stochastic Approximation |
|
|
309 | (10) |
|
11.1 Convergence with Probability 1 |
|
|
309 | (1) |
|
11.2 Ordinary Differential Equations |
|
|
310 | (3) |
|
11.3 Stochastic Approximation and ODEs |
|
|
313 | (6) |
|
10 Convergence Analysis Of Parametric Optimization Methods |
|
|
319 | (32) |
|
|
319 | (1) |
|
|
319 | (5) |
|
|
320 | (1) |
|
|
320 | (1) |
|
2.3 A Continuously Differentiable Function |
|
|
320 | (1) |
|
2.4 Stationary Points and Local and Global Optima |
|
|
321 | (1) |
|
|
322 | (2) |
|
|
324 | (3) |
|
4 Finite Differences Perturbation Estimates |
|
|
327 | (1) |
|
5 Simultaneous Perturbation |
|
|
328 | (8) |
|
|
329 | (5) |
|
|
334 | (2) |
|
|
336 | (1) |
|
6 Stochastic Adaptive Search |
|
|
336 | (14) |
|
|
338 | (2) |
|
6.2 Learning Automata Search Technique |
|
|
340 | (1) |
|
6.3 Backtracking Adaptive Search |
|
|
341 | (2) |
|
|
343 | (6) |
|
6.5 Modified Stochastic Ruler |
|
|
349 | (1) |
|
|
350 | (1) |
|
11 Convergence Analysis Of Control Optimization Methods |
|
|
351 | (100) |
|
|
351 | (1) |
|
2 Dynamic Programming: Background |
|
|
351 | (7) |
|
|
353 | (1) |
|
2.2 Monotonicity of T, Tˆμ, L, and Lˆμ |
|
|
354 | (1) |
|
2.3 Key Results for Average and Discounted MDPs |
|
|
355 | (3) |
|
3 Discounted Reward DP: MDPs |
|
|
358 | (13) |
|
3.1 Bellman Equation for Discounted Reward |
|
|
358 | (9) |
|
|
367 | (2) |
|
|
369 | (2) |
|
4 Average Reward DP: MDPs |
|
|
371 | (18) |
|
4.1 Bellman Equation for Average Reward |
|
|
371 | (5) |
|
|
376 | (4) |
|
|
380 | (9) |
|
|
389 | (1) |
|
6 Asynchronous Stochastic Approximation |
|
|
390 | (10) |
|
6.1 Asynchronous Convergence |
|
|
390 | (6) |
|
6.2 Two-Time-Scale Convergence |
|
|
396 | (4) |
|
7 Reinforcement Learning: Convergence Background |
|
|
400 | (4) |
|
7.1 Discounted Reward MDPs |
|
|
401 | (1) |
|
|
402 | (2) |
|
8 Reinforcement Learning for MDPs: Convergence |
|
|
404 | (20) |
|
8.1 Q-Learning: Discounted Reward MDPs |
|
|
404 | (11) |
|
8.2 Relative Q-Learning: Average Reward MDPs |
|
|
415 | (2) |
|
8.3 CAP-I: Discounted Reward MDPs |
|
|
417 | (5) |
|
8.4 Q-P-Learning: Discounted Reward MDPs |
|
|
422 | (2) |
|
9 Reinforcement Learning for SMDPs: Convergence |
|
|
424 | (23) |
|
9.1 Q-Learning: Discounted Reward SMDPs |
|
|
424 | (1) |
|
|
425 | (22) |
|
10 Reinforcement Learning for Finite Horizon: Convergence |
|
|
447 | (2) |
|
|
449 | (2) |
|
|
451 | (22) |
|
|
451 | (1) |
|
2 Airline Revenue Management |
|
|
451 | (8) |
|
|
459 | (4) |
|
4 Production Line Buffer Optimization |
|
|
463 | (5) |
|
|
468 | (2) |
|
|
470 | (3) |
Appendix |
|
473 | (4) |
Bibliography |
|
477 | (27) |
Index |
|
504 | |