|
|
xv | |
|
|
xvii | |
Foreword |
|
xix | |
Preface |
|
xxi | |
The Author Bio |
|
xxxiii | |
Contributors |
|
xxxv | |
|
1 Introduction to Learning in Games |
|
|
1 | (52) |
|
1.1 Basic Elements of Games |
|
|
5 | (17) |
|
1.1.1 Basic Components of One-Shot Game |
|
|
5 | (4) |
|
1.1.4 State-Dependent One-Shot Game |
|
|
9 | (1) |
|
1.1.4.1 Perfectly-Known State One-Shot Games |
|
|
9 | (1) |
|
1.1.4.2 One-Shot Games with Partially-Known State |
|
|
10 | (1) |
|
1.1.4.3 State Component is Unknown |
|
|
10 | (1) |
|
1.1.4.4 Only the State Space Is Known |
|
|
11 | (1) |
|
1.1.5 Perfectly Known State Dynamic Game |
|
|
11 | (1) |
|
1.1.6 Unknown State Dynamic Games |
|
|
12 | (8) |
|
1.1.7 State-Dependent Equilibrium |
|
|
20 | (1) |
|
1.1.8 Random Matrix Games |
|
|
21 | (1) |
|
1.1.9 Dynamic Robust Game |
|
|
21 | (1) |
|
1.2 Robust Games in Networks |
|
|
22 | (5) |
|
|
27 | (2) |
|
1.4 Basics of Robust Cooperative Games |
|
|
29 | (4) |
|
|
29 | (1) |
|
1.4.0.4 Cooperative Solution Concepts |
|
|
30 | (3) |
|
1.5 Distributed Strategic Learning |
|
|
33 | (8) |
|
|
39 | (1) |
|
|
39 | (1) |
|
1.5.2.1 How to Select an Efficient Outcome? |
|
|
40 | (1) |
|
1.5.2.2 How to Select a Stable Outcome? |
|
|
40 | (1) |
|
1.6 Distributed Strategic Learning in "Wireless Networks |
|
|
41 | (12) |
|
|
41 | (1) |
|
|
42 | (1) |
|
|
42 | (5) |
|
|
47 | (1) |
|
|
48 | (1) |
|
|
49 | (4) |
|
|
53 | (84) |
|
|
53 | (1) |
|
2.2 Strategy Learning under Perfect Action Monitoring |
|
|
53 | (43) |
|
2.2.1 Fictitious Play-Based Algorithms |
|
|
54 | (12) |
|
2.2.2 Best Response-Based Learning Algorithms |
|
|
66 | (6) |
|
2.2.5 Better Reply-Based Learning Algorithms |
|
|
72 | (5) |
|
2.2.6 Fixed Point Iterations |
|
|
77 | (3) |
|
|
80 | (6) |
|
2.2.8 Learning Bargaining Solutions |
|
|
86 | (5) |
|
2.2.9 Learning and Conjectural Variations |
|
|
91 | (3) |
|
2.2.10 Bayesian Learning in Games |
|
|
94 | (2) |
|
2.2.11 Non-Bayesian Learning |
|
|
96 | (1) |
|
2.3 Fully Distributed Strategy-Learning |
|
|
96 | (34) |
|
2.3.1 Learning by Experimentation |
|
|
98 | (3) |
|
2.3.2 Reinforcement Learning |
|
|
101 | (10) |
|
2.3.3 Learning Correlated Equilibria |
|
|
111 | (3) |
|
2.3.4 Boltzmann-Gibbs Learning Algorithms |
|
|
114 | (4) |
|
2.3.5 Hybrid Learning Scheme |
|
|
118 | (3) |
|
2.3.6 Fast Convergence of Evolutionary Dynamics |
|
|
121 | (1) |
|
2.3.7 Convergence in Finite Number of Steps |
|
|
122 | (1) |
|
2.3.8 Convergence Time of Boltzmann-Gibbs Learning |
|
|
123 | (4) |
|
2.3.9 Learning Satisfactory Solutions |
|
|
127 | (3) |
|
2.4 Stochastic Approximations |
|
|
130 | (1) |
|
|
131 | (1) |
|
2.6 Discussions and Open Issues |
|
|
132 | (5) |
|
3 Payoff Learning and Dynamics |
|
|
137 | (12) |
|
|
137 | (3) |
|
3.2 Learning Equilibrium Payoffs |
|
|
140 | (4) |
|
|
144 | (1) |
|
3.4 Routing Games with Parallel Links |
|
|
144 | (2) |
|
3.5 Numerical Values of Payoffs Are Not Observed |
|
|
146 | (3) |
|
|
149 | (58) |
|
|
149 | (3) |
|
|
152 | (10) |
|
4.2.1 Description of the Dynamic Game |
|
|
153 | (2) |
|
4.2.2 Combined Payoff and Strategy Learning |
|
|
155 | (7) |
|
|
162 | (4) |
|
4.3.1 Convergence of the Payoff Reinforcement Learning |
|
|
163 | (1) |
|
|
163 | (2) |
|
4.3.3 From Imitative Boltzmann-Gibbs CODIPAS-RL to Replicator Dynamics |
|
|
165 | (1) |
|
4.4 Hybrid and Combined Dynamics |
|
|
166 | (14) |
|
4.4.1 From Boltzmann-Gibbs-Based CODIPAS-RL to Composed Dynamics |
|
|
166 | (1) |
|
4.4.2 From Heterogeneous Learning to Novel Game Dynamics |
|
|
167 | (4) |
|
4.4.3 Aggregative Robust Games in Wireless Networks |
|
|
171 | (1) |
|
4.4.3.2 Power Allocation as Aggregative Robust Games |
|
|
172 | (6) |
|
4.4.4 Wireless MIMO Systems |
|
|
178 | (1) |
|
4.4.4.1 Learning the Outage Probability |
|
|
179 | (1) |
|
4.4.4.2 Learning the Ergodic Capacity |
|
|
180 | (1) |
|
4.5 Learning in Games with Continuous Action Spaces |
|
|
180 | (3) |
|
4.5.1 Stable Robust Games |
|
|
181 | (2) |
|
4.5.2 Stochastic-Gradient-Like CODIPAS |
|
|
183 | (1) |
|
4.6 CODIPAS for Stable Games with Continuous Action Spaces |
|
|
183 | (3) |
|
4.6.1 Algorithm to Solve Variational Inequality |
|
|
184 | (1) |
|
4.6.2 Convergence to Variational Inequality Solution |
|
|
185 | (1) |
|
4.7 CODIPAS-RL via Extremum-Seeking |
|
|
186 | (2) |
|
4.8 Designer and Users in an Hierarchical System |
|
|
188 | (3) |
|
4.9 From Fictitious Play with Inertia to CODIPAS-RL |
|
|
191 | (1) |
|
4.10 CODIPAS-RL with Random Number of Active Players |
|
|
192 | (5) |
|
4.11 CODIPAS for Multi-Armed Bandit Problems |
|
|
197 | (3) |
|
4.12 CODIPAS and Evolutionary Game Dynamics |
|
|
200 | (2) |
|
4.12.1 Discrete-Time Evolutionary Game Dynamics |
|
|
201 | (1) |
|
4.12.4 CODIPAS-Based Evolutionary Game Dynamics |
|
|
201 | (1) |
|
4.13 Fastest Learning Algorithms |
|
|
202 | (5) |
|
5 Learning under Delayed Measurement |
|
|
207 | (24) |
|
|
207 | (1) |
|
5.2 Learning under Delayed Imperfect Payoffs |
|
|
208 | (4) |
|
5.2.1 CODIPAS-RL under Delayed Measurement |
|
|
209 | (3) |
|
5.3 Reacting to the Interference |
|
|
212 | (19) |
|
|
214 | (2) |
|
|
216 | (1) |
|
|
216 | (1) |
|
|
216 | (2) |
|
5.3.3 MIMO Interference Channel |
|
|
218 | (4) |
|
5.3.3.1 One-Shot MIMO Game |
|
|
222 | (3) |
|
|
225 | (2) |
|
5.3.4.5 Without Perfect CSI |
|
|
227 | (4) |
|
6 Learning in Constrained Robust Games |
|
|
231 | (24) |
|
|
231 | (1) |
|
6.2 Constrained One-Shot Games |
|
|
231 | (2) |
|
6.2.1 Orthogonal Constraints |
|
|
231 | (1) |
|
6.2.2 Coupled Constraints |
|
|
232 | (1) |
|
6.3 Quality of Experience |
|
|
233 | (1) |
|
6.4 Relevance in QoE and QoS satisfaction |
|
|
234 | (1) |
|
6.5 Satisfaction Levels as Benchmarks |
|
|
235 | (1) |
|
6.6 Satisfactory Solution |
|
|
236 | (1) |
|
6.7 Efficient Satisfactory Solution |
|
|
237 | (1) |
|
6.8 Learning a Satisfactory Solution |
|
|
237 | (2) |
|
6.8.3 Minkowski-Sum of Feasible Sets |
|
|
239 | (1) |
|
6.9 From Nash Equilibrium to Satisfactory Solution |
|
|
239 | (1) |
|
6.10 Mixed and Near-Satisfactory Solution |
|
|
240 | (2) |
|
6.11 CODIPAS with Dynamic Satisfaction Level |
|
|
242 | (1) |
|
|
243 | (10) |
|
6.12.1 Random Matrix Games Overview |
|
|
244 | (1) |
|
6.12.2 Zero-Sum Random Matrix Games |
|
|
245 | (2) |
|
6.12.4 NonZero Sum Random Matrix Games |
|
|
247 | (1) |
|
6.12.5.1 Relevance in Networking and Communication |
|
|
248 | (2) |
|
6.12.7 Evolutionary Random Matrix Games |
|
|
250 | (1) |
|
6.12.8 Learning in Random Matrix Games |
|
|
250 | (1) |
|
6.12.9 Mean-Variance Response |
|
|
251 | (2) |
|
6.12.11 Satisfactory Solution |
|
|
253 | (1) |
|
6.13 Mean-Variance Response and Demand Satisfaction |
|
|
253 | (2) |
|
7 Learning under Random Updates |
|
|
255 | (62) |
|
|
255 | (3) |
|
7.2 Description of the Random Update Model |
|
|
258 | (5) |
|
7.2.1 Description of the Dynamic Robust Game |
|
|
260 | (3) |
|
7.3 Fully Distributed Learning |
|
|
263 | (17) |
|
7.3.1 Distributed Strategy-Reinforcement Learning |
|
|
263 | (6) |
|
7.3.2 Random Number of Interacting Players |
|
|
269 | (4) |
|
7.3.3 CODIPAS-RL for Random Updates |
|
|
273 | (1) |
|
7.3.4 Learning Schemes Leading to Multi-Type Replicator Dynamics |
|
|
274 | (2) |
|
7.3.5 Heterogeneous Learning with Random Updates |
|
|
276 | (3) |
|
7.3.6 Constant Step-Size Random Updates |
|
|
279 | (1) |
|
7.3.7 Revision Protocols with Random Updates |
|
|
279 | (1) |
|
7.4 Dynamic Routing Games with Random Traffic |
|
|
280 | (2) |
|
|
282 | (10) |
|
7.5.1 Learning in Stochastic Games |
|
|
282 | (1) |
|
7.5.2.1 Nonconvergence of Fictitious Play |
|
|
283 | (1) |
|
7.5.2.3 Q-learning in Zero-Sum Stochastic Games |
|
|
284 | (2) |
|
7.5.3 Connection to Differential Dynamic Programming |
|
|
286 | (1) |
|
7.5.4 Learning in Robust Population Games |
|
|
286 | (1) |
|
7.5.4.1 Connection with Mean Field Game Dynamics |
|
|
286 | (5) |
|
7.5.5 Simulation of Population Games |
|
|
291 | (1) |
|
7.6 Mobility-Based Learning in Cognitive Radio Networks |
|
|
292 | (16) |
|
7.6.1 Proposed Cognitive Network Model |
|
|
295 | (1) |
|
7.6.2 Cognitive Radio Network Model |
|
|
296 | (1) |
|
7.6.2.1 Mobility of Users |
|
|
296 | (2) |
|
|
298 | (2) |
|
7.6.4 Virtual Received Power |
|
|
300 | (1) |
|
|
300 | (1) |
|
|
301 | (2) |
|
7.6.8 Performance of a Generic User |
|
|
303 | (1) |
|
7.6.8.1 Access Probability |
|
|
303 | (2) |
|
7.6.8.3 Coverage Probability |
|
|
305 | (3) |
|
7.7 Hybrid Strategic Learning |
|
|
308 | (4) |
|
7.7.1 Learning in a Simple Dynamic Game |
|
|
309 | (1) |
|
7.7.1.1 Learning Patterns |
|
|
309 | (1) |
|
7.7.1.2 Description of CODIPAS Patterns |
|
|
310 | (1) |
|
7.7.1.3 Asymptotics of Pure Learning Schemes |
|
|
311 | (1) |
|
7.7.1.4 Asymptotics of Hybrid Learning Schemes |
|
|
312 | (1) |
|
|
312 | (2) |
|
7.8.1 What is Wrong in Learning in Games? |
|
|
312 | (2) |
|
7.8.2 Learning the Action Space |
|
|
314 | (1) |
|
|
314 | (3) |
|
8 Fully Distributed Learning for Global Optima |
|
|
317 | (44) |
|
|
317 | (1) |
|
8.2 Resource Selection Games |
|
|
317 | (1) |
|
8.3 Frequency Selection Games |
|
|
318 | (17) |
|
8.3.1 Convergence to One of the Global Optima |
|
|
319 | (3) |
|
8.3.2 Symmetric Configuration and Evolutionarily Stable State |
|
|
322 | (1) |
|
8.3.3 Accelerating the Convergence Time |
|
|
323 | (1) |
|
8.3.4 Weighted Multiplicative imitative CODIPAS-RL |
|
|
324 | (5) |
|
8.3.5 Three Players and Two Frequencies |
|
|
329 | (1) |
|
|
329 | (1) |
|
8.3.5.2 Noisy Observation |
|
|
329 | (2) |
|
8.3.6 Similar Learning Rate |
|
|
331 | (1) |
|
|
332 | (1) |
|
8.3.8 Three Players and Three Frequencies |
|
|
332 | (1) |
|
8.3.9 Arbitrary Number of Users |
|
|
332 | (1) |
|
8.3.9.1 Global Optimization |
|
|
333 | (1) |
|
8.3.9.2 Equilibrium Analysis |
|
|
334 | (1) |
|
|
334 | (1) |
|
8.4 User-Centric Network Selection |
|
|
335 | (10) |
|
8.4.1 Architecture for 4G User-Centric Paradigm |
|
|
337 | (5) |
|
8.4.2 OPNET Simulation Setup |
|
|
342 | (2) |
|
|
344 | (1) |
|
8.5 Markov Chain Adjustment |
|
|
345 | (3) |
|
8.5.1 Transitions of the Markov Chains |
|
|
346 | (1) |
|
8.5.2 Selection of Efficient Outcomes |
|
|
347 | (1) |
|
8.6 Pareto Optimal Solutions |
|
|
348 | (13) |
|
8.6.1 Regular Perturbed Markov Process |
|
|
350 | (1) |
|
8.6.2 Stochastic Potential |
|
|
350 | (11) |
|
9 Learning in Risk-Sensitive Games |
|
|
361 | (56) |
|
|
361 | (7) |
|
|
363 | (2) |
|
9.1.2 Risk-Sensitive Strategic Learning |
|
|
365 | (1) |
|
9.1.3 Single State, Risk-Sensitive Game |
|
|
366 | (1) |
|
9.1.4 Risk-Sensitive Robust Games |
|
|
366 | (1) |
|
9.1.5 Risk-Sensitive Criterion in Wireless Networks |
|
|
367 | (1) |
|
9.2 Risk-Sensitive in Dynamic Environment |
|
|
368 | (8) |
|
9.2.1 Description of the Risk-Sensitive Dynamic Environment |
|
|
368 | (1) |
|
9.2.2 Description of the Risk-Sensitive Dynamic Game |
|
|
369 | (4) |
|
9.2.2.8 Two-by-Two Risk-Sensitive Games |
|
|
373 | (2) |
|
|
375 | (1) |
|
|
375 | (1) |
|
9.3 Risk-sensitive CODIPAS |
|
|
376 | (17) |
|
9.3.1 Learning the Risk-Sensitive Payoff |
|
|
376 | (2) |
|
9.3.2 Risk-Sensitive CODIPAS Patterns |
|
|
378 | (1) |
|
9.3.2.1 Bush-Mosteller based RS-CODIPAS |
|
|
378 | (1) |
|
9.3.2.2 Boltzmann-Gibbs-Based RS-CODIPAS |
|
|
378 | (1) |
|
9.3.2.3 Imitative BG CODIPAS |
|
|
379 | (1) |
|
9.3.2.4 Multiplicative Weighted Imitative CODIPAS |
|
|
379 | (1) |
|
9.3.2.5 Weakened Fictitious Play-Based CODIAPS |
|
|
380 | (1) |
|
9.3.2.6 Risk-Sensitive Payoff Learning |
|
|
380 | (1) |
|
9.3.3 Risk-Sensitive Pure Learning Schemes |
|
|
381 | (2) |
|
9.3.4 Risk-sensitive Hybrid Learning Scheme |
|
|
383 | (1) |
|
9.3.5 Convergence Results |
|
|
384 | (2) |
|
9.3.5.2 Convergence to Equilibria |
|
|
386 | (1) |
|
|
387 | (1) |
|
9.3.5.8 Explicit Solutions |
|
|
388 | (2) |
|
9.3.5.9 Composed Dynamics |
|
|
390 | (1) |
|
9.3.5.11 Non-Convergence to Unstable Rest Points |
|
|
391 | (1) |
|
9.3.5.13 Dulac Criterion for Convergence |
|
|
391 | (2) |
|
9.4 Risk-Sensitivity in Networking and Communications |
|
|
393 | (12) |
|
9.5 Risk-Sensitive Mean Field Learning |
|
|
405 | (4) |
|
|
409 | (4) |
|
9.6.1 Risk-Sensitive Correlated Equilibria |
|
|
409 | (1) |
|
9.6.2 Other Risk-Sensitive Formulations |
|
|
410 | (1) |
|
9.6.3 From Risk-Sensitive to Maximin Robust Games |
|
|
410 | (2) |
|
9.6.4 Mean-Variance Approach |
|
|
412 | (1) |
|
|
413 | (4) |
|
|
413 | (2) |
|
|
415 | (2) |
|
|
417 | (26) |
|
A.1 Basics of Dynamical Systems |
|
|
417 | (6) |
|
A.2 Basics of Stochastic Approximations |
|
|
423 | (15) |
|
A.3 Differential Inclusion |
|
|
438 | (4) |
|
|
442 | (1) |
Bibliography |
|
443 | (16) |
Index |
|
459 | |