Foreword |
|
v | |
About the Editors |
|
vii | |
About the Contributors |
|
ix | |
1 Scalability of Multiagent Reinforcement Learning |
|
1 | (18) |
|
|
|
|
|
1 | (2) |
|
1.2 Coordinating Q-Learning |
|
|
3 | (1) |
|
1.3 Negotiation-based MARL |
|
|
4 | (4) |
|
1.4 Accelerating MARL by Equilibrium Transfer |
|
|
8 | (4) |
|
1.5 MARL Using Knowledge Transfer |
|
|
12 | (8) |
|
1.5.1 Value Function Transfer |
|
|
12 | (1) |
|
1.5.2 Selected Value Function Transfer |
|
|
12 | (3) |
|
1.5.2.1 Evaluation of local environmental dynamics |
|
|
13 | (1) |
|
1.5.2.2 The SVFT algorithm |
|
|
14 | (1) |
|
1.5.3 Model Transfer-based Game Abstraction |
|
|
15 | (4) |
2 Centralization or Decentralization? A Compromising Solution Toward Coordination in Multiagent Systems |
|
19 | (24) |
|
|
|
|
|
|
|
|
20 | (2) |
|
|
22 | (1) |
|
2.3 The Proposed Hierarchical Learning Framework |
|
|
23 | (7) |
|
2.3.1 The Principle of the Learning Framework |
|
|
23 | (2) |
|
2.3.2 Generation of Supervision Policies |
|
|
25 | (2) |
|
2.3.3 Adaption of Local Learning Behaviors |
|
|
27 | (1) |
|
2.3.4 Price of Anarchy and Monarchy |
|
|
28 | (2) |
|
2.4 Experiments and Results |
|
|
30 | (9) |
|
|
39 | (2) |
|
|
41 | (1) |
|
|
42 | (1) |
3 Making Efficient Reputation-Aware Decisions in Multiagent Systems |
|
43 | (22) |
|
|
|
|
|
|
|
44 | (1) |
|
|
45 | (8) |
|
3.2.1 Problem Formulation |
|
|
45 | (4) |
|
3.2.2 An Efficient Distributed Decision-Making Approach |
|
|
49 | (4) |
|
|
53 | (10) |
|
3.3.1 Theoretical Analysis |
|
|
53 | (3) |
|
|
56 | (12) |
|
3.3.2.1 Experiment settings |
|
|
56 | (3) |
|
3.3.2.2 Simulation results |
|
|
59 | (4) |
|
|
63 | (1) |
|
|
64 | (1) |
4 Decision-Theoretic Planning in Partially Observable Environments |
|
65 | (26) |
|
|
|
|
66 | (2) |
|
4.2 Partially Observable Markov Decision Processes |
|
|
68 | (5) |
|
|
68 | (2) |
|
|
70 | (3) |
|
|
71 | (1) |
|
|
71 | (1) |
|
4.2.2.3 Policies and value functions |
|
|
72 | (1) |
|
4.3 Approaches to Offline Planning |
|
|
73 | (10) |
|
4.3.1 Exact Value Iteration |
|
|
74 | (2) |
|
4.3.2 Approximate Value Iteration |
|
|
76 | (2) |
|
4.3.2.1 Methods for initializing the value function |
|
|
77 | (1) |
|
4.3.3 Point-based Value Iteration Methods |
|
|
78 | (5) |
|
4.4 Approaches to Online Planning |
|
|
83 | (4) |
|
|
84 | (1) |
|
|
84 | (1) |
|
4.4.3 Monte Carlo Tree Search |
|
|
85 | (2) |
|
4.5 Covering-Number-based Planning Theories |
|
|
87 | (2) |
|
|
87 | (1) |
|
4.5.2 Complexity of Approximate Planning |
|
|
88 | (1) |
|
|
89 | (2) |
5 Multiagent Reinforcement Learning Algorithms Based on Gradient Ascent Policy |
|
91 | (16) |
|
|
|
|
|
|
92 | (1) |
|
5.2 Gradient Ascent Algorithms |
|
|
93 | (11) |
|
5.2.1 The Original Gradient Ascent Algorithm: Infinitesimal Gradient Ascent (IGA) |
|
|
94 | (2) |
|
5.2.2 Algorithms Improving the Convergence Properties of IGA |
|
|
96 | (4) |
|
5.2.3 Algorithms Improving Social Welfare of IGA |
|
|
100 | (4) |
|
5.3 Dynamics of GA-MARL Algorithms |
|
|
104 | (2) |
|
|
106 | (1) |
6 Task Allocation in Multiagent Systems: A Survey of Some Interesting Aspects |
|
107 | (42) |
|
|
|
|
|
|
108 | (3) |
|
6.2 Taxonomy Study on Task Allocation |
|
|
111 | (5) |
|
6.2.1 Taxonomy Study for MultiRobot Task Allocation |
|
|
112 | (4) |
|
6.2.2 Taxonomy Study of Other Subfields |
|
|
116 | (1) |
|
6.3 Allocating Constrained or Complex Tasks |
|
|
116 | (6) |
|
6.3.1 Allocating Tasks with Constraints |
|
|
117 | (2) |
|
6.3.2 Allocating Complex Tasks |
|
|
119 | (3) |
|
6.4 Task Allocation for Rational Agents |
|
|
122 | (7) |
|
6.4.1 Task Allocation via VCG-based Mechanisms |
|
|
124 | (2) |
|
6.4.2 Task Allocation Based on Optimal Mechanisms |
|
|
126 | (1) |
|
6.4.3 Task Allocation Based on Online Mechanisms |
|
|
127 | (2) |
|
6.5 Task Allocation for Networked Systems |
|
|
129 | (5) |
|
6.5.1 Task Allocation in Social Networks |
|
|
129 | (3) |
|
6.5.2 Task Allocation in Wireless Sensor Networks |
|
|
132 | (2) |
|
6.5.3 Other Researches for Networked Task Allocation |
|
|
134 | (1) |
|
6.6 Distributed Task Allocation |
|
|
134 | (8) |
|
6.6.1 The Contract Net Protocol |
|
|
135 | (2) |
|
6.6.2 Market-based Distributed Task Allocation |
|
|
137 | (2) |
|
6.6.3 Distributed Task Allocation via Coalition Formation |
|
|
139 | (2) |
|
6.6.4 Centralized and Distributed Model: Trade-offs |
|
|
141 | (1) |
|
6.7 Dynamic Task Allocation |
|
|
142 | (4) |
|
6.7.1 Allocating Dynamic Tasks |
|
|
142 | (3) |
|
6.7.2 Task Allocation for Dynamic Agents |
|
|
145 | (1) |
|
|
146 | (3) |
7 Automated Negotiation: An Efficient Approach to Interaction Among Agents |
|
149 | (30) |
|
|
|
|
150 | (1) |
|
7.2 Automated Negotiation |
|
|
150 | (11) |
|
|
152 | (2) |
|
7.2.1.1 Single-issue versus multiissue negotiations |
|
|
152 | (1) |
|
7.2.1.2 Bilateral versus multilateral negotiations |
|
|
152 | (1) |
|
7.2.1.3 Sequential versus concurrent negotiations |
|
|
153 | (1) |
|
7.2.1.4 Complete versus incomplete information |
|
|
153 | (1) |
|
7.2.2 Negotiation Protocol |
|
|
154 | (1) |
|
7.2.2.1 Simultaneous offers |
|
|
154 | (1) |
|
7.2.2.2 Alternating offers |
|
|
155 | (1) |
|
7.2.3 Negotiation Approaches |
|
|
155 | (6) |
|
7.2.3.1 Heuristic approaches |
|
|
156 | (2) |
|
7.2.3.2 Game theoretic approaches |
|
|
158 | (2) |
|
|
160 | (1) |
|
7.3 Characters of Complex Practical Negotiation |
|
|
161 | (3) |
|
7.3.1 Zero Prior Opponent Knowledge |
|
|
161 | (1) |
|
7.3.2 Continuous-Time Constraints |
|
|
161 | (1) |
|
7.3.3 Discounting Effect and Reservation Value |
|
|
162 | (2) |
|
|
164 | (12) |
|
7.4.1 Agents Based on Regression Techniques |
|
|
164 | (5) |
|
7.4.2 Agents Based on Transfer Learning |
|
|
169 | (7) |
|
|
176 | (1) |
|
|
177 | (2) |
8 Norm Emergence in Multiagent Systems |
|
179 | (28) |
|
|
|
|
|
|
180 | (2) |
|
8.2 Norm Emergence Approaches |
|
|
182 | (16) |
|
8.2.1 Top-Down Approaches |
|
|
183 | (4) |
|
8.2.2 Bottom-Up Approaches |
|
|
187 | (5) |
|
8.2.3 Hierarchical Approaches |
|
|
192 | (6) |
|
8.3 The Influence of Fixed-Strategy Agents on Norm Emergence |
|
|
198 | (8) |
|
8.3.1 Introduction of Fixed-Strategy Agents |
|
|
199 | (2) |
|
8.3.2 The Influence of Fixed-Strategy Agents on Norm Adoption |
|
|
201 | (3) |
|
8.3.3 The Influence of the Placement Heuristics of Fixed-Strategy Agents |
|
|
204 | (1) |
|
8.3.4 The Influence of Late Intervention of Fixed-Strategy Agents |
|
|
205 | (1) |
|
|
206 | (1) |
9 Diffusion Convergence in the Collective Interactions of Large-scale Multiagent Systems |
|
207 | (22) |
|
|
|
|
|
|
208 | (1) |
|
9.2 Diffusion Convergence of Collective Behaviors in MAS |
|
|
209 | (1) |
|
9.2.1 Collective Behaviors in MAS |
|
|
209 | (1) |
|
9.2.2 Diffusion Convergence |
|
|
209 | (1) |
|
9.3 Structured Diffusion Convergence versus Non-structured Diffusion Convergence |
|
|
210 | (7) |
|
9.3.1 Structured Diffusion Convergence |
|
|
211 | (3) |
|
9.3.2 Non-structured Diffusion Convergence |
|
|
214 | (3) |
|
9.3.3 The Comparison and Analysis of the Two Diffusion Mechanisms |
|
|
217 | (1) |
|
9.4 Homogeneous Diffusion Convergence versus Heterogeneous Diffusion Convergence |
|
|
217 | (5) |
|
9.4.1 Homogeneous Diffusion Convergence |
|
|
217 | (2) |
|
9.4.2 Heterogeneous Diffusion Convergence |
|
|
219 | (3) |
|
9.4.2.1 Agent's heterogeneity |
|
|
220 | (1) |
|
9.4.2.2 Interaction's heterogeneity |
|
|
221 | (1) |
|
9.4.3 The Comparison and Analysis of the Two Diffusion Mechanisms |
|
|
222 | (1) |
|
9.5 Neighboring Diffusion Convergence versus Global Diffusion Convergence |
|
|
222 | (5) |
|
9.5.1 Neighboring Diffusion Convergence |
|
|
222 | (2) |
|
9.5.2 Global Diffusion Convergence |
|
|
224 | (3) |
|
9.5.3 The Comparison and Analysis of the Two Diffusion Mechanisms |
|
|
227 | (1) |
|
|
227 | (2) |
10 Incorporating Inference into Online Planning in Multiagent Settings |
|
229 | (35) |
|
|
|
|
|
|
229 | (4) |
|
10.2 Individual Decision Making Frameworks |
|
|
233 | (13) |
|
10.2.1 Interactive Dynamic Influence Diagrams |
|
|
234 | (8) |
|
|
242 | (4) |
|
10.3 Online Planning with Limited Model Space |
|
|
246 | (6) |
|
|
246 | (3) |
|
10.3.2 Most Probable Model Selection |
|
|
249 | (3) |
|
10.4 Savings and PAC Bound |
|
|
252 | (3) |
|
10.5 Experimental Results |
|
|
255 | (6) |
|
|
261 | (2) |
|
|
263 | (1) |
Acknowledgments |
|
264 | (1) |
Bibliography |
|
265 | (30) |
Index |
|
295 | |