C. Guestrin, M. G. Lagoudakis. (2002, July).Coordinated reinforcement learning. [Online]. Available: http://www.cs.berkeley.edu/~russell/classes/cs294/f05/papers/guestrin+al-2002.pdf
We present several new algorithms for multiagentreinforcement learning. A common feature of thesealgorithms is a parameterized, structured representationof a policy or value function. This structureis leveraged in an approach we call coordinated reinforcementlearning, by which agents coordinateboth their action selection activities and their parameterupdates. Within the limits of our parametricrepresentations, the agents will determinea jointly optimal action without explicitly consideringevery possible action in their exponentiallylarge joint action space. Our methods differ frommany previous reinforcement learning approachesto multiagent coordination in that structured communicationand coordination between agents appearsat the core of both the learning algorithm andthe execution architecture. Our experimental results,comparing our approach to other RL methods,illustrate both the quality of the policies obtainedand the additional benefits of coordination.