Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Value function approximation in zero–sum Markov games

Lagoudakis Michael, Parr,R.

Full record


URI: http://purl.tuc.gr/dl/dias/2F95F669-B215-44BD-90AF-6176BD490AA9
Year 2002
Type of Item Conference Full Paper
License
Details
Bibliographic Citation M.G. Lagoudakis and R. Parr. (2002, Aug.). Value function approximation in zero–sum Markov games. [Online]. Available: http://arxiv.org/ftp/arxiv/papers/1301/1301.0580.pdf
Appears in Collections

Summary

This paper investigates value function approximationin the context of zero-sum Markovgames, which can be viewed as a generalizationof the Markov decision process (MDP) frameworkto the two-agent case. We generalize errorbounds from MDPs to Markov games anddescribe generalizations of reinforcement learningalgorithms to Markov games. We presenta generalization of the optimal stopping problemto a two-player simultaneous move Markovgame. For this special problem, we providestronger bounds and can guarantee convergencefor LSTD and temporal difference learning withlinear value function approximation. We demonstratethe viability of value function approximationfor Markov games by using the Least squarespolicy iteration (LSPI) algorithm to learn goodpolicies for a soccer domain and a flow controlproblem.

Services

Statistics