Value function approximation in zero–sum Markov games

Lagoudakis Michael, Parr,R.

Full record

URI:

http://purl.tuc.gr/dl/dias/2F95F669-B215-44BD-90AF-6176BD490AA9

Year

2002

Type of Item

Conference Full Paper

License

Details

Bibliographic Citation

M.G. Lagoudakis and R. Parr. (2002, Aug.). Value function approximation in zero–sum Markov games. [Online]. Available: http://arxiv.org/ftp/arxiv/papers/1301/1301.0580.pdf

Appears in Collections

Conference Publications in Community School of Electrical and Computer Engineering

Summary

This paper investigates value function approximationin the context of zero-sum Markovgames, which can be viewed as a generalizationof the Markov decision process (MDP) frameworkto the two-agent case. We generalize errorbounds from MDPs to Markov games anddescribe generalizations of reinforcement learningalgorithms to Markov games. We presenta generalization of the optimal stopping problemto a two-player simultaneous move Markovgame. For this special problem, we providestronger bounds and can guarantee convergencefor LSTD and temporal difference learning withlinear value function approximation. We demonstratethe viability of value function approximationfor Markov games by using the Least squarespolicy iteration (LSPI) algorithm to learn goodpolicies for a soccer domain and a flow controlproblem.

Search

Browse

My Space

Value function approximation in zero–sum Markov games

Lagoudakis Michael, Parr,R.

Summary

Services

Export

Share

Statistics

Metadata & Content in a METS Package:

Metadata in Format: