Value function approximation in zero–sum Markov games

Lagoudakis Michael, Parr,R.

Πλήρης Εγγραφή

URI:

http://purl.tuc.gr/dl/dias/2F95F669-B215-44BD-90AF-6176BD490AA9

Έτος

2002

Τύπος

Πλήρης Δημοσίευση σε Συνέδριο

Άδεια Χρήσης

Λεπτομέρειες

Βιβλιογραφική Αναφορά

M.G. Lagoudakis and R. Parr. (2002, Aug.). Value function approximation in zero–sum Markov games. [Online]. Available: http://arxiv.org/ftp/arxiv/papers/1301/1301.0580.pdf

Εμφανίζεται στις Συλλογές

Δημοσιεύσεις σε Συνέδρια στην Κοινότητα Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Περίληψη

This paper investigates value function approximationin the context of zero-sum Markovgames, which can be viewed as a generalizationof the Markov decision process (MDP) frameworkto the two-agent case. We generalize errorbounds from MDPs to Markov games anddescribe generalizations of reinforcement learningalgorithms to Markov games. We presenta generalization of the optimal stopping problemto a two-player simultaneous move Markovgame. For this special problem, we providestronger bounds and can guarantee convergencefor LSTD and temporal difference learning withlinear value function approximation. We demonstratethe viability of value function approximationfor Markov games by using the Least squarespolicy iteration (LSPI) algorithm to learn goodpolicies for a soccer domain and a flow controlproblem.

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

Value function approximation in zero–sum Markov games

Lagoudakis Michael, Parr,R.

Περίληψη

Υπηρεσίες

Εξαγωγή

Κοινοποίηση

Στατιστικά

Μεταδεδομένων & Περιεχομένου σε METS:

Μεταδεδομένων σε Μορφότυπο: