Το έργο με τίτλο Value function approximation in zero–sum Markov games από τον/τους δημιουργό/ούς Lagoudakis Michael, Parr,R. διατίθεται με την άδεια Creative Commons Αναφορά Δημιουργού 4.0 Διεθνές
Βιβλιογραφική Αναφορά
M.G. Lagoudakis and R. Parr. (2002, Aug.). Value function approximation in zero–sum Markov games. [Online]. Available: http://arxiv.org/ftp/arxiv/papers/1301/1301.0580.pdf
This paper investigates value function approximationin the context of zero-sum Markovgames, which can be viewed as a generalizationof the Markov decision process (MDP) frameworkto the two-agent case. We generalize errorbounds from MDPs to Markov games anddescribe generalizations of reinforcement learningalgorithms to Markov games. We presenta generalization of the optimal stopping problemto a two-player simultaneous move Markovgame. For this special problem, we providestronger bounds and can guarantee convergencefor LSTD and temporal difference learning withlinear value function approximation. We demonstratethe viability of value function approximationfor Markov games by using the Least squarespolicy iteration (LSPI) algorithm to learn goodpolicies for a soccer domain and a flow controlproblem.