Το work with title On the locality of action domination in sequential decision making by Rachelson, Emmanuel, Lagoudakis Michael is licensed under Creative Commons Attribution 4.0 International
Bibliographic Citation
E. Rachelson and Michail G. Lagoudakis. (2010, Jan.). On the locality of action domination in sequential decision making. Presented at 11th International Symposium on Artificial Intelligence and Mathematics (ISAIM). [Online]. Available: http://www.researchgate.net/profile/Emmanuel_Rachelson/publication/221186156_On_the_locality_of_action_domination_in_sequential_decision_making/links/0fcfd5051c4eaad94f000000.pdf
In the field of sequential decision making and reinforcementlearning, it has been observed that good policies for mostproblems exhibit a significant amount of structure. In practice,this implies that when a learning agent discovers an actionis better than any other in a given state, this action actuallyhappens to also dominate in a certain neighbourhoodaround that state. This paper presents new results provingthat this notion of locality in action domination can be linkedto the smoothness of the environment’s underlying stochasticmodel. Namely, we link the Lipschitz continuity of a MarkovDecision Process to the Lispchitz continuity of its policies’value functions and introduce the key concept of influence radiusto describe the neighbourhood of states where the dominatingaction is guaranteed to be constant. These ideas aredirectly exploited into the proposed Localized Policy Iteration(LPI) algorithm, which is an active learning version ofRollout-based Policy Iteration. Preliminary results on the InvertedPendulum domain demonstrate the viability and thepotential of the proposed approach.