Ιδρυματικό Αποθετήριο [SANDBOX]
Πολυτεχνείο Κρήτης
EN  |  EL

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

On the locality of action domination in sequential decision making

Rachelson, Emmanuel, Lagoudakis Michael

Πλήρης Εγγραφή


URI: http://purl.tuc.gr/dl/dias/E0292307-A486-42F6-A1D4-8BF6498753E2
Έτος 2010
Τύπος Πλήρης Δημοσίευση σε Συνέδριο
Άδεια Χρήσης
Λεπτομέρειες
Βιβλιογραφική Αναφορά E. Rachelson and Michail G. Lagoudakis. (2010, Jan.). On the locality of action domination in sequential decision making. Presented at 11th International Symposium on Artificial Intelligence and Mathematics (ISAIM). [Online]. Available: http://www.researchgate.net/profile/Emmanuel_Rachelson/publication/221186156_On_the_locality_of_action_domination_in_sequential_decision_making/links/0fcfd5051c4eaad94f000000.pdf
Εμφανίζεται στις Συλλογές

Περίληψη

In the field of sequential decision making and reinforcementlearning, it has been observed that good policies for mostproblems exhibit a significant amount of structure. In practice,this implies that when a learning agent discovers an actionis better than any other in a given state, this action actuallyhappens to also dominate in a certain neighbourhoodaround that state. This paper presents new results provingthat this notion of locality in action domination can be linkedto the smoothness of the environment’s underlying stochasticmodel. Namely, we link the Lipschitz continuity of a MarkovDecision Process to the Lispchitz continuity of its policies’value functions and introduce the key concept of influence radiusto describe the neighbourhood of states where the dominatingaction is guaranteed to be constant. These ideas aredirectly exploited into the proposed Localized Policy Iteration(LPI) algorithm, which is an active learning version ofRollout-based Policy Iteration. Preliminary results on the InvertedPendulum domain demonstrate the viability and thepotential of the proposed approach.

Υπηρεσίες

Στατιστικά