Ιδρυματικό Αποθετήριο [SANDBOX]
Πολυτεχνείο Κρήτης

EN | EL

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

Είσοδος

Model–free least–squares policy iteration

Lagoudakis Michael, Parr, R.

Απλή Εγγραφή

URI	http://purl.tuc.gr/dl/dias/CDADBEEF-15F4-44B5-89B2-295FEC71FDAE	-
Αναγνωριστικό	http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.4345&rep=rep1&type=pdf	-
Γλώσσα	en	-
Μέγεθος	8 pages	en
Τίτλος	Model–free least–squares policy iteration	en
Δημιουργός	Lagoudakis Michael	en
Δημιουργός	Λαγουδακης Μιχαηλ	el
Δημιουργός	Parr, R.	en
Περίληψη	We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely o policy. We are motivated by the least squares temporal dierence learning algorithm (LSTD), which is known for its ecient use of sample experiences compared to pure temporal dierence algorithms. LSTD is ideal for prediction problems, however it heretofore has not had a straightforward application to control problems. Moreover, approximations learned by LSTD are strongly in uenced by the visitation distribution over states. Our new algorithm, Least-Squares Policy Iteration (LSPI) addresses these issues. The result is an o-policy method which can use (or reuse) data collected from any source. We test LSPI on several problems, including a bicycle simulator in which it learns to guide the bicycle to a goal eciently by merely observing a relatively small number of completely random trials.	en
Τύπος	Πλήρης Δημοσίευση σε Συνέδριο	el
Τύπος	Conference Full Paper	en
Άδεια Χρήσης	http://creativecommons.org/licenses/by/4.0/	en
Ημερομηνία	2015-11-14	-
Ημερομηνία Δημοσίευσης	2001	-
Θεματική Κατηγορία	Artificial Intelligence	en
Βιβλιογραφική Αναφορά	M. G. Lagoudakis and R. Parr. (2001, Dec.).Model–free least–squares policy iteration. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.4345&rep=rep1&type=pdf	en

Υπηρεσίες

Στατιστικά

Copyright © DIAS 2013