URI | http://purl.tuc.gr/dl/dias/CDADBEEF-15F4-44B5-89B2-295FEC71FDAE | - |
Αναγνωριστικό | http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.4345&rep=rep1&type=pdf | - |
Γλώσσα | en | - |
Μέγεθος | 8 pages | en |
Τίτλος | Model–free least–squares policy iteration | en |
Δημιουργός | Lagoudakis Michael | en |
Δημιουργός | Λαγουδακης Μιχαηλ | el |
Δημιουργός | Parr, R. | en |
Περίληψη | We propose a new approach to reinforcement learning which combines
least squares function approximation with policy iteration. Our
method is model-free and completely o policy. We are motivated
by the least squares temporal dierence learning algorithm (LSTD),
which is known for its ecient use of sample experiences compared
to pure temporal dierence algorithms. LSTD is ideal for prediction
problems, however it heretofore has not had a straightforward application
to control problems. Moreover, approximations learned by LSTD
are strongly in
uenced by the visitation distribution over states. Our
new algorithm, Least-Squares Policy Iteration (LSPI) addresses these
issues. The result is an o-policy method which can use (or reuse)
data collected from any source. We test LSPI on several problems,
including a bicycle simulator in which it learns to guide the bicycle
to a goal eciently by merely observing a relatively small number of
completely random trials.
| en |
Τύπος | Πλήρης Δημοσίευση σε Συνέδριο | el |
Τύπος | Conference Full Paper | en |
Άδεια Χρήσης | http://creativecommons.org/licenses/by/4.0/ | en |
Ημερομηνία | 2015-11-14 | - |
Ημερομηνία Δημοσίευσης | 2001 | - |
Θεματική Κατηγορία | Artificial Intelligence | en |
Βιβλιογραφική Αναφορά | M. G. Lagoudakis and R. Parr. (2001, Dec.).Model–free least–squares policy iteration. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.4345&rep=rep1&type=pdf | en |