Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Model–free least–squares policy iteration

Lagoudakis Michael, Parr, R.

Full record


URI: http://purl.tuc.gr/dl/dias/CDADBEEF-15F4-44B5-89B2-295FEC71FDAE
Year 2001
Type of Item Conference Full Paper
License
Details
Bibliographic Citation M. G. Lagoudakis and R. Parr. (2001, Dec.).Model–free least–squares policy iteration. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.4345&rep=rep1&type=pdf
Appears in Collections

Summary

We propose a new approach to reinforcement learning which combinesleast squares function approximation with policy iteration. Ourmethod is model-free and completely o policy. We are motivatedby the least squares temporal dierence learning algorithm (LSTD),which is known for its ecient use of sample experiences comparedto pure temporal dierence algorithms. LSTD is ideal for predictionproblems, however it heretofore has not had a straightforward applicationto control problems. Moreover, approximations learned by LSTDare strongly inuenced by the visitation distribution over states. Ournew algorithm, Least-Squares Policy Iteration (LSPI) addresses theseissues. The result is an o-policy method which can use (or reuse)data collected from any source. We test LSPI on several problems,including a bicycle simulator in which it learns to guide the bicycleto a goal eciently by merely observing a relatively small number ofcompletely random trials.

Services

Statistics