<efrbr:recordSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:efrbr="http://vfrbr.info/efrbr/1.1" xmlns:efrbr-work="http://vfrbr.info/efrbr/1.1/work" xmlns:efrbr-expression="http://vfrbr.info/efrbr/1.1/expression" xmlns:efrbr-manifestation="http://vfrbr.info/efrbr/1.1/manifestation" xmlns:efrbr-person="http://vfrbr.info/efrbr/1.1/person" xmlns:efrbr-corporateBody="http://vfrbr.info/efrbr/1.1/corporateBody" xmlns:efrbr-concept="http://vfrbr.info/efrbr/1.1/concept" xmlns:efrbr-structure="http://vfrbr.info/efrbr/1.1/structure" xmlns:efrbr-responsible="http://vfrbr.info/efrbr/1.1/responsible" xmlns:efrbr-subject="http://vfrbr.info/efrbr/1.1/subject" xmlns:efrbr-other="http://vfrbr.info/efrbr/1.1/other" xsi:schemaLocation="http://vfrbr.info/efrbr/1.1 http://vfrbr.info/schemas/1.1/efrbr.xsd"><efrbr:entities><efrbr-work:work identifier="http://purl.tuc.gr/dl/dias/7D85C6DD-512E-4EBF-8560-2C809AE30E19"><efrbr-work:titleOfTheWork>Directed exploration of policy space using support vector classifiers</efrbr-work:titleOfTheWork></efrbr-work:work><efrbr-expression:expression identifier="http://purl.tuc.gr/dl/dias/7D85C6DD-512E-4EBF-8560-2C809AE30E19"><efrbr-expression:titleOfTheExpression>Directed exploration of policy space using support vector classifiers</efrbr-expression:titleOfTheExpression><efrbr-expression:formOfExpression vocabulary="DIAS:TYPES">
            Πλήρης Δημοσίευση σε Συνέδριο
            Conference Full Paper
         </efrbr-expression:formOfExpression><efrbr-expression:dateOfExpression type="issued">2015-11-13</efrbr-expression:dateOfExpression><efrbr-expression:dateOfExpression type="published">2011</efrbr-expression:dateOfExpression><efrbr-expression:languageOfExpression vocabulary="iso639-1">en</efrbr-expression:languageOfExpression><efrbr-expression:summarizationOfContent>Good policies in reinforcement learning problems typically exhibit significant structure. Several recent learning approaches based on the approximate policy iteration scheme suggest the use of classifiers for capturing this structure and representing policies compactly. Nevertheless, the space of possible policies, even under such structured representations, is huge and needs to be explored carefully to avoid computationally expensive simulations (rollouts) needed to probe the improved policy and obtain training samples at various points over the state space. Regarding rollouts as a scarce resource, we propose a method for directed exploration of policy space using support vector classifiers. We use a collection of binary support vector classifiers to represent policies, whereby each of these classifiers corresponds to a single action and captures the parts of the state space where this action dominates over the other actions. After an initial training phase with rollouts uniformly distributed over the entire state space, we use the support vectors of the classifiers to identify the critical parts of the state space with boundaries between different action choices in the represented policy. The policy is subsequently improved by probing the state space only at points around the support vectors that are distributed perpendicularly to the separating border. This directed focus on critical parts of the state space iteratively leads to the gradual refinement and improvement of the underlying policy and delivers excellent control policies in only a few iterations with a conservative use of rollouts. We demonstrate the proposed approach on three standard reinforcement learning domains: inverted pendulum, mountain car, and acrobot.</efrbr-expression:summarizationOfContent><efrbr-expression:useRestrictionsOnTheExpression type="creative-commons">http://creativecommons.org/licenses/by/4.0/</efrbr-expression:useRestrictionsOnTheExpression><efrbr-expression:note type="page range">112–119</efrbr-expression:note><efrbr-expression:note type="conference name">2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning</efrbr-expression:note><efrbr-expression:note type="proceedings title">Proceedings of the 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), Paris, France,</efrbr-expression:note></efrbr-expression:expression><efrbr-person:person identifier="http://users.isc.tuc.gr/~irexakis"><efrbr-person:nameOfPerson vocabulary="TUC:LDAP">
            Rexakis Ioannis
            Ρεξακης Ιωαννης
         </efrbr-person:nameOfPerson></efrbr-person:person><efrbr-person:person identifier="http://users.isc.tuc.gr/~lagoudakis"><efrbr-person:nameOfPerson vocabulary="TUC:LDAP">
            Lagoudakis Michael
            Λαγουδακης Μιχαηλ
         </efrbr-person:nameOfPerson></efrbr-person:person><efrbr-corporateBody:corporateBody identifier="http://www.ieee.org/index.html"><efrbr-corporateBody:nameOfTheCorporateBody vocabulary="S/R:PUBLISHERS">
            Institute of Electrical and Electronics Engineers
         </efrbr-corporateBody:nameOfTheCorporateBody></efrbr-corporateBody:corporateBody><efrbr-concept:concept identifier="6FE2C064-E272-41E3-A29D-879B0F58EC48"><efrbr-concept:termForTheConcept>
            reinforcement learning
         </efrbr-concept:termForTheConcept></efrbr-concept:concept></efrbr:entities><efrbr:relationships><efrbr-structure:structureRelations><efrbr-structure:realizedThrough sourceEntity="work" targetEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/7D85C6DD-512E-4EBF-8560-2C809AE30E19" targetURI="http://purl.tuc.gr/dl/dias/7D85C6DD-512E-4EBF-8560-2C809AE30E19"/></efrbr-structure:structureRelations><efrbr-responsible:responsibleRelations><efrbr-responsible:createdBy sourceEntity="work" targetEntity="person" sourceURI="http://purl.tuc.gr/dl/dias/7D85C6DD-512E-4EBF-8560-2C809AE30E19" targetURI="http://users.isc.tuc.gr/~irexakis"/><efrbr-responsible:realizedBy sourceEntity="expression" role="author" targetEntity="person" sourceURI="http://purl.tuc.gr/dl/dias/7D85C6DD-512E-4EBF-8560-2C809AE30E19" targetURI="http://users.isc.tuc.gr/~irexakis"/><efrbr-responsible:realizedBy sourceEntity="expression" role="author" targetEntity="person" sourceURI="http://purl.tuc.gr/dl/dias/7D85C6DD-512E-4EBF-8560-2C809AE30E19" targetURI="http://users.isc.tuc.gr/~lagoudakis"/><efrbr-responsible:realizedBy sourceEntity="expression" role="publisher" targetEntity="person" sourceURI="http://purl.tuc.gr/dl/dias/7D85C6DD-512E-4EBF-8560-2C809AE30E19" targetURI="http://www.ieee.org/index.html"/></efrbr-responsible:responsibleRelations><efrbr-subject:subjectRelations><efrbr-subject:hasSubject sourceEntity="work" targetEntity="concept" sourceURI="http://purl.tuc.gr/dl/dias/7D85C6DD-512E-4EBF-8560-2C809AE30E19" targetURI="6FE2C064-E272-41E3-A29D-879B0F58EC48"/></efrbr-subject:subjectRelations><efrbr-other:otherRelations/></efrbr:relationships></efrbr:recordSet>