<efrbr:recordSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:efrbr="http://vfrbr.info/efrbr/1.1" xmlns:efrbr-work="http://vfrbr.info/efrbr/1.1/work" xmlns:efrbr-expression="http://vfrbr.info/efrbr/1.1/expression" xmlns:efrbr-manifestation="http://vfrbr.info/efrbr/1.1/manifestation" xmlns:efrbr-person="http://vfrbr.info/efrbr/1.1/person" xmlns:efrbr-corporateBody="http://vfrbr.info/efrbr/1.1/corporateBody" xmlns:efrbr-concept="http://vfrbr.info/efrbr/1.1/concept" xmlns:efrbr-structure="http://vfrbr.info/efrbr/1.1/structure" xmlns:efrbr-responsible="http://vfrbr.info/efrbr/1.1/responsible" xmlns:efrbr-subject="http://vfrbr.info/efrbr/1.1/subject" xmlns:efrbr-other="http://vfrbr.info/efrbr/1.1/other" xsi:schemaLocation="http://vfrbr.info/efrbr/1.1 http://vfrbr.info/schemas/1.1/efrbr.xsd"><efrbr:entities><efrbr-work:work identifier="http://purl.tuc.gr/dl/dias/2E55B7D4-6FCA-4907-8055-F24FEEF56CC9"><efrbr-work:titleOfTheWork>Classifier-based policy representation</efrbr-work:titleOfTheWork></efrbr-work:work><efrbr-expression:expression identifier="http://purl.tuc.gr/dl/dias/2E55B7D4-6FCA-4907-8055-F24FEEF56CC9"><efrbr-expression:titleOfTheExpression>Classifier-based policy representation</efrbr-expression:titleOfTheExpression><efrbr-expression:formOfExpression vocabulary="DIAS:TYPES">
            Πλήρης Δημοσίευση σε Συνέδριο
            Conference Full Paper
         </efrbr-expression:formOfExpression><efrbr-expression:dateOfExpression type="issued">2015-11-13</efrbr-expression:dateOfExpression><efrbr-expression:dateOfExpression type="published">2008</efrbr-expression:dateOfExpression><efrbr-expression:languageOfExpression vocabulary="iso639-1">en</efrbr-expression:languageOfExpression><efrbr-expression:summarizationOfContent>Motivated by recent proposals that view a reinforcement learning problem as a collection of classification problems, we investigate various aspects of policy representation using classifiers. In particular, we derive optimal policies for two standard reinforcement learning domains (inverted pendulum and mountain car) in both deterministic and stochastic versions and we examine their internal structure. We then proceed in an evaluation of the representational ability of a variety of classifiers for these policies, using both a multi-class and a binary formulation of the classification problem. Finally, we evaluate the actual performance of the policies learned by the classifiers in the original control problem as a function of the amount of training examples provided. Our results offer significant insight in making the reinforcement-learning-via-classification technology successfully applicable to hard learning problems.</efrbr-expression:summarizationOfContent><efrbr-expression:useRestrictionsOnTheExpression type="creative-commons">http://creativecommons.org/licenses/by/4.0/</efrbr-expression:useRestrictionsOnTheExpression><efrbr-expression:note type="page range">91–98</efrbr-expression:note><efrbr-expression:note type="conference name">2008 IEEE International Conference on Machine Learning and Applications</efrbr-expression:note><efrbr-expression:note type="proceedings title">Proceedings of the 2008 IEEE International Conference on Machine Learning and Applications (ICMLA), San Diego, CA, USA, December 2008</efrbr-expression:note></efrbr-expression:expression><efrbr-person:person identifier="http://users.isc.tuc.gr/~irexakis"><efrbr-person:nameOfPerson vocabulary="TUC:LDAP">
            Rexakis Ioannis
            Ρεξακης Ιωαννης
         </efrbr-person:nameOfPerson></efrbr-person:person><efrbr-person:person identifier="http://users.isc.tuc.gr/~lagoudakis"><efrbr-person:nameOfPerson vocabulary="TUC:LDAP">
            Lagoudakis Michael
            Λαγουδακης Μιχαηλ
         </efrbr-person:nameOfPerson></efrbr-person:person><efrbr-corporateBody:corporateBody identifier="http://www.ieee.org/index.html"><efrbr-corporateBody:nameOfTheCorporateBody vocabulary="S/R:PUBLISHERS">
            Institute of Electrical and Electronics Engineers
         </efrbr-corporateBody:nameOfTheCorporateBody></efrbr-corporateBody:corporateBody><efrbr-concept:concept identifier="7A8211F7-DF89-44DC-88D8-7220928103AB"><efrbr-concept:termForTheConcept>
            Machine Learning
         </efrbr-concept:termForTheConcept></efrbr-concept:concept></efrbr:entities><efrbr:relationships><efrbr-structure:structureRelations><efrbr-structure:realizedThrough sourceEntity="work" sourceURI="http://purl.tuc.gr/dl/dias/2E55B7D4-6FCA-4907-8055-F24FEEF56CC9" targetEntity="expression" targetURI="http://purl.tuc.gr/dl/dias/2E55B7D4-6FCA-4907-8055-F24FEEF56CC9"/></efrbr-structure:structureRelations><efrbr-responsible:responsibleRelations><efrbr-responsible:createdBy sourceEntity="work" sourceURI="http://purl.tuc.gr/dl/dias/2E55B7D4-6FCA-4907-8055-F24FEEF56CC9" targetEntity="person" targetURI="http://users.isc.tuc.gr/~irexakis"/><efrbr-responsible:realizedBy sourceEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/2E55B7D4-6FCA-4907-8055-F24FEEF56CC9" targetEntity="person" targetURI="http://users.isc.tuc.gr/~irexakis" role="author"/><efrbr-responsible:realizedBy sourceEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/2E55B7D4-6FCA-4907-8055-F24FEEF56CC9" targetEntity="person" targetURI="http://users.isc.tuc.gr/~lagoudakis" role="author"/><efrbr-responsible:realizedBy sourceEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/2E55B7D4-6FCA-4907-8055-F24FEEF56CC9" targetEntity="person" targetURI="http://www.ieee.org/index.html" role="publisher"/></efrbr-responsible:responsibleRelations><efrbr-subject:subjectRelations><efrbr-subject:hasSubject sourceEntity="work" sourceURI="http://purl.tuc.gr/dl/dias/2E55B7D4-6FCA-4907-8055-F24FEEF56CC9" targetEntity="concept" targetURI="7A8211F7-DF89-44DC-88D8-7220928103AB"/></efrbr-subject:subjectRelations><efrbr-other:otherRelations/></efrbr:relationships></efrbr:recordSet>