<efrbr:recordSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:efrbr="http://vfrbr.info/efrbr/1.1" xmlns:efrbr-work="http://vfrbr.info/efrbr/1.1/work" xmlns:efrbr-expression="http://vfrbr.info/efrbr/1.1/expression" xmlns:efrbr-manifestation="http://vfrbr.info/efrbr/1.1/manifestation" xmlns:efrbr-person="http://vfrbr.info/efrbr/1.1/person" xmlns:efrbr-corporateBody="http://vfrbr.info/efrbr/1.1/corporateBody" xmlns:efrbr-concept="http://vfrbr.info/efrbr/1.1/concept" xmlns:efrbr-structure="http://vfrbr.info/efrbr/1.1/structure" xmlns:efrbr-responsible="http://vfrbr.info/efrbr/1.1/responsible" xmlns:efrbr-subject="http://vfrbr.info/efrbr/1.1/subject" xmlns:efrbr-other="http://vfrbr.info/efrbr/1.1/other" xsi:schemaLocation="http://vfrbr.info/efrbr/1.1 http://vfrbr.info/schemas/1.1/efrbr.xsd"><efrbr:entities><efrbr-work:work identifier="http://purl.tuc.gr/dl/dias/3FB923DA-4C00-4B8D-B671-06DDC7E38ACF"><efrbr-work:titleOfTheWork>Efficient reinforcement learning in adversarial games</efrbr-work:titleOfTheWork></efrbr-work:work><efrbr-expression:expression identifier="http://purl.tuc.gr/dl/dias/3FB923DA-4C00-4B8D-B671-06DDC7E38ACF"><efrbr-expression:titleOfTheExpression>Efficient reinforcement learning in adversarial games</efrbr-expression:titleOfTheExpression><efrbr-expression:formOfExpression vocabulary="DIAS:TYPES">
            Πλήρης Δημοσίευση σε Συνέδριο
            Conference Full Paper
         </efrbr-expression:formOfExpression><efrbr-expression:dateOfExpression type="issued">2015-11-13</efrbr-expression:dateOfExpression><efrbr-expression:dateOfExpression type="published">2012</efrbr-expression:dateOfExpression><efrbr-expression:languageOfExpression vocabulary="iso639-1">en</efrbr-expression:languageOfExpression><efrbr-expression:summarizationOfContent>The ability of learning is critical for agents designed to compete in a variety of two-player, turn-taking, tactical adversarial games, such as Backgammon, Othello/Reversi, Chess, Hex, etc. The mainstream approach to learning in such games consists of updating some state evaluation function usually in a Temporal Difference (TD) sense either under the MiniMax optimality criterion or under optimization against a specific opponent. However, this approach is limited by several factors: (a) updates to the evaluation function are incremental, (b) stored samples from past games cannot be utilized, and (c) the quality of each update depends on the current evaluation function due to bootstrapping. In this paper, we present a learning approach based on the Least-Squares Policy Iteration (LSPI) algorithm that overcomes these limitations by focusing on learning a state-action evaluation function. The key advantage of the proposed approach is that the agent can make batch updates to the evaluation function with any collection of samples, can utilize samples from past games, and can make updates that do not depend on the current evaluation function since there is no bootstrapping. We demonstrate the efficiency of the LSPI agent over the TD agent in the classical board game of Othello/Reversi.</efrbr-expression:summarizationOfContent><efrbr-expression:useRestrictionsOnTheExpression type="creative-commons">http://creativecommons.org/licenses/by/4.0/</efrbr-expression:useRestrictionsOnTheExpression><efrbr-expression:note type="page range">704 - 711</efrbr-expression:note><efrbr-expression:note type="conference name">2012 IEEE International Conference on Tools with Artificial Intelligence</efrbr-expression:note><efrbr-expression:note type="proceedings title">Proceedings of the 2012 IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Athens, Greece, November 2012</efrbr-expression:note></efrbr-expression:expression><efrbr-person:person identifier="http://users.isc.tuc.gr/~lagoudakis"><efrbr-person:nameOfPerson vocabulary="TUC:LDAP">
            Lagoudakis Michael
            Λαγουδακης Μιχαηλ
         </efrbr-person:nameOfPerson></efrbr-person:person><efrbr-person:person identifier="http://users.isc.tuc.gr/~iskoulakis"><efrbr-person:nameOfPerson vocabulary="TUC:LDAP">
            Skoulakis Ioannis
            Σκουλακης Ιωαννης
         </efrbr-person:nameOfPerson></efrbr-person:person><efrbr-concept:concept identifier="A78D5FC8-0D0D-4D8C-82B0-AB83CA5E805B"><efrbr-concept:termForTheConcept>
            Reinforcement Learning
         </efrbr-concept:termForTheConcept></efrbr-concept:concept></efrbr:entities><efrbr:relationships><efrbr-structure:structureRelations><efrbr-structure:realizedThrough sourceEntity="work" targetEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/3FB923DA-4C00-4B8D-B671-06DDC7E38ACF" targetURI="http://purl.tuc.gr/dl/dias/3FB923DA-4C00-4B8D-B671-06DDC7E38ACF"/></efrbr-structure:structureRelations><efrbr-responsible:responsibleRelations><efrbr-responsible:createdBy sourceEntity="work" targetEntity="person" sourceURI="http://purl.tuc.gr/dl/dias/3FB923DA-4C00-4B8D-B671-06DDC7E38ACF" targetURI="http://users.isc.tuc.gr/~lagoudakis"/><efrbr-responsible:realizedBy sourceEntity="expression" role="author" targetEntity="person" sourceURI="http://purl.tuc.gr/dl/dias/3FB923DA-4C00-4B8D-B671-06DDC7E38ACF" targetURI="http://users.isc.tuc.gr/~lagoudakis"/><efrbr-responsible:realizedBy sourceEntity="expression" role="author" targetEntity="person" sourceURI="http://purl.tuc.gr/dl/dias/3FB923DA-4C00-4B8D-B671-06DDC7E38ACF" targetURI="http://users.isc.tuc.gr/~iskoulakis"/></efrbr-responsible:responsibleRelations><efrbr-subject:subjectRelations><efrbr-subject:hasSubject sourceEntity="work" targetEntity="concept" sourceURI="http://purl.tuc.gr/dl/dias/3FB923DA-4C00-4B8D-B671-06DDC7E38ACF" targetURI="A78D5FC8-0D0D-4D8C-82B0-AB83CA5E805B"/></efrbr-subject:subjectRelations><efrbr-other:otherRelations/></efrbr:relationships></efrbr:recordSet>