Ιδρυματικό Αποθετήριο [SANDBOX]
Πολυτεχνείο Κρήτης
EN  |  EL

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

Δενδρική αναζήτηση Monte Carlo στο παιχνίδι στρατηγικής "Άποικοι του Κατάν"

Karamalegos Emmanouil

Πλήρης Εγγραφή


URI: http://purl.tuc.gr/dl/dias/1CCC4417-B1AA-4C2E-B33B-0F6BEFDED36E
Έτος 2016
Τύπος Διπλωματική Εργασία
Άδεια Χρήσης
Λεπτομέρειες
Βιβλιογραφική Αναφορά Εμμανουήλ Καραμαλέγκος, "Δενδρική αναζήτηση Monte Carlo στο παιχνίδι στρατηγικής "Άποικοι του Κατάν"", Διπλωματική Εργασία, Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών, Πολυτεχνείο Κρήτης, Χανιά, Ελλάς, 2016 https://doi.org/10.26233/heallink.tuc.66891
Εμφανίζεται στις Συλλογές

Περίληψη

Classic approaches to game AI require either a high quality of domain knowledge, ora long time to generate effective AI behavior. Monte Carlo Tree Search (MCTS) is asearch method that combines the precision of tree search with the generality ofrandom sampling. The family of MCTS algorithms has achieved promising resultswith perfect-information games such as Go. In our work, we apply Monte-Carlo TreeSearch to the non-deterministic game "Settlers of Catan", a multi-player board-turnedweb-based game that necessitates strategic planning and negotiation skills. Weimplemented an agent which takes into consideration all the aspects of the game forthe first time, using no domain knowledge.In our work, we are experimenting using a reinforcement learning method Value ofPerfect Information (VPI) and two bandit methods, namely, the Upper CoefficientBound and Bayesian Upper Coefficient Bound methods. Such methods attempt tostrike a balance between exploitation and exploration when creating of the searchtree.For first time, we implemented an agent that takes into consideration the completerules-set of the game and makes it possible to negotiate trading between all players.Furthermore, we included in our agent an alternative initial placement found in theliterature, which is based on the analysis of human behavior in Settlers of Catangames.In our experiments we compare the performance of our methods against each otherand against appropriate benchmarks (e.g., JSettlers agents), and examine the effectthat the number of simulations and the simulation depth has on the algorithms’performance. Our results suggests that VPI scores significantly better than banditbased methods, even if these employ a much higher number of simulations. Inaddition to this, the simulation depth of the algorithm needs to be calculated so themethod will neither get lost in deep simulations of improbable scenarios neither endup shortly without given a proper estimation of the upcoming moves. Finally, ourresults indicate that our agent performance is improved when the alternative, humanbehavior-based, initial placement method.

Διαθέσιμα αρχεία

Υπηρεσίες

Στατιστικά