Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Reinforcement learning for swing up and balancing of three-dimensional humanoid model

Papadimitriou Panagiotis

Full record


URI: http://purl.tuc.gr/dl/dias/FAC034FB-7F9D-4CF7-A0C5-25FE0E6EB331
Year 2021
Type of Item Diploma Work
License
Details
Bibliographic Citation Παναγιώτης Παπαδημητρίου, "Ενισχυτική μάθηση για αιώρηση και ισορροπία ενός τρισδιάστατου ανθρωποειδούς μοντέλου", Διπλωματική Εργασία, Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών, Πολυτεχνείο Κρήτης, Χανιά, Ελλάς, 2021 https://doi.org/10.26233/heallink.tuc.88825
Appears in Collections

Summary

Reinforcement Learning, as a subfield of Artificial Intelligence and Machine Learning, has gained a lot of traction in recent years. From trained agents playing video games or chess at expert level to self-driving cars in the streets, a lot of ground-breaking results have been achieved thanks to advances in Reinforcement Learning. The combination of Reinforcement Learning and Robotics has the additional advantage that agents trained in simulation could eventually be carried over to real robots that can be utilized in varying tasks to aid humans. In this diploma thesis, we construct a 3-dimensional humanoid model hanging below a horizontal bar (an acrobat) within a realistic simulation environment, based on humanoid model originally made for walk learning experiments. The goal of the agent that controls the actions of the humanoid model is to swing up and eventually balance the humanoid model on the bar. The challenge in this problem is the high-dimensional and continuous state and action space, since the model has 19 degrees of freedom (joints) and 17 actuators (motors), a case where conventional learning approaches do not apply. We try out two Reinforcement Learning algorithms: Deep Deterministic Policy Gradient (DDPG) and Advantage Actor-Critic (A2C) to train the agent using thousands of trials and we demonstrate the learning progress. A simple reward scheme was adopted that rewards the agent proportionally to the height reached at any time, but does not reveal any information about the nature of the problem. Through the extensive experimentation we conducted with both algorithms and some variations of the model, we deduced that the most efficient algorithm and a better fit to the problem at hand was DDPG, which through some tuning of the problem parameters yielded satisfying results. The resulting agent after learning is able to complete the task in most trials from any starting pose.

Available Files

Services

Statistics