Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

A systematic evaluation of the PPO algorithm for deep reinforcement learning in lane-free autonomous driving

Akrivopoulos Grigorios

Full record


URI: http://purl.tuc.gr/dl/dias/FD1DD319-715B-4920-A518-2404172ABB59
Year 2024
Type of Item Diploma Work
License
Details
Bibliographic Citation Grigorios Akrivopoulos, "A systematic evaluation of the PPO algorithm for deep reinforcement learning in lane-free autonomous driving", Diploma Work, School of Production Engineering and Management, Technical University of Crete, Chania, Greece, 2024 https://doi.org/10.26233/heallink.tuc.101775
Appears in Collections

Summary

Lane-free traffic is a novel paradigm that targets environments fully comprised of Connected and Automated Vehicles (CAVs), where CAVs do not adhere to traffic lanes but can occupy any lateral position within the road boundaries. This gives rise to many research opportunities and innovative applications. At the same time, the field of Deep Reinforcement Learning (DRL) has gained momentum and continues to rapidly advance, with active lines of research for applications in autonomous driving. Specifically, Proximal Policy Optimization (PPO) is a recently introduced on-policy algorithm for DRL and is considered as one of the most prominent for modern DRL applications. As of now, research avenues for DRL in lane-free traffic have examined other algorithms not related to PPO or on-policy algorithms in general. To this end, we build upon existing work for DRL in single-agent lane-free environments, where a CAV with the form of an agent has the task to learn a lane-free vehicle movement strategy while navigating a road populated with other CAVs. To effectively apply PPO in this setting, we extend an existing Markov Decision Process formulation of the problem with different new components, and systematically evaluate their influence on the agent’s learning performance. Firstly, we put forward an image state representation of surrounding traffic that captures the 2-dimensional movement of CAVs and compare it with the existing vector-based state input. Then, we formulate and examine different reward function terms that are better fitted for PPO. Moreover, we develop a blocking environment setting where the agent’s actions are filtered under some critical conditions. There, instead of the fully unconstrained learning environment, we observe the impact of a practical constraint that better guides the learning process away from the local maxima that we commonly experienced in practice. Our experimental evaluation shows the improvement that each of the above-mentioned enhancements for PPO provides under the single-agent lane-free environment. The results indicate the agent’s capacity to learn strategies that overcome solutions of inferior quality that were initially observed under the original formulation targeting other methods. Motivated by the results, we believe that the proposed enhancements can serve as groundwork in future endeavours for PPO and other methods for DRL in lane-free traffic.

Available Files

Services

Statistics