A systematic evaluation of the PPO algorithm for deep reinforcement learning in lane-free autonomous driving

Akrivopoulos Grigorios

Full record

URI:

http://purl.tuc.gr/dl/dias/FD1DD319-715B-4920-A518-2404172ABB59

Year

2024

Type of Item

Diploma Work

License

Details

Bibliographic Citation

Grigorios Akrivopoulos, "A systematic evaluation of the PPO algorithm for deep reinforcement learning in lane-free autonomous driving", Diploma Work, School of Production Engineering and Management, Technical University of Crete, Chania, Greece, 2024 https://doi.org/10.26233/heallink.tuc.101775

Appears in Collections

Diploma Works in Community School of Production Engineering and Management

Summary

Lane-free traffic is a novel paradigm that targets environments fully comprised of Connected and Automated Vehicles (CAVs), where CAVs do not adhere to traffic lanes but can occupy any lateral position within the road boundaries. This gives rise to many research opportunities and innovative applications. At the same time, the field of Deep Reinforcement Learning (DRL) has gained momentum and continues to rapidly advance, with active lines of research for applications in autonomous driving. Specifically, Proximal Policy Optimization (PPO) is a recently introduced on-policy algorithm for DRL and is considered as one of the most prominent for modern DRL applications. As of now, research avenues for DRL in lane-free traffic have examined other algorithms not related to PPO or on-policy algorithms in general. To this end, we build upon existing work for DRL in single-agent lane-free environments, where a CAV with the form of an agent has the task to learn a lane-free vehicle movement strategy while navigating a road populated with other CAVs. To effectively apply PPO in this setting, we extend an existing Markov Decision Process formulation of the problem with different new components, and systematically evaluate their influence on the agent’s learning performance. Firstly, we put forward an image state representation of surrounding traffic that captures the 2-dimensional movement of CAVs and compare it with the existing vector-based state input. Then, we formulate and examine different reward function terms that are better fitted for PPO. Moreover, we develop a blocking environment setting where the agent’s actions are filtered under some critical conditions. There, instead of the fully unconstrained learning environment, we observe the impact of a practical constraint that better guides the learning process away from the local maxima that we commonly experienced in practice. Our experimental evaluation shows the improvement that each of the above-mentioned enhancements for PPO provides under the single-agent lane-free environment. The results indicate the agent’s capacity to learn strategies that overcome solutions of inferior quality that were initially observed under the original formulation targeting other methods. Motivated by the results, we believe that the proposed enhancements can serve as groundwork in future endeavours for PPO and other methods for DRL in lane-free traffic.

Search

Browse

My Space

A systematic evaluation of the PPO algorithm for deep reinforcement learning in lane-free autonomous driving

Akrivopoulos Grigorios

Summary

Available Files

Services

Export

Share

Statistics

Metadata & Content in a METS Package:

Metadata in Format: