Το work with title Deep Q-Networks with normalized advantage function for autonomous driving in lane-free traffic by Bakopoulos Leonidas is licensed under Creative Commons Attribution 4.0 International
Bibliographic Citation
Leonidas Bakopoulos, "Deep Q-Networks with normalized advantage function for autonomous driving in lane-free traffic", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2024
https://doi.org/10.26233/heallink.tuc.98702
In the past decade Deep Reinforcement Learning (Deep-RL) has evolved into a powerful tool that can outperform both human abilities and traditional algorithms in many domains. Deep-RL differs from classic RL in its ability to handle complex problems in larger, and sometimes continuous, action and state spaces. At the same time, the vehicular traffic research area is of utmost practical importance. Numerous works have proposed that automated vehicles canoptimize traffic flow. Vehicles on the road tend to maintain different desired speeds, leading to various situations requiring overtaking and other appropriate reactions to others’ behavior.Now, in recent years, a novel vehicular traffic paradigm, namely that of lane-free traffic, has emerged as a means to utilize the full width of a road by automated and (potentially connected) vehicles. In a lane-free environment, vehicles can be positioned anywhere in the two-dimensional state space, complicating the automated vehicles’ decision-making process significantly and makingit entirely different from the traditional lane-based approach. Deep RL is a natural candidate to address the challenges posed by this new traffic paradigm.Against this background, this thesis builds upon recent work by Karalakou et al. [1] that enabled the application of the Deep Deterministic Policy Gradients (DDPG) Deep RL algorithm in the lane-free traffic domain. Our work progressively builds an autonomous agent that combines various algorithmic components, having as a basis the Normalized Advantage Functions (NAF) deep RL algorithm. Specifically, we put forward the blending of NAF with Prioritized ExperienceReplay (PER), Parameter State Noise for Exploration (PSNE), the well-known Boltzmann exploration method, and a local optimization method for exploration; and systematically test our approach in the lane-free highway traffic domain, comparing the performance of various combinations of these algorithmic components against that of the aforementioned DDPGapproach. Our simulation experiments’ results showcase our approach’s superiority to using DDPG; highlight the strengths of each tested algorithmic variant; and demonstrate that our NAF+PER+PSNE variant (in which PSNE is actually combined with Boltzmann exploration) is overall the better method for use in the lane-free traffic scenarios examined.