Stavros Lyronis, "Vulnerable code generation by Large Language Models", Master Thesis, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2024
https://doi.org/10.26233/heallink.tuc.101189
The vast amount of publicly available data and the improvements in the field of neural networks, combined with the limitless computational power of GPUs, led to the creation of a new paradigm of natural language processing models, also known as Large Language Models (LLMs). Beside the short period of LLM models’ availability, platforms such as ChatGPT and Bard have demonstrated impressive usability and effectiveness in handling text based tasks, such as passing the exams in various domains, writing essays, songs and poems, orchestrating marketing campaigns and more. Automated generation of source code based on user queries is one of these tasks, with a vast amount of developers having already adopted LLMs, in some form, into their working environments. In this thesis we demonstrate the risks correlated to LLM models’ truthfulness in terms of source code generation. More specifically, we manage to identify common and well-known vulnerabilities in the source code generated by most popular LLMs (ChatGPT, Bard and Copilot). Towards this goal, we deploy a series of interactive experiments based on role playing where LLMs are required to generate source code for naive developers that completely trust the model’s outcome. In order to evaluate security of the generated content we exploit different case scenarios where we inform the model with the concerns towards security. Our analysis reveals that current LLMs by default lack security concerns and provide vulnerable source code in most cases. Additionally, we manage to achieve secure code generation only in the case where we specifically ask LLMs to provide a secure solution via query manipulation. Finally, we argue that the usage of LLMs for critical cybersecurity tasks should be avoided, however they can be used for educational purposes or with proper query manipulation towards enforcing the secure implementation.