Iliodora Seferli, "Interactive story generation via content-based filtering", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2024
https://doi.org/10.26233/heallink.tuc.100926
In resent times, Artificial Intelligence (AI) has started to expand in many domains, as much in science as in the world of gaming development. In our thesis, we explore the use of an AI agent in the development of an interactive story generation system through the application of content-based filtering techniques. The primary goal is to design a dynamic storytelling mechanism (colloquially known as a “drama manager” - DM) capable of predicting and generating narrative paths aligned with user preferences. Leveraging text vectorization techniques, such as Term Frequency-Inverse Document Frequency (TF-IDF) and Class Label Frequency Distance (CLFD), the system is trained on a dataset comprising book summariesand their associated genres.The study evaluates the efficacy of classification methods, such as Logistic Regression, Support Vector Machines (SVM), Multi-Layer Perceptron (MLP), Random Forest, and Naive Bayes, in enhancing the DM’s ability to comprehend and predict narrative elements within specific chapters. The use of clustering methods is also examined to determine if the inclusion of the non-labeled chapters will provide better results. By examining both approaches, our thesis work manages to identify the most effective strategy for the DM to categorize and generate content that resonates with users’ tastes. The evaluation of our methods was conducted both in silico, but also via the involvement of real users who interacted with the system, choosing paths in the story determined by the DM based on its classification capabilities. To this end, we also provided a graphical interface, where users can read stories and choose their own path of story development.Specifically, experimental results for the classification methods indicate that Logistic Regression is the fastest and most effective method for accurately recognizing each path label. We tested our system with two different datasets containing book summaries with different numbers of documents, examining how the results differ with the use of small vs large datasets. Additionally, we show experimentally that the CLFD approach is better for text vectorization for genres that appear to be multi-labeled. We received user evaluations for its path recommendations and how the framework overall works for them.In summary, our work contributes to the field of interactive storytelling by providing insights into the application of advanced text vectorization and machine learning techniques for narrative generation. It highlights the importance of understanding user preferences and offers a framework for developing intelligent DMs capable of delivering customized story paths.