Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Accelerating dictionary-based sentiment analysis with GPGPUs

Theodoraki Emmanouela

Full record


URI: http://purl.tuc.gr/dl/dias/A3990ECE-1406-416A-9F4E-C5469A70F59D
Year 2023
Type of Item Diploma Work
License
Details
Bibliographic Citation Emmanouela Theodoraki, "Accelerating dictionary-based sentiment analysis with GPGPUs", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2023 https://doi.org/10.26233/heallink.tuc.96456
Appears in Collections

Summary

Sentiment analysis is a natural language processing (NLP) technique that extracts subjective information such as opinions and emotions from textual data. The rapid growth of online social networks and the vast amount of content generated by their users has led the research community to dedicate a significant amount of study to the development of effective analysis techniques in this field. In addition, sentiment analysis has wide application in various areas, such as brand intelligence and market research, political campaigns, and spam detection, among others. The goal of this thesis is to develop the algorithms and tools that enable the acceleration of dictionary-based sentiment analysis using General Purpose Graphics Processing Units (GPGPUs) and other multi-core processors. To achieve this, we design and implement a data-parallel sentiment analysis system that extends previous literature on data-parallel pattern matchers, based on the Aho-Corasick algorithm, using thousands of data blobs as input, simultaneously. This system is able to analyze large feeds of data (e.g., Twitter feeds) and assign the respective scores to the content. Also, we re-design and implement sentiment analysis techniques found in popular tools, such as Vader, aiming to provide fast and accurate sentiment analysis results. We implement the core engine of our system using C/OpenCL, enabling it to execute on a large variety of devices and evaluate our system using a large corpus of Twitter feeds related to the COVID-19 pandemic. We compare our sentiment analysis tool against state-of-the-art solutions found in the literature, utilizing both lexicon-based sentiment analysis and machine learning and identify that our proposal can outperform them in computational speed by orders of magnitude while providing the same accuracy. This work provides a fast and accurate sentiment analysis tool that can execute on commodity systems without modifications, operating either as a stand-alone tool or as a library that can be embedded in other applications, allowing users to obtain sentiment analysis results in an almost real-time fashion.

Available Files

Services

Statistics