Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

An engine for efficient data stream summarization using Kafka and Kafka streams microservices

Kalfakis Georgios-Panagiotis

Full record


URI: http://purl.tuc.gr/dl/dias/B8BFB68A-9186-4131-A568-E0D61948D48F
Year 2024
Type of Item Diploma Work
License
Details
Bibliographic Citation Georgios-Panagiotis Kalfakis, "An engine for efficient data stream summarization using Kafka and Kafka streams microservices", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2024 https://doi.org/10.26233/heallink.tuc.99743
Appears in Collections

Summary

In this work, we introduce a novel stream summary maintenance paradigm in the form of distributed microservices, namely Synopses as a MicroService, and we implement this paradigm on top of Apache Kafka and Kafka Streams Microservices. SaaMS is designed for real-time stream summarization and analysis over rapid data streams. SaaMS also contains a built-in library with Synopses that is used for producing stream summaries but remains extensible and customizable to new Synopses techniques. In that, (a) it contributes an innovative architecture to gain scalability dynamically based on the necessary computation requirements, (b) maintains a large volume of Synopses, concurrently with high throughput and fault-tolerance, (c) provides an extensible Synopsis library for real-time analysis (d) experimental evaluation provided using real financial data. SaaMS manages large-scale stream processing and analysis because it enables (i) horizontal scalability, i.e., taking advantage of complicated mechanisms that Kafka has for distributing the workload, achieving maximum throughput and minimum latency (ii) vertical scalability, i.e., the ability to scale the computation with the number of processed streams (iii) federated scalability, i.e., data can be processed across multiple distributed environments even in case they are geographically dispersed.

Available Files

Services

Statistics