Michail Marketakis, "An ensemble learning engine with Kafka and Kafka streams microservices", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2025
https://doi.org/10.26233/heallink.tuc.103499
This thesis introduces ELaaMS (Ensemble Learning as a MicroService), an event-driven microservice architecture designed to deliver robust, real-time ensemble predictions from multiple streaming machine-learning models running concurrently, built on top of Apache Kafka and Kafka Streams. Within ELaaMS, each learner runs as an independent Kafka Streams application that consumes streaming data, trains incrementally with the Massive Online Analysis (MOA) library, and publishes live predictions. A lightweight ensemble aggregator service fuses these outputs on the fly—applying majority voting for classifi cation and simple averaging for regression tasks —while a built-in catalog of streaming machine learning algorithms makes the system usable out of the box and readily extensi ble through the integration of new algorithms and methods. This architecture is designed to support robust scalability. Horizontal scalability is achieved by leveraging Kafka’s ca pability to distribute the workload across additional instances of the application, thereby increasing overall processing capacity and throughput. Vertical scalability is the ability to scale the computation with the number of processed streams.