Το έργο με τίτλο Επεξεργασία βίντεο σε πραγματικό χρόνο σε περιβάλλον Apache FLINK στο Υπολογιστικό Νέφος από τον/τους δημιουργό/ούς Kastrinakis Dimitrios διατίθεται με την άδεια Creative Commons Αναφορά Δημιουργού 4.0 Διεθνές
Βιβλιογραφική Αναφορά
Δημήτριος Καστρινάκης, "Επεξεργασία βίντεο σε πραγματικό χρόνο σε περιβάλλον Apache FLINK στο Υπολογιστικό Νέφος", Διπλωματική Εργασία, Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών, Πολυτεχνείο Κρήτης, Χανιά, Ελλάς, 2022
https://doi.org/10.26233/heallink.tuc.92728
In this work, we present a distributed stream-processing system built with Apache Flink on top of Kubernetes, fueled by the high-speed streaming platform Apache Kafka. Our system’s function will be processing high-resolution raw videos in order to detect camera shot changes. This work has two mains goals. Firstly, we want to make this system highly scalable, and able to efficiently utilize multiple processing nodes. Secondly, we want our system to be able to process a high input throughput of videos in as close to real-time as possible. Flink works by applying a series of operators in a pipeline to transform a video stream into meaningful data (i.e., shots in our case). These operators can be easily duplicated. This allows them to work in parallel by being distributed inside Flink-managed nodes. To allow fully distributed and scalable processing of video files, we suggest partitioning each video frame into smaller blocks. These can then be separately processed by all the distributed operators in parallel. The blocks of each frame are first evenly distributed to multiple Kafka topic partitions. Then, all Flink nodes read in parallel from those partitions. For the purposes of shot change detection, the histogram of the intensity of each separate block is calculated by an operator. Then, a second operator assembles all the histograms of a frame’s blocks into that same frame’s full histogram. In the next step, a third operator receives the full histograms of adjacent frames and calculates their differences. A final operator receives these histogram differences as a stream. If a difference exceeds a predefined threshold, then a camera cut shot change is announced. Otherwise, the operator looks for gradient fades among multiple sequential frames. We deployed our Flink application on a Flink cluster on top of a Kubernetes cluster. Up to 8 Flink nodes were used on the Google Cloud Platform, using Flink’s Native Kubernetes support. To determine the scalability of our system, we compare its performance against a non-distributed system. The experiments produced excellent speed-up results. An important improvement was detected in all tested video resolutions. The highest speedup however was observed in experiments with the videos of the highest resolutions. Up to 7 times better performance was reached compared to the non-distributed system.