Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Efficient forecasting of multiple concurrent cancer simulations with Apache Flink

Katara Sotiria-Maria

Full record


URI: http://purl.tuc.gr/dl/dias/7C1B37F2-DF62-4993-983C-015E7159911E
Year 2021
Type of Item Diploma Work
License
Details
Bibliographic Citation Sotiria-Maria Katara, "Efficient forecasting of multiple concurrent cancer simulations with Apache Flink", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2021 https://doi.org/10.26233/heallink.tuc.90051
Appears in Collections

Summary

The rapid growth of computer systems, both fixed and mobile, in relation with the growing penetration of wireless and wired networks have resulted in the creation of very large volumes of data on a daily basis. Studying this data allows scientists to identify trends and patterns that can be used for future benefit. A very important field of application of these studies is in Bioinformatics and specifically in the prediction of the behaviour of heterogeneous multicellular systems, providing the possibility of timely decision making. The aim of this diploma thesis is to identify the similar time points of a set of concurrent cancer cell simulations, in order to extract appropriate information that will be used to predict their behaviour. Achieving this goal faces two very important challenges. The high dimensionality of the data combined with the time-consuming and memory-costly comparison of all one thousand four hundred simulations of time require the application of an algorithm, the functionality of which will combine the solution of these two very important challenges. The Random Hyperplane Projection form of the Locality Sensitive Hashing algorithm can solve both challenges by reducing the size of the data to smaller ones, while maintaining their diversity, while at the same time undertaking the grouping of similar objects in the same groups with high probability, through the use of appropriate hash functions. Very important is the scalability of the algorithm technique we will use, in order to achieve the optimal time efficiency in terms of exporting results, despite the increase in the volume of incoming data. This, in combination with the need of reduction spatial complexity leads to the development of the algorithm in a Synopses Data Engine, which is built on Apache Flink and aims to support a wide variety of synopses and add new ones, at runtime, in parallel and distributed way, thus providing the synopsis-as-a- service functionality. The execution of the algorithm is followed by the development of a forecasting mathematical model with the method of multiple linear regression in order to predict the behaviour of elements of the multicellular system. The performance of the system was tested locally and remotely - distributed, yielding positive results.

Available Files

Services

Statistics