Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Time-series analysis using machine learning methods

Paraskakis Nikolaos

Full record


URI: http://purl.tuc.gr/dl/dias/9ACB7095-6536-4C36-BDC0-538E8F77DF88
Year 2023
Type of Item Diploma Work
License
Details
Bibliographic Citation Nikolaos Paraskakis, "Time-series analysis using machine learning methods", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2023 https://doi.org/10.26233/heallink.tuc.97531
Appears in Collections

Summary

This diploma thesis explores the application of machine learning techniques to time-series analysis, focusing on the yearly number of sunspots dataset. The introduction begins with a presentation of fundamental concepts in time-series analysis, encompassing stochastic processes, correlation, stationarity, heteroscedasticity, and time-series decomposition methods. The thesis then delves into crucial aspects of time-series forecasting, including dataset splitting, cross-validation, evaluation metrics, and various forecasting strategies, emphasizing both one-step and multi-step forecasting.A key focus of this research is the examination of non-linear data transformations and their role in enhancing model predictive performance by achieving desirable properties of the transformed dataset, such as normality and stationarity. The study also investigates the advanced machine learning methods of Gaussian Processes (GPs), Gradient Boosting Decision Trees (GBDT), and Long Short-Term Memory (LSTM) neural networks in the context of time-series forecasting. A comparative analysis which examines the strengths and weaknesses of each of these methods is presented.This thesis contains a case study which involves the analysis and forecasting of the yearly number of sunspots. First, we take advantage of GPs, which constitute a probabilistic non-parametric regression framework. We use a constant mean function and an exponential multiplied by a periodic covariance kernel, while assuming independent and identically distributed Gaussian noise, and Gaussian likelihood of the data. To square with these assumptions, we apply the kappa-logarithmic transformation (Kaniadakis G., 2009), that accounts for the skewness, heteroscedasticity, and non-negativity of the sunspot data. Then, we train the model on the transformed data and optimize its hyperparameters using maximum likelihood estimation (MLE). Next, we utilize the algorithm of LightGBM (Light Gradient Boosting Machine), which is a gradient-boosting framework of regression trees, that is well-known for its efficiency and accuracy. The tuning of hyperparameters is carried out using Bayesian optimization with the goal to minimize the validation loss. Finally, we implement an LSTM model with multiple layers capable of forecasting the yearly number of sunspots, and optimize its hyperparameters using grid search with the objective of minimizing the validation loss. LSTM is an especial form of recurrent neural network (RNN), which comprise a deep learning architecture, capable of capturing long-term dependencies and complex patterns. It consists of four gates (input, forget, candidate, and output) responsible for information flow.GP regression excels in interpretability, delivers uncertainty estimates along with point estimates, and can capture complex patterns using different kernels. However, it requires the computationally intensive inversion of large covariance matrices (large dataset). LSTM performs well in capturing long-term dependencies, but it needs large amounts of data, time, and resources for tuning and training, and it suffers from error accumulation on long-term predictions. LightGBM can capture complex patterns as well, and it is more computationally efficient, making its training faster.In conclusion, this thesis provides insights into the performance and characteristics of three powerful machine learning methods, which produce competitive predictions of the yearly number of sunspots. Our findings collectively mark a significant stride in the application of advanced machine learning techniques to forecast and analyze time-series data across diverse disciplines.

Available Files

Services

Statistics