Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Implementation of a platform for the update, management and analysis of data for the «HelTh» nutrition database

Vlassopoulos Evaggelos-Stylianos

Full record


URI: http://purl.tuc.gr/dl/dias/E6D5E201-CA1E-4AED-82D3-528558FFCFF5
Year 2025
Type of Item Diploma Work
License
Details
Bibliographic Citation Evaggelos-Stylianos Vlassopoulos, "Implementation of a platform for the update, management and analysis of data for the «HelTh» nutrition database", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2025 https://doi.org/10.26233/heallink.tuc.104081
Appears in Collections

Summary

Purpose: The study aims to test whether Natural Language Processing (NLP) and Machine Learning teaching can be employed to accurately predict the nutritional composition- namely total fat, protein, total sugar, sodium and fiber content- of food products using their ingredient list as input. This approach is centered around the development of AI-tool to support food labelling standardization, address public health concerns and raise consumer awareness.Methodology: DistilBERT embeddings were employed to transform text from a food’s ingredient list into structured numerical representation, in a deep learning based predictive framework. The experimental dataset was the USDA FoodData Central Branded Food Composition database which ensures a comprehensive representation of the food environment and the variation in composition. Experimental regression models and Multi-Layer Perceptron (MLP) networksemployed a variety of loss functions, epochs, dataset sizes and batch sizes. The evaluation of the different experimental conditions was carried out using validation loss, Mean Absolute Error (MAE), and R2 score. Optimization was carried out using AdamW.Results: Findings indicate that using datasets with data from a single food category (category-specific), provide models with improved predictive accuracy, validation loss and model convergence compared to those using data from various food categories (generalized). SmoothL1Loss function was associated with improved validation and training loss compared to other loss functions, while AdamW enhanced training stability. The study further highlights that using datasets with higher structure as opposed to unstructured datasets improves predictionaccuracy and reduces noise and overfitting risks.Conclusions: The results indicate that NLP-driven models can be proposed as a reliable alternative in the estimation/prediction of a food’s nutritional composition from its ingredient list. This proposes the choice of scalable and cost-effective AI-based alternatives to traditional laboratory-based methods. Future research needs are identified in the areas of refinement of real-time prediction capabilities, optimization of feature selection techniques and ultimately the usability of such techniques in regulatory environments. The study highlights the potential of machine learning and intelligent food composition prediction in the food industry as a tool to increase consumer trust and support high quality labelling.

Available Files

Services

Statistics