Το work with title Implementation of a platform for the update, management and analysis of data for the «HelTh» nutrition database by Vlassopoulos Evaggelos-Stylianos is licensed under Creative Commons Attribution 4.0 International
Bibliographic Citation
Evaggelos-Stylianos Vlassopoulos, "Implementation of a platform for the update, management and analysis of data for the «HelTh» nutrition database", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2025
https://doi.org/10.26233/heallink.tuc.104081
Purpose: The study aims to test whether Natural Language Processing (NLP) and Machine Learning teaching can be employed to accurately predict the nutritional composition- namely total fat, protein, total sugar, sodium and fiber content- of food products using their ingredient list as input. This approach is centered around the development of AI-tool to support food labelling standardization, address public health concerns and raise consumer awareness.Methodology: DistilBERT embeddings were employed to transform text from a food’s ingredient list into structured numerical representation, in a deep learning based predictive framework. The experimental dataset was the USDA FoodData Central Branded Food Composition database which ensures a comprehensive representation of the food environment and the variation in composition. Experimental regression models and Multi-Layer Perceptron (MLP) networksemployed a variety of loss functions, epochs, dataset sizes and batch sizes. The evaluation of the different experimental conditions was carried out using validation loss, Mean Absolute Error (MAE), and R2 score. Optimization was carried out using AdamW.Results: Findings indicate that using datasets with data from a single food category (category-specific), provide models with improved predictive accuracy, validation loss and model convergence compared to those using data from various food categories (generalized). SmoothL1Loss function was associated with improved validation and training loss compared to other loss functions, while AdamW enhanced training stability. The study further highlights that using datasets with higher structure as opposed to unstructured datasets improves predictionaccuracy and reduces noise and overfitting risks.Conclusions: The results indicate that NLP-driven models can be proposed as a reliable alternative in the estimation/prediction of a food’s nutritional composition from its ingredient list. This proposes the choice of scalable and cost-effective AI-based alternatives to traditional laboratory-based methods. Future research needs are identified in the areas of refinement of real-time prediction capabilities, optimization of feature selection techniques and ultimately the usability of such techniques in regulatory environments. The study highlights the potential of machine learning and intelligent food composition prediction in the food industry as a tool to increase consumer trust and support high quality labelling.