Μέθοδοι επαύξησης δεδομένων για νευρωνικά δίκτυα Vision Transformer

Georgakilas Christos

URI	http://purl.tuc.gr/dl/dias/E7D90BFF-93A5-4DC0-A9D9-92B95EB1FE51	-
Αναγνωριστικό	https://doi.org/10.26233/heallink.tuc.93681	-
Γλώσσα	en	-
Μέγεθος	2.4 megabytes	en
Μέγεθος	40 pages	en
Τίτλος	Data augmentation methods for Vision Transformers	en
Τίτλος	Μέθοδοι επαύξησης δεδομένων για νευρωνικά δίκτυα Vision Transformer	el
Δημιουργός	Georgakilas Christos	en
Δημιουργός	Γεωργακιλας Χριστος	el
Συντελεστής [Επιβλέπων Καθηγητής]	Zervakis Michail	en
Συντελεστής [Επιβλέπων Καθηγητής]	Ζερβακης Μιχαηλ	el
Συντελεστής [Μέλος Εξεταστικής Επιτροπής]	Lagoudakis Michail	en
Συντελεστής [Μέλος Εξεταστικής Επιτροπής]	Λαγουδακης Μιχαηλ	el
Συντελεστής [Μέλος Εξεταστικής Επιτροπής]	Κομοντάκης Νίκος	el
Συντελεστής [Μέλος Εξεταστικής Επιτροπής]	Komodakis Nikos	en
Εκδότης	Πολυτεχνείο Κρήτης	el
Εκδότης	Technical University of Crete	en
Ακαδημαϊκή Μονάδα	Technical University of Crete::School of Electrical and Computer Engineering	en
Ακαδημαϊκή Μονάδα	Πολυτεχνείο Κρήτης::Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών	el
Περίληψη	The Transformer architecture was first introduced in 2017 and has since become the standard for Natural Language Processing tasks, replacing Recurrent Neural Networks. For the first time, in 2021, the Transformer architecture was used with great success for computer vision tasks, proving that a Vision Transformer can, under certain conditions, outperform Convolutional Neural Networks and become the state-of-the-art in image recognition. One of the main challenges being tackled by subsequent work on Vision Transformers is the need of the architecture for humongous amounts of data during pre-training in order to achieve state-of-the-art accuracy on the downstream task. Some works have addressed this by altering or adding parts to the original Vision Transformer architecture while others are using Self-Supervised Learning techniques to take advantage of unlabeled data. This thesis explores data augmentation methods for Vision Transformers with the goal to increase the model’s accuracy and robustness on classification tasks, with limited amounts of data. Our augmentation methods are based on the architecture’s characteristics such as the self-attention mechanism and the input of discrete tokens. All methods are tested for the benchmark classification datasets CIFAR-10 and CIFAR-100 using Supervised Learning and yield great results. When training with the same model hyperparameters, our best augmentation method improves the baseline’s accuracy on CIFAR-10 and CIFAR-100 by 1.98 % and 2.71 % respectively.	en
Τύπος	Διπλωματική Εργασία	el
Τύπος	Diploma Work	en
Άδεια Χρήσης	http://creativecommons.org/licenses/by/4.0/	en
Ημερομηνία	2022-10-17	-
Ημερομηνία Δημοσίευσης	2022	-
Θεματική Κατηγορία	Vision Transformers	en
Βιβλιογραφική Αναφορά	Christos Georgakilas, "Data augmentation methods for Vision Transformers", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2022	en
Βιβλιογραφική Αναφορά	Χρίστος Γεωργακίλας, "Μέθοδοι επαύξησης δεδομένων για νευρωνικά δίκτυα Vision Transformer", Διπλωματική Εργασία, Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών, Πολυτεχνείο Κρήτης, Χανιά, Ελλάς, 2022	el

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

Μέθοδοι επαύξησης δεδομένων για νευρωνικά δίκτυα Vision Transformer

Georgakilas Christos

Διαθέσιμα αρχεία

Υπηρεσίες

Εξαγωγή

Κοινοποίηση

Στατιστικά

Μεταδεδομένων & Περιεχομένου σε METS:

Μεταδεδομένων σε Μορφότυπο: