dc.contributor.author | Βερνίκος, Γεώργιος | el |
dc.contributor.author | Vernikos, Georgios | en |
dc.date.accessioned | 2020-12-03T07:05:45Z | |
dc.date.available | 2020-12-03T07:05:45Z | |
dc.identifier.uri | https://dspace.lib.ntua.gr/xmlui/handle/123456789/52190 | |
dc.identifier.uri | http://dx.doi.org/10.26240/heal.ntua.19888 | |
dc.description | Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) “Επιστήμη Δεδομένων και Μηχανική Μάθηση” | el |
dc.rights | Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/gr/ | * |
dc.subject | Adversarial | en |
dc.subject | Fine-tuning | en |
dc.subject | Language models | en |
dc.subject | Transformers | en |
dc.subject | Transfer learning | en |
dc.subject | Αντιπαλικός όρος ποινής | el |
dc.subject | Προσαρμογή | el |
dc.subject | Γλωσσικά μοντέλα | el |
dc.subject | Μεταφορά μάθησης | el |
dc.subject | Εξομαλυντής | el |
dc.title | Adversarial Fine-Tuning of Pretrained Language Models | en |
dc.title | Τεχνικές Μεταφοράς Μάθησης για την Προσαρμογή Προεκπαιδευμένων Γλωσσικών Μοντέλων | el |
dc.contributor.department | Artificial Intelligence and Learning Systems Laboratory (AILS Lab) | el |
heal.type | masterThesis | |
heal.classification | Artificial Intelligence | en |
heal.classification | Machine Learning | en |
heal.classification | Deep Learning | en |
heal.classification | Τεχνητή Νοημοσύνη | el |
heal.classification | Μηχανική Μάθηση | el |
heal.classification | Βαθιά Μάθηση | el |
heal.language | el | |
heal.language | en | |
heal.access | free | |
heal.recordProvider | ntua | el |
heal.publicationDate | 2020-07-07 | |
heal.abstract | In this work, we investigate methods in order to effectively transfer knowledge from a pretrained model to downstream tasks. Our goal is to improve the performance of pretrained language models on natural language processing tasks. We evaluate our approach on four natural language understanding tasks: sentence acceptability, sentiment analysis, paraphrase detection and textual entailment. Pretrained language models have achieved state-of-the-art results on most natural language processing tasks. Part of this success lies in their pretraining on large unlabeled corpora. The transferring process of language models consists of further training (fine-tuning) on the task-specific dataset. We argue that this process can hinder the knowledge captured during pretraining, a phenomenon known as \textit{catastrophic forgetting}. In our work, we identify the emergence of too domain-specific features during fine-tuning as a form of catastrophic forgetting. We aim to effectively transfer the knowledge gained during the pretraining of language models to the adaptation process. In order to preserve most of the knowledge captured during pretraining and exploit the capabilities of pretrained language models, we introduce \textsc{after}, short for domain adversarial fine-tuning as an effective regularizer. We leverage unlabeled data from a different domain than the task-specific dataset and we constrain the extent to which the model representations are allowed to differ across different domains. This constraint is realized through a classifier that tries to distinguish between domains. The parameters of the pretrained language model are trained adversarially to maximize the loss of this classifier. The addition of this domain adversarial loss has a regularizing effect on the fine-tuning process, encouraging domain-invariant representations and subsequently leading to improved performance on downstream tasks. We experiment with two top-performing pretrained language models (BERT and XLNet), although our approach is equally applicable to any language model. The proposed adversarial fine-tuning method, \textsc{after}, outperforms standard fine-tuning on four natural language understanding tasks using two different pretrained language models. We additionally conduct an ablation study regarding the effect of the domain of origin of the unlabeled corpus and the similarity between the domain of the pretraining and the task-specific data. Our adversarial fine-tuning approach requires minimal changes on the fine-tuning process and leverages unlabeled data. | en |
heal.advisorName | Σταφυλοπάτης, Ανδρέας-Γεώργιος | el |
heal.advisorName | Stafylopatis, Andreas-Georgios | en |
heal.committeeMemberName | Στάμου, Γιώργος | el |
heal.committeeMemberName | Σιόλας, Γεώργιος | el |
heal.academicPublisher | Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών | el |
heal.academicPublisherID | ntua | |
heal.numberOfPages | 126 σ. | el |
heal.fullTextAvailability | false |
The following license files are associated with this item: