Structured pruning for deep learning language models

Αχλάτης, Στέφανος Σταμάτης; Achlatis, Stefanos Stamatis

dc.contributor.author	Αχλάτης, Στέφανος Σταμάτης	el
dc.contributor.author	Achlatis, Stefanos Stamatis	en
dc.date.accessioned	2022-04-12T12:37:51Z
dc.date.available	2022-04-12T12:37:51Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/55068
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.22766
dc.rights	Αναφορά Δημιουργού 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/gr/	*
dc.subject	Βαθιά μάθηση BERT	el
dc.subject	Επεξεργασία φυσικής γλώσσας	el
dc.subject	Δομική περικοπή	el
dc.subject	Υπόθεσης τυχερού δελτίου	el
dc.subject	Deep learning	en
dc.subject	Natural language processing	en
dc.subject	BERT	en
dc.subject	Lottery ticket hypothesis	en
dc.title	Structured pruning for deep learning language models	en
heal.type	bachelorThesis
heal.classification	Deep Learning	en
heal.language	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2021-11-09
heal.abstract	In this Diploma Thesis, we study the compression of Deep Neural Networks, and more precisely, we study the structured pruning in Natural Language Processing models. From our work, we draw some conclusions that can be divided into two main aspects: • Studying pruning for pruning. We propose a better implementation of pruning that considers both the pre-trained and the fine-tuned model. • Studying pruning for a better understanding of the model. We see that through fine-tuning, the model forgets prior knowledge, and in order to overcome this problem, we study both the pre-trained and the fine-tuned model for better pruning results. Many studies have been done regarding compression neural networks, and they apply different compression techniques such as Magnitude Pruning, Structural Pruning, Quantization, Knowledge Distillation, and Neural Architecture Search. However, through Structured Pruning, one can compress the network and understand the fundamental structured components of the model. Therefore, we focus on Structured Pruning of BERT-based models, the dominant models used in Natural Language Processing. Other studies regarding Structured Pruning of BERT-based models considered only the final fine-tuned model, even though these models are created after a fine-tuning process, where weight values are mostly predetermined by the original model and are only fine-tuned on the end task. Thus, we suggest that pruning strategies should both consider the pre-trained and the final fine-tuned model, and the head importance score should be calculated considering both the importance of the pre-trained and the fine-tuned head. In this study, we examine how this idea could be implemented for BERT-base models, and in order to illustrate the impact of our approach, we execute experiments considering BERT models pre-trained on different corpora and fine-tuned on datasets of different domains. Moreover, we study our approach through the Lottery Ticket Hypothesis, where we see that obtaining initialization pruning masks considering both the pre-trained and the fine-tuned model outperforms the approach which only considers the fine-tuned model. Moreover, we propose a better application of the Lottery Ticket Hypothesis in structured pruning, and we name this approach "Iterative Structural Pruning". Last but not least, we examine the generalization ability of our methodology through different modalities, and we examine a speech model named wav2vec 2.0. This study is essential because many Transformer-based architectures achieve significant results on different modalities such as speech and vision. With this thesis, we wish to open new roads and create new aspects to explore pruning mechanisms, Lottery Ticket Hypothesis, and fine-tuning techniques.	en
heal.advisorName	Ποταμιάνος, Αλέξανδρος	el
heal.committeeMemberName	Ποταμιάνος, Αλέξανδρος	el
heal.committeeMemberName	Τζαφέστας, Κωνσταντίνος	el
heal.committeeMemberName	Κατσαμάνης, Αθανάσιος	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Σημάτων, Ελέγχου και Ρομποτικής	el
heal.academicPublisherID	ntua
heal.numberOfPages	109 σ.	el
heal.fullTextAvailability	false