Exploiting feature-based and logit-based knowledge distillation for
improved teacher-student deep neural networks

Μαυρεπής, Φίλιππος; Philippos, Mavrepis

dc.contributor.author	Μαυρεπής, Φίλιππος	el
dc.contributor.author	Philippos, Mavrepis	en
dc.date.accessioned	2023-09-06T09:25:23Z
dc.date.available	2023-09-06T09:25:23Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/58027
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.25724
dc.rights	Default License
dc.subject	Knowledge Distillation	en
dc.subject	Deep Learning	en
dc.subject	Teacher-Student Architectures	en
dc.subject	Απόσταξη Γνώσης	el
dc.subject	Αρχιτεκτονικές Δασκάλου-Μαθητή	el
dc.subject	Βαθιά Μάθηση	el
dc.title	Exploiting feature-based and logit-based knowledge distillation for improved teacher-student deep neural networks	en
dc.contributor.department	Remote Sensing Lab (RSLab)	el
heal.type	bachelorThesis
heal.classification	Computer Science	en
heal.language	el
heal.language	en
heal.access	campus
heal.recordProvider	ntua	el
heal.publicationDate	2023-02-26
heal.abstract	Knowledge distillation is one of the techniques used to transfer knowledge between two or more networks. Usually, those networks are referred to as student(s) and teacher(s), with the main goal being to increase the metric of performance for the student(s) through exploitation of the knowledge from the teacher network. Previous studies have explored transferring knowledge through feature-based and/or logit-based approaches. Besides that, the utilisation of same and cross-level information between teacher and student networks has also been explored. This diploma thesis examines the combination of state of the art methods in feature- based and logit-based knowledge distillation. The techniques used are, 'Distillation via knowledge Review' also known as 'ReviewKD' and 'Decoupled Knowledge Distillation' or 'DKD'. Those method were merged to create a novel technique named 'ReviewDKD' and test its performance. In addition, we explore the effect of data augmentation techniques such as MixUp and propose a novel way to apply MixUp to teacher and student networks. We apply our method to a variety of teacher-student architectures for the problem of image classification on CIFAR-100. To this end our result show relatively promising results for specific architecture pairs with the student being able to surpass the teacher network at some cases.	en
heal.advisorName	Karantzalos, Konstantinos
heal.advisorName	Kakogeorgiou, Ioannis
heal.committeeMemberName	Karantzalos, Konstantinos
heal.committeeMemberName	Stamou, Giorgos
heal.committeeMemberName	Voulodimos, Athanasios
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών	el
heal.academicPublisherID	ntua
heal.numberOfPages	62
heal.fullTextAvailability	false