Self-Attention Based Generative Adversarial Networks for Unsupervised Video Summarization

Μηναΐδη, Μαρία Νεκταρία; Minaidi, Maria Nektaria

dc.contributor.author	Μηναΐδη, Μαρία Νεκταρία	el
dc.contributor.author	Minaidi, Maria Nektaria	el
dc.date.accessioned	2023-05-17T06:43:21Z
dc.date.available	2023-05-17T06:43:21Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/57713
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.25410
dc.description	Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.)	el
dc.rights	Default License
dc.subject	Βαθιά Μάθηση	el
dc.subject	Αυτόματη Περίληψη Βίντεο	el
dc.subject	Δίκτυα Μακράς και Βραχείας Μνήμης	el
dc.subject	Παραγωγικά Ανταγωνιστικά Δίκτυα	el
dc.subject	Μηχανισμός Προσοχής	el
dc.subject	Deep Learning	en
dc.subject	Automatic Video Summarization	el
dc.subject	Long Short-Term Memory Networks	el
dc.subject	Attention Mechanism	el
dc.subject	Generative Adversarial Networks	el
dc.title	Self-Attention Based Generative Adversarial Networks for Unsupervised Video Summarization	en
dc.title	Παραγωγικά Ανταγωνιστικά Δίκτυα με βάση την Αυτό-Προσοχή για την Μη-Επιβλεπόμενη Περίληψη Βίντεο	el
dc.contributor.department	Speech and Language Processing Group	el
heal.type	masterThesis
heal.classification	Βαθιά Μηχανική Μάθηση	el
heal.classification	Deep Learning	el
heal.language	el
heal.language	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2022-11-08
heal.abstract	In this diploma thesis we tackle the topic of video summarization based on unsupervised learning and attention networks. In today’s era, the amount of data that is generated on a daily basis is increasing at an exponential rate. Given this growth, the need for users to select, browse, and consume such extensive collections of videos, as well as efficiently store the large amounts of data, is increasing. In order to meet these needs, automatic video summarization, which aims to provide a short visual summary of an original, full-length video, is considered necessary and is being researched. Given the recent development of neural networks, many video summarization architectures based on deep neural networks have been proposed in the recent years. In this work, we tackle video summarization as a problem of selecting the most characteristic key-shots (sequence of consecutive frames) and use deep learning techniques and generative adversarial networks to build a model that efficiently summarizes the input videos. The visual content of each video is modeled as a feature vector of the visual information of each frame. Firstly, motivated by the desire to overcome the disadvantages of Long Short-Term Memory Networks, as well as to exploit the advantages of attention mechanisms, we build our model by extending a simple generative adversarial network, incorporating into it attention mechanisms in different parts of the architecture. Then, by running a set of experiments on the resulting models that act as an ablation study, we determine the importance of incorporating attention and improving the temporal modeling of the frames, for the selection of key-shots and improving the efficiency of our architecture. Finally, we evaluate the above models on two popular datasets, which consist of short videos and have been extensively used to train and evaluate video summarization models. Additionally, relying on one more database, we create an additional dataset, consisting of longer videos, on which we evaluate our models. The generalizability of our model, as well as the use of attention mechanisms, are judged effective in each case, as the results showcase that using self-attention mechanisms as the frame selection mechanism outperforms the state-of-the-art approaches on SumMe and TVSum.	en
heal.advisorName	Ποταμιάνος, Αλέξανδρος	el
heal.committeeMemberName	Τζαφέστας, Κωνσταντίνος	el
heal.committeeMemberName	Σιόλας, Γεώργιος	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών	el
heal.academicPublisherID	ntua
heal.numberOfPages	98 σ.	el
heal.fullTextAvailability	false