Attention-based story visualization

Tsakas, Nikolaos; Τσάκας,Νικόλαος

dc.contributor.author	Tsakas, Nikolaos	en
dc.contributor.author	Τσάκας,Νικόλαος	el
dc.date.accessioned	2022-09-05T08:27:53Z
dc.date.available	2022-09-05T08:27:53Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/55589
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.23287
dc.rights	Αναφορά Δημιουργού 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/gr/	*
dc.subject	Μηχανική μάθηση	el
dc.subject	Γεννητικά ανταγωνιστικά δίκτυα	el
dc.subject	Οπτικοποίηση ιστορίας	el
dc.subject	Μηχανισμοί προσοχής	el
dc.subject	Τεχνητή νοημοσύνη	el
dc.subject	GANs	en
dc.subject	Transformer	en
dc.subject	Story visualization	en
dc.subject	Attention	en
dc.subject	Machine learning	en
dc.title	Attention-based story visualization	en
dc.contributor.department	Εργαστήριο Ψηφιακής Επεξεργασίας Εικόνας και Σημάτων	el
heal.type	bachelorThesis
heal.classification	Artificial Intelligence	en
heal.language	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2022-03-01
heal.abstract	Story Visualization is a novel task described as the generation of an image sequence based on a short story made up of natural language sentences or other semantic information. The task borrows from Text-to-Image in its pursuit of language-image correspondence, as well as Text-to-Video in its aim for consistency across frames. Currently there are few improvements on this challenging topic as well as a scarcity of viable datasets and evaluation methods. It is the combination of recent advances in sequence transduction (Transformer) and conditional image generation (SAGAN) that motivated our approach to the task of Story Visualization, in hopes of contributing towards a model that can capture the nuances of image sequence generation and language-to-vision temporal correspondence. The main objective of this thesis is to research various improvements on the original StoryGAN and experiment with different implementations of our architectural proposals. To that end we: • Examine the effects of using a Transformer encoder in place of the original RNN. • Apply more recent architectural approaches to the image generating GAN. • Explore the effects of attention mechanisms in the model, both as presented in the SAGAN architecture and by proposing two novel attention mechanisms for image sequences.	en
heal.advisorName	Στάμου, Γεώργιος	el
heal.committeeMemberName	Στάμου, Γεώργιος	el
heal.committeeMemberName	Βουλόδημος, Αθανάσιος	el
heal.committeeMemberName	Σταφυλοπάτης, Ανδρέας Γεώργιος	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών	el
heal.academicPublisherID	ntua
heal.numberOfPages	63 σ.	el
heal.fullTextAvailability	false