Attention-based story visualization

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Tsakas, Nikolaos en
dc.contributor.author Τσάκας,Νικόλαος el
dc.date.accessioned 2022-09-05T08:27:53Z
dc.date.available 2022-09-05T08:27:53Z
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/55589
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.23287
dc.rights Αναφορά Δημιουργού 3.0 Ελλάδα *
dc.rights.uri http://creativecommons.org/licenses/by/3.0/gr/ *
dc.subject Μηχανική μάθηση el
dc.subject Γεννητικά ανταγωνιστικά δίκτυα el
dc.subject Οπτικοποίηση ιστορίας el
dc.subject Μηχανισμοί προσοχής el
dc.subject Τεχνητή νοημοσύνη el
dc.subject GANs en
dc.subject Transformer en
dc.subject Story visualization en
dc.subject Attention en
dc.subject Machine learning en
dc.title Attention-based story visualization en
dc.contributor.department Εργαστήριο Ψηφιακής Επεξεργασίας Εικόνας και Σημάτων el
heal.type bachelorThesis
heal.classification Artificial Intelligence en
heal.language en
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2022-03-01
heal.abstract Story Visualization is a novel task described as the generation of an image sequence based on a short story made up of natural language sentences or other semantic information. The task borrows from Text-to-Image in its pursuit of language-image correspondence, as well as Text-to-Video in its aim for consistency across frames. Currently there are few improvements on this challenging topic as well as a scarcity of viable datasets and evaluation methods. It is the combination of recent advances in sequence transduction (Transformer) and conditional image generation (SAGAN) that motivated our approach to the task of Story Visualization, in hopes of contributing towards a model that can capture the nuances of image sequence generation and language-to-vision temporal correspondence. The main objective of this thesis is to research various improvements on the original StoryGAN and experiment with different implementations of our architectural proposals. To that end we: • Examine the effects of using a Transformer encoder in place of the original RNN. • Apply more recent architectural approaches to the image generating GAN. • Explore the effects of attention mechanisms in the model, both as presented in the SAGAN architecture and by proposing two novel attention mechanisms for image sequences. en
heal.advisorName Στάμου, Γεώργιος el
heal.committeeMemberName Στάμου, Γεώργιος el
heal.committeeMemberName Βουλόδημος, Αθανάσιος el
heal.committeeMemberName Σταφυλοπάτης, Ανδρέας Γεώργιος el
heal.academicPublisher Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών el
heal.academicPublisherID ntua
heal.numberOfPages 63 σ. el
heal.fullTextAvailability false

Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Αναφορά Δημιουργού 3.0 Ελλάδα Except where otherwise noted, this item's license is described as Αναφορά Δημιουργού 3.0 Ελλάδα