HEAL DSpace

Advancing visual word disambiguation: A hybrid approach with large language models, transformers and introduction to novel hybrid ArPa Model

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Papastavrou, Aristi en
dc.contributor.author Παπασταύρου, Αρίστη el
dc.date.accessioned 2024-04-15T11:55:43Z
dc.date.available 2024-04-15T11:55:43Z
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/59198
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.26894
dc.description Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) "Επιστήμη Δεδομένων και Μηχανική Μάθηση" el
dc.rights Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/gr/ *
dc.subject Visual Word Sense Disambiguation en
dc.subject Multimodal Learning en
dc.subject Large Language Models en
dc.subject Πολυτροπική Μάθηση el
dc.subject Μοντέλο ArPa el
dc.subject Αποσαφήνιση Οπτικών Εννοιών el
dc.subject Mετασχηματιστές (Transformers) el
dc.subject Νευρωνικά Δίκτυα Γράφων el
dc.subject Μεγάλα Μοντέλα Γλώσσας el
dc.subject Graph Neural Networks en
dc.subject ArPa Model en
dc.subject Transformers en
dc.title Advancing visual word disambiguation: A hybrid approach with large language models, transformers and introduction to novel hybrid ArPa Model en
heal.type masterThesis
heal.classification Machine Learning en
heal.classification Data Science en
heal.classification Artificial Intelligence en
heal.language en
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2024-03-29
heal.abstract In the era of rapid advancements in artificial intelligence, the fusion of visual and textual data presents a compelling frontier for exploration. "Advancing Visual Word Disambiguation: A Hybrid Approach with Large Language Models, Transformers, and Introduction to Novel Hybrid ArPa Model" delves into this intriguing intersection, aiming to unravel the complexities of Visual Word Sense Disambiguation (V-WSD). This thesis proposes a hybrid approach that leverages the prowess of large language models and transformers to enhance the interpretability and integration of multimodal information, culminating in the introduction of a novel model, ArPa. At the heart of this exploration is the quest to refine the processes through which machines understand and contextualize visual and textual cues in tandem. By evaluating state-of-the-art computational models and introducing the ArPa model—a hybrid framework that marries the analytical depth of large language models like Bert with the perceptual acuity of visual transformers such as Swin Transformer, enriched further by Graph Neural Network (GNN) integration—this research seeks to set new benchmarks in multimodal understanding. The thesis embarks on a journey of experimentation and analysis, aiming to shed light on the multifaceted challenges of V-WSD. Through a meticulous examination of preprocessing techniques, including linguistic context expansion and advanced image processing, and exploring a variety of model architectures, we endeavor to optimize the synergy between textual and visual data, thereby enhancing model performance across a spectrum of V-WSD tasks. This work not only contributes novel insights and methodologies to the domain of artificial intelligence but also beckons the scientific community and technology enthusiasts alike towards a future where the seamless integration of language and vision transforms our interaction with technology. By proposing innovative approaches and unveiling the ArPa model, this thesis opens new avenues for research and application in multimodal learning, promising to enrich our digital landscape with more intelligent, nuanced, and empathetic artificial intelligence systems. en
heal.advisorName Stamou, Giorgos en
heal.committeeMemberName Voulodimos, Athanasios en
heal.committeeMemberName Stafylopatis, Andreas-Georgios en
heal.academicPublisher Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Εργαστήριο Συστημάτων Τεχνητής Νοημοσύνης και Μάθησης el
heal.academicPublisherID ntua
heal.numberOfPages 86 σ. el
heal.fullTextAvailability false


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα Except where otherwise noted, this item's license is described as Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα