Advancing visual word disambiguation: A hybrid approach with large language models, transformers and introduction to novel hybrid ArPa Model

Papastavrou, Aristi; Παπασταύρου, Αρίστη

dc.contributor.author	Papastavrou, Aristi	en
dc.contributor.author	Παπασταύρου, Αρίστη	el
dc.date.accessioned	2024-04-15T11:55:43Z
dc.date.available	2024-04-15T11:55:43Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/59198
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.26894
dc.description	Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) "Επιστήμη Δεδομένων και Μηχανική Μάθηση"	el
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/gr/	*
dc.subject	Visual Word Sense Disambiguation	en
dc.subject	Multimodal Learning	en
dc.subject	Large Language Models	en
dc.subject	Πολυτροπική Μάθηση	el
dc.subject	Μοντέλο ArPa	el
dc.subject	Αποσαφήνιση Οπτικών Εννοιών	el
dc.subject	Mετασχηματιστές (Transformers)	el
dc.subject	Νευρωνικά Δίκτυα Γράφων	el
dc.subject	Μεγάλα Μοντέλα Γλώσσας	el
dc.subject	Graph Neural Networks	en
dc.subject	ArPa Model	en
dc.subject	Transformers	en
dc.title	Advancing visual word disambiguation: A hybrid approach with large language models, transformers and introduction to novel hybrid ArPa Model	en
heal.type	masterThesis
heal.classification	Machine Learning	en
heal.classification	Data Science	en
heal.classification	Artificial Intelligence	en
heal.language	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2024-03-29
heal.abstract	In the era of rapid advancements in artificial intelligence, the fusion of visual and textual data presents a compelling frontier for exploration. "Advancing Visual Word Disambiguation: A Hybrid Approach with Large Language Models, Transformers, and Introduction to Novel Hybrid ArPa Model" delves into this intriguing intersection, aiming to unravel the complexities of Visual Word Sense Disambiguation (V-WSD). This thesis proposes a hybrid approach that leverages the prowess of large language models and transformers to enhance the interpretability and integration of multimodal information, culminating in the introduction of a novel model, ArPa. At the heart of this exploration is the quest to refine the processes through which machines understand and contextualize visual and textual cues in tandem. By evaluating state-of-the-art computational models and introducing the ArPa model—a hybrid framework that marries the analytical depth of large language models like Bert with the perceptual acuity of visual transformers such as Swin Transformer, enriched further by Graph Neural Network (GNN) integration—this research seeks to set new benchmarks in multimodal understanding. The thesis embarks on a journey of experimentation and analysis, aiming to shed light on the multifaceted challenges of V-WSD. Through a meticulous examination of preprocessing techniques, including linguistic context expansion and advanced image processing, and exploring a variety of model architectures, we endeavor to optimize the synergy between textual and visual data, thereby enhancing model performance across a spectrum of V-WSD tasks. This work not only contributes novel insights and methodologies to the domain of artificial intelligence but also beckons the scientific community and technology enthusiasts alike towards a future where the seamless integration of language and vision transforms our interaction with technology. By proposing innovative approaches and unveiling the ArPa model, this thesis opens new avenues for research and application in multimodal learning, promising to enrich our digital landscape with more intelligent, nuanced, and empathetic artificial intelligence systems.	en
heal.advisorName	Stamou, Giorgos	en
heal.committeeMemberName	Voulodimos, Athanasios	en
heal.committeeMemberName	Stafylopatis, Andreas-Georgios	en
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Εργαστήριο Συστημάτων Τεχνητής Νοημοσύνης και Μάθησης	el
heal.academicPublisherID	ntua
heal.numberOfPages	86 σ.	el
heal.fullTextAvailability	false