dc.contributor.author | Papastavrou, Aristi | en |
dc.contributor.author | Παπασταύρου, Αρίστη | el |
dc.date.accessioned | 2024-04-15T11:55:43Z | |
dc.date.available | 2024-04-15T11:55:43Z | |
dc.identifier.uri | https://dspace.lib.ntua.gr/xmlui/handle/123456789/59198 | |
dc.identifier.uri | http://dx.doi.org/10.26240/heal.ntua.26894 | |
dc.description | Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) "Επιστήμη Δεδομένων και Μηχανική Μάθηση" | el |
dc.rights | Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/gr/ | * |
dc.subject | Visual Word Sense Disambiguation | en |
dc.subject | Multimodal Learning | en |
dc.subject | Large Language Models | en |
dc.subject | Πολυτροπική Μάθηση | el |
dc.subject | Μοντέλο ArPa | el |
dc.subject | Αποσαφήνιση Οπτικών Εννοιών | el |
dc.subject | Mετασχηματιστές (Transformers) | el |
dc.subject | Νευρωνικά Δίκτυα Γράφων | el |
dc.subject | Μεγάλα Μοντέλα Γλώσσας | el |
dc.subject | Graph Neural Networks | en |
dc.subject | ArPa Model | en |
dc.subject | Transformers | en |
dc.title | Advancing visual word disambiguation: A hybrid approach with large language models, transformers and introduction to novel hybrid ArPa Model | en |
heal.type | masterThesis | |
heal.classification | Machine Learning | en |
heal.classification | Data Science | en |
heal.classification | Artificial Intelligence | en |
heal.language | en | |
heal.access | free | |
heal.recordProvider | ntua | el |
heal.publicationDate | 2024-03-29 | |
heal.abstract | In the era of rapid advancements in artificial intelligence, the fusion of visual and textual data presents a compelling frontier for exploration. "Advancing Visual Word Disambiguation: A Hybrid Approach with Large Language Models, Transformers, and Introduction to Novel Hybrid ArPa Model" delves into this intriguing intersection, aiming to unravel the complexities of Visual Word Sense Disambiguation (V-WSD). This thesis proposes a hybrid approach that leverages the prowess of large language models and transformers to enhance the interpretability and integration of multimodal information, culminating in the introduction of a novel model, ArPa. At the heart of this exploration is the quest to refine the processes through which machines understand and contextualize visual and textual cues in tandem. By evaluating state-of-the-art computational models and introducing the ArPa model—a hybrid framework that marries the analytical depth of large language models like Bert with the perceptual acuity of visual transformers such as Swin Transformer, enriched further by Graph Neural Network (GNN) integration—this research seeks to set new benchmarks in multimodal understanding. The thesis embarks on a journey of experimentation and analysis, aiming to shed light on the multifaceted challenges of V-WSD. Through a meticulous examination of preprocessing techniques, including linguistic context expansion and advanced image processing, and exploring a variety of model architectures, we endeavor to optimize the synergy between textual and visual data, thereby enhancing model performance across a spectrum of V-WSD tasks. This work not only contributes novel insights and methodologies to the domain of artificial intelligence but also beckons the scientific community and technology enthusiasts alike towards a future where the seamless integration of language and vision transforms our interaction with technology. By proposing innovative approaches and unveiling the ArPa model, this thesis opens new avenues for research and application in multimodal learning, promising to enrich our digital landscape with more intelligent, nuanced, and empathetic artificial intelligence systems. | en |
heal.advisorName | Stamou, Giorgos | en |
heal.committeeMemberName | Voulodimos, Athanasios | en |
heal.committeeMemberName | Stafylopatis, Andreas-Georgios | en |
heal.academicPublisher | Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Εργαστήριο Συστημάτων Τεχνητής Νοημοσύνης και Μάθησης | el |
heal.academicPublisherID | ntua | |
heal.numberOfPages | 86 σ. | el |
heal.fullTextAvailability | false |
The following license files are associated with this item: