Διαθεματικές και γνωστικές μέθοδοι για αναπαραστάσεις φυσικής γλώσσας

Αθανασίου, Νίκος; Athanasiou, Nikos

dc.contributor.author	Αθανασίου, Νίκος	el
dc.contributor.author	Athanasiou, Nikos	en
dc.date.accessioned	2020-03-30T14:22:57Z
dc.date.available	2020-03-30T14:22:57Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/49968
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.17666
dc.rights	Default License
dc.subject	Computational neuroscience	el
dc.subject	Machine learning	en
dc.subject	Multiple word embeddings	en
dc.subject	Natural language representations	en
dc.subject	Topic modelling	en
dc.subject	Υπολογιστική νευροεπιστήμη	el
dc.subject	Μηχανική μάθηση	el
dc.subject	Πολλαπλές διανυσματικές αναπαραστάσεις λέξεων	el
dc.subject	Θεματική μοντελοποίηση	el
dc.subject	Αναπαραστάσεις φυσικής γλώσσας	el
dc.title	Διαθεματικές και γνωστικές μέθοδοι για αναπαραστάσεις φυσικής γλώσσας	el
heal.type	bachelorThesis
heal.secondaryTitle	Cognitive and Cross-Topic Methods for Natural Language Representations	el
heal.classification	Machine Learning, Natural Language	en
heal.language	el
heal.language	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2019-07-05
heal.abstract	In this work we investigate Natural Language Representations by two different points of view cognitive neuroscience and topic modelling. For the evaluation of each approach, we use multiple datasets and experimental setups which follow literature's guidelines. Moreover, we evaluate our work both quantitatively and qualitatively providing useful insights and visualizations in order to make our results interpretable. First, from the angle of cognitive neuroscience we explore how brain representations can help us improve current corpus-based language representations. Neural activation models that have been proposed in the literature use a set of example words for which fMRI measurements are available in order to find a mapping between word semantics and localized neural activations. Successful mappings let us expand to the full lexicon of concrete nouns using the assumption that similarity of meaning implies similar neural activation patterns. In this paper, we propose a computational model that estimates semantic similarity in the neural activation space and investigates the relative performance of this model for various natural language processing tasks. Despite the simplicity of the proposed model and the very small number of example words used to bootstrap it, the neural activation semantic model performs surprisingly well compared to state-of-the-art word embeddings. Specifically, the neural activation semantic model performs better than the state-of-the-art for the task of semantic similarity estimation between very similar or very dissimilar words, while performing well on other tasks such as entailment and word categorization. These are strong indications that neural activation semantic models can not only shed some light into human cognition but also contribute to computation models for certain tasks. In the second part, we investigate how topic modelling can help us produce multi-prototype word embeddings and compare their performance with single-prototype models. In traditional Distributional Semantic Models (DSMs) the multiple senses of a polysemous word are conflated into a single vector space representation. In this work, we propose a DSM that learns multiple distributional representations of a word based on different topics. First, a separate DSM is trained for each topic and then each of the topic-based DSMs is aligned to a common vector space. Our unsupervised mapping approach is motivated by the hypothesis that words preserving their relative distances in different topic semantic sub-spaces constitute robust semantic anchors that define the mappings between them. Aligned cross-topic representations achieve state-of-the-art results for the task of contextual word similarity. Furthermore, evaluation on NLP downstream tasks shows that multiple topic-based embeddings outperform single-prototype models.	en
heal.advisorName	Ποταμιάνος, Αλέξανδρος	el
heal.committeeMemberName	Τζαφέστας, Κωσταντίνος	el
heal.committeeMemberName	Σταφυλοπάτης, Ανδρέας-Γεώργιος	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Σημάτων, Ελέγχου και Ρομποτικής	el
heal.academicPublisherID	ntua
heal.numberOfPages	117
heal.fullTextAvailability	true