Detection of causality relations in plain text with the use of word embeddings

Μπάστας, Γρηγόριος; Bastas, Grigorios

dc.contributor.author	Μπάστας, Γρηγόριος	en
dc.contributor.author	Bastas, Grigorios	en
dc.date.accessioned	2018-03-05T10:00:16Z
dc.date.available	2018-03-05T10:00:16Z
dc.date.issued	2018-03-05
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/46627
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.15245
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/gr/	*
dc.subject	Nlp	en
dc.subject	Language	el
dc.subject	Neural networks	el
dc.subject	Word embeddings	el
dc.subject	Causality	el
dc.title	Detection of causality relations in plain text with the use of word embeddings	en
heal.type	bachelorThesis
heal.classification	Επεξεργασία φυσικής γλώσσας	el
heal.language	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2017-09-27
heal.abstract	Causality detection is one of the most challenging topics in NLP. In this project we tried to cope with this open problem by employing training methods focused on the creation of vector representations of french words. While we only worked on the problem of causality detection in the French language, our methodology is applicable in many other cases thanks to its generality. Our whole project can be separated into three major tasks. The first task pertains to the creation of our training data through the automatic extraction of cause-effect tuples from a syntactically annotated French corpus. For this purpose, we collected non-ambiguous lexical units from the ASFALDA French FrameNet, that denote causality relations. We, therefore, extracted tuples of meaningful sets of words that represent either the cause or the effect of the captured frame. To achieve all of this, we took advantage of the dependency tree of each sentence and the part-of-speech tag of each word. The second task deals with the computational processing of our training data extracted in the previous task, in order to create causal word embeddings based on cause-effect context similarity. At this stage, the cause-effect tuples created in the first task are treated in an innovative manner as the training data set for the models Word2vec, SVD and NMF, in such a way as to create causal embeddings. The third task is about the evaluation of our models. We compared the causal proximity of cause-effect word pairs by comparing the dot product and cosine similarity of the embeddings stored in the input matrix and the embeddings stored in the output matrix of our models. For the evaluation, we use the SemEval Task8 test data (partially translated in French).	en
heal.sponsor	Μέσω χορήγησης υποτροφίας για πραγματοποίηση της εργασίας στο ερευνητικό κέντρο IRIT - Université Toulouse III - Paul Sabatier από το πρόγραμμα ανταλλαγής φοιτητών Erasmus+	el
heal.advisorName	Σταφυλοπάτης, Ανδρέας-Γεώργιος	el
heal.committeeMemberName	Σταφυλοπάτης, Ανδρέας-Γεώργιος	el
heal.committeeMemberName	Στάμου, Γεώργιος	el
heal.committeeMemberName	Τσανάκας, Παναγιώτης	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών	el
heal.academicPublisherID	ntua
heal.numberOfPages	97 σ.
heal.fullTextAvailability	true