Ανάπτυξη ρομπότ συνομιλίας με τεχνικές μηχανικής μάθησης

Τσοκαναρίδου, Μυρτώ; Tsokanaridou, Myrto

dc.contributor.author	Τσοκαναρίδου, Μυρτώ	el
dc.contributor.author	Tsokanaridou, Myrto	en
dc.date.accessioned	2019-10-08T09:29:38Z
dc.date.available	2019-10-08T09:29:38Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/49270
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.16968
dc.rights	Default License
dc.subject	Ρομπότ συνομιλίας	el
dc.subject	Chatbot	en
dc.subject	Νευρωνική μηχανική μετάφραση	el
dc.subject	Θεματικά ενήμερο ρομπότ συνομιλίας	el
dc.subject	Ιεραρχική συσταδοποίηση	el
dc.subject	Latent dirichlet allocation	en
dc.subject	Neural machine translation	en
dc.subject	Topic-informed chatbot	en
dc.subject	Hierarchical agglomerative clustering	en
dc.title	Ανάπτυξη ρομπότ συνομιλίας με τεχνικές μηχανικής μάθησης	el
dc.title	Chatbot development using machine learning techniques	en
heal.type	bachelorThesis
heal.classification	Μηχανική μάθηση	el
heal.classification	Machine learning	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2019-07-10
heal.abstract	Η δυνατότητα ενός υπολογιστή να διεξάγει διάλογο σαν ανθρώπινο πρόσωπο είναι μία από τις μεγαλύτερες προκλήσεις -αν όχι η μεγαλύτερη- που αντιμετωπίζει ο τομέας της Τεχνητής Νοημοσύνης. Στην παρούσα εργασία, έχοντας υπ’ όψιν το πόσο πολύπλοκο και ευρύ είναι αυτό το ζήτημα, επιχειρήθηκε η δημιουργία ενός ρομπότ συνομιλίας (chatbot) η λειτουργία του οποίου βασίζεται αποκλειστικά σε τεχνικές μηχανικής μάθησης. Η αρχική δομή νευρωνικών δικτύων που χρησιμοποιήθηκε για την παραγωγή του chatbot είναι αυτή που χρησιμοποιείται για την κατασκευή συστήματος νευρωνικής μηχανικής μετάφρασης. Στην περίπτωση της μετάφρασης, ως σύνολο δεδομένων χρησιμοποιούνται ζεύγη προτάσεων σε δύο διαφορετικές γλώσσες με την δεύτερη να αποτελεί μετάφραση της πρώτης, ούτως ώστε μετά την εκπαίδευση το σύστημα να είναι ικανό να παράγει αρκετά ικανοποιητικές μεταφράσεις. Εμείς, από την άλλη, τροφοδοτούμε το σύστημα με ζεύγη προτάσεων στην ίδια γλώσσα, με την δεύτερη να αποτελεί απάντηση στην πρώτη. Φυσικά, μια πρόταση έχει -αν όχι μία μετάφραση- πάντως περιορισμένο αριθμό μεταφράσεων, ενώ οι απαντήσεις που μπορούν να δοθούν σε μία πρόταση είναι άπειρες. Επομένως, αφού η διεξαγωγή διαλόγου δεν έχει την κανονικότητα της παραγωγής μετάφρασης, δεν έχουμε την προσδοκία το σύστημα που εκπαιδεύσαμε να ανταποκρίνεται εξίσου ικανοποιητικά. Για να βελτιώσουμε, λοιπόν, την απόδοση του, και χάρις στην επάρκεια των δεδομένων (περίπου 9.000.000 ζεύγη σχολίων από το reddit) εργαστήκαμε για την βελτίωση της απόδοσης του αποσκοπώντας στο να μείνει "εντός θέματος" στις συζητήσεις του. Αυτό το επιτύχαμε αξιοποιώντας την πληροφορία για την θεματική υπο-ενότητα (subreddit) στην οποία ανήκει το κάθε ζεύγος σχολίων την οποία αποδώσαμε κατάλληλα στο σύστημα με μεθόδους machine learning. Συγκεκριμένα, πραγματοποιήσαμε δύο προσεγγίσεις , μια με Hierarchical Agglomerative Clustering (HAC) και μία με Latent Dirichlet Allocation (LDA), με αποτέλεσμα την ταχύτερη σύγκλιση και στις δύο περιπτώσεις.	el
heal.abstract	The ability of a computer to engage in dialogue as a human being is one of the greatest challenges - if not the greatest - faced by the field of Artificial Intelligence. In this work, having in mind the complexity and broadness of this issue, we attempted to create a chatbot whose function is based solely on machine learning techniques. The original neural network structure used to generate the chatbot is the one used to construct a neural machine translation system. In the case of translation, as a set of data, pairs of sentences in two different languages are used, with the second being a translation of the first so that the trained the system is able to produce quite satisfactory translations. On the other hand, in this work, we provide the system with pairs of sentences in the same language, with the second being the answer to the first. Of course, a sentence has - but not just one translation - a limited number of translations, while the answers that can be given to a certain sentence are infinite. Therefore, since dialogue has not the regularity of translation, we have no expectation that the system we have been training will respond equally satisfactorily. So to improve performance, and due to the sufficiency of the data (about 9,000,000 pairs of comments from reddit), we have worked to improve its performance by aiming to stay on - topic in its discussions. This was accomplished using the information about the subreddit that includes each pair of comments that we appropriately attributed to the system using machine learning techniques. In particular, we have made two approaches, one with Hierarchical Agglomerative Clustering (HAC) and one with Latent Dirichlet Allocation (LDA), resulting in faster convergence in both cases.	en
heal.advisorName	Στάμου, Γεώργιος	el
heal.committeeMemberName	Στάμου, Γεώργιος	el
heal.committeeMemberName	Σταφυλοπάτης, Ανδρέας-Γεώργιος	el
heal.committeeMemberName	Παπασπύρου, Νικόλαος	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών	el
heal.academicPublisherID	ntua
heal.numberOfPages	82 σ.
heal.fullTextAvailability	true