Εφαρμογή Μηχανικής Μάθησης στην Ανάλυση Άποψης Κειμένων στον Θεματικό Τομέα των Τουριστικών Επιχειρήσεων

Γκιόκας, Οδυσσέας; Γκιόκας, Κωνσταντίνος

dc.contributor.author	Γκιόκας, Οδυσσέας	el
dc.contributor.author	Γκιόκας, Κωνσταντίνος	el
dc.date.accessioned	2015-03-24T08:48:36Z
dc.date.available	2015-03-24T08:48:36Z
dc.date.issued	2015-03-24
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/40486
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.8052
dc.rights	Default License
dc.subject	Μηχανική Μάθηση	el
dc.subject	Machine Learning	en
dc.subject	Ανάλυση Άποψης	el
dc.subject	Κρυφό Μοντέλο Markov	el
dc.subject	Sentiment Analysis	en
dc.subject	Aspect Based Sentiment Analysis	en
dc.subject	Naive Bayes	en
dc.subject	Hidden Markov Model	en
dc.subject	Κατηγοριοποίηση βασισμένη στην Άποψη	el
dc.subject	Ανάλυση Άποψης βασισμένη σε χαρακτηριστικά	el
dc.title	Εφαρμογή Μηχανικής Μάθησης στην Ανάλυση Άποψης Κειμένων στον Θεματικό Τομέα των Τουριστικών Επιχειρήσεων	el
dc.accessRights	Gkiokas, Odysseas	en
dc.accessRights	Gkiokas, Konstantinos	en
heal.type	bachelorThesis
heal.classification	Machine learning	el
heal.classificationURI	http://id.loc.gov/authorities/subjects/sh85079324
heal.language	el
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2015-03-06
heal.abstract	Η παρούσα διπλωματική πραγματεύεται το πρόβλημα του Sentiment Analysis δηλαδή την αυτόματη κατηγοριοποίηση ενός κειμένου ως θετικό ή αρνητικό με γνώμονα την άποψη του συγγραφέα πάνω στο θέμα του κειμένου και ιδιαίτερα σε κείμενα που αφορούν κριτικές ξενοδοχείων. Αφού οριστεί το θεωρητικό υπόβαθρο του προβλήματος, επιλέγονται τρεις μέθοδοι (αλγόριθμοι) για να εκπαιδευτεί ένα σύστημα σε αυτήν την αυτόματη κατηγοριοποίηση. Συγκεκριμένα επιλέγονται ο αλγόριθμος Naive Bayes, μία τροποποίηση του Hidden Markov Model που ονομάζεται Lexicalised Hidden Markov Model Integrating Part-of-Speech και Νευρωνικά Δίκτυα. Στον αλγόριθμο Naive Bayes δοκιμάσαμε μερικές παραλλαγές του, με διαφοροποιήσεις κάθε φορά στο ποιες λέξεις συμπεριλαμβάνονται αναφορικά με την συχνότητα τους και το μήκος τους, αν χρησιμοποιούνταν σκέτες λέξεις ή και n-grams και το πως γινόταν ο χωρισμός των λέξεων μεταξύ τους. Στην τροποποίηση του Hidden Markov Model επιλέχθηκε ένα σύνολο από χαρακτηριστικά (tags) και εφαρμόστηκε ένα πιο ειδικό πεδίο του Sentiment Analysis, το Aspect Based Sentiment Analysis. Στα Νευρωνικά Δίκτυα εφαρμόστηκαν υλοποιήσεις για one-layer και three-layer perceptrons και έγιναν πειράματα με διαφορετικές τιμές στις παραμέτρους του μοντέλου ώστε να επιτευχθούν τα καλύτερα αποτελέσματα. Για να δοκιμαστούν αυτοί οι αλγόριθμοι χρησιμοποιήθηκαν κυρίως κριτικές ξενοδοχείων οι οποίες αντλήθηκαν από το booking.com και παρατίθενται για αναφορά, αλλά και το ευρέως χρησιμοποιούμενο στο Sentiment Analysis σύνολο δεδομένων από κριτικές ταινιών του imdb.com των Pang και Lee. Ως συμπέρασμα καταλήξαμε ότι για να επιτευχθεί μία πολύ ικανοποιητική απόδοση και ακρίβεια από το μοντέλο σε κριτικές ξενοδοχείων είναι αρκετός ένας απλός αλγόριθμος όπως ο Naive Bayes με την ακρίβεια προβλέψεων να φτάνει μέχρι και 94.5%. Αν απαιτείται ανάλυση υποχαρακτηριστικών του κειμένου τότε μπορεί να χρησιμοποιηθεί το Hidden Markov Model αλλά με χαμηλότερη ακρίβεια προβλέψεων. Τα Νευρωνικά Δίκτυα δείχνουν να μην ξεπερνούν σε ακρίβεια τον αλγόριθμο Naive Bayes παρά τη δυσκολία χρήσης τους.	el
heal.abstract	In this dissertation we consider the problem of Sentiment Analysis, which refers to automatically classifying documents as positive or negative with regards to the writer’s opinion on the central subject of the document and especially we consider the application of the problem in hotel reviews. After the theoretical background is specified, three distinct methods (algorithms) are chosen to train a system to perform such an automatic classification. Specifically, the chosen algorithms are Naive Bayes, a variation of a Hidden Markov Model called Lexicalised Hidden Markov Model Integrating Part-of-Speech and Neural Networks. Considering Naive Bayes, we tried different versions of the algorithm, differentiating by which words are allowed with regards to their frequency and length, by whether single words were used or n-grams were included and the way the words were actually split. In the variation of Hidden Markov Model, we chose a set of features (tags) and we considered the more specific field of Sentiment Analysis called Aspect Based Sentiment Analysis. In Neural Networks we structured both one layer and three layer perceptrons and experiments were conducted whilst tweaking the parameters of the system to achieve the best possible results. In order to test those algorithms we used mainly hotel reviews that were scraped of booking.com website and are available for reference, but additionally we used the dataset that is the most popular in Sentiment Analysis, the dataset of movie reviews from imdb.com website by Pang and Lee. In the end, we concluded that in order to achieve a good performance and precision, a simple algorithm like Naive Bayes is sufficient with precision percentages reaching the number of 94.5%. If an aspect based analysis on the text is required then a Hidden Markov Model is advised though the precision will be lower. Neural Networks seem to not exceed Naive Bayes' performance, even though they are harder to use.	en
heal.advisorName	Κόλλιας, Στέφανος	el
heal.committeeMemberName	Κόλλιας, Στέφανος	el
heal.committeeMemberName	Σταφυλοπάτης, Ανδρέας-Γεώργιος	el
heal.committeeMemberName	Στάμου, Γεώργιος	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών	el
heal.academicPublisherID	ntua
heal.numberOfPages	92 σ.
heal.fullTextAvailability	true