Εξαγωγή και ανάλυση θεμάτων και συναισθημάτων σε μηνύματα του Twitter με βάση τη χωρική και τη χρονική διάσταση

Αναγνωστόπουλος, Γεώργιος; Anagnostopoulos, Georgios

dc.contributor.author	Αναγνωστόπουλος, Γεώργιος	el
dc.contributor.author	Anagnostopoulos, Georgios	en
dc.date.accessioned	2016-04-13T12:50:00Z
dc.date.available	2016-04-13T12:50:00Z
dc.date.issued	2016-04-13
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/42389
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.12033
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/gr/	*
dc.subject	Θεματική ανάλυση	el
dc.subject	Συναισθηματική ανάλυση	el
dc.subject	Αναγνώριση	el
dc.subject	Βάσεις δεδομένων	el
dc.subject	Twitter	en
dc.subject	Data mining	en
dc.subject	Topic	en
dc.subject	Sentiment	en
dc.subject	Clustering	en
dc.title	Εξαγωγή και ανάλυση θεμάτων και συναισθημάτων σε μηνύματα του Twitter με βάση τη χωρική και τη χρονική διάσταση	el
heal.type	bachelorThesis
heal.classification	Data mining	en
heal.language	el
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2015-07-29
heal.abstract	Ο σκοπός της διπλωματικής εργασίας ήταν σε πρώτη φάση η επεξεργασία κειμένων (tweets) από το Twitter για την αναγνώριση του θέματος και του συναισθήματός τους και η συσχέτισή τους με τη γεωγραφική περιοχή από την οποία δημοσιεύτηκαν και το χρόνο (ημερομηνία και ώρα) που έγινε η δημοσίευση. Έπειτα ομαδοποιήθηκαν τα tweets ανά θέμα και ανά περιοχή, και μελετήθηκε η διαφοροποίηση των θεμάτων και των συναισθημάτων των μηνυμάτων στο twitter καθώς αλλάζει η περιοχή ή/και το χρονικό παράθυρο που εξετάζεται. Φτιάχτηκε τέλος εφαρμογή η οποία παρουσιάζει στο χρήστη τα αποτελέσματα της εξαγωγής θεμάτων και συναισθημάτων από μεγάλο αριθμό tweets και το πώς αυτά διαφοροποιούνται ανά πόλη και ανά ημέρα. Συγκεκριμένα, αφού ελήφθησαν tweets από το twitter με βάση τη γεωγραφική περιοχή (Λονδίνο, Νέα Υόρκη, Λος Άντζελες) και τις ημέρες τις οποίες δημοσιεύτηκαν (08 – 14 Ιουνίου 2015), έγινε η επεξεργασία του περιεχομένου τους, ομαδοποιήθηκαν ανά θέμα και εξήχθη το συναίσθημα του κάθε tweet. Για την κάθε θεματική ομάδα εξήχθησαν οι πιο σημαντικοί- αντιπροσωπευτικοί όροι και βρέθηκε το πώς μοιάζει με τις υπόλοιπες στην ίδια και σε διαφορετικές πόλεις. Τελικά, κάθε θεματική ομάδα συνδεδεμένη με τα tweets τα οποία περιέχει και με τις υπόλοιπες ομάδες με τις οποίες μοιάζει, εισήχθη σε βάση δεδομένων. Το συναίσθημα για ένα θέμα υπολογίζεται αθροιστικά από τα συναισθήματα του κάθε tweet που σχετίζεται με αυτό. Αναπτύχθηκε εφαρμογή web η οποία επικοινωνεί με τη βάση δεδομένων και παρουσιάζει τα αποτελέσματα της επεξεργασίας στο χρήστη, ο οποίος έχει δυνατότητα να προσδιορίσει το τι θέλει να δει κάθε φορά. Στα αποτελέσματα που παρουσιάζονται από τη web εφαρμογή είναι φανερό το πώς αλλάζουν τα θέματα για τα οποία μιλούν οι χρήστες ανά πόλη και ανά μέρα, και ποια η διαφορά στους όρους του ίδιου θέματος μεταξύ διαφορετικών πόλεων. Ακόμα παρατηρείται το συναίσθημα που μπορεί να διακατέχει τους χρήστες όταν γράφουν (<<τουιτάρουν>>) για κάποιο θέμα που τους επηρεάζει.	el
heal.abstract	The purpose of this thesis was the processing of texts (tweets) from Twitter so as to determine their topics and their sentiment and to connect them with the geographical area from which they were published and the time (date and time) the publication was done. Then the tweets were clustered according to their topics and their areas, and the diversification of their topics and sentiments was studied as the area and the timeframe changed. In the end, an application was created that presents the results of the processing of a great number of tweets to the user, and how these results differ by city and by day. More precisely, after collecting tweets from Twitter based on their geographical area (London, New York, Los Angeles) and the days on which they were published (June 8th to June 14th 2015), their content was processed, they were clustered by subject and the sentiment of each individual tweet was determined. For every subject group, the most important-representative terms were extracted and then the similarity of each group to the others was calculated for different days and different cities. Finally, each cluster along with the tweets assigned to it and the other clusters to whom it is similar, were inserted in a database. The sentiment of a topic is calculated cumulatively from the sentiment of each related tweet. A web application was developed that communicates with the database and presents the results of the processing to the user, who may choose what specific information he wants to see. From the results presented by the web application, the user can see how the topics change between different cities and days, and what the differences in the same topic are each time. Furthermore, he can get information about the sentiment of the users tweeting about a subject that concerns them.	en
heal.advisorName	Βασιλείου, Ιωάννης	el
heal.committeeMemberName	Κοντογιάννης, Κωνσταντίνος	el
heal.committeeMemberName	Σταύρακας, Ιωάννης	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών	el
heal.academicPublisherID	ntua
heal.numberOfPages	68 σ.
heal.fullTextAvailability	true