Twitter bot detection using graph convolutional networks

Κομβόπουλος, Ελευθέριος; Komvopoulos, Eleftherios

dc.contributor.author	Κομβόπουλος, Ελευθέριος	el
dc.contributor.author	Komvopoulos, Eleftherios	en
dc.date.accessioned	2025-01-22T11:51:14Z
dc.date.available	2025-01-22T11:51:14Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/60919
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.28615
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/gr/	*
dc.subject	Twitter	en
dc.subject	Bot	en
dc.subject	Detection	en
dc.subject	Natural Language Process	en
dc.subject	Graph Convolutional Networks	en
dc.subject	Νευρωνικά Δίκτυα	el
dc.subject	Μηχανική Μάθηση	el
dc.subject	Γράφοι	el
dc.title	Twitter bot detection using graph convolutional networks	en
heal.type	bachelorThesis
heal.secondaryTitle	Ανίχνευση bots στο Twitter με χρήση Συνελικτικών Δικτύων Γραφημάτων	el
heal.classification	Machine Learning	en
heal.language	el
heal.language	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2024-07-15
heal.abstract	Η διπλωματική αυτή εργασία πραγματεύεται την ανάπτυξη ενός συστήματος μηχανικής μάθησης, με στόχο την ανίχνευση ψευδών λογαριασμών στην πλατφόρμα του Twitter. Το Twitter αποτελεί ένα μέσο κοινωνικής δικτύωσης που επιτρέπει την αλληλεπίδραση των χρηστών μέσω σύντομων δημοσιεύσεων που ονομάζονται tweets. Το κυρίαρχο πλεονέκτημα της συγκεκριμένης εφαρμογής είναι η ελεύθερη παράθεση απόψεων και ιδεών, οι οποίες μάλιστα έχουν τη δυνατότητα να οργανωθούν και να ομαδοποιηθούν με βάση το θέμα και τους συμμετέχοντες. Με αυτόν τον τρόπο δημιουργούνται νήματα και λίστες βαθιών και εκτενών, ή και μη, συζητήσεων με χαρακτηριστικά όπως τα likes, τα mentions, τα replies και τα hashtags να πρωταγωνιστούν. Όλα όσα αναφέραμε ως θετικές δυνατότητες της πλατφόρμας του Twitter, έχουν την έμφυτη τάση να μετατρέπονται ανά πάσα στιγμή σε κρίσιμα μειονεκτήματά του, όχι προς το ίδιο το μέσο, αλλά φυσικά για τους ανθρώπους που το αξιοποιούν. Συγκεκριμένα, η δύναμη της επιρροής, που ένα τέτοιο μέσο προσεγγίζει, δε θα μπορούσε να μη συνοδευτεί από ζητήματα ασφάλειας και πιστότητας, όσον αφορά τις ειδήσεις και τις ιδέες που εκπέμπει. Με απλά λόγια, το Twitter, εδώ και πολλά χρόνια, έχει αποτελέσει ένα μέσο στρατευμένης διακίνησης απόψεων και ιδεών, με στόχο την κατεύθυνση ή και την παραπλάνηση μεγάλων ομάδων ανθρώπων, για κακοπροαίρετους σκοπούς. Η ελευθερία και η ανεξαρτησία μετατρέπεται έμμεσα σε υποδόρια χειραγώγηση και η ανάγκη για περιορισμό της εξάπλωσης των fake news και των bot λογαριασμών κρίνεται κάτι παραπάνω από επιτακτική. Η εργασία αρχικά καταπιάνεται με όλο το θεωρητικό υπόβαθρο των μοντέλων που απασχολούν το συγκεκριμένο πρόβλημα, αλλά και με τις πιο θεμελιώδεις έννοιες της Τεχνητής Νοημοσύνης και Μηχανικής Μάθησης, ενώ στη συνέχεια περιγράφει όλες τις κατηγορίες προϋπάρχουσων μεθόδων που επιχείρησαν να δώσουν λύση. Τέλος, στην αρχιτεκτονική που υλοποιούμε και προτείνουμε, αξιοποιούμε πολυτροπικές μεθόδους επεξεργασίας δεδομένων, τις οποίες εν τέλει συνδυάζουμε προκειμένου να καταλήξουμε στις τελικές προβλέψεις. Κύριοι πρωταγωνιστές του συστήματος είναι τα κατάλληλα προσαρμοσμένα Graph Convolutional Networks, τα οποία μεταφέρουν πληροφορία και εκτελούν αλληλεπιδράσεις στις γειτονιές των χρηστών, καθιστώντας τις σχέσεις μεταξύ τους πλήρως καθοριστικές. Τέλος, συγκρίνουμε τις επιδόσεις του συστήματός μας με τις προγενέστερες της επιστημονικής κοινότητας, τονίζουμε τα δυνατά της σημεία αλλά και επισημάνουμε κάποιες μελλοντικές βελτιωτικές κινήσεις, οι οποίες μπορούν να εκτοξεύσουν περισσότερο την ακρίβεια, τη βαθύτητα αλλά και την ανθεκτικότητα του μοντέλου μας.	el
heal.abstract	This thesis aims to the development of a bot detection model on Twitter, using Graph Convolutional Networks. Twitter is one of the most famous social media platforms, counting more than 350 million global users. Although Twitter was initially built in the sphere of communication, like the rest social media applications, its latest purposes concern the fields of information and advertising. Specifically, Twitter nowadays disposes a heavy impact in the information and the spread of ideas and opinions, which are mainly connected with socio-political issues, and organized with “hashtags”. As a result, the individual desire for strategic promotion of fake news, led Twitter to experience the rise of copious “bot” accounts, generated by automated software. The detection of those non-genuine accounts is vital as the need for their limitation and elimination is urgent. The most dangerous part in the detection of those bot accounts is the fact that they are not static and indolent entities, but they progressively adapt their behavior with divergent characteristics. These characteristics, known as “user features”, usually include profile information, interaction with other accounts, and tweets. In other words, while the implemented models attempt to distinguish the authenticity of Twitter users, the bots dynamically evolve their actions and presence in the Twitter community, shaping a confusing landscape in the detection procedure. There is a variety of existing deployed systems with dedication for users’ classification, utilizing multiple techniques in their models. Some of them focus more in user’s data and statistics, while others process only the tweets or the interaction links among the users. Both simple algorithms for clustering and machine learning methods have managed to achieve remarkable results, whereas the accuracy was better increased when multimodal perception of data and Natural Language Process came to the surface. However, the successful generalization of the parameters of the problem remains an open demanding question. Our approach is based on three essential principles. At first, we analyse the user stats and appropriately create a balanced and realistic community graph. Secondly, we split the total user features into four fundamental categories and process them separately. Finally, we highlight the procedure of combining the results of the four Graph Convolutional Networks before exposing the results. Our system provides heterogeneity in data retrieval and processing, while it also underscores divergence’s inclusiveness and scalability for further future versions. All in all, our model achieves high metrics in comparison with other state-of-art architectures.	en
heal.advisorName	Ασκούνης, Δημήτριος	el
heal.committeeMemberName	Ψαρράς, Ιωάννης	el
heal.committeeMemberName	Μαρινάκης, Ευάγγελος	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Ηλεκτρικών Βιομηχανικών Διατάξεων και Συστημάτων Αποφάσεων. Εργαστήριο Συστημάτων Αποφάσεων και Διοίκησης	el
heal.academicPublisherID	ntua
heal.numberOfPages	116 σ.	el
heal.fullTextAvailability	false