Σημασιολογική συσταδοποίηση αντικειμένων με χρήση οντολογικών περιγραφών

Χατζηθεοδώρου, Μάνος; Chatzitheodorou, Manos

dc.contributor.author	Χατζηθεοδώρου, Μάνος	el
dc.contributor.author	Chatzitheodorou, Manos	en
dc.date.accessioned	2016-04-26T11:05:55Z
dc.date.available	2016-04-26T11:05:55Z
dc.date.issued	2016-04-26
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/42449
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.11450
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/gr/	*
dc.subject	Οντολογία	el
dc.subject	Εξαγωγή χαρακτηριστικών	el
dc.subject	Λημματοποίηση	el
dc.subject	Συσταδοποίηση	el
dc.subject	Αλγόριθμος Κ-μέσων	el
dc.subject	Ontology	en
dc.subject	Feature extraction	en
dc.subject	Lemmatization	en
dc.subject	Clustering	en
dc.subject	K-means algorithm	en
dc.title	Σημασιολογική συσταδοποίηση αντικειμένων με χρήση οντολογικών περιγραφών	el
heal.type	bachelorThesis
heal.classification	Συστήματα και τεχνολογίες γνώσης	el
heal.classification	Cluster analysis	en
heal.classification	Ontologies (Information retrieval)	en
heal.classificationURI	http://lod.nal.usda.gov/127679
heal.classificationURI	http://id.loc.gov/authorities/subjects/sh2005006014
heal.language	el
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2015-07-28
heal.abstract	Ο σκοπός της διπλωματικής εργασίας ήταν η μελέτη των τρόπων με τους οποίους μια οντολογία μπορεί να χρησιμοποιηθεί ως πηγή χαρακτηριστικών για τη συσταδοποίηση ενός συνόλου δεδομένων. Τα δεδομένα αυτά συγκεκριμένα αφορούσαν το σενάριο μιας ταινίας. Η συσταδοποίηση έγινε για τα πλάνα της ταινίας με τα χρήση διαφορετικών τεχνικών, από τις οποίες επιλέχθηκε η αποτελεσματικότερη για την παραγωγή του τελικού αποτελέσματος. Συγκεκριμένα, τα δεδομένα θεωρήθηκαν ότι είναι εν γένει ημιδομημένα, δηλαδή ότι είναι εν μέρει κατηγοριοποιημένα βάσει κάποιας οντολογίας, αλλά ότι μπορεί και να περιέχουν και κάποια επιπλέον, αδόμητη κειμενική πληροφορία που να αποτελεί δυνητική πηγή επιπλέον χαρακτηριστικών. Αρχικά υπολογίστηκε ο κατάλληλος χώρος αναπαράστασης των χαρακτηριστικών που προέρχονται από τις δύο διαφορετικές πηγές. Για την παραγωγή των χαρακτηριστικών που αφορούν την οντολογία έγινε χρήση τεχνικών συλλογιστικής, καθώς κάποια από αυτά ήταν ρητώς δηλωμένα, άλλα όμως αποτελούσαν υπονοούμενη γνώση και δεν ήταν άμεσα διαθέσιμα. Για την παραγωγή των χαρακτηριστικών που βρίσκονταν σε μορφή αδόμητης πληροφορίας, έγινε λημματοποίηση και αφαίρεση αδιάφορων λέξεων στα κειμενικά τμήματα του σεναρίου. Με βάση τον χώρο των παραπάνω χαρακτηριστικών, πραγματοποιήθηκε η συσταδοποίηση με τον αλγόριθμο k-means. Τα αποτελέσματα οργανώθηκαν σε βάση δεδομένων και παρουσιάσθηκαν μέσω ιστοσελίδας. Η μελέτη αυτή μπορεί να επεκταθεί εύκολα και σε άλλα δεδομένα, με μικρές μετατροπές, ανάλογα με τη μορφή αναπαράστασης των δεδομένων και της οντολογίας, καθώς η διαδικασία έχει καταγραφεί αναλυτικά και είναι πλήρως παραμετροποιημένη.	el
heal.abstract	The scope of this thesis was the study of the ways in which an ontology can be used as the source of features for the clustering of a data set. This data set was, specifically, the script of a movie. The clustering was executed for the shots of the movie using different techniques. The final results were produced using the most efficient of those techniques. Specifically, the data were considered to be, as a whole, semi-structured. That is, they are partially classified based on an ontology, but they can also contain some extra, unstructured textual information that constitutes a potential source of additional features. Initially, we calculated the appropriate representation space for the features that originate from both sources. For the extraction of the features that pertain to the ontology we used reasoning techniques, since some of them were explicitly stated, but others were implied knowledge and weren’t directly available. For the extraction of the features contained in the unstructured information, we used lemmatization and stop words removal techniques on the textual parts of the script. Based on the space of the aforementioned features, we did the clustering using the k-means algorithm. The results were organized in a database and were presented through a website. This study can be easily expanded to include different data sets, with few modifications, depending on the representation format of the data and the ontology, since the procedure is thoroughly specified and fully parametrized.	en
heal.advisorName	Στάμου, Γεώργιος	el
heal.committeeMemberName	Κόλλιας, Στέφανος	el
heal.committeeMemberName	Σταφυλοπάτης, Ανδρέας-Γεώργιος	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών	el
heal.academicPublisherID	ntua
heal.numberOfPages	99 σ.	el
heal.fullTextAvailability	true