Αλγόριθμοι για τα προβλήματα k-means και k-median

Κάβουρας, Λουκάς; Kavouras, Loukas

dc.contributor.author	Κάβουρας, Λουκάς	el
dc.contributor.author	Kavouras, Loukas	en
dc.date.accessioned	2014-10-23T07:36:19Z
dc.date.available	2014-10-23T07:36:19Z
dc.date.issued	2014-10-23
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/39349
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.5192
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/gr/	*
dc.subject	Προσεγγιστικοί αλγόριθμοι	el
dc.subject	Αλγόριθμοι ροής	el
dc.subject	Εξόρυξη δεδομένων	el
dc.subject	Υπολογιστική γεωμετρία	el
dc.subject	Συσταδοποίηση	el
dc.subject	Clustering	en
dc.subject	k-means	en
dc.subject	k-median	en
dc.subject	Approximation algorithms	en
dc.subject	Streaming algorithms	en
dc.title	Αλγόριθμοι για τα προβλήματα k-means και k-median	el
dc.title	Algorithms for the k-median problem and the k-means problem	en
heal.type	bachelorThesis
heal.classification	Μαθηματικά	el
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2014-10-06
heal.abstract	Συσταδοποίηση ονομάζουμε την διαδικασία ομαδοποιήσης ενός συνόλου αντικειμένων με τρόπο ώστε αντικείμενα στην ίδια συστάδα να μοιάζουν περισσότερο μεταξύ τους από αντικείμενα σε άλλες συστάδες. Σε αυτή την διπλωματική, εξετάζουμε τα διάσημα προβλήματα συσταδοποίησης k-means και k-median. Παρουσιάζουμε προσεγγιστικούς αλγόριθμους για τα προβλήματα στο offline και στο streaming μοντέλο.	el
heal.abstract	Clustering is the task of grouping a set of objects in such a way that objects in the same cluster are more similar to each other than to those in other clusters. It is a main task of data mining, machine learning and computational geometry. In this thesis, we discuss famous clustering problems and we emphasize on the k-means clustering problem, where one seeks to partition n observations into k clusters so as to minimize the within-cluster sum of squares. We present Lloyd's algorithm for the k-means problem, which was identified as one of the top 10 algorithms in data mining. Although Lloyd's algorithm has an exponential running time in the worst case, it usually runs fast in many practical applications. However, the algorithm gives no guarantees and there are natural examples where it may produce arbitrarily bad clusterings. k-means++ algorithm addresses this problem by augmenting Lloyd's algorithm with a simple and intuitive seeding technique. A formal proof shows that k-means++ algorithm is O(log k) competitive. We also examine the k-meansjj algorithm, which is an algorithm inspired by kmeans++ algorithm that can be effectively parallelized. In the last chapter, we consider cases where the entire input is not available from the beginning. That is, we study algorithms for k-means in the streaming model, where the data is too large to be stored in main memory and must be accessed sequentially. Finally, we study the facility location problem and discuss the online facility location algorithm of Meyerson.	en
heal.advisorName	Φωτάκης, Δημήτριος	el
heal.committeeMemberName	Ζάχος, Ευστάθιος	el
heal.committeeMemberName	Συμβώνης, Αντώνιος	el
heal.academicPublisher	Σχολή Εφαρμοσμένων Μαθηματικών και Φυσικών Επιστημών	el
heal.academicPublisherID	ntua
heal.fullTextAvailability	true