Τεχνικές ομαδοποίησης και κοντινότερου γείτονα για οπτική αναζήτηση εικόνων

Καλαντίδης, Ιωάννης; Kalantidis, Yannis

dc.contributor.author	Καλαντίδης, Ιωάννης	el
dc.contributor.author	Kalantidis, Yannis	en
dc.date.accessioned	2014-12-16T10:40:41Z
dc.date.available	2014-12-16T10:40:41Z
dc.date.issued	2014-12-16
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/39939
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.1411
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/gr/	*
dc.subject	ομαδοποίηση	el
dc.subject	αναζήτηση εικόνων	el
dc.subject	αναζήτηση κοντινοτερου γείτονα	el
dc.subject	οπτική αναζήτηση	el
dc.subject	αναζήτηση μεγάλης κλίμακας	el
dc.subject	clustering	en
dc.subject	image retrieval	en
dc.subject	computer vision	en
dc.subject	visual search	en
dc.subject	nearest neighbor search	en
dc.title	Τεχνικές ομαδοποίησης και κοντινότερου γείτονα για οπτική αναζήτηση εικόνων	el
dc.title	Clustering and Nearest Neighbor methods for visual search	en
dc.contributor.department	Image Video and Multimedia Lab	el
heal.type	doctoralThesis
heal.classification	Content-based image retrieval	el
heal.classification	Information retrieval	el
heal.classification	Nearest neighbor analysis (Statistics)	el
heal.classificationURI	http://id.loc.gov/authorities/subjects/sh2008009943
heal.classificationURI	http://skos.um.es/unescothes/C01986
heal.classificationURI	http://id.loc.gov/authorities/subjects/sh88000645
heal.language	el
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2014-11-07
heal.abstract	Στην παρούσα εργασία προτείνονται βελτιώσεις στην οπτική αναζήτηση εικόνων, με τεχνικές που βασίζονται κυρίως σε ομαδοποίηση. Η ομαδοποίηση εκτελείται είτε στο χώρο των χαρακτηριστικών είτε στο χώρο των εικόνων, σε πολυδιάστατους διανυσματικούς ή μετρικούς χώρους, αντίστοιχα. Αρχικά προτείνουμε μια νέα, γενικότερη μέθοδο ομαδοποίησης, η οποία συνδυάζει την περιγραφική δύναμη των μοντέλων μείγματος κανονικών κατανομών με τις ιδιότητες που απαιτούνται κατά την κατασκευή μεγάλης κλίμακας οπτικών λεξικών για αναζήτηση εικόνων. Είναι μια παραλλαγή του αλγορίθμου expectation-maximization που μπορεί να συγκλίνει γρήγορα, ενώ παράλληλα μπορεί να εκτιμήσει δυναμικά τον τελικό αριθμό των συνιστωσών. Επιστρατεύουμε τεχνικές προσεγγιστικών κοντινότερων γειτόνων για την επιτάχυνση του E-step του αλγορίθμου EM και εκμεταλλευόμαστε την επαναληπτική του φύση για να κάνουμε την αναζήτηση αυξητική, βελτιώνοντας την ταχύτητα αλλά και την ακρίβεια. Καταλήγουμε να έχουμε απόδοση υψηλότερη από το state of the art της αναζήτησης σε μεγάλες βάσεις εικόνων, ενώ είμαστε ταυτόχρονα το ίδιο γρήγοροι με τις πλέον γρήγορες γνωστές τεχνικές κατασκευής οπτικών λεξικών. Έπειτα, παρουσιάζουμε μια νέα μέθοδο για αναζήτηση κοντινότερου γείτονα, μια μέθοδο που βελτιστοποιεί παραγοντικούς κβαντιστές τοπικά και έτσι μειώνει σημαντικά την παραμόρφωση κατά τον κβαντισμό. Αν συνδυαστεί με τη μέθοδο δεικτοδότησης multi-index, καταφέρνει να ξεπεράσει τα μέχρι τώρα καλύτερα δημοσιευμένα αποτελέσματα στην αναζήτηση κοντινότερου γείτονα σε ένα σύνολο με ένα δισεκατομμύριο πολυδιάστατα σημεία. Παράλληλα απολαμβάνει ταχύτητες αναζήτησης της τάξεως των λίγων millisecond, γεγονός που την καθιστά ανταγωνιστική ως προς το χρόνο ακόμα και σε σχέση με μεθόδους κατακερματισμού (hashing). Προτείνουμε επίσης τους χάρτες σκηνών και θα δείξουμε ότι μια εκ των προτέρων ομαδοποίηση των εικόνων της συλλογής μπορεί να βελτιώσει την απόδοση της οπτικής αναζήτησης, ενώ παράλληλα ένα κριτήριο παραμόρφωσης μπορεί να εγγυηθεί την ανάκτηση ακόμα και απομονωμένων εικόνων από μη δημοφιλής τοποθεσίες όπως σε ένα γενικό σύστημα αναζήτησης εικόνων. Προτείνουμε μια λύση που παρότι μπορεί να δουλέψει σε συλλογές εκατομμυρίων εικόνων, μπορεί να ανακτήσει ακόμα και τις μη δημοφιλής εικόνες απαιτώντας μονάχα ένα ποσοστό της αρχικής μνήμης. Παρουσιάσουμε τέλος ένα ολοκληρωμένο σύστημα αναζήτησης εικόνων, το οποίο μπορεί να χρησιμοποιηθεί για αυτόματο γεωγραφικό εντοπισμό καθώς και για αναγνώριση οροσήμων ή σημείων ενδιαφέροντος, όπου αυτό είναι εφικτό. Το VIRaL (Visual Image Retrieval and Localization) παρέχει δημόσια πρόσβαση στις προαναφερθείσες τεχνολογίες μέσω ενός ενοποιημένου γραφικού διαδικτυακού περιβάλλοντος. Η διατριβή καταλήγει με τη συνοπτική περιγραφή μερικών ακόμα δημοσιεύσεων που εστιάζουν σε εφαρμογές της οπτικής αναζήτησης καθώς και τα συμπεράσματα της έρευνας.	el
heal.abstract	New applications that exploit the huge data volume in community photo collections are emerging every day and visual image search is therefore becoming increasingly important. In this thesis we propose clustering- and nearest neighbor-based improvements for visual image search. Clustering is either performed on feature space or on image space, i.e. on high-dimensional vector spaces or metric spaces, respectively. We first introduce a clustering method that combines the flexibility of Gaussian mixtures with the scaling properties needed to construct visual vocabularies for image retrieval. It is a variant of expectation-maximization that can converge rapidly while dynamically estimating the number of components. We employ approximate nearest neighbor search to speed-up the E-step and exploit its iterative nature to make search incremental, boosting both speed and precision. We achieve superior performance in large scale retrieval, being as fast as the best known approximate k-means algorithm. We then present our locally optimized product quantization scheme, an approximate nearest neighbor search method that locally optimizes product quantizers per cell, after clustering the data in the original space. When combined with a multi-index, its performance is unprecedented and sets the new state-of-the-art in a billion scale dataset. At the same time, our approach enjoys query times in the order of a few milliseconds, and it becomes comparable in terms of speed even to hashing approaches. We next focus on large community photo collections. Most applications for such collections focus on popular subsets, e.g. images containing landmarks or associated to Wikipedia articles. In this thesis we are concerned with the problem of accurately finding the location where a photo is taken without needing any metadata, that is, solely by its visual content. We also recognize landmarks where applicable, automatically linking them to Wikipedia. We show that the time is right for automating the geo-tagging process, and we show how this can work at large scale. In doing so, we do exploit redundancy of content in popular locations—but unlike most existing solutions, we do not restrict to landmarks. In other words, we can compactly represent the visual content of all thousands of images depicting e.g. the Parthenon and still retrieve any single, isolated, non-landmark image like a house or a graffiti on a wall. Starting from an existing, geo-tagged dataset, we cluster images into sets of different views of the the same scene. This is a very efficient, scalable, and fully automated mining process. We then align all views in a set to one reference image and construct a 2D scene map. Our indexing scheme operates directly on scene maps. We evaluate our solution on a challenging one million urban image dataset and provide public access to our service through our online application, VIRaL. The thesis concludes with two chapters. The first is a summary of other approaches for visual search and applications, like geometry indexing, logo detection and clothing recognition, while the second presents conclusions and possible future directions.	en
heal.advisorName	Κόλλιας, Στέφανος	el
heal.committeeMemberName	Σταφυλοπάτης, Ανδρέας	el
heal.committeeMemberName	Εμίρης, Ιωάννης	el
heal.committeeMemberName	Στάμου, Γεώργιος	el
heal.committeeMemberName	Μαραγκός, Πέτρος	el
heal.committeeMemberName	Βασιλείου, Ιωάννης	el
heal.committeeMemberName	Τσανάκας, Παναγιώτης	el
heal.academicPublisher	Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών	el
heal.academicPublisherID	ntua
heal.numberOfPages	153
heal.fullTextAvailability	true