Παραγωγή περιγραφικών γράφων σκηνής χρησιμοποιώντας ασθενή επίβλεψη σε περιγραφές εικόνων

Μπενετάτος, Αλέξανδρος; Benetatos, Alexandros

dc.contributor.author	Μπενετάτος, Αλέξανδρος	el
dc.contributor.author	Benetatos, Alexandros	en
dc.date.accessioned	2022-11-28T11:30:01Z
dc.date.available	2022-11-28T11:30:01Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/56276
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.23974
dc.rights	Default License
dc.subject	Παραγωγή Γράφων Σκηνής	el
dc.subject	Ασθενής Επίβλεψη	el
dc.subject	Περιγραφικότητα	el
dc.subject	Παραγωγή Γράφων Σκηνής από Περιγραφές Εικόνων	el
dc.subject	COCO	en
dc.subject	VG200	en
dc.subject	Open Images	en
dc.subject	Scene Graph Generation (SGG)	en
dc.subject	Weak Supervision	en
dc.subject	Saliency	en
dc.subject	Scene Graph Generation from Image Captions	en
dc.title	Παραγωγή περιγραφικών γράφων σκηνής χρησιμοποιώντας ασθενή επίβλεψη σε περιγραφές εικόνων	el
dc.title	Salient Scene Graph Generation From Image Captions Using Weak Supervision	en
heal.type	bachelorThesis
heal.classification	Όραση Υπολογιστών	el
heal.classification	Computer Vision	en
heal.language	el
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2022-07-14
heal.abstract	Το πρόβλημα παραγωγής γράφων σκηνής (scene graph generation) του τομέα της όρασης υπολογιστών αφορά την εξαγωγή κατευθυνόμενων γράφων ως αναπαράσταση των σχέσεων (ακμές) μεταξύ των αντικειμένων (κόμβοι) σε μία εικόνα. Παρατηρώντας τη συμπεριφορά σύγχρονων μοντέλων στη βιβλιογραφία σε εικόνες με επισημειωμένα δείγματα, γίνεται σαφές πως τα μοντέλα που εκπαιδεύουμε δυσκολεύονται να ξεχωρίσουν ποιες από τις πιθανές σχέσεις είναι πιο σημαντικές για την περιγραφή της εικόνας. Μάλιστα, αυτό δεν οφείλεται σε κάποιο πρόβλημα εκπαίδευσης καθώς, πολύ συχνά, τα μοντέλα θα προβλέψουν τις σχέσεις που είναι επισημειωμένες, ωστόσο ακόμα και αυτές δεν θα παρέχουν σημαντική πληροφορία για την εικόνα. Θα αναφερόμαστε στην ικανότητα των μοντέλων να εντοπίσουν ποιες από τις πιθανές σχέσεις είναι πιο σημαντικές για την περιγραφή της εικόνας ως saliency και, από όσο γνωρίζουμε, είμαστε οι πρώτοι που αναφερόμαστε σε αυτό το χαρακτηριστικό. Η συνεισφορά αυτής της διπλωματικής αφορά τόσο τη μέτρηση του saliency ενός Scene Graph Generation (SGG) μοντέλου όσο και τη παραγωγή πιο salient γράφων σκηνής σύμφωνα με ποιοτικά και ποσοτικά αποτελέσματα που εξάγουμε. Συγκεκριμένα (α) εισάγουμε μια γενικευμένη μέθοδο εκπαίδευσης SGG μοντέλων με ασθενή επίβλεψη χρησιμοποιώντας περιγραφές εικόνων, (β) εισάγουμε δύο παραλλαγές της μέτρησης του Recall@N όπου, με χρήση των περιγραφών εικόνων, μπορούμε να εξάγουμε μετρήσεις για το saliency SGG μοντέλων και (γ) πραγματοποιούμε τόσο ποσοτική όσο και ποιοτική σύγκριση μεταξύ των μεθόδων που προτείνουμε και με τη σχετική βιβλιογραφία στο VG200, το δημοφιλέστερο σύνολο δεδομένων του προβλήματος όπου πετυχαίνουμε 35\% μέγιστη σχετική βελτίωση συγκριτικά με επαναϋλοποίηση της SOTA μεθόδου. Θεμελιώνουμε, λοιπόν, την αιτία έλλειψης saliency στους γράφους σκηνής, προτείνουμε μετρικές για την αξιολόγηση του saliency ενός μοντέλου και τέλος σχεδιάζουμε μια μέθοδο εκπαίδευσης μοντέλων ώστε αυτά να αντιλαμβάνονται καλύτερα την έννοια του saliency και να παράγουν πιο ουσιώδεις γράφους σκηνής. Τα παραπάνω τονίζουν την ανάγκη παραγωγής περιγραφικών γράφων σκηνής και αναδεικνύουν την ανάγκη αλλαγής προσανατολισμού στην αντιμετώπιση του προβλήματος. Η χρήση πλήρως επιβλεπόμενων μεθόδων, δυστυχώς, δεν κλιμακώνονται καλά σε αυξημένο αριθμό από κατηγορίες αντικειμένων ή σχέσεων. Αλλά ακόμα και σε μικρότερα λεξιλόγια, εξαιτίας της αραιής μη-περιγραφικής επισημείωσης, οδηγούμαστε σε μεροληπτικά μοντέλα που δεν κατανοούν την εικόνα και αδυνατούν να εντοπίσουν τη σημαντική πληροφορία σε αυτή.	el
heal.abstract	Scene graph generation is a computer vision task regarding the generation of a directed graph as a representation of relations (edges) and object entities (nodes) in an image. Observing the behavior of state-of-the-art models in images with labeled data, it is clear that the models we train find it difficult to separate which of the possible relations are more important for the description of the image. In fact, this is not due to some problem in training since, very often, the models will predict relations that are labeled, though even those do not provide important information for the image. We will refer to the ability of the models to identify which of the possible relations are more important for the description of the image as saliency and, as far as we know, we are the first to study this characteristic of the SGG models. This thesis contributes to quantifying the saliency of a Scene Graph Generation (SGG) model and the generation of more salient scene graphs, as shown by the qualitative and quantitative results we gather. Specifically, we (a) Introduce a generalized method for training SGG models with weak supervision using image captions, (b) introduce two variations of the common Recall@N metric with which, using image captions, we can calculate measurements regarding the saliency of SGG models and (c) perform quantitative and qualitative comparison between the methods we propose and the relative literature in VG200, the most common dataset for SGG where we achieve 35% maximum relative improvement compared to the re-implementation of the SOTA method for weakly supervised training with image captions. So, we establish the reason for the lack of saliency in scene graphs, we introduce metrics to evaluate the saliency of a model, and lastly, we propose a generalized method for training SGG models that better incorporate the concept of saliency and generate more descriptive scene graphs. The above emphasizes the need to produce descriptive scene graphs and highlights the need to change the way we deal with the problem. The use of fully supervised methods, unfortunately, does not scale well into an increasing number of object categories or relationships. But even in smaller vocabularies, due to the sparse, non-salient annotations, we end up with biased models that do not understand the image and are unable to locate important relationships.	en
heal.advisorName	Μαραγκός, Πέτρος	el
heal.advisorName	Maragos, Petros	en
heal.committeeMemberName	Ροντογιαννης, Αθανασιος	el
heal.committeeMemberName	Ποταμιάνος, Γεράσιμος	el
heal.committeeMemberName	Μαραγκός, Πέτρος	el
heal.committeeMemberName	‪Rontogiannis, Athanasios	en
heal.committeeMemberName	‪Potamianos‬, Gerasimos	en
heal.committeeMemberName	Maragos, Petros	en
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Σημάτων, Ελέγχου και Ρομποτικής. Εργαστήριο Όρασης Υπολογιστών, Επικοινωνίας Λόγου και Επεξεργασίας Σημάτων	el
heal.academicPublisherID	ntua
heal.numberOfPages	60 σ.	el
heal.fullTextAvailability	false