HEAL DSpace

Action to object knowledge distillation for object-centric representation learning

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Γιαννακάκης, Νικόλαος el
dc.contributor.author Giannakakis, Nikolaos en
dc.date.accessioned 2025-03-28T09:26:21Z
dc.date.available 2025-03-28T09:26:21Z
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/61524
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.29220
dc.rights Αναφορά Δημιουργού 3.0 Ελλάδα *
dc.rights Αναφορά Δημιουργού 3.0 Ελλάδα *
dc.rights Αναφορά Δημιουργού-Μη Εμπορική Χρήση 3.0 Ελλάδα *
dc.rights.uri http://creativecommons.org/licenses/by-nc/3.0/gr/ *
dc.subject Εκμάθηση Αναπαραστάσεων el
dc.subject Αντικειµενοκεντρική Εκμάθηση Αναπαραστάσεων el
dc.subject Κατηγοριοποίηση Προσφερόμενων Δυνατοτήτων Αντικειµένων el
dc.subject Προσοµοίωση Ρομποτικού Χειρισμού el
dc.subject Ρομποτική Αντίληψη el
dc.subject Object-centric Representation Learning en
dc.subject Representation Learning en
dc.subject Slot Attention en
dc.subject Robot Perception en
dc.subject Robotics Simulation en
dc.title Action to object knowledge distillation for object-centric representation learning en
dc.contributor.department Division of Signals, Control and Robotics el
heal.type bachelorThesis
heal.classification Machine Learning en
heal.classification Deep Learning en
heal.classification Computer Vision en
heal.language el
heal.language en
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2024-10-24
heal.abstract This thesis aims to study the possible improvement of object-centric image encoders by enhancing them with action-centric representations derived from videos of actions. Firstly, we study a method to distill the representations of a pre-trained Video Masked Auto-encoder (Video MAE) to the representations of two state-of-the-art image encoders in an object-centric manner. This method is evaluated in the task of affordance categorization using a small-scale dataset that we created using the Something-Something v2 (SSV2) dataset. Experiments show that the representations of the Video MAE contain information that could be useful to the image encoders, and we test some methods to enrich them with this information. The experiments show that the methods produce a marginal yet consistent enhancement. Further experimentation with larger scale model implementations and datasets could potentially unlock additional improvements. Furthermore, we propose and study a method based on the Slot Attention object-centric representation learning framework. The effectiveness of the method is also evaluated in the task of affordance categorization and it presents competitive results while also achieving automatic segmentation of the images and a substantial reduction in per-object representation size. Finally, we propose a method to combine object-centric representations from a slot-attention-based model to produce a flat representation vector for an image with the aim of learning visuomotor policies. This method is evaluated in a robotic simulation task and presents better results compared to other out-of-domain representations. We also show that the slot representations’ performance in the simulated robotic manipulation can be improved when fine-tuning the model with videos of actions from the SSV2 dataset. By creating action-object associations in the representations of object-centric image encoders, this study seeks to contribute to the development of more effective vision perception systems for robots and artificial agents, enabling them to better understand the semantics and dynamics of agent-object interaction. en
heal.advisorName Μαραγκός, Πέτρος el
heal.committeeMemberName Μαραγκός, Πέτρος el
heal.committeeMemberName Ροντογιάννης, Αθανάσιος el
heal.committeeMemberName Κορδώνης, Ιωάννης el
heal.academicPublisher Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Σημάτων, Ελέγχου και Ρομποτικής el
heal.academicPublisherID ntua
heal.numberOfPages 119 σ. el
heal.fullTextAvailability false


Αρχεία σε αυτό το τεκμήριο

Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο:

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής

Αναφορά Δημιουργού 3.0 Ελλάδα Εκτός από όπου ορίζεται κάτι διαφορετικό, αυτή η άδεια περιγράφεται ως Αναφορά Δημιουργού 3.0 Ελλάδα