dc.contributor.author | Γιαννακάκης, Νικόλαος![]() |
el |
dc.contributor.author | Giannakakis, Nikolaos![]() |
en |
dc.date.accessioned | 2025-03-28T09:26:21Z | |
dc.date.available | 2025-03-28T09:26:21Z | |
dc.identifier.uri | https://dspace.lib.ntua.gr/xmlui/handle/123456789/61524 | |
dc.identifier.uri | http://dx.doi.org/10.26240/heal.ntua.29220 | |
dc.rights | Αναφορά Δημιουργού 3.0 Ελλάδα | * |
dc.rights | Αναφορά Δημιουργού 3.0 Ελλάδα | * |
dc.rights | Αναφορά Δημιουργού-Μη Εμπορική Χρήση 3.0 Ελλάδα | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc/3.0/gr/ | * |
dc.subject | Εκμάθηση Αναπαραστάσεων | el |
dc.subject | Αντικειµενοκεντρική Εκμάθηση Αναπαραστάσεων | el |
dc.subject | Κατηγοριοποίηση Προσφερόμενων Δυνατοτήτων Αντικειµένων | el |
dc.subject | Προσοµοίωση Ρομποτικού Χειρισμού | el |
dc.subject | Ρομποτική Αντίληψη | el |
dc.subject | Object-centric Representation Learning | en |
dc.subject | Representation Learning | en |
dc.subject | Slot Attention | en |
dc.subject | Robot Perception | en |
dc.subject | Robotics Simulation | en |
dc.title | Action to object knowledge distillation for object-centric representation learning | en |
dc.contributor.department | Division of Signals, Control and Robotics | el |
heal.type | bachelorThesis | |
heal.classification | Machine Learning | en |
heal.classification | Deep Learning | en |
heal.classification | Computer Vision | en |
heal.language | el | |
heal.language | en | |
heal.access | free | |
heal.recordProvider | ntua | el |
heal.publicationDate | 2024-10-24 | |
heal.abstract | This thesis aims to study the possible improvement of object-centric image encoders by enhancing them with action-centric representations derived from videos of actions. Firstly, we study a method to distill the representations of a pre-trained Video Masked Auto-encoder (Video MAE) to the representations of two state-of-the-art image encoders in an object-centric manner. This method is evaluated in the task of affordance categorization using a small-scale dataset that we created using the Something-Something v2 (SSV2) dataset. Experiments show that the representations of the Video MAE contain information that could be useful to the image encoders, and we test some methods to enrich them with this information. The experiments show that the methods produce a marginal yet consistent enhancement. Further experimentation with larger scale model implementations and datasets could potentially unlock additional improvements. Furthermore, we propose and study a method based on the Slot Attention object-centric representation learning framework. The effectiveness of the method is also evaluated in the task of affordance categorization and it presents competitive results while also achieving automatic segmentation of the images and a substantial reduction in per-object representation size. Finally, we propose a method to combine object-centric representations from a slot-attention-based model to produce a flat representation vector for an image with the aim of learning visuomotor policies. This method is evaluated in a robotic simulation task and presents better results compared to other out-of-domain representations. We also show that the slot representations’ performance in the simulated robotic manipulation can be improved when fine-tuning the model with videos of actions from the SSV2 dataset. By creating action-object associations in the representations of object-centric image encoders, this study seeks to contribute to the development of more effective vision perception systems for robots and artificial agents, enabling them to better understand the semantics and dynamics of agent-object interaction. | en |
heal.advisorName | Μαραγκός, Πέτρος | el |
heal.committeeMemberName | Μαραγκός, Πέτρος | el |
heal.committeeMemberName | Ροντογιάννης, Αθανάσιος | el |
heal.committeeMemberName | Κορδώνης, Ιωάννης | el |
heal.academicPublisher | Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Σημάτων, Ελέγχου και Ρομποτικής | el |
heal.academicPublisherID | ntua | |
heal.numberOfPages | 119 σ. | el |
heal.fullTextAvailability | false |
Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο: