HEAL DSpace

Count-based agent modelling in multi-agent reinforcement learning

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Παπαθανασίου, Κωνσταντίνος el
dc.contributor.author Κοντογιάννης, Ανδρέας el
dc.contributor.author Papathanasiou, Konstantinos en
dc.contributor.author Kontogiannis, Andreas en
dc.date.accessioned 2024-04-08T09:54:24Z
dc.date.available 2024-04-08T09:54:24Z
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/59127
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.26823
dc.description Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) "Επιστήμη Δεδομένων και Μηχανική Μάθηση" el
dc.rights Default License
dc.subject Reinforcement Learning en
dc.subject Intrinsic Exploration en
dc.subject Agent Modelling en
dc.subject Multi-Agent Learning en
dc.subject Variational Inference en
dc.title Count-based agent modelling in multi-agent reinforcement learning en
heal.type masterThesis
heal.classification Multi-Agent Reinforcement Learning en
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2023-09-07
heal.abstract In this thesis, we consider the problem of Multi-Agent Reinforcement Learning (MARL) in partially-observable cooperative tasks. Due to their good performance on multiple bench- mark MARL testbeds and the efficiency of their on-policy learning, we emphasize on policy gradient algorithms, such as the Multi-Agent Actor-Critic (MAA2C) algorithm. Following the recent surge of approaches that either (a) adopt the Centralized-Training-Decentralized Execution (CTDE) schema, or (b) utilize agent communication also during execution for improved performance, we address the question of how to combine the CTDE schema with the benefits of communication methods, in order to train agents able to perform better in difficult tasks, including those with sparse reward settings. To this aim, in this thesis, we propose Count-based Agent Modelling (CAM), a MARL framework, built on top of MAA2C, that utilizes agent modelling techniques, variational inference and self-supervised learning for generating information sharing among the agents as latent neural representa- tions. CAM uses the generated information sharing representations for explicitly enhancing the agents’ policies, and also for encouraging the agents towards coordinated exploration based on intrinsic motivation. Experimentally, we show that CAM outperforms state-of- the-art MARL algorithms on difficult tasks, with and without sparse rewards, from the Multi-Agent Particle Environment (MPE) and Level-based Foraging (LBF) benchmark testbeds. en
heal.advisorName Στάμου, Γεώργιος el
heal.committeeMemberName Βουλόδημος, Αθανάσιος el
heal.committeeMemberName Σταφυλοπάτης, Ανδρέας el
heal.academicPublisher Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών el
heal.academicPublisherID ntua
heal.fullTextAvailability false


Αρχεία σε αυτό το τεκμήριο

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής