Count-based Agent Modelling in Multi-Agent Reinforcement Learning

Kontogiannis, Andreas; Papathanasiou, Konstantinos

dc.contributor.author	Kontogiannis, Andreas
dc.contributor.author	Papathanasiou, Konstantinos
dc.date.accessioned	2023-12-06T07:09:28Z
dc.date.available	2023-12-06T07:09:28Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/58371
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.26067
dc.rights	Default License
dc.subject	Reinforcement Learning, Multi-Agent Learning, Variational Inference, Agent Modelling, Intrinsic Exploration	en
dc.title	Count-based Agent Modelling in Multi-Agent Reinforcement Learning	en
dc.contributor.department	Επιστήμη Δεδομένων και Μηχανική Μάθηση	el
heal.type	masterThesis
heal.classification	Multi-Agent Reinforcement Learning	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2023-09-07
heal.abstract	In this thesis, we consider the problem of Multi-Agent Reinforcement Learning (MARL) in partially-observable cooperative tasks. Due to their good performance on multiple benchmark MARL testbeds and the efficiency of their on-policy learning, we emphasize on policy gradient algorithms, such as the Multi-Agent Actor-Critic (MAA2C) algorithm. Following the recent surge of approaches that either (a) adopt the Centralized-Training-Decentralized Execution (CTDE) schema, or (b) utilize agent communication also during execution for improved performance, we address the question of how to combine the CTDE schema with the benefits of communication methods, in order to train agents able to perform better in difficult tasks, including those with sparse reward settings. To this aim, in this thesis, we propose Count-based Agent Modelling (CAM), a MARL framework, built on top of MAA2C, that utilizes agent modelling techniques, variational inference and self-supervised learning for generating information sharing among the agents as latent neural representations. CAM uses the generated information sharing representations for explicitly enhancing the agents' policies, and also for encouraging the agents towards coordinated exploration based on intrinsic motivation. Experimentally, we show that CAM outperforms state-of-the-art MARL algorithms on difficult tasks, with and without sparse rewards, from the Multi-Agent Particle Environment (MPE) and Level-based Foraging (LBF) benchmark testbeds.	en
heal.advisorName	Στάμου, Γεώργιος
heal.committeeMemberName	Βουλόδημος, Αθανάσιος
heal.committeeMemberName	Σταφυλοπάτης, Ανδρέας
heal.academicPublisher	Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών	el
heal.academicPublisherID	ntua
heal.fullTextAvailability	false