Training & acceleration of deep reinforcement learning agents

Αναγνωστόπουλος, Κωνσταντίνος; Anagnostopoulos, Konstantinos

dc.contributor.author	Αναγνωστόπουλος, Κωνσταντίνος	el
dc.contributor.author	Anagnostopoulos, Konstantinos	en
dc.date.accessioned	2022-10-31T08:31:34Z
dc.date.available	2022-10-31T08:31:34Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/56035
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.23733
dc.description	Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) "Επιστήμη Δεδομένων και Μηχανική Μάθηση"	el
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/gr/	*
dc.subject	Machine learning	en
dc.subject	Reinforcement learning	en
dc.subject	Hardware acceleration	en
dc.subject	Deep reinforcement learning	en
dc.subject	Neural networks	en
dc.title	Training & acceleration of deep reinforcement learning agents	en
heal.type	masterThesis
heal.classification	Machine learning	en
heal.classification	Reinforcement learning	en
heal.language	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2022-07-13
heal.abstract	In recent years, machine learning applications are becoming increasingly more popular. By leveraging the computing power that modern chips provide us with and large amounts of data that are produced daily by people or machines, machine learning models can be trained to solve a vast spectrum of problems that classical programming approaches cannot. Training machine learning models can be based on three different learning paradigms: supervised learning, unsupervised learning and reinforcement learning. In summary, supervised learning paradigm needs ”golden” or ”labeled” data in order to supervise the process of training a model and unsupervised learning lies on techniques that discover hidden patterns on a set of unlabeled data. Reinforcement learning on the other hand is a totally different approach and relies on the maximization of the reward an agent receives when he interacts with a specific environment. A common real-world reinforcement learning example is that of a dog trainer trying train a dog to perform certain tricks. When the dog acts towards the proper direction, the trainer rewards the dog with a biscuit. The dog (agent) tries to maximize its cumulative reward i.e. eat as many biscuits as possible, by adapting its behavior and act towards the correct direction. Furthermore, in recent years, neural networks, a machine learning model that mimics the way biological neural networks work, have come to the fore due to state of the art results they achieve in various problems like natural language processing tasks, image classification, image or text generation etc. Scaling neural networks (adding more neurons and thus more parameters), usually leads to better results. For natural language processing tasks for example, current state of the art models contain billions of trainable parameters. Again, our advance in microelectronics enables us to use such models due to the increased computing power that we can utilize. In the first part of the present thesis, we will study how reinforcement learning paradigm and neural networks can be combined in order to train intelligent agents, able to interact with complex environments. We will implement four different deep reinforcement learning algorithms, Deep Q Network, REINFORCE, Asynchronous Actor Critic & Proximal Policy Optimization and we will leverage those implementations to train intelligent agents able to interact with two environments, Cart Pole and DuckieTown. While CartPole environment is considered as the ”hello world” environment for reinforcement learning, DuckieTown is a more complex one with an agent trying to learn how to properly drive through the roads of a simulated, animated city. In the second part of the present thesis, we will focus one how we can deploy the trained DuckieTown agent in the real world. More specifically, we will focus on two scenarios. The first scenario regards the acceleration of the forward pass of the neural network model in order to achieve faster inference time and therefore create a more responsive agent, a trait that is desired for all autonomous vehicles. The second scenario regards the control of a swarm of agents by a central device. The central device this time performs batch computations, with the batch size being equal to the total number of agents in the swarm. For this scenario we set a minimum of 100 FPS that must be achieved for each agent. The devices that will be used for the acceleration are NVIDIA Xavier NX utilizing a GPU as a hardware accelerator and Xilinx Zynq UltraScale+ MPSoC ZCU104 utilizing an FPGA.	en
heal.advisorName	Σούντρης, Δημήτριος	el
heal.committeeMemberName	Σούντρης, Δημήτριος	el
heal.committeeMemberName	Θεοδωρίδης, Γεώργιος	el
heal.committeeMemberName	Ξύδης, Σωτήριος	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών	el
heal.academicPublisherID	ntua
heal.fullTextAvailability	false