HEAL DSpace

Training & acceleration of deep reinforcement learning agents

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Αναγνωστόπουλος, Κωνσταντίνος el
dc.contributor.author Anagnostopoulos, Konstantinos en
dc.date.accessioned 2022-10-31T08:31:34Z
dc.date.available 2022-10-31T08:31:34Z
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/56035
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.23733
dc.description Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) "Επιστήμη Δεδομένων και Μηχανική Μάθηση" el
dc.rights Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/gr/ *
dc.subject Machine learning en
dc.subject Reinforcement learning en
dc.subject Hardware acceleration en
dc.subject Deep reinforcement learning en
dc.subject Neural networks en
dc.title Training & acceleration of deep reinforcement learning agents en
heal.type masterThesis
heal.classification Machine learning en
heal.classification Reinforcement learning en
heal.language en
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2022-07-13
heal.abstract In recent years, machine learning applications are becoming increasingly more popular. By leveraging the computing power that modern chips provide us with and large amounts of data that are produced daily by people or machines, machine learning models can be trained to solve a vast spectrum of problems that classical programming approaches cannot. Training machine learning models can be based on three different learning paradigms: supervised learning, unsupervised learning and reinforcement learning. In summary, supervised learning paradigm needs ”golden” or ”labeled” data in order to supervise the process of training a model and unsupervised learning lies on techniques that discover hidden patterns on a set of unlabeled data. Reinforcement learning on the other hand is a totally different approach and relies on the maximization of the reward an agent receives when he interacts with a specific environment. A common real-world reinforcement learning example is that of a dog trainer trying train a dog to perform certain tricks. When the dog acts towards the proper direction, the trainer rewards the dog with a biscuit. The dog (agent) tries to maximize its cumulative reward i.e. eat as many biscuits as possible, by adapting its behavior and act towards the correct direction. Furthermore, in recent years, neural networks, a machine learning model that mimics the way biological neural networks work, have come to the fore due to state of the art results they achieve in various problems like natural language processing tasks, image classification, image or text generation etc. Scaling neural networks (adding more neurons and thus more parameters), usually leads to better results. For natural language processing tasks for example, current state of the art models contain billions of trainable parameters. Again, our advance in microelectronics enables us to use such models due to the increased computing power that we can utilize. In the first part of the present thesis, we will study how reinforcement learning paradigm and neural networks can be combined in order to train intelligent agents, able to interact with complex environments. We will implement four different deep reinforcement learning algorithms, Deep Q Network, REINFORCE, Asynchronous Actor Critic & Proximal Policy Optimization and we will leverage those implementations to train intelligent agents able to interact with two environments, Cart Pole and DuckieTown. While CartPole environment is considered as the ”hello world” environment for reinforcement learning, DuckieTown is a more complex one with an agent trying to learn how to properly drive through the roads of a simulated, animated city. In the second part of the present thesis, we will focus one how we can deploy the trained DuckieTown agent in the real world. More specifically, we will focus on two scenarios. The first scenario regards the acceleration of the forward pass of the neural network model in order to achieve faster inference time and therefore create a more responsive agent, a trait that is desired for all autonomous vehicles. The second scenario regards the control of a swarm of agents by a central device. The central device this time performs batch computations, with the batch size being equal to the total number of agents in the swarm. For this scenario we set a minimum of 100 FPS that must be achieved for each agent. The devices that will be used for the acceleration are NVIDIA Xavier NX utilizing a GPU as a hardware accelerator and Xilinx Zynq UltraScale+ MPSoC ZCU104 utilizing an FPGA. en
heal.advisorName Σούντρης, Δημήτριος el
heal.committeeMemberName Σούντρης, Δημήτριος el
heal.committeeMemberName Θεοδωρίδης, Γεώργιος el
heal.committeeMemberName Ξύδης, Σωτήριος el
heal.academicPublisher Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών el
heal.academicPublisherID ntua
heal.fullTextAvailability false


Αρχεία σε αυτό το τεκμήριο

Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο:

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής

Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα Εκτός από όπου ορίζεται κάτι διαφορετικό, αυτή η άδεια περιγράφεται ως Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα