dc.contributor.author | Αναγνωστόπουλος, Κωνσταντίνος | el |
dc.contributor.author | Anagnostopoulos, Konstantinos | en |
dc.date.accessioned | 2022-10-31T08:31:34Z | |
dc.date.available | 2022-10-31T08:31:34Z | |
dc.identifier.uri | https://dspace.lib.ntua.gr/xmlui/handle/123456789/56035 | |
dc.identifier.uri | http://dx.doi.org/10.26240/heal.ntua.23733 | |
dc.description | Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) "Επιστήμη Δεδομένων και Μηχανική Μάθηση" | el |
dc.rights | Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/gr/ | * |
dc.subject | Machine learning | en |
dc.subject | Reinforcement learning | en |
dc.subject | Hardware acceleration | en |
dc.subject | Deep reinforcement learning | en |
dc.subject | Neural networks | en |
dc.title | Training & acceleration of deep reinforcement learning agents | en |
heal.type | masterThesis | |
heal.classification | Machine learning | en |
heal.classification | Reinforcement learning | en |
heal.language | en | |
heal.access | free | |
heal.recordProvider | ntua | el |
heal.publicationDate | 2022-07-13 | |
heal.abstract | In recent years, machine learning applications are becoming increasingly more popular. By leveraging the computing power that modern chips provide us with and large amounts of data that are produced daily by people or machines, machine learning models can be trained to solve a vast spectrum of problems that classical programming approaches cannot. Training machine learning models can be based on three different learning paradigms: supervised learning, unsupervised learning and reinforcement learning. In summary, supervised learning paradigm needs ”golden” or ”labeled” data in order to supervise the process of training a model and unsupervised learning lies on techniques that discover hidden patterns on a set of unlabeled data. Reinforcement learning on the other hand is a totally different approach and relies on the maximization of the reward an agent receives when he interacts with a specific environment. A common real-world reinforcement learning example is that of a dog trainer trying train a dog to perform certain tricks. When the dog acts towards the proper direction, the trainer rewards the dog with a biscuit. The dog (agent) tries to maximize its cumulative reward i.e. eat as many biscuits as possible, by adapting its behavior and act towards the correct direction. Furthermore, in recent years, neural networks, a machine learning model that mimics the way biological neural networks work, have come to the fore due to state of the art results they achieve in various problems like natural language processing tasks, image classification, image or text generation etc. Scaling neural networks (adding more neurons and thus more parameters), usually leads to better results. For natural language processing tasks for example, current state of the art models contain billions of trainable parameters. Again, our advance in microelectronics enables us to use such models due to the increased computing power that we can utilize. In the first part of the present thesis, we will study how reinforcement learning paradigm and neural networks can be combined in order to train intelligent agents, able to interact with complex environments. We will implement four different deep reinforcement learning algorithms, Deep Q Network, REINFORCE, Asynchronous Actor Critic & Proximal Policy Optimization and we will leverage those implementations to train intelligent agents able to interact with two environments, Cart Pole and DuckieTown. While CartPole environment is considered as the ”hello world” environment for reinforcement learning, DuckieTown is a more complex one with an agent trying to learn how to properly drive through the roads of a simulated, animated city. In the second part of the present thesis, we will focus one how we can deploy the trained DuckieTown agent in the real world. More specifically, we will focus on two scenarios. The first scenario regards the acceleration of the forward pass of the neural network model in order to achieve faster inference time and therefore create a more responsive agent, a trait that is desired for all autonomous vehicles. The second scenario regards the control of a swarm of agents by a central device. The central device this time performs batch computations, with the batch size being equal to the total number of agents in the swarm. For this scenario we set a minimum of 100 FPS that must be achieved for each agent. The devices that will be used for the acceleration are NVIDIA Xavier NX utilizing a GPU as a hardware accelerator and Xilinx Zynq UltraScale+ MPSoC ZCU104 utilizing an FPGA. | en |
heal.advisorName | Σούντρης, Δημήτριος | el |
heal.committeeMemberName | Σούντρης, Δημήτριος | el |
heal.committeeMemberName | Θεοδωρίδης, Γεώργιος | el |
heal.committeeMemberName | Ξύδης, Σωτήριος | el |
heal.academicPublisher | Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών | el |
heal.academicPublisherID | ntua | |
heal.fullTextAvailability | false |
Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο: