dc.contributor.author | Eleftherakis, Panagiotis-Eleftherios | en |
dc.contributor.author | Ελευθεράκης, Παναγιώτης-Ελευθέριος | el |
dc.date.accessioned | 2023-05-24T06:42:11Z | |
dc.date.available | 2023-05-24T06:42:11Z | |
dc.identifier.uri | https://dspace.lib.ntua.gr/xmlui/handle/123456789/57747 | |
dc.identifier.uri | http://dx.doi.org/10.26240/heal.ntua.25444 | |
dc.rights | Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/gr/ | * |
dc.subject | General-Purpose GPU | en |
dc.subject | GPU γενιϰού σϰοπού | el |
dc.subject | High Performance Computing | en |
dc.subject | Reconfigurable computing | en |
dc.subject | Instruction Level Parallelism | en |
dc.subject | Out-of-Order | en |
dc.subject | Parallel Systems | en |
dc.subject | Energy Efficient computing | en |
dc.subject | Modelling and Simulation | en |
dc.subject | Υψηλή υπολογιστιϰή επίδοση | el |
dc.subject | Αναδιαµορφώσιµες αρχιτεϰτονιϰές | el |
dc.subject | Παραλληλία επιπέδου εντολής | el |
dc.subject | Εϰτός σειράς αρχιτεϰτονιϰή | el |
dc.subject | Παράλληλα συστήµατα | el |
dc.subject | Ενεργειαϰή απόδοση υπολογισµού, | el |
dc.subject | Μοντελοποίηση ϰαι προσοµοίωση | el |
dc.title | A novel reconfigurable out-of-order GPU microarchitecture with runtime workload characterization | en |
heal.type | bachelorThesis | |
heal.classification | Computer engineering | en |
heal.language | el | |
heal.language | en | |
heal.access | free | |
heal.recordProvider | ntua | el |
heal.publicationDate | 2023-03-21 | |
heal.abstract | Since the breakdown of Moore’s law, high processor performance has been driven by Massively Parallel Processing and hardware specialization. The halt met by Dennard’s scaling and the advent of the "Dark Silicon Era" necessitate energy-efficient computing. In this context, heterogeneous architectures and reconfigurable computing have emerged as flexible approaches for achieving the above goals. Meanwhile, the previously proposed Light-weight Out-of-Order GPU (LOOG) execution scheme addresses the performance stagnation met by a class of general-purpose GPU workloads, by complementing the traditional TLP leveraging and fast context switching of the GPU, with exploitation of the inherent Instruction Level Parallelism (ILP) of these workloads. As it constitutes the backbone of this thesis, we implement it in the most recent version of Accel-Sim, a GPU simulation framework that provides modelling of recent high-end NVIDIA GPU architectures, built around the performance model of GPGPU-Sim 4.1.0, a cycle-level GPU performance simulator. Having accommodated LOOG on an HPC-relevant platform (NVIDIA Quadro GV100, powered by the Volta microarchitecture) by right-sizing its structures, implementing a dynamic Instruction Buffer reconfiguration mechanism and optimally configuring GPU pipeline front-end components, we collect detailed architecture bottleneck statistics across 7 benchmark suites and 100 CUDA kernels. The emerging application characterization and the study of workload characteristics that predict speedup on LOOG, paired with a scalability analysis of LOOG components from an architectural standpoint, motivates the assessment of a Scalable, Reconfigurable Out-of-Order GPU Microarchitecture that appropriately handles both kernels deemed LOOG-sensitive as well as generic kernels, to maximize performance or energy efficiency. The reconfigurable microarchitecture is evaluated under different reconfiguration schemes and granularities, including a per-kernel-launch granularity hardware reconfiguration controller using runtime performance counters to predict application OOO performance improvement. A static scale-up LOOG configuration provides a speedup of 1.48 for generic kernels and a 13.7% reduction in energy dissipation, compared to the baseline architecture. Reconfiguration under programmer-assisted directives and using the hardware controller can provide the same speedup when needed and have the potential to improve energy efficiency from baseline (in-order microarchitecture) by 22.4% and 19.5% respectively | en |
heal.advisorName | Soudris, Dimitrios | en |
heal.committeeMemberName | Soudris, Dimitrios | en |
heal.committeeMemberName | Xydis, Sotirios | en |
heal.committeeMemberName | Tsanakas, Panagiotis | en |
heal.academicPublisher | Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών. Εργαστήριο Μικροϋπολογιστών και Ψηφιακών Συστημάτων VLSI | el |
heal.academicPublisherID | ntua | |
heal.numberOfPages | 204 σ. | el |
heal.fullTextAvailability | false |
Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο: