HEAL DSpace

FPGA Design and Analysis of a RISC-V Out-Of-Order GPU

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Ζέρβα, Μαρία el
dc.contributor.author Zerva, Maria en
dc.date.accessioned 2025-12-01T09:39:08Z
dc.date.available 2025-12-01T09:39:08Z
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/62956
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.30652
dc.rights Default License
dc.subject High Performance Computing en
dc.subject GPU Micro-Architecture en
dc.subject Out-Of-Order Execution en
dc.subject RISC-V en
dc.subject RTL Design en
dc.subject Υψηλής Επίδοσης Υπολογισµός el
dc.subject GPU Μιϰροαρχιτεϰρονιϰή el
dc.subject Εϰτέλεση Εϰτός Σειράς el
dc.subject Σχεδίαση RTL el
dc.title FPGA Design and Analysis of a RISC-V Out-Of-Order GPU en
heal.type bachelorThesis
heal.classification Computer Engineering en
heal.language en
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2025-03-17
heal.abstract Owing to their exceptional computational performance and cost efficiency, GPUs have solidified their status as the premier platform for accelerating general-purpose workloads. Nonetheless, a subset of these workloads continues to exhibit performance stagnation. The previously proposed Light-weight Out-Of-Order GPU (LOOG) execution scheme addresses this issue by augmenting conventional Thread-Level Parallelism with the exploitation of inherent Instruction-Level Parallelism. Although LOOG has been modeled using GPU simulation tools in previous studies, these implementations have suffered from limited accuracy in power consumption and critical path estimations, in addition to slow execution of applications. To overcome these limitations, this thesis proposes integrating LOOG into an RTL GPU framework and specifically Vortex GPU version 2.0, an open-source design that is well-suited for deployment on FPGA platforms. To preserve LOOG’s performance gain in Vortex’s RISC-V–based pipeline, the extension is meticulously designed to complement the existing micro-architecture and the operations it supports. Furthermore, a comprehensive investigation of design optimizations and trade-offs is conducted to enhance performance while constraining the overall Area and Power overhead. A detailed characterization of 21 Vortex workloads based on their stalling behavior is executed previous to the experimental evaluation, enabling the right-sizing of the micro-architecture across a broad design space that is supported by Vortex’s configurability. The results demonstrate an average speedup of up to approximately 23.5%, while maintaining lower Area-Delay and Power-Delay products compared to the in-order Vortex in various configurations. en
heal.advisorName Ξύδης, Σωτήριος el
heal.committeeMemberName Ξύδης, Σωτήριος el
heal.committeeMemberName Σούντρης, Δημήτριος el
heal.committeeMemberName Πεκμεστζή, Κιαμάλ el
heal.academicPublisher Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών. Εργαστήριο Μικροϋπολογιστών και Ψηφιακών Συστημάτων VLSI el
heal.academicPublisherID ntua
heal.numberOfPages 117 σ. el
heal.fullTextAvailability false


Αρχεία σε αυτό το τεκμήριο

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής