FPGA Design and Analysis of a RISC-V Out-Of-Order GPU

Ζέρβα, Μαρία; Zerva, Maria

dc.contributor.author	Ζέρβα, Μαρία	el
dc.contributor.author	Zerva, Maria	en
dc.date.accessioned	2025-12-01T09:39:08Z
dc.date.available	2025-12-01T09:39:08Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/62956
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.30652
dc.rights	Default License
dc.subject	High Performance Computing	en
dc.subject	GPU Micro-Architecture	en
dc.subject	Out-Of-Order Execution	en
dc.subject	RISC-V	en
dc.subject	RTL Design	en
dc.subject	Υψηλής Επίδοσης Υπολογισµός	el
dc.subject	GPU Μιϰροαρχιτεϰρονιϰή	el
dc.subject	Εϰτέλεση Εϰτός Σειράς	el
dc.subject	Σχεδίαση RTL	el
dc.title	FPGA Design and Analysis of a RISC-V Out-Of-Order GPU	en
heal.type	bachelorThesis
heal.classification	Computer Engineering	en
heal.language	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2025-03-17
heal.abstract	Owing to their exceptional computational performance and cost efficiency, GPUs have solidified their status as the premier platform for accelerating general-purpose workloads. Nonetheless, a subset of these workloads continues to exhibit performance stagnation. The previously proposed Light-weight Out-Of-Order GPU (LOOG) execution scheme addresses this issue by augmenting conventional Thread-Level Parallelism with the exploitation of inherent Instruction-Level Parallelism. Although LOOG has been modeled using GPU simulation tools in previous studies, these implementations have suffered from limited accuracy in power consumption and critical path estimations, in addition to slow execution of applications. To overcome these limitations, this thesis proposes integrating LOOG into an RTL GPU framework and specifically Vortex GPU version 2.0, an open-source design that is well-suited for deployment on FPGA platforms. To preserve LOOG’s performance gain in Vortex’s RISC-V–based pipeline, the extension is meticulously designed to complement the existing micro-architecture and the operations it supports. Furthermore, a comprehensive investigation of design optimizations and trade-offs is conducted to enhance performance while constraining the overall Area and Power overhead. A detailed characterization of 21 Vortex workloads based on their stalling behavior is executed previous to the experimental evaluation, enabling the right-sizing of the micro-architecture across a broad design space that is supported by Vortex’s configurability. The results demonstrate an average speedup of up to approximately 23.5%, while maintaining lower Area-Delay and Power-Delay products compared to the in-order Vortex in various configurations.	en
heal.advisorName	Ξύδης, Σωτήριος	el
heal.committeeMemberName	Ξύδης, Σωτήριος	el
heal.committeeMemberName	Σούντρης, Δημήτριος	el
heal.committeeMemberName	Πεκμεστζή, Κιαμάλ	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών. Εργαστήριο Μικροϋπολογιστών και Ψηφιακών Συστημάτων VLSI	el
heal.academicPublisherID	ntua
heal.numberOfPages	117 σ.	el
heal.fullTextAvailability	false