| dc.contributor.author |
Ζέρβα, Μαρία
|
el |
| dc.contributor.author |
Zerva, Maria
|
en |
| dc.date.accessioned |
2025-12-01T09:39:08Z |
|
| dc.date.available |
2025-12-01T09:39:08Z |
|
| dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/62956 |
|
| dc.identifier.uri |
http://dx.doi.org/10.26240/heal.ntua.30652 |
|
| dc.rights |
Default License |
|
| dc.subject |
High Performance Computing |
en |
| dc.subject |
GPU Micro-Architecture |
en |
| dc.subject |
Out-Of-Order Execution |
en |
| dc.subject |
RISC-V |
en |
| dc.subject |
RTL Design |
en |
| dc.subject |
Υψηλής Επίδοσης Υπολογισµός |
el |
| dc.subject |
GPU Μιϰροαρχιτεϰρονιϰή |
el |
| dc.subject |
Εϰτέλεση Εϰτός Σειράς |
el |
| dc.subject |
Σχεδίαση RTL |
el |
| dc.title |
FPGA Design and Analysis of a RISC-V Out-Of-Order GPU |
en |
| heal.type |
bachelorThesis |
|
| heal.classification |
Computer Engineering |
en |
| heal.language |
en |
|
| heal.access |
free |
|
| heal.recordProvider |
ntua |
el |
| heal.publicationDate |
2025-03-17 |
|
| heal.abstract |
Owing to their exceptional computational performance and cost efficiency, GPUs have solidified their status as the premier platform for accelerating general-purpose workloads. Nonetheless, a subset of these workloads continues to exhibit performance stagnation. The previously proposed Light-weight Out-Of-Order GPU (LOOG) execution scheme addresses this issue by augmenting conventional Thread-Level Parallelism with the exploitation of inherent Instruction-Level Parallelism. Although LOOG has been modeled using GPU simulation tools in previous studies, these implementations have suffered from limited accuracy in power consumption and critical path estimations, in addition to slow execution of applications.
To overcome these limitations, this thesis proposes integrating LOOG into an RTL GPU framework and specifically Vortex GPU version 2.0, an open-source design that is well-suited for deployment on FPGA platforms. To preserve LOOG’s performance gain in Vortex’s RISC-V–based pipeline, the extension is meticulously designed to complement the existing micro-architecture and the operations it supports. Furthermore, a comprehensive investigation of design optimizations and trade-offs is conducted to enhance performance while constraining the overall Area and Power overhead.
A detailed characterization of 21 Vortex workloads based on their stalling behavior is executed previous to the experimental evaluation, enabling the right-sizing of the micro-architecture across a broad design space that is supported by Vortex’s configurability. The results demonstrate an average speedup of up to approximately 23.5%, while maintaining lower Area-Delay and Power-Delay products compared to the in-order Vortex in various configurations. |
en |
| heal.advisorName |
Ξύδης, Σωτήριος |
el |
| heal.committeeMemberName |
Ξύδης, Σωτήριος |
el |
| heal.committeeMemberName |
Σούντρης, Δημήτριος |
el |
| heal.committeeMemberName |
Πεκμεστζή, Κιαμάλ |
el |
| heal.academicPublisher |
Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών. Εργαστήριο Μικροϋπολογιστών και Ψηφιακών Συστημάτων VLSI |
el |
| heal.academicPublisherID |
ntua |
|
| heal.numberOfPages |
117 σ. |
el |
| heal.fullTextAvailability |
false |
|