| dc.contributor.author | Anastasiadis, Petros
|
|
| dc.date.accessioned | 2025-01-17T07:44:12Z | |
| dc.date.available | 2025-01-17T07:44:12Z | |
| dc.identifier.uri | https://dspace.lib.ntua.gr/xmlui/handle/123456789/60809 | |
| dc.identifier.uri | http://dx.doi.org/10.26240/heal.ntua.28505 | |
| dc.rights | Αναφορά Δημιουργού 3.0 Ελλάδα | * |
| dc.rights.uri | http://creativecommons.org/licenses/by/3.0/gr/ | * |
| dc.subject | Linear algebra | en |
| dc.subject | Graphics processing units (GPUs) | en |
| dc.subject | BLAS routines | en |
| dc.subject | Modeling | en |
| dc.subject | Autotuning | en |
| dc.subject | Γραμμική άλγεβρα | el |
| dc.subject | Επεξεργαστές γραφικών | el |
| dc.subject | Ρουτίνες BLAS | el |
| dc.subject | Μοντελοποίηση | el |
| dc.subject | Αυτόματη βελτιστοποίηση | el |
| dc.title | Model-assisted optimization of Linear Algebra routines on multi-GPU computing systems | en |
| dc.contributor.department | Computing Systems Laboratory | el |
| heal.type | doctoralThesis | |
| heal.classification | Performance engineering | en |
| heal.classification | Computer engineering | en |
| heal.classification | Software engineering | en |
| heal.language | en | |
| heal.access | free | |
| heal.recordProvider | ntua | el |
| heal.publicationDate | 2024-09-09 | |
| heal.abstract | Dense linear algebra operations appear frequently in high-performance computing (HPC) applications, rendering their performance crucial to achieving optimal scalability. As many modern HPC clusters contain multi-GPU nodes, BLAS operations are frequently offloaded on GPUs, necessitating optimized libraries to ensure good performance. However, optimizing BLAS for multi-GPU introduces numerous challenges similar to distributed computing, like data decomposition, task scheduling, and communication across GPUs with distinct memory spaces. This complexity of multi-GPU makes BLAS optimization very complex, leading to sub-optimal performance or system-specific solutions with reduced portability. To address these issues, we suggest a model-based autotuning approach: we introduce several performance models for BLAS and integrate them into PARALiA, an end-to-end BLAS library. PARALiA uses model-driven insights to dynamically autotune BLAS execution, tailoring performance-critical parameters for each specific problem and system during runtime. This autotuning is coupled with an optimized task scheduler, leading to near-optimal data distribution and performance-aware resource utilization. PARALiA provides state-of-the-art performance and energy efficiency and incorporates the ability to adapt to heterogeneous systems and scenarios via model-based decisions. Finally, we focus on the GEMM kernel, extending PARALiA with a custom static scheduler that integrates model-driven algorithmic, communication, and autotuning optimizations (PARALiA-GEMMex), which delivers significantly superior performance compared to the state-of-the-art. | en |
| heal.advisorName | Goumas, Georgios | |
| heal.committeeMemberName | Goumas, Georgios | |
| heal.committeeMemberName | Papadopoulou, Nikela | |
| heal.committeeMemberName | Koziris, Nectarios | |
| heal.committeeMemberName | Pnevmatikatos, Dionisios | |
| heal.committeeMemberName | Papaspyrou, Nikolaos | |
| heal.committeeMemberName | Xydis, Sotirios | |
| heal.committeeMemberName | Antonopoulos, Christos | |
| heal.academicPublisher | Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών | el |
| heal.academicPublisherID | ntua | |
| heal.numberOfPages | 183 | |
| heal.fullTextAvailability | false |
Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο: