dc.contributor.author | Anastasiadis, Petros | |
dc.date.accessioned | 2025-01-17T07:44:12Z | |
dc.date.available | 2025-01-17T07:44:12Z | |
dc.identifier.uri | https://dspace.lib.ntua.gr/xmlui/handle/123456789/60809 | |
dc.identifier.uri | http://dx.doi.org/10.26240/heal.ntua.28505 | |
dc.rights | Αναφορά Δημιουργού 3.0 Ελλάδα | * |
dc.rights.uri | http://creativecommons.org/licenses/by/3.0/gr/ | * |
dc.subject | Linear algebra | en |
dc.subject | Graphics processing units (GPUs) | en |
dc.subject | BLAS routines | en |
dc.subject | Modeling | en |
dc.subject | Autotuning | en |
dc.subject | Γραμμική άλγεβρα | el |
dc.subject | Επεξεργαστές γραφικών | el |
dc.subject | Ρουτίνες BLAS | el |
dc.subject | Μοντελοποίηση | el |
dc.subject | Αυτόματη βελτιστοποίηση | el |
dc.title | Model-assisted optimization of Linear Algebra routines on multi-GPU computing systems | en |
dc.contributor.department | Computing Systems Laboratory | el |
heal.type | doctoralThesis | |
heal.classification | Performance engineering | en |
heal.classification | Computer engineering | en |
heal.classification | Software engineering | en |
heal.language | en | |
heal.access | free | |
heal.recordProvider | ntua | el |
heal.publicationDate | 2024-09-09 | |
heal.abstract | Dense linear algebra operations appear frequently in high-performance computing (HPC) applications, rendering their performance crucial to achieving optimal scalability. As many modern HPC clusters contain multi-GPU nodes, BLAS operations are frequently offloaded on GPUs, necessitating optimized libraries to ensure good performance. However, optimizing BLAS for multi-GPU introduces numerous challenges similar to distributed computing, like data decomposition, task scheduling, and communication across GPUs with distinct memory spaces. This complexity of multi-GPU makes BLAS optimization very complex, leading to sub-optimal performance or system-specific solutions with reduced portability. To address these issues, we suggest a model-based autotuning approach: we introduce several performance models for BLAS and integrate them into PARALiA, an end-to-end BLAS library. PARALiA uses model-driven insights to dynamically autotune BLAS execution, tailoring performance-critical parameters for each specific problem and system during runtime. This autotuning is coupled with an optimized task scheduler, leading to near-optimal data distribution and performance-aware resource utilization. PARALiA provides state-of-the-art performance and energy efficiency and incorporates the ability to adapt to heterogeneous systems and scenarios via model-based decisions. Finally, we focus on the GEMM kernel, extending PARALiA with a custom static scheduler that integrates model-driven algorithmic, communication, and autotuning optimizations (PARALiA-GEMMex), which delivers significantly superior performance compared to the state-of-the-art. | en |
heal.advisorName | Goumas, Georgios | |
heal.committeeMemberName | Goumas, Georgios | |
heal.committeeMemberName | Papadopoulou, Nikela | |
heal.committeeMemberName | Koziris, Nectarios | |
heal.committeeMemberName | Pnevmatikatos, Dionisios | |
heal.committeeMemberName | Papaspyrou, Nikolaos | |
heal.committeeMemberName | Xydis, Sotirios | |
heal.committeeMemberName | Antonopoulos, Christos | |
heal.academicPublisher | Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών | el |
heal.academicPublisherID | ntua | |
heal.numberOfPages | 183 | |
heal.fullTextAvailability | false |
Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο: