HEAL DSpace

Model-assisted optimization of Linear Algebra routines on multi-GPU computing systems

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Anastasiadis, Petros
dc.date.accessioned 2025-01-17T07:44:12Z
dc.date.available 2025-01-17T07:44:12Z
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/60809
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.28505
dc.rights Αναφορά Δημιουργού 3.0 Ελλάδα *
dc.rights.uri http://creativecommons.org/licenses/by/3.0/gr/ *
dc.subject Linear algebra en
dc.subject Graphics processing units (GPUs) en
dc.subject BLAS routines en
dc.subject Modeling en
dc.subject Autotuning en
dc.subject Γραμμική άλγεβρα el
dc.subject Επεξεργαστές γραφικών el
dc.subject Ρουτίνες BLAS el
dc.subject Μοντελοποίηση el
dc.subject Αυτόματη βελτιστοποίηση el
dc.title Model-assisted optimization of Linear Algebra routines on multi-GPU computing systems en
dc.contributor.department Computing Systems Laboratory el
heal.type doctoralThesis
heal.classification Performance engineering en
heal.classification Computer engineering en
heal.classification Software engineering en
heal.language en
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2024-09-09
heal.abstract Dense linear algebra operations appear frequently in high-performance computing (HPC) applications, rendering their performance crucial to achieving optimal scalability. As many modern HPC clusters contain multi-GPU nodes, BLAS operations are frequently offloaded on GPUs, necessitating optimized libraries to ensure good performance. However, optimizing BLAS for multi-GPU introduces numerous challenges similar to distributed computing, like data decomposition, task scheduling, and communication across GPUs with distinct memory spaces. This complexity of multi-GPU makes BLAS optimization very complex, leading to sub-optimal performance or system-specific solutions with reduced portability. To address these issues, we suggest a model-based autotuning approach: we introduce several performance models for BLAS and integrate them into PARALiA, an end-to-end BLAS library. PARALiA uses model-driven insights to dynamically autotune BLAS execution, tailoring performance-critical parameters for each specific problem and system during runtime. This autotuning is coupled with an optimized task scheduler, leading to near-optimal data distribution and performance-aware resource utilization. PARALiA provides state-of-the-art performance and energy efficiency and incorporates the ability to adapt to heterogeneous systems and scenarios via model-based decisions. Finally, we focus on the GEMM kernel, extending PARALiA with a custom static scheduler that integrates model-driven algorithmic, communication, and autotuning optimizations (PARALiA-GEMMex), which delivers significantly superior performance compared to the state-of-the-art. en
heal.advisorName Goumas, Georgios
heal.committeeMemberName Goumas, Georgios
heal.committeeMemberName Papadopoulou, Nikela
heal.committeeMemberName Koziris, Nectarios
heal.committeeMemberName Pnevmatikatos, Dionisios
heal.committeeMemberName Papaspyrou, Nikolaos
heal.committeeMemberName Xydis, Sotirios
heal.committeeMemberName Antonopoulos, Christos
heal.academicPublisher Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών el
heal.academicPublisherID ntua
heal.numberOfPages 183
heal.fullTextAvailability false


Αρχεία σε αυτό το τεκμήριο

Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο:

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής

Αναφορά Δημιουργού 3.0 Ελλάδα Εκτός από όπου ορίζεται κάτι διαφορετικό, αυτή η άδεια περιγράφεται ως Αναφορά Δημιουργού 3.0 Ελλάδα