Βελτιστοποίηση κατανεμημένων αλγορίθμων πολλαπλασιασμού πινάκων για αποδοτική εκτέλεση σε υπερυπολογιστικά συστήματα επεξεργαστών γραφικών

Βρεττός, Βασίλης; Vrettos, Vasilis

dc.contributor.author	Βρεττός, Βασίλης	el
dc.contributor.author	Vrettos, Vasilis	en
dc.date.accessioned	2025-03-11T08:45:09Z
dc.date.available	2025-03-11T08:45:09Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/61318
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.29014
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc/3.0/gr/	*
dc.subject	Πολλαπλασιασμός πίνακα με πίνακα	el
dc.subject	Επεξεργαστές γραφικών	el
dc.subject	Υπερυπολογιστικά Συστήματα	el
dc.subject	Συστοιχίες Υπολογιστών	el
dc.subject	Βελτιστοποίηση Επικοινωνίας/Υπολογισμού	el
dc.subject	Distributed dense matrix multiplication	en
dc.subject	Cluster Systems	en
dc.subject	Graphics Processors (GPUs)	en
dc.subject	HPC	en
dc.subject	Distributed Algorithm Optimization	en
dc.subject	Communication/Computation Overlap	en
dc.title	Βελτιστοποίηση κατανεμημένων αλγορίθμων πολλαπλασιασμού πινάκων για αποδοτική εκτέλεση σε υπερυπολογιστικά συστήματα επεξεργαστών γραφικών	el
dc.title	Optimization of distributed dense matrix multiplication algorithms for optimal usage in modern GPU Clusters	en
heal.type	bachelorThesis
heal.classification	Computer Engineering	en
heal.classification	Μηχανική Υπολογιστών	el
heal.language	el
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2024-10-18
heal.abstract	Σκοπός της παρούσης διπλωματικής εργασίας είναι η μελέτη υπάρχοντων κατανεμημένων αλγορίθμων πολλαπλασιασμού πινάκων που εκτελούνται σε υπερυπολογιστές και γενικά συστήματα συστοιχιών και η υλοποίηση μοντέρνων εκδόσεων που εκτελούνται αποδοτικά σε επεξεργαστές γραφικών. Το μεγαλύτερο ποσοστό της προϋπάρχουσας έρευνας αγνοεί τα ιδιαίτερα χαρακτηριστικά των GPU, αφήνωντας αρκετά σημαντικό περιθώριο επίδοσης ανεκμετάλλευτο. Έχοντας μελετήσει την αρχιτεκτονική των GPU, τροποποιούμε τους ήδη υπάρχοντες αλγορίθμους δίνοντας βάση στην αποδοτική επικοινωνία μεταξύ συσκευών και την ελάχιστη χρήση μνήμης με τελικό στόχο την παράδοση μιας αποδοτικής βιβλιοθήκης πολλαπλασιασμού πινάκων η οποία εκτελείται σε μοντέρνα υπερυπολογιστικά συστήματα χρησιμοποιώντας επεξεργαστές γραφικών.	el
heal.abstract	Multiplication of Dense Matrices is one of the most common and important mathematical kernels executed in both normal systems and clusters alike. Following the major increase of GPUs in the HPC world due to the recent ”AI boom”, the GEMM kernel is commonly found being executed by Graphics Processors instead of the conventional CPUs. The previous distributed algorithms used by the GEMM calls were heavily designed for use in CPU based systems and fall short when executed in modern GPU nodes. In this thesis, we study these distributed algorithms as well as PBLAS libraries that focus on GPU execution in an attempt to optimize them for use in modern cluster systems. In the process, we present our own method of execution, built upon existing algorithms, modified in such a way that many of the special features of contemporary Graphics Processors are utilized in order to increase performance.	en
heal.advisorName	Goumas, Georgios	en
heal.advisorName	Γεώργιος, Γκούμας	el
heal.committeeMemberName	Πνευματικάτος, Διονύσιος	el
heal.committeeMemberName	Κοζύρης, Νεκτάριος	el
heal.committeeMemberName	Koziris, Nektarios	en
heal.committeeMemberName	Pnevmatikatos, Dionisis	en
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών	el
heal.academicPublisherID	ntua
heal.numberOfPages	86 σ.	el
heal.fullTextAvailability	false