HEAL DSpace

Exploring the effect of block shapes on the performance of sparse kernels

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Karakasis, V en
dc.contributor.author Goumas, G en
dc.contributor.author Koziris, N en
dc.date.accessioned 2014-03-01T02:46:09Z
dc.date.available 2014-03-01T02:46:09Z
dc.date.issued 2009 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/32574
dc.subject Experimental Evaluation en
dc.subject Memory Bandwidth en
dc.subject Sparse Matrix en
dc.subject Compressed Sparse Row en
dc.subject.other Compressed sparse row en
dc.subject.other Experimental evaluation en
dc.subject.other Memory bandwidths en
dc.subject.other Memory wall en
dc.subject.other Micro architectures en
dc.subject.other Multithreaded en
dc.subject.other Performance variations en
dc.subject.other Sparse kernels en
dc.subject.other Sparse matrix-vector multiplication en
dc.subject.other Storage formats en
dc.subject.other Vectorization en
dc.subject.other Shape memory effect en
dc.subject.other Distributed parameter networks en
dc.title Exploring the effect of block shapes on the performance of sparse kernels en
heal.type conferenceItem en
heal.identifier.primary 10.1109/IPDPS.2009.5161159 en
heal.identifier.secondary http://dx.doi.org/10.1109/IPDPS.2009.5161159 en
heal.identifier.secondary 5161159 en
heal.publicationDate 2009 en
heal.abstract In this paper we explore the impact of the block shape on blocked and vectorized versions of the Sparse Matrix-Vector Multiplication (SpMV) kernel and build upon previous work by performing an extensive experimental evaluation of the most widespread blocking storage format, namely Block Compressed Sparse Row (BCSR) format, on a set of modern commodity microarchitectures. We evaluate the merit of vectorization on the memory-bound blocked SpMV kernel and report the results for single- and multithreaded (both SMP and NUMA) configurations. The performance of blocked SpMV can significantly vary with the block shape, despite similar memory bandwidth demands for different blocks. This is further accentuated when vectorizing the kernel. When moving to multiple cores, the memory wall problem becomes even more evident and may overwhelm any benefit from optimizations targeting the computational part of the kernel. In this paper we explore and discuss the architectural characteristics of modern commodity architectures that are responsible for these performance variations between block shapes. © 2009 IEEE. en
heal.journalName IPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium en
dc.identifier.doi 10.1109/IPDPS.2009.5161159 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής