dc.contributor.author |
Karakasis, V |
en |
dc.contributor.author |
Goumas, G |
en |
dc.contributor.author |
Koziris, N |
en |
dc.date.accessioned |
2014-03-01T02:46:09Z |
|
dc.date.available |
2014-03-01T02:46:09Z |
|
dc.date.issued |
2009 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/32574 |
|
dc.subject |
Experimental Evaluation |
en |
dc.subject |
Memory Bandwidth |
en |
dc.subject |
Sparse Matrix |
en |
dc.subject |
Compressed Sparse Row |
en |
dc.subject.other |
Compressed sparse row |
en |
dc.subject.other |
Experimental evaluation |
en |
dc.subject.other |
Memory bandwidths |
en |
dc.subject.other |
Memory wall |
en |
dc.subject.other |
Micro architectures |
en |
dc.subject.other |
Multithreaded |
en |
dc.subject.other |
Performance variations |
en |
dc.subject.other |
Sparse kernels |
en |
dc.subject.other |
Sparse matrix-vector multiplication |
en |
dc.subject.other |
Storage formats |
en |
dc.subject.other |
Vectorization |
en |
dc.subject.other |
Shape memory effect |
en |
dc.subject.other |
Distributed parameter networks |
en |
dc.title |
Exploring the effect of block shapes on the performance of sparse kernels |
en |
heal.type |
conferenceItem |
en |
heal.identifier.primary |
10.1109/IPDPS.2009.5161159 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1109/IPDPS.2009.5161159 |
en |
heal.identifier.secondary |
5161159 |
en |
heal.publicationDate |
2009 |
en |
heal.abstract |
In this paper we explore the impact of the block shape on blocked and vectorized versions of the Sparse Matrix-Vector Multiplication (SpMV) kernel and build upon previous work by performing an extensive experimental evaluation of the most widespread blocking storage format, namely Block Compressed Sparse Row (BCSR) format, on a set of modern commodity microarchitectures. We evaluate the merit of vectorization on the memory-bound blocked SpMV kernel and report the results for single- and multithreaded (both SMP and NUMA) configurations. The performance of blocked SpMV can significantly vary with the block shape, despite similar memory bandwidth demands for different blocks. This is further accentuated when vectorizing the kernel. When moving to multiple cores, the memory wall problem becomes even more evident and may overwhelm any benefit from optimizations targeting the computational part of the kernel. In this paper we explore and discuss the architectural characteristics of modern commodity architectures that are responsible for these performance variations between block shapes. © 2009 IEEE. |
en |
heal.journalName |
IPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium |
en |
dc.identifier.doi |
10.1109/IPDPS.2009.5161159 |
en |