dc.contributor.author |
Kourtis, K |
en |
dc.contributor.author |
Goumas, G |
en |
dc.contributor.author |
Koziris, N |
en |
dc.date.accessioned |
2014-03-01T02:45:30Z |
|
dc.date.available |
2014-03-01T02:45:30Z |
|
dc.date.issued |
2008 |
en |
dc.identifier.issn |
01903918 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/32283 |
|
dc.subject |
Coarse Grained |
en |
dc.subject |
Memory Bandwidth |
en |
dc.subject |
Shared Memory |
en |
dc.subject |
Sparse Matrix |
en |
dc.subject |
Structured Data |
en |
dc.subject.other |
Coarse grains |
en |
dc.subject.other |
Compression schemes |
en |
dc.subject.other |
Delta encoding |
en |
dc.subject.other |
Memory bandwidth requirements |
en |
dc.subject.other |
Multithreaded |
en |
dc.subject.other |
Numerical values |
en |
dc.subject.other |
Shared memories |
en |
dc.subject.other |
Sparse matrixes |
en |
dc.subject.other |
Structural datums |
en |
dc.subject.other |
Vector multiplications |
en |
dc.subject.other |
Data compression |
en |
dc.subject.other |
Encoding (symbols) |
en |
dc.subject.other |
Online searching |
en |
dc.subject.other |
Vectors |
en |
dc.subject.other |
Data storage equipment |
en |
dc.title |
Improving the performance of multithreaded sparse matrix-vector multiplication using index and value compression |
en |
heal.type |
conferenceItem |
en |
heal.identifier.primary |
10.1109/ICPP.2008.62 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1109/ICPP.2008.62 |
en |
heal.identifier.secondary |
4625888 |
en |
heal.publicationDate |
2008 |
en |
heal.abstract |
The Sparse Matrix-Vector Multiplication kernel exhibits limited potential for taking advantage of modern shared memory architectures due to its large memory bandwidth requirements. To decrease memory contention and improve the performance of the kernel we propose two compression schemes. The first, called CSR-DU, targets the reduction of the matrix structural data by applying coarse grain delta encoding for the column indices. The second scheme, called CSR-VI, targets the reduction of the numerical values using indirect indexing and can only be applied to matrices which contain a small number of unique values. Evaluation of both methods on a rich matrix set showed that they can significantly improve the performance of the multithreaded version of the kernel and achieve good scalability for large matrices. © 2008 IEEE. |
en |
heal.journalName |
Proceedings of the International Conference on Parallel Processing |
en |
dc.identifier.doi |
10.1109/ICPP.2008.62 |
en |
dc.identifier.spage |
511 |
en |
dc.identifier.epage |
519 |
en |