dc.contributor.author |
Athanasaki, E |
en |
dc.contributor.author |
Kourtis, K |
en |
dc.contributor.author |
Anastopoulos, N |
en |
dc.contributor.author |
Koziris, N |
en |
dc.date.accessioned |
2014-03-01T02:43:42Z |
|
dc.date.available |
2014-03-01T02:43:42Z |
|
dc.date.issued |
2005 |
en |
dc.identifier.issn |
0302-9743 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/31494 |
|
dc.subject |
Cache Performance |
en |
dc.subject |
Data Layout |
en |
dc.subject |
Data Transformation |
en |
dc.subject |
Loop Tiling |
en |
dc.subject |
Memory Access |
en |
dc.subject |
Memory Hierarchy |
en |
dc.subject |
Memory Performance |
en |
dc.subject |
Performance Improvement |
en |
dc.subject |
Program Transformation |
en |
dc.subject |
Scientific Computing |
en |
dc.subject |
Theoretical Analysis |
en |
dc.subject.classification |
Computer Science, Theory & Methods |
en |
dc.subject.other |
Blocked data |
en |
dc.subject.other |
Loop tiling |
en |
dc.subject.other |
Memory-intensive applications |
en |
dc.subject.other |
Program transformations |
en |
dc.subject.other |
Buffer storage |
en |
dc.subject.other |
Computer applications |
en |
dc.subject.other |
Mathematical transformations |
en |
dc.subject.other |
Natural sciences computing |
en |
dc.subject.other |
Optimization |
en |
dc.subject.other |
Integrated circuit layout |
en |
dc.title |
Tuning blocked array layouts to exploit memory hierarchy in SMT architectures |
en |
heal.type |
conferenceItem |
en |
heal.identifier.primary |
10.1007/11573036_57 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1007/11573036_57 |
en |
heal.language |
English |
en |
heal.publicationDate |
2005 |
en |
heal.abstract |
Cache misses form a major bottleneck for memory-intensive applications, due to the significant latency of main memory accesses. Loop tiling, in conjunction with other program transformations, have been shown to be an effective approach to improving locality and cache exploitation, especially for dense matrix scientific computations. Beyond loop nest optimizations, data transformation techniques, and in particular blocked data layouts, have been used to boost the cache performance. The stability of performance improvements achieved are heavily dependent on the appropriate selection of tile sizes. In this paper, we investigate the memory performance of blocked data layouts, and provide a theoretical analysis for the multiple levels of memory hierarchy, when they are organized in a set associative fashion. According to this analysis, the optimal tile size that maximizes LI cache utilization, should completely fit in the LI cache, even for loop bodies that access more than just one array. Increased self- or/and cross-interference misses can be tolerated through prefetching. Such larger tiles also reduce mispredicted branches and, as a result, the lost CPU cycles that arise. Results are validated through actual benchmarks on an SMT platform. © Springer-Verlag Berlin Heidelberg 2005. |
en |
heal.publisher |
SPRINGER-VERLAG BERLIN |
en |
heal.journalName |
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
en |
heal.bookName |
LECTURE NOTES IN COMPUTER SCIENCE |
en |
dc.identifier.doi |
10.1007/11573036_57 |
en |
dc.identifier.isi |
ISI:000233675500057 |
en |
dc.identifier.volume |
3746 LNCS |
en |
dc.identifier.spage |
600 |
en |
dc.identifier.epage |
610 |
en |