Tuning blocked array layouts to exploit memory hierarchy in SMT architectures

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Athanasaki, E en
dc.contributor.author Kourtis, K en
dc.contributor.author Anastopoulos, N en
dc.contributor.author Koziris, N en
dc.date.accessioned 2014-03-01T02:43:42Z
dc.date.available 2014-03-01T02:43:42Z
dc.date.issued 2005 en
dc.identifier.issn 0302-9743 en
dc.identifier.uri http://hdl.handle.net/123456789/31494
dc.subject Cache Performance en
dc.subject Data Layout en
dc.subject Data Transformation en
dc.subject Loop Tiling en
dc.subject Memory Access en
dc.subject Memory Hierarchy en
dc.subject Memory Performance en
dc.subject Performance Improvement en
dc.subject Program Transformation en
dc.subject Scientific Computing en
dc.subject Theoretical Analysis en
dc.subject.classification Computer Science, Theory & Methods en
dc.subject.other Blocked data en
dc.subject.other Loop tiling en
dc.subject.other Memory-intensive applications en
dc.subject.other Program transformations en
dc.subject.other Buffer storage en
dc.subject.other Computer applications en
dc.subject.other Mathematical transformations en
dc.subject.other Natural sciences computing en
dc.subject.other Optimization en
dc.subject.other Integrated circuit layout en
dc.title Tuning blocked array layouts to exploit memory hierarchy in SMT architectures en
heal.type conferenceItem en
heal.identifier.primary 10.1007/11573036_57 en
heal.identifier.secondary http://dx.doi.org/10.1007/11573036_57 en
heal.language English en
heal.publicationDate 2005 en
heal.abstract Cache misses form a major bottleneck for memory-intensive applications, due to the significant latency of main memory accesses. Loop tiling, in conjunction with other program transformations, have been shown to be an effective approach to improving locality and cache exploitation, especially for dense matrix scientific computations. Beyond loop nest optimizations, data transformation techniques, and in particular blocked data layouts, have been used to boost the cache performance. The stability of performance improvements achieved are heavily dependent on the appropriate selection of tile sizes. In this paper, we investigate the memory performance of blocked data layouts, and provide a theoretical analysis for the multiple levels of memory hierarchy, when they are organized in a set associative fashion. According to this analysis, the optimal tile size that maximizes LI cache utilization, should completely fit in the LI cache, even for loop bodies that access more than just one array. Increased self- or/and cross-interference misses can be tolerated through prefetching. Such larger tiles also reduce mispredicted branches and, as a result, the lost CPU cycles that arise. Results are validated through actual benchmarks on an SMT platform. © Springer-Verlag Berlin Heidelberg 2005. en
heal.publisher SPRINGER-VERLAG BERLIN en
heal.journalName Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) en
dc.identifier.doi 10.1007/11573036_57 en
dc.identifier.isi ISI:000233675500057 en
dc.identifier.volume 3746 LNCS en
dc.identifier.spage 600 en
dc.identifier.epage 610 en

Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record