HEAL DSpace

A tile size selection analysis for blocked array layouts

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Athanasaki, E en
dc.contributor.author Koziris, N en
dc.contributor.author Tsanakas, P en
dc.date.accessioned 2014-03-01T02:43:05Z
dc.date.available 2014-03-01T02:43:05Z
dc.date.issued 2005 en
dc.identifier.issn 15506207 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/31223
dc.subject Blocked array layouts en
dc.subject Cache miss analysis en
dc.subject Tile selection en
dc.subject.other Blocked array layouts en
dc.subject.other Cache miss analysis en
dc.subject.other Memory speed en
dc.subject.other Tile selection en
dc.subject.other Cache memory en
dc.subject.other Computational methods en
dc.subject.other Data reduction en
dc.subject.other Data transfer en
dc.subject.other Hierarchical systems en
dc.subject.other Optimization en
dc.subject.other Program processors en
dc.subject.other Data storage equipment en
dc.title A tile size selection analysis for blocked array layouts en
heal.type conferenceItem en
heal.identifier.primary 10.1109/INTERACT.2005.1 en
heal.identifier.secondary http://dx.doi.org/10.1109/INTERACT.2005.1 en
heal.identifier.secondary 1423141 en
heal.publicationDate 2005 en
heal.abstract Efficient use of the memory hierarchy is essential for good performance due to the ever increasing gap between processor and memory speed. Program transformations such as loop tiling have been shown to be an effective approach to improving locality and cache exploitation, especially for dense matrix scientific computations. In conjunction with tiling, several experimental studies have been conducted on blocked data layouts, as a data transformation technique used to boost the cache performance. The stability of the achieved performance improvements are heavily dependent on the appropriate selection of tile sizes, taking into account the actual layout of the arrays in memory. In this paper, we first provide a theoretical analysis for the cache and TLB performance of blocked data layouts. According to this analysis, the optimal tile size that maximizes L1 cache utilization, should completely fit in the LI cache, to avoid any interference misses. We prove that when applying optimization techniques, such as register assignment, array alignment, prefetching and loop unrolling, tile sizes equal to L1 capacity, offer better cache utilization, even for loop bodies that access more than just one array. Increased self-or/and cross-interference misses are now tolerated through prefetching. Such larger tiles also reduce lost CPU cycles due to less mispredicted branches. Results are validated through simulations and actual benchmarks on various modern platforms. en
heal.journalName Proceedings - Annual Workshop on Interaction between Compilers and Computer Architectures, INTERACT en
dc.identifier.doi 10.1109/INTERACT.2005.1 en
dc.identifier.volume 2005 en
dc.identifier.spage 70 en
dc.identifier.epage 81 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής