HEAL DSpace

Improving cache locality with blocked array layouts

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Athanasaki, E en
dc.contributor.author Koziris, N en
dc.date.accessioned 2014-03-01T02:42:48Z
dc.date.available 2014-03-01T02:42:48Z
dc.date.issued 2004 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/31090
dc.subject Indexation en
dc.subject Matrix Multiplication en
dc.subject Memory Access en
dc.subject.other Blocked Array Layouts en
dc.subject.other Inter-variable paddling en
dc.subject.other Loop skewing en
dc.subject.other Multi-level memory hierarchies en
dc.subject.other Arrays en
dc.subject.other Computer simulation en
dc.subject.other Functions en
dc.subject.other Matrix algebra en
dc.subject.other Program compilers en
dc.subject.other Program processors en
dc.subject.other Cache memory en
dc.title Improving cache locality with blocked array layouts en
heal.type conferenceItem en
heal.identifier.primary 10.1109/EMPDP.2004.1271460 en
heal.identifier.secondary http://dx.doi.org/10.1109/EMPDP.2004.1271460 en
heal.publicationDate 2004 en
heal.abstract Minimizing cache misses is one of the most important factors to reduce average latency for memory accesses. Tiled codes modify the instruction stream to exploit cache locality/or array accesses. In this paper, we further reduce cache misses, restructuring the memory layout of multidimensional arrays, that are accessed by tiled instruction code. In our method, array elements are stored in a blocked way, exactly as they are swept by the tiled instruction stream. We present a straightforward way to easily translate multidimensional indexing of arrays into their blocked memory layout using simple binary-mask operations. Indices for such array layouts are easily calculated based on the algebra of dilated integers, similarly to morton-order indexing. Actual experimental results, using matrix multiplication and LU-decomposition on various size arrays, illustrate that execution time is greatly improved when combining tiled code with tiled array layouts and binary mask-based index translation functions. Simulations using the Simplescalar tool, verify that enhanced performance is due to the considerable reduction of total cache misses. en
heal.journalName Proceedings - Euromicro Conference on Parellel, Distribeted and Network-based Proceeding en
dc.identifier.doi 10.1109/EMPDP.2004.1271460 en
dc.identifier.spage 308 en
dc.identifier.epage 317 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής