Fast indexing for blocked array layouts to improve multi-level cache locality

Athanasaki, E; Koziris, N

dc.contributor.author	Athanasaki, E	en
dc.contributor.author	Koziris, N	en
dc.date.accessioned	2014-03-01T02:42:47Z
dc.date.available	2014-03-01T02:42:47Z
dc.date.issued	2004	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/31075
dc.subject	Cycle Time	en
dc.subject	Indexation	en
dc.subject	Memory Access	en
dc.subject	Memory Hierarchy	en
dc.subject	Program Transformation	en
dc.subject	Multi Dimensional	en
dc.subject.other	Control transformations	en
dc.subject.other	Iterative instruction streams	en
dc.subject.other	Loop fusion	en
dc.subject.other	Multi-level cache locality	en
dc.subject.other	Algorithms	en
dc.subject.other	Binary codes	en
dc.subject.other	Cache memory	en
dc.subject.other	Computer simulation	en
dc.subject.other	Computer software	en
dc.subject.other	Hierarchical systems	en
dc.subject.other	Optimization	en
dc.subject.other	Pipeline processing systems	en
dc.subject.other	Program compilers	en
dc.subject.other	Computer architecture	en
dc.title	Fast indexing for blocked array layouts to improve multi-level cache locality	en
heal.type	conferenceItem	en
heal.identifier.primary	10.1109/INTERA.2004.1299515	en
heal.identifier.secondary	http://dx.doi.org/10.1109/INTERA.2004.1299515	en
heal.publicationDate	2004	en
heal.abstract	One of the key challenges computer architects and compiler writers are facing, is the increasing discrepancy between processor cycle times and main memory access times. To overcome this problem, program transformations that decrease cache misses are used, to reduce average latency for memory accesses. Tiling is a widely used loop iteration reordering technique for improving locality of references. In this paper, we further reduce cache misses, restructuring the memory layout of multi-dimensional arrays, that are accessed by tiled instruction code. In our method, array elements are stored in a blocked way, exactly as they are swept by the tiled instruction stream. We present a straightforward way to easily translate multi-dimensional indexing of arrays into their blocked memory layout using simple binary-mask operations. Indices for such array layouts are now easily calculated based on the algebra of dilated integers, similarly to morton-order indexing. Actual experimental results on three different hardware platforms, using 5 benchmarks, illustrate that execution time is greatly improved when combining tiled code with tiled array layouts and binary mask-based index translation functions. Both TLB and L1 cache misses are concurrently minimized, for the same tile size, thus, applying the proposed layouts, locality of references is greatly improved. Finally, simulations using the Simplescalar tool, verify that our enhanced performance is due to the considerable reduction of cache misses in all levels of memory hierarchy.	en
heal.journalName	Proceedings - Eighth Workshop on Interaction between Compilers and Computer Architectures, INTERACT-8 2004	en
dc.identifier.doi	10.1109/INTERA.2004.1299515	en
dc.identifier.spage	109	en
dc.identifier.epage	119	en