HEAL DSpace

A pipelined schedule to minimize completion time for loop tiling with computation and communication overlapping

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Koziris, N en
dc.contributor.author Sotiropoulos, A en
dc.contributor.author Goumas, G en
dc.date.accessioned 2014-03-01T01:18:33Z
dc.date.available 2014-03-01T01:18:33Z
dc.date.issued 2003 en
dc.identifier.issn 0743-7315 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/15081
dc.subject Loop Tiling en
dc.subject.classification Computer Science, Theory & Methods en
dc.subject.other UNIFORM DEPENDENCIES en
dc.subject.other ALGORITHMS en
dc.subject.other SPACES en
dc.title A pipelined schedule to minimize completion time for loop tiling with computation and communication overlapping en
heal.type journalArticle en
heal.identifier.primary 10.1016/S0743-7315(03)00102-3 en
heal.identifier.secondary http://dx.doi.org/10.1016/S0743-7315(03)00102-3 en
heal.language English en
heal.publicationDate 2003 en
heal.abstract This paper proposes a new method for the problem of minimizing the execution time of nested for-loops using a tiling transformation. In our approach, we are interested not only in tile size and shape according to the required communication to computation ratio, but also in overall completion time. We select a time hyperplane to execute different tiles much more efficiently by exploiting the inherent overlapping between communication and computation phases among successive, atomic tile executions. We assign tiles to processors according to the tile space boundaries, thus considering the iteration space bounds. Our schedule considerably reduces overall completion time under the assumption that some part from every communication phase can be efficiently overlapped with atomic, pure tile computations. The overall schedule resembles a pipelined datapath where computations are not anymore interleaved with sends and receives to nonlocal processors. We survey the application of our schedule to modern communication architectures. We performed two sets of experimental results, one using MPI primitives over FastEthernet and one using the SISCI API over an SCI network. In both cases, the total completion time is significantly reduced. (C) 2003 Elsevier Inc. All rights reserved. en
heal.publisher ACADEMIC PRESS INC ELSEVIER SCIENCE en
heal.journalName Journal of Parallel and Distributed Computing en
dc.identifier.doi 10.1016/S0743-7315(03)00102-3 en
dc.identifier.isi ISI:000186551500009 en
dc.identifier.volume 63 en
dc.identifier.issue 11 en
dc.identifier.spage 1138 en
dc.identifier.epage 1151 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής