HEAL DSpace

Scheduling of tiled nested loops onto a cluster with a fixed number of SMP nodes

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Athanasaki, M en
dc.contributor.author Koukis, E en
dc.contributor.author Koziris, N en
dc.date.accessioned 2014-03-01T02:42:57Z
dc.date.available 2014-03-01T02:42:57Z
dc.date.issued 2004 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/31152
dc.subject Nested Loops en
dc.subject Parallel Architecture en
dc.subject Time Dependent en
dc.subject.other Cluster configuration en
dc.subject.other Nested loops en
dc.subject.other Network interface cards (NIC) en
dc.subject.other Symmetric MultiProcessors (SMP) en
dc.subject.other Bandwidth en
dc.subject.other Computational methods en
dc.subject.other Computer networks en
dc.subject.other Interfaces (computer) en
dc.subject.other Network protocols en
dc.subject.other Numerical analysis en
dc.subject.other Optimization en
dc.subject.other Synchronization en
dc.subject.other Multiprocessing systems en
dc.title Scheduling of tiled nested loops onto a cluster with a fixed number of SMP nodes en
heal.type conferenceItem en
heal.identifier.primary 10.1109/EMPDP.2004.1271475 en
heal.identifier.secondary http://dx.doi.org/10.1109/EMPDP.2004.1271475 en
heal.publicationDate 2004 en
heal.abstract In this paper we propose several alternative methods for the compile time scheduling of Tiled Nested Loops onto a fixed size parallel architecture. We investigate the distribution of tiles among processors, provided that we have chosen either a non-overlapping communication mode, which involves successive computation and communication steps, or an overlapping communication mode, which supposes a pipelined, concurrent execution of communication and computations. In order to utilize the available processors as efficiently as possible, we can either adopt a cyclic assignment schedule, or assign neighboring tiles to the same CPU, or adapt the size and shape of tiles, so that the required number of processors is exactly equal to the number of the available ones. We theoretically and experimentally compare the proposed schedules, so as to design one which achieves the minimum total execution time, depending on the cluster configuration, (i.e. number and type of nodes, interconnect bandwidth, etc) the internal characteristics of the underlying architecture (i.e. NIC and DMA latencies, etc) and the iteration space size and shape. en
heal.journalName Proceedings - Euromicro Conference on Parellel, Distribeted and Network-based Proceeding en
dc.identifier.doi 10.1109/EMPDP.2004.1271475 en
dc.identifier.spage 424 en
dc.identifier.epage 433 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής