dc.contributor.author |
Riakiotakis, I |
en |
dc.contributor.author |
Ciorba, FM |
en |
dc.contributor.author |
Andronikos, T |
en |
dc.contributor.author |
Papakonstantinou, G |
en |
dc.contributor.author |
Chronopoulos, AT |
en |
dc.date.accessioned |
2014-03-01T11:47:19Z |
|
dc.date.available |
2014-03-01T11:47:19Z |
|
dc.date.issued |
2012 |
en |
dc.identifier.issn |
15320626 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/38126 |
|
dc.subject |
Communication model |
en |
dc.subject |
Dynamic load balancing |
en |
dc.subject |
Heterogeneous systems |
en |
dc.subject |
Inter-processor communication |
en |
dc.subject |
Loops with data dependencies |
en |
dc.subject |
Performance evaluation |
en |
dc.subject |
Performance prediction |
en |
dc.subject |
Pipelined computations |
en |
dc.subject |
Synchronization |
en |
dc.title |
Towards the optimal synchronization granularity for dynamic scheduling of pipelined computations on heterogeneous computing systems |
en |
heal.type |
other |
en |
heal.identifier.primary |
10.1002/cpe.2812 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1002/cpe.2812 |
en |
heal.publicationDate |
2012 |
en |
heal.abstract |
Loops are the richest source of parallelism in scientific applications. A large number of loop scheduling schemes have therefore been devised for loops with and without data dependencies (modeled as dependence distance vectors) on heterogeneous clusters. The loops with data dependencies require synchronization via cross-node communication. Synchronization requires fine-tuning to overcome the communication overhead and to yield the best possible overall performance. In this paper, a theoretical model is presented to determine the granularity of synchronization that minimizes the parallel execution time of loops with data dependencies when these are parallelized on heterogeneous systems using dynamic self-scheduling algorithms. New formulas are proposed for estimating the total number of scheduling steps when a threshold for the minimum work assigned to a processor is assumed. The proposed model uses these formulas to determine the synchronization granularity that minimizes the estimated parallel execution time. The accuracy of the proposed model is verified and validated via extensive experiments on a heterogeneous computing system. The results show that the theoretically optimal synchronization granularity, as determined by the proposed model, is very close to the experimentally observed optimal synchronization granularity, with no deviation in the best case, and within 38.4% in the worst case. © 2012 John Wiley & Sons, Ltd. |
en |
heal.journalName |
Concurrency Computation Practice and Experience |
en |
dc.identifier.doi |
10.1002/cpe.2812 |
en |