Towards the optimal synchronization granularity for dynamic scheduling of pipelined computations on heterogeneous computing systems

Riakiotakis, I; Ciorba, FM; Andronikos, T; Papakonstantinou, G; Chronopoulos, AT

dc.contributor.author	Riakiotakis, I	en
dc.contributor.author	Ciorba, FM	en
dc.contributor.author	Andronikos, T	en
dc.contributor.author	Papakonstantinou, G	en
dc.contributor.author	Chronopoulos, AT	en
dc.date.accessioned	2014-03-01T11:47:19Z
dc.date.available	2014-03-01T11:47:19Z
dc.date.issued	2012	en
dc.identifier.issn	15320626	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/38126
dc.subject	Communication model	en
dc.subject	Dynamic load balancing	en
dc.subject	Heterogeneous systems	en
dc.subject	Inter-processor communication	en
dc.subject	Loops with data dependencies	en
dc.subject	Performance evaluation	en
dc.subject	Performance prediction	en
dc.subject	Pipelined computations	en
dc.subject	Synchronization	en
dc.title	Towards the optimal synchronization granularity for dynamic scheduling of pipelined computations on heterogeneous computing systems	en
heal.type	other	en
heal.identifier.primary	10.1002/cpe.2812	en
heal.identifier.secondary	http://dx.doi.org/10.1002/cpe.2812	en
heal.publicationDate	2012	en
heal.abstract	Loops are the richest source of parallelism in scientific applications. A large number of loop scheduling schemes have therefore been devised for loops with and without data dependencies (modeled as dependence distance vectors) on heterogeneous clusters. The loops with data dependencies require synchronization via cross-node communication. Synchronization requires fine-tuning to overcome the communication overhead and to yield the best possible overall performance. In this paper, a theoretical model is presented to determine the granularity of synchronization that minimizes the parallel execution time of loops with data dependencies when these are parallelized on heterogeneous systems using dynamic self-scheduling algorithms. New formulas are proposed for estimating the total number of scheduling steps when a threshold for the minimum work assigned to a processor is assumed. The proposed model uses these formulas to determine the synchronization granularity that minimizes the estimated parallel execution time. The accuracy of the proposed model is verified and validated via extensive experiments on a heterogeneous computing system. The results show that the theoretically optimal synchronization granularity, as determined by the proposed model, is very close to the experimentally observed optimal synchronization granularity, with no deviation in the best case, and within 38.4% in the worst case. © 2012 John Wiley & Sons, Ltd.	en
heal.journalName	Concurrency Computation Practice and Experience	en
dc.identifier.doi	10.1002/cpe.2812	en