The effect of process topology and load balancing on parallel programming models for SMP clusters and iterative algorithms

Drosinos, N; Koziris, N

dc.contributor.author	Drosinos, N	en
dc.contributor.author	Koziris, N	en
dc.date.accessioned	2014-03-01T01:25:15Z
dc.date.available	2014-03-01T01:25:15Z
dc.date.issued	2006	en
dc.identifier.issn	0920-8542	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/17617
dc.subject	High performance computing	en
dc.subject	Hybrid programming	en
dc.subject	Iterative algorithms	en
dc.subject	MPI	en
dc.subject	OpenMP	en
dc.subject	Parallel programming	en
dc.subject	SMP clusters	en
dc.subject	Tiling	en
dc.subject.classification	Computer Science, Hardware & Architecture	en
dc.subject.classification	Computer Science, Theory & Methods	en
dc.subject.classification	Engineering, Electrical & Electronic	en
dc.subject.other	Algorithms	en
dc.subject.other	Iterative methods	en
dc.subject.other	Optimization	en
dc.subject.other	Parallel processing systems	en
dc.subject.other	Process control	en
dc.subject.other	Topology	en
dc.subject.other	High performance computing	en
dc.subject.other	Hybrid programming	en
dc.subject.other	Iterative algorithms	en
dc.subject.other	MPI	en
dc.subject.other	OpenMP	en
dc.subject.other	Parallel programming	en
dc.subject.other	SMP clusters	en
dc.subject.other	Tiling	en
dc.subject.other	Computer systems programming	en
dc.title	The effect of process topology and load balancing on parallel programming models for SMP clusters and iterative algorithms	en
heal.type	journalArticle	en
heal.identifier.primary	10.1007/s11227-006-1156-z	en
heal.identifier.secondary	http://dx.doi.org/10.1007/s11227-006-1156-z	en
heal.language	English	en
heal.publicationDate	2006	en
heal.abstract	This article focuses on the effect of both process topology and load balancing on various programming models for SMP clusters and iterative algorithms. More specifically, we consider nested loop algorithms with constant flow dependencies, that can be parallelized on SMP clusters with the aid of the tiling transformation. We investigate three parallel programming models, namely a popular message passing monolithic parallel implementation, as well as two hybrid ones, that employ both message passing and multi-threading. We conclude that the selection of an appropriate mapping topology for the mesh of processes has a significant effect on the overall performance, and provide an algorithm for the specification of such an efficient topology according to the iteration space and data dependencies of the algorithm. We also propose static load balancing techniques for the computation distribution between threads, that diminish the disadvantage of the master thread assuming all inter-process communication due to limitations often imposed by the message passing library. Both improvements are implemented as compile-time optimizations and are further experimentally evaluated. An overall comparison of the above parallel programming styles on SMP clusters based on micro-kernel experimental evaluation is further provided, as well. © 2006 Springer Science + Business Media, Inc.	en
heal.publisher	SPRINGER	en
heal.journalName	Journal of Supercomputing	en
dc.identifier.doi	10.1007/s11227-006-1156-z	en
dc.identifier.isi	ISI:000232531800004	en
dc.identifier.volume	35	en
dc.identifier.issue	1	en
dc.identifier.spage	65	en
dc.identifier.epage	91	en