Advanced hybrid MPI/OpenMP parallelization paradigms for nested loop algorithms onto clusters of SMPs

Drosinos, N; Koziris, N

dc.contributor.author	Drosinos, N	en
dc.contributor.author	Koziris, N	en
dc.date.accessioned	2014-03-01T01:18:36Z
dc.date.available	2014-03-01T01:18:36Z
dc.date.issued	2003	en
dc.identifier.issn	0302-9743	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/15105
dc.subject	Coarse Grained	en
dc.subject	Data Dependence	en
dc.subject	Hybrid Model	en
dc.subject	Message Passing	en
dc.subject	Nested Loops	en
dc.subject	Parallel Architecture	en
dc.subject	Parallel Processing	en
dc.subject	Programming Model	en
dc.subject	Smp Cluster	en
dc.subject.classification	Computer Science, Theory & Methods	en
dc.title	Advanced hybrid MPI/OpenMP parallelization paradigms for nested loop algorithms onto clusters of SMPs	en
heal.type	journalArticle	en
heal.identifier.primary	10.1007/978-3-540-39924-7_30	en
heal.identifier.secondary	http://dx.doi.org/10.1007/978-3-540-39924-7_30	en
heal.language	English	en
heal.publicationDate	2003	en
heal.abstract	The parallelization process of nested-loop algorithms onto popular multi-level parallel architectures, such as clusters of SMPs, is not a trivial issue, since the existence of data dependencies in the algorithm impose severe restrictions on the task decomposition to be applied. In this paper we propose three techniques for the parallelization of such algorithms, namely pure MPI parallelization, fine-grain hybrid MPI/OpenMP parallelization and coarse-grain MPI/OpenMP parallelization. We further apply an advanced hyperplane scheduling scheme that enables pipelined execution and the overlapping of communication with useful computation, thus leading almost to full CPU utilization. We implement the three variations and perform a number of micro-kernel benchmarks to verify the intuition that the hybrid programming model could potentially exploit the characteristics of an SMP cluster more efficiently than the pure message-passing programming model. We conclude that the overall performance for each model is both application and hardware dependent, and propose some directions for the efficiency improvement of the hybrid model. © Springer-Verlag Berlin Heidelberg 2003.	en
heal.publisher	SPRINGER-VERLAG BERLIN	en
heal.journalName	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	en
heal.bookName	LECTURE NOTES IN COMPUTER SCIENCE	en
dc.identifier.doi	10.1007/978-3-540-39924-7_30	en
dc.identifier.isi	ISI:000187497500030	en
dc.identifier.volume	2840	en
dc.identifier.spage	204	en
dc.identifier.epage	213	en