Performance comparison of pure MPI vs hybrid MPI-OpenMP parallelization models on SMP clusters

Drosinos, N; Koziris, N

dc.contributor.author	Drosinos, N	en
dc.contributor.author	Koziris, N	en
dc.date.accessioned	2014-03-01T02:42:54Z
dc.date.available	2014-03-01T02:42:54Z
dc.date.issued	2004	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/31137
dc.subject	Experimental Evaluation	en
dc.subject	Hybrid Model	en
dc.subject	Message Passing	en
dc.subject	Nested Loops	en
dc.subject	Parallel Models	en
dc.subject	Performance Comparison	en
dc.subject	Shared Memory	en
dc.subject	Smp Cluster	en
dc.subject.other	Bandwidth	en
dc.subject.other	Codes (symbols)	en
dc.subject.other	Computer architecture	en
dc.subject.other	Data communication systems	en
dc.subject.other	Mathematical models	en
dc.subject.other	Parallel algorithms	en
dc.subject.other	Synchronization	en
dc.subject.other	Bandwidth measurements	en
dc.subject.other	Flow data dependencies	en
dc.subject.other	Intra-node messages	en
dc.subject.other	Parallelization	en
dc.subject.other	Parallel processing systems	en
dc.title	Performance comparison of pure MPI vs hybrid MPI-OpenMP parallelization models on SMP clusters	en
heal.type	conferenceItem	en
heal.identifier.primary	10.1109/IPDPS.2004.1302919	en
heal.identifier.secondary	http://dx.doi.org/10.1109/IPDPS.2004.1302919	en
heal.publicationDate	2004	en
heal.abstract	This paper compares the performance of three programming paradigms for the parallelization of nested loop algorithms onto BMP clusters. More specifically, we propose three alternative models for tiled nested loop algorithms, namely a pure message passing paradigm, as well as two hybrid ones, that implement communication both through message passing and shared memory access. The hybrid models adopt an advanced hyperplane scheduling scheme, that allows both for minimal thread synchronization, as well as for pipelined execution with overlapping of computation and communication phases. We focus on the experimental evaluation of all three models, and test their performance against several iteration spaces and parallelization grains with the aid of a typical micro-kernel benchmark. We conclude that the hybrid models can in some cases be more beneficial compared to the monolithic pure message passing model, as they exploit better the configuration characteristics of an hierarchical parallel platform, such as an SMP cluster.	en
heal.journalName	Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)	en
dc.identifier.doi	10.1109/IPDPS.2004.1302919	en
dc.identifier.volume	18	en
dc.identifier.spage	193	en
dc.identifier.epage	202	en