dc.contributor.author |
Drosinos, N |
en |
dc.contributor.author |
Koziris, N |
en |
dc.date.accessioned |
2014-03-01T01:18:36Z |
|
dc.date.available |
2014-03-01T01:18:36Z |
|
dc.date.issued |
2003 |
en |
dc.identifier.issn |
0302-9743 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/15105 |
|
dc.subject |
Coarse Grained |
en |
dc.subject |
Data Dependence |
en |
dc.subject |
Hybrid Model |
en |
dc.subject |
Message Passing |
en |
dc.subject |
Nested Loops |
en |
dc.subject |
Parallel Architecture |
en |
dc.subject |
Parallel Processing |
en |
dc.subject |
Programming Model |
en |
dc.subject |
Smp Cluster |
en |
dc.subject.classification |
Computer Science, Theory & Methods |
en |
dc.title |
Advanced hybrid MPI/OpenMP parallelization paradigms for nested loop algorithms onto clusters of SMPs |
en |
heal.type |
journalArticle |
en |
heal.identifier.primary |
10.1007/978-3-540-39924-7_30 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1007/978-3-540-39924-7_30 |
en |
heal.language |
English |
en |
heal.publicationDate |
2003 |
en |
heal.abstract |
The parallelization process of nested-loop algorithms onto popular multi-level parallel architectures, such as clusters of SMPs, is not a trivial issue, since the existence of data dependencies in the algorithm impose severe restrictions on the task decomposition to be applied. In this paper we propose three techniques for the parallelization of such algorithms, namely pure MPI parallelization, fine-grain hybrid MPI/OpenMP parallelization and coarse-grain MPI/OpenMP parallelization. We further apply an advanced hyperplane scheduling scheme that enables pipelined execution and the overlapping of communication with useful computation, thus leading almost to full CPU utilization. We implement the three variations and perform a number of micro-kernel benchmarks to verify the intuition that the hybrid programming model could potentially exploit the characteristics of an SMP cluster more efficiently than the pure message-passing programming model. We conclude that the overall performance for each model is both application and hardware dependent, and propose some directions for the efficiency improvement of the hybrid model. © Springer-Verlag Berlin Heidelberg 2003. |
en |
heal.publisher |
SPRINGER-VERLAG BERLIN |
en |
heal.journalName |
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
en |
heal.bookName |
LECTURE NOTES IN COMPUTER SCIENCE |
en |
dc.identifier.doi |
10.1007/978-3-540-39924-7_30 |
en |
dc.identifier.isi |
ISI:000187497500030 |
en |
dc.identifier.volume |
2840 |
en |
dc.identifier.spage |
204 |
en |
dc.identifier.epage |
213 |
en |