dc.contributor.author |
Rokos, G |
en |
dc.contributor.author |
Peteinatos, G |
en |
dc.contributor.author |
Kouveli, G |
en |
dc.contributor.author |
Goumas, G |
en |
dc.contributor.author |
Kourtis, K |
en |
dc.contributor.author |
Koziris, N |
en |
dc.date.accessioned |
2014-03-01T02:46:59Z |
|
dc.date.available |
2014-03-01T02:46:59Z |
|
dc.date.issued |
2010 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/32979 |
|
dc.subject |
Advection equation |
en |
dc.subject |
Cell Broadband Engine |
en |
dc.subject |
Explicit memory hierarchy |
en |
dc.subject |
Instruction scheduling |
en |
dc.subject |
Parallelization |
en |
dc.subject |
Vectorization |
en |
dc.subject.other |
Advection equations |
en |
dc.subject.other |
Cell Broadband Engine |
en |
dc.subject.other |
Instruction scheduling |
en |
dc.subject.other |
Memory hierarchy |
en |
dc.subject.other |
Parallelizations |
en |
dc.subject.other |
Vectorization |
en |
dc.subject.other |
Aircraft engines |
en |
dc.subject.other |
Distributed parameter networks |
en |
dc.subject.other |
Parallel architectures |
en |
dc.subject.other |
Partial differential equations |
en |
dc.subject.other |
Program compilers |
en |
dc.subject.other |
Advection |
en |
dc.title |
Solving the advection PDE on the cell broadband engine |
en |
heal.type |
conferenceItem |
en |
heal.identifier.primary |
10.1109/IPDPSW.2010.5470761 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1109/IPDPSW.2010.5470761 |
en |
heal.identifier.secondary |
5470761 |
en |
heal.publicationDate |
2010 |
en |
heal.abstract |
In this paper we present the venture of porting two different algorithms for solving the two-dimensional advection PDE on the CBE platform, an in-place and an outof- place one, and compare their computational performance, completion time and code productivity. Study of the advection equation reveals data dependencies which lead to limited performance and inefficient scaling to parallel architectures. We explore programming techniques and optimizations which maximize performance for these solver versions. The out-ofplace version is straightforward to implement and achieves greater raw performance than the in-place one, but requires more computational steps to converge. In both cases, achieving high computational performance relies heavily on manual source code optimization, due to compiler incapability to do data vectorization and efficient instruction scheduling. The latter proves to be a key factor in pursuit of high GFLOPS measurements. © 2010 IEEE. |
en |
heal.journalName |
Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010 |
en |
dc.identifier.doi |
10.1109/IPDPSW.2010.5470761 |
en |