dc.contributor.author |
Athanasaki, E |
en |
dc.contributor.author |
Anastopoulos, N |
en |
dc.contributor.author |
Kourtis, K |
en |
dc.contributor.author |
Koziris, N |
en |
dc.date.accessioned |
2014-03-01T02:44:03Z |
|
dc.date.available |
2014-03-01T02:44:03Z |
|
dc.date.issued |
2006 |
en |
dc.identifier.issn |
0302-9743 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/31638 |
|
dc.subject |
Instruction Level Parallel |
en |
dc.subject |
Parallel Applications |
en |
dc.subject |
Performance Monitoring |
en |
dc.subject |
Scientific Application |
en |
dc.subject |
simultaneous multithreading |
en |
dc.subject |
Thread Level Parallelism |
en |
dc.subject.classification |
Computer Science, Theory & Methods |
en |
dc.subject.other |
Instruction level parallelism (ILP) |
en |
dc.subject.other |
Instructions streams |
en |
dc.subject.other |
Simultaneous multithreading (SMT) |
en |
dc.subject.other |
Thread level parallelism (TLP) |
en |
dc.subject.other |
Codes (symbols) |
en |
dc.subject.other |
Communication systems |
en |
dc.subject.other |
Computation theory |
en |
dc.subject.other |
Data flow analysis |
en |
dc.subject.other |
Program processors |
en |
dc.subject.other |
Systems analysis |
en |
dc.subject.other |
Computer architecture |
en |
dc.title |
Exploring the capacity of a modern SMT architecture to deliver high scientific application performance |
en |
heal.type |
conferenceItem |
en |
heal.identifier.primary |
10.1007/11847366_19 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1007/11847366_19 |
en |
heal.language |
English |
en |
heal.publicationDate |
2006 |
en |
heal.abstract |
Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that heterogeneity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent, threads. In this paper, we explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instructions streams. We evaluate and contrast speculative precomputation (SPR) and thread-level parallelism (TLP) techniques for a series of scientific codes executed on an SMT processor. We also examine the effect of thread synchronization mechanisms on multithreaded parallel applications that are executed on a single SMT processor. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. © Springer-Verlag Berlin Heidelberg 2006. |
en |
heal.publisher |
SPRINGER-VERLAG BERLIN |
en |
heal.journalName |
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
en |
heal.bookName |
LECTURE NOTES IN COMPUTER SCIENCE |
en |
dc.identifier.doi |
10.1007/11847366_19 |
en |
dc.identifier.isi |
ISI:000241591300019 |
en |
dc.identifier.volume |
4208 LNCS |
en |
dc.identifier.spage |
180 |
en |
dc.identifier.epage |
189 |
en |