Exploring the capacity of a modern SMT architecture to deliver high scientific application performance

Athanasaki, E; Anastopoulos, N; Kourtis, K; Koziris, N

dc.contributor.author	Athanasaki, E	en
dc.contributor.author	Anastopoulos, N	en
dc.contributor.author	Kourtis, K	en
dc.contributor.author	Koziris, N	en
dc.date.accessioned	2014-03-01T02:44:03Z
dc.date.available	2014-03-01T02:44:03Z
dc.date.issued	2006	en
dc.identifier.issn	0302-9743	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/31638
dc.subject	Instruction Level Parallel	en
dc.subject	Parallel Applications	en
dc.subject	Performance Monitoring	en
dc.subject	Scientific Application	en
dc.subject	simultaneous multithreading	en
dc.subject	Thread Level Parallelism	en
dc.subject.classification	Computer Science, Theory & Methods	en
dc.subject.other	Instruction level parallelism (ILP)	en
dc.subject.other	Instructions streams	en
dc.subject.other	Simultaneous multithreading (SMT)	en
dc.subject.other	Thread level parallelism (TLP)	en
dc.subject.other	Codes (symbols)	en
dc.subject.other	Communication systems	en
dc.subject.other	Computation theory	en
dc.subject.other	Data flow analysis	en
dc.subject.other	Program processors	en
dc.subject.other	Systems analysis	en
dc.subject.other	Computer architecture	en
dc.title	Exploring the capacity of a modern SMT architecture to deliver high scientific application performance	en
heal.type	conferenceItem	en
heal.identifier.primary	10.1007/11847366_19	en
heal.identifier.secondary	http://dx.doi.org/10.1007/11847366_19	en
heal.language	English	en
heal.publicationDate	2006	en
heal.abstract	Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that heterogeneity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent, threads. In this paper, we explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instructions streams. We evaluate and contrast speculative precomputation (SPR) and thread-level parallelism (TLP) techniques for a series of scientific codes executed on an SMT processor. We also examine the effect of thread synchronization mechanisms on multithreaded parallel applications that are executed on a single SMT processor. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. © Springer-Verlag Berlin Heidelberg 2006.	en
heal.publisher	SPRINGER-VERLAG BERLIN	en
heal.journalName	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	en
heal.bookName	LECTURE NOTES IN COMPUTER SCIENCE	en
dc.identifier.doi	10.1007/11847366_19	en
dc.identifier.isi	ISI:000241591300019	en
dc.identifier.volume	4208 LNCS	en
dc.identifier.spage	180	en
dc.identifier.epage	189	en