Exploring the performance limits of simultaneous multithreading for scientific codes

Athanasaki, E; Anastopoulos, N; Kourtis, K; Koziris, N

dc.contributor.author	Athanasaki, E	en
dc.contributor.author	Anastopoulos, N	en
dc.contributor.author	Kourtis, K	en
dc.contributor.author	Koziris, N	en
dc.date.accessioned	2014-03-01T02:44:03Z
dc.date.available	2014-03-01T02:44:03Z
dc.date.issued	2006	en
dc.identifier.issn	01903918	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/31639
dc.subject	Instruction Level Parallel	en
dc.subject	Perforation	en
dc.subject	simultaneous multithreading	en
dc.subject	Thread Level Parallelism	en
dc.subject.other	Performance monitoring hardware	en
dc.subject.other	Simultaneous multithreading (SMT)	en
dc.subject.other	Thread-level parallelism (TLP) techniques	en
dc.subject.other	Interfaces (computer)	en
dc.subject.other	Natural sciences computing	en
dc.subject.other	Program processors	en
dc.subject.other	Synchronization	en
dc.subject.other	Throughput	en
dc.subject.other	Parallel processing systems	en
dc.title	Exploring the performance limits of simultaneous multithreading for scientific codes	en
heal.type	conferenceItem	en
heal.identifier.primary	10.1109/ICPP.2006.41	en
heal.identifier.secondary	http://dx.doi.org/10.1109/ICPP.2006.41	en
heal.identifier.secondary	1690604	en
heal.publicationDate	2006	en
heal.abstract	Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. The speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent, threads. In this paper, we evaluate and contrast software prefetching and thread-level parallelism (TLP) techniques for a series of scientific codes executed on an SMT processor. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instructions streams. Obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application 's threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. © 2006 IEEE.	en
heal.journalName	Proceedings of the International Conference on Parallel Processing	en
dc.identifier.doi	10.1109/ICPP.2006.41	en
dc.identifier.spage	45	en
dc.identifier.epage	54	en