Exploring the performance limits of simultaneous multithreading for memory intensive applications

Athanasaki, E; Anastopoulos, N; Kourtis, K; Koziris, N

dc.contributor.author	Athanasaki, E	en
dc.contributor.author	Anastopoulos, N	en
dc.contributor.author	Kourtis, K	en
dc.contributor.author	Koziris, N	en
dc.date.accessioned	2014-03-01T01:28:23Z
dc.date.available	2014-03-01T01:28:23Z
dc.date.issued	2008	en
dc.identifier.issn	0920-8542	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/18832
dc.subject	Instruction-level parallelism	en
dc.subject	Performance analysis	en
dc.subject	Simultaneous multithreading	en
dc.subject	Software prefetching	en
dc.subject	Speculative precomputation	en
dc.subject	Thread-level parallelism	en
dc.subject.classification	Computer Science, Hardware & Architecture	en
dc.subject.classification	Computer Science, Theory & Methods	en
dc.subject.classification	Engineering, Electrical & Electronic	en
dc.subject.other	Data storage equipment	en
dc.subject.other	Parallel programming	en
dc.subject.other	Resource allocation	en
dc.subject.other	Synchronization	en
dc.subject.other	Instruction-level parallelismILP)	en
dc.subject.other	Simultaneous multithreading (SMT)	en
dc.subject.other	Speculative precomputation	en
dc.subject.other	Thread-level parallelism	en
dc.subject.other	Multiprocessing systems	en
dc.title	Exploring the performance limits of simultaneous multithreading for memory intensive applications	en
heal.type	journalArticle	en
heal.identifier.primary	10.1007/s11227-007-0149-x	en
heal.identifier.secondary	http://dx.doi.org/10.1007/s11227-007-0149-x	en
heal.language	English	en
heal.publicationDate	2008	en
heal.abstract	Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application's threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. © 2007 Springer Science+Business Media, LLC.	en
heal.publisher	SPRINGER	en
heal.journalName	Journal of Supercomputing	en
dc.identifier.doi	10.1007/s11227-007-0149-x	en
dc.identifier.isi	ISI:000253523500004	en
dc.identifier.volume	44	en
dc.identifier.issue	1	en
dc.identifier.spage	64	en
dc.identifier.epage	97	en