HEAL DSpace

Facilitating efficient synchronization of asymmetric threads on hyper-threaded processors

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Anastopoulos, N en
dc.contributor.author Koziris, N en
dc.date.accessioned 2014-03-01T02:45:16Z
dc.date.available 2014-03-01T02:45:16Z
dc.date.issued 2008 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/32247
dc.subject Operating System en
dc.subject Performance Optimization en
dc.subject Bottom Up en
dc.subject.other Execution models en
dc.subject.other Parallel and distributed processing en
dc.subject.other Thread synchronization en
dc.subject.other Computer networks en
dc.subject.other Distributed parameter networks en
dc.subject.other Spin dynamics en
dc.subject.other Synchronization en
dc.title Facilitating efficient synchronization of asymmetric threads on hyper-threaded processors en
heal.type conferenceItem en
heal.identifier.primary 10.1109/IPDPS.2008.4536358 en
heal.identifier.secondary 4536358 en
heal.identifier.secondary http://dx.doi.org/10.1109/IPDPS.2008.4536358 en
heal.publicationDate 2008 en
heal.abstract So far, the privileged instructions MONITOR and MWAIT introduced with Intel Prescott core, have been used mostly for inter-thread synchronization in operating systems code. In a hyper-threaded processor, these instructions offer a ""performance-optimized"" way for threads involved in synchronization events to wait on a condition. In this work, we explore the potential of using these instructions for synchronizing application threads that execute on hyper-threaded processors, and are characterized by workload asymmetry. Initially, we propose a framework through which one can use MONITOR/MWAIT to build condition wait and notification primitives, with minimal kernel involvement. Then, we evaluate the efficiency of these primitives in a bottom-up manner: at first, we quantify certain performance aspects of the primitives that reflect the execution model under consideration, such as resource consumption and responsiveness, and we compare them against other commonly used implementations. As a further step, we use our primitives to build synchronization barriers. Again, we examine the same performance issues as before, and using a pseudo-benchmark we evaluate the efficiency of our implementation for fine-grained inter-thread synchronization. In terms of throughput, our barriers yielded 12% better performance on average compared to Pthreads, and 26% compared to a spin-loops-based implementation, for varying levels of threads asymmetry. Finally, we test our barriers in a real-world scenario, and specifically, in applying thread-level Speculative Precomputation on four applications. For this multithreaded execution scheme, our implementation provided up to 7% better performance compared to Pthreads, and up to 40% compared to spin-loops-based barriers. ©2008 IEEE. en
heal.journalName IPDPS Miami 2008 - Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium, Program and CD-ROM en
dc.identifier.doi 10.1109/IPDPS.2008.4536358 en


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record