Distributed dynamic load balancing for pipelined computations on heterogeneous systems

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Riakiotakis, I en
dc.contributor.author Ciorba, FM en
dc.contributor.author Andronikos, T en
dc.contributor.author Papakonstantinou, G en
dc.date.accessioned 2014-03-01T02:47:20Z
dc.date.available 2014-03-01T02:47:20Z
dc.date.issued 2011 en
dc.identifier.issn 0167-8191 en
dc.identifier.uri http://hdl.handle.net/123456789/33085
dc.subject Distributed model en
dc.subject Dynamic load balancing algorithms en
dc.subject Loops with dependencies en
dc.subject Master-worker model en
dc.subject Non-dedicated heterogeneous systems en
dc.subject Synchronization en
dc.subject Weighting en
dc.subject.classification Computer Science, Theory & Methods en
dc.subject.other Distributed models en
dc.subject.other Dynamic load balancing algorithms en
dc.subject.other Heterogeneous systems en
dc.subject.other Loops with dependencies en
dc.subject.other Master-worker model en
dc.subject.other Weighting en
dc.subject.other Computational chemistry en
dc.subject.other Computer software selection and evaluation en
dc.subject.other Dynamic loads en
dc.subject.other Dynamics en
dc.subject.other Parallel architectures en
dc.subject.other Synchronization en
dc.subject.other Distributed computer systems en
dc.title Distributed dynamic load balancing for pipelined computations on heterogeneous systems en
heal.type conferenceItem en
heal.identifier.primary 10.1016/j.parco.2011.01.003 en
heal.identifier.secondary http://dx.doi.org/10.1016/j.parco.2011.01.003 en
heal.language English en
heal.publicationDate 2011 en
heal.abstract One of the most significant causes for performance degradation of scientific and engineering applications on high performance computing systems is the uneven distribution of the computational work to the resources of the system. This effect, which is known as load imbalance, is even more noticeable in the case of irregular applications and heterogeneous distributed systems. This motivated the parallel and distributed computing research community to focus on methods that provide good load balancing for scientific and engineering applications running on (heterogeneous) distributed systems. Efficient load balancing and scheduling methods are employed for scientific applications from various fields, such as mechanics, materials, physics, chemistry, biology, applied mathematics, etc. Such applications typically employ a large number of computational methods in order to simulate complex phenomena, on very large scales of time and magnitude. These simulations consist of routines that perform repetitive computations (in the form of DO/FOR loops) over very large data sets, which, if not properly implemented and executed, may suffer from poor performance. The number of repetitive computations in the simulation codes is not always constant. Moreover, the computational nature of these simulations may be in fact irregular, leading to the case when one computation takes (unpredictably) more time than others. For successful and timely results, large scale simulations require the use of large scale computing systems, which often are widely distributed and highly heterogeneous. Moreover, large scale computing systems are usually shared among multiple users, which causes the quality and quantity of the available resources to be highly unpredictable. There are numerous load balancing methods in the literature for different parallel architectures. The most recent of these methods typically follow the master-worker paradigm, where a single coordinator (master) is responsible for making all the scheduling decisions based on information provided by the workers. Depending on the application requirements, the scheduling policy and the computational environment, the benefits of this paradigm may be limited as follows: (1) its efficiency may not scale as the number of processors increases, and (2) it is quite probable that the scheduling decisions are made based on outdated information, especially on systems where the workload changes rapidly. In an effort to address these limitations, we propose a distributed (master-less) load balancing scheme, in which the scheduling decisions are made by the workers in a distributed fashion. We implemented this method along with other two master-worker schemes (a previously existing one and a recently modified one) for three different scientific computational kernels. In order to validate the usefulness and efficiency of the proposed scheme, we conducted a series of comparative performance tests with the two master-worker schemes for each computational kernel. The target system is an SMP cluster, on which we simulated three different patterns of system load fluctuation. The experiments strongly support the belief that the distributed approach offers greater performance and better scalability on such systems, showing an overall improvement ranging from 13% to 24% over the master-worker approaches. (C) 2011 Elsevier B.V. All rights reserved. en
heal.publisher ELSEVIER SCIENCE BV en
heal.journalName Parallel Computing en
dc.identifier.doi 10.1016/j.parco.2011.01.003 en
dc.identifier.isi ISI:000296179600006 en
dc.identifier.volume 37 en
dc.identifier.issue 10-11 en
dc.identifier.spage 713 en
dc.identifier.epage 729 en

Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record