Distributed dynamic load balancing for pipelined computations on heterogeneous systems

Riakiotakis, I; Ciorba, FM; Andronikos, T; Papakonstantinou, G

dc.contributor.author	Riakiotakis, I	en
dc.contributor.author	Ciorba, FM	en
dc.contributor.author	Andronikos, T	en
dc.contributor.author	Papakonstantinou, G	en
dc.date.accessioned	2014-03-01T02:47:20Z
dc.date.available	2014-03-01T02:47:20Z
dc.date.issued	2011	en
dc.identifier.issn	0167-8191	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/33085
dc.subject	Distributed model	en
dc.subject	Dynamic load balancing algorithms	en
dc.subject	Loops with dependencies	en
dc.subject	Master-worker model	en
dc.subject	Non-dedicated heterogeneous systems	en
dc.subject	Synchronization	en
dc.subject	Weighting	en
dc.subject.classification	Computer Science, Theory & Methods	en
dc.subject.other	Distributed models	en
dc.subject.other	Dynamic load balancing algorithms	en
dc.subject.other	Heterogeneous systems	en
dc.subject.other	Loops with dependencies	en
dc.subject.other	Master-worker model	en
dc.subject.other	Weighting	en
dc.subject.other	Computational chemistry	en
dc.subject.other	Computer software selection and evaluation	en
dc.subject.other	Dynamic loads	en
dc.subject.other	Dynamics	en
dc.subject.other	Parallel architectures	en
dc.subject.other	Synchronization	en
dc.subject.other	Distributed computer systems	en
dc.title	Distributed dynamic load balancing for pipelined computations on heterogeneous systems	en
heal.type	conferenceItem	en
heal.identifier.primary	10.1016/j.parco.2011.01.003	en
heal.identifier.secondary	http://dx.doi.org/10.1016/j.parco.2011.01.003	en
heal.language	English	en
heal.publicationDate	2011	en
heal.abstract	One of the most significant causes for performance degradation of scientific and engineering applications on high performance computing systems is the uneven distribution of the computational work to the resources of the system. This effect, which is known as load imbalance, is even more noticeable in the case of irregular applications and heterogeneous distributed systems. This motivated the parallel and distributed computing research community to focus on methods that provide good load balancing for scientific and engineering applications running on (heterogeneous) distributed systems. Efficient load balancing and scheduling methods are employed for scientific applications from various fields, such as mechanics, materials, physics, chemistry, biology, applied mathematics, etc. Such applications typically employ a large number of computational methods in order to simulate complex phenomena, on very large scales of time and magnitude. These simulations consist of routines that perform repetitive computations (in the form of DO/FOR loops) over very large data sets, which, if not properly implemented and executed, may suffer from poor performance. The number of repetitive computations in the simulation codes is not always constant. Moreover, the computational nature of these simulations may be in fact irregular, leading to the case when one computation takes (unpredictably) more time than others. For successful and timely results, large scale simulations require the use of large scale computing systems, which often are widely distributed and highly heterogeneous. Moreover, large scale computing systems are usually shared among multiple users, which causes the quality and quantity of the available resources to be highly unpredictable. There are numerous load balancing methods in the literature for different parallel architectures. The most recent of these methods typically follow the master-worker paradigm, where a single coordinator (master) is responsible for making all the scheduling decisions based on information provided by the workers. Depending on the application requirements, the scheduling policy and the computational environment, the benefits of this paradigm may be limited as follows: (1) its efficiency may not scale as the number of processors increases, and (2) it is quite probable that the scheduling decisions are made based on outdated information, especially on systems where the workload changes rapidly. In an effort to address these limitations, we propose a distributed (master-less) load balancing scheme, in which the scheduling decisions are made by the workers in a distributed fashion. We implemented this method along with other two master-worker schemes (a previously existing one and a recently modified one) for three different scientific computational kernels. In order to validate the usefulness and efficiency of the proposed scheme, we conducted a series of comparative performance tests with the two master-worker schemes for each computational kernel. The target system is an SMP cluster, on which we simulated three different patterns of system load fluctuation. The experiments strongly support the belief that the distributed approach offers greater performance and better scalability on such systems, showing an overall improvement ranging from 13% to 24% over the master-worker approaches. (C) 2011 Elsevier B.V. All rights reserved.	en
heal.publisher	ELSEVIER SCIENCE BV	en
heal.journalName	Parallel Computing	en
dc.identifier.doi	10.1016/j.parco.2011.01.003	en
dc.identifier.isi	ISI:000296179600006	en
dc.identifier.volume	37	en
dc.identifier.issue	10-11	en
dc.identifier.spage	713	en
dc.identifier.epage	729	en