edoc

rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks

Mohammed, Ali Omar Abdelazim and Cavelan, Aurélien and Ciorba, Florina M.. (2019) rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks. In: International Conference on High Performance Computing & Simulation (HPCS).

Full text not available from this repository.

Official URL: https://edoc.unibas.ch/72155/

Downloads: Statistics Overview

Abstract

Scientific applications often contain large and computationally- intensive parallel loops. Dynamic loop self-scheduling (DLS) is used to achieve a balanced load execution of such applications on high performance computing (HPC) systems. Large HPC systems are vulnerable to processors or node failures and perturbations in the availability of resources. Most self-scheduling approaches do not consider fault-tolerant scheduling or depended on failure or perturbation detection and react by rescheduling failed tasks. In this work, a robust dynamic load balancing (rDLB) approach is proposed for the robust self-scheduling of independent tasks. The proposed approach is proactive and does not depend on failure or perturbation detection. The theoretical analysis of the proposed approach shows that it is linearly scalable and its cost decreases quadratically by increasing the system size. rDLB is integrated into an MPI DLS library to evaluate its performance experimentally with two computationally-intensive scientific applications. Results show that rDLB enables the tolerance of up to (P −1) processor failures, where P is the number of processors executing an application. In the presence of perturbations, rDLB boosted the robustness of DLS techniques up to 30 times and decreased application execution time up to 7 times compared to their counterparts without rDLB.
Faculties and Departments:05 Faculty of Science > Departement Mathematik und Informatik > Informatik > High Performance Computing (Ciorba)
UniBasel Contributors:Mohammed, Ali Omar Abdelazim and Cavelan, Aurélien and Ciorba, Florina M.
Item Type:Conference or Workshop Item, refereed
Conference or workshop item Subtype:Conference Paper
Publisher:IEEE
Note:Publication type according to Uni Basel Research Database: Conference paper
Last Modified:11 Mar 2020 12:36
Deposited On:11 Mar 2020 12:36

Repository Staff Only: item control page