Design of Robust Scheduling Methodologies in High Performance Computing

Mohammed, Ali and Ciorba, Florina M.. (2019) Design of Robust Scheduling Methodologies in High Performance Computing. PhD Forum Poster at the 34th International Conference on High Performance Computing (ISC).

Full text not available from this repository.

Official URL: https://edoc.unibas.ch/80772/

Downloads: Statistics Overview


Scientific applications are often irregular and characterized by large computationally intensive parallel loops. The performance of scientific applications on high performance computing (HPC) systems may degrade due to load imbalance. Load imbalance may be caused by irregular computational load per loop iteration or irregular and unpredictable computing system characteristics. Dynamic loop scheduling (DLS) techniques improve the performance of computationally-intensive scientific applications by balancing the load during their execution. A number of dynamic loop scheduling (DLS) techniques have been proposed between the late 1980s and early 2000s and used efficiently in scientific applications. HPC systems have significantly advanced in recent decades and are continuing to grow in terms of computing power and memory. State-of-the-art HPC systems have several million processing cores and approximately 1 petabyte of system memory. However, achieving a balanced load execution of scientific applications on such systems is challenging due to systems heterogeneity, unpredictable performance variations, perturbations, and faults. My Ph.D. aims to improve the performance of computationally-intensive scientific applications on HPC systems via robust load balancing under unpredictable application and system characteristics. Given the significant advancement of HPC systems, the computing systems on which DLS techniques have initially been tested and validated are no longer available. Therefore, this work is concerned with the minimization of the sources of uncertainty in the implementation of DLS techniques to avoid unnecessary influences on the performance of scientific applications. It is essential to ensure that the DLS techniques employed in scientific applications today adhere to their original design goals and specifications and attain trust in the implementation of DLS techniques in today's studies. To achieve this goal, verification of DLS techniques implementation via the reproduction of selected experiments [1] was performed via simulative and native experimentation. Simulation alleviates a large number of exploratory native experiments required to optimize applications performance, which may not always be feasible or practical due to associated time and costs. Bridging the native and simulative executions of parallel applications is needed for attaining trustworthiness in simulation results. To this end, a methodology for bridging the native and simulative executions of parallel applications on HPC systems is devised in this work. The experiments presented in this poster confirm that the simulation reproduces the performance achieved on the past computing platform and accurately predicts the performance achieved on the present computing platform. The performance reproduction and prediction confirm that the present implementation of the DLS techniques considered both, in simulation and natively, adheres to their original description. Using the above simulation methodology, trusted simulation of application performance was leveraged to achieve a balanced execution under perturbations via simulated assisted scheduling (SimAS). SimAS is a new control-theoretic inspired approach that predicts and selects DLS techniques that improve the performance under certain execution scenarios. The performance results confirm that the SimAS-based DLS selection delivered improved application performance in most experiments.
Faculties and Departments:05 Faculty of Science > Departement Mathematik und Informatik > Informatik > High Performance Computing (Ciorba)
UniBasel Contributors:Mohammed, Ali Omar Abdelazim and Ciorba, Florina M.
Item Type:Other
Note:Publication type according to Uni Basel Research Database: Other publications
Related URLs:
Last Modified:19 Jan 2021 07:39
Deposited On:19 Jan 2021 07:39

Repository Staff Only: item control page