edoc

Resilient N-Body Tree Computations with Algorithm-Based Focused Recovery: Model and Performance Analysis

Cavelan, Aurélien and Fang, Aiman and Chien, Andrew A. and Robert, Yves. (2017) Resilient N-Body Tree Computations with Algorithm-Based Focused Recovery: Model and Performance Analysis. In: High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. PMBS 2017, 10724.

[img] PDF - Accepted Version
5Mb

Official URL: http://edoc.unibas.ch/56464/

Downloads: Statistics Overview

Abstract

This paper presents a model and performance study for Algorithm-Based Focused Recovery (ABFR) applied to N-body computations, subject to latent errors. We make a detailed comparison with the classical Checkpoint/Restart (CR) approach. While the model applies to general frameworks, the performance study is limited to perfect binary trees, due to the inherent difficulty of the analysis. With ABFR, the crucial parameter is the detection interval, which bounds the error latency. We show that the detection interval has a dramatic impact on the overhead, and that optimally choosing its value leads to significant gains over the CR approach.
Faculties and Departments:05 Faculty of Science > Departement Mathematik und Informatik > Informatik > High Performance Computing (Ciorba)
UniBasel Contributors:Cavelan, Aurélien and Ciorba, Florina M.
Item Type:Conference or Workshop Item, refereed
Conference or workshop item Subtype:Conference Paper
Publisher:Springer
ISBN:978-3-319-72970-1
e-ISBN:978-3-319-72971-8
Series Name:Lecture Notes in Computer Science
ISSN:0302-9743
Note:Publication type according to Uni Basel Research Database: Conference paper
Language:English
Identification Number:
Last Modified:18 May 2018 12:53
Deposited On:18 May 2018 12:52

Repository Staff Only: item control page