Download PDFOpen PDF in browser

Handling Soft Errors in Krylov Subspace Methods by Exploiting Their Numerical Properties

EasyChair Preprint 4382

10 pagesDate: October 12, 2020

Abstract

Krylov space methods are a popular means for solving sparse systems. In this paper, we consider three such methods: GMRES, Conjugate Gradient (CG) and Conjugate Residual (CR). We focus on the problem of efficiently and accurately detecting soft errors leading to silent data corruption (SDC) for each of these methods. Unlike a limited amount of previous work in this area, our work is driven by analysis of mathematical properties of the methods. We identify a term we refer to as energy norm, which is decreasing for our target class of methods. We also show other applications of error norm and residual value, and expand the set of algorithms to which they can be applied. We have extensively evaluated our method considering three distinct dimensions: accuracy, the magnitude of undetected errors, and runtime overhead. First, we show that our methods have a high detection accuracy rate. We gain over 90\% detection rate for GMRES in most of the scenarios and matrices. For most cases in CG and CR, we gain over 70\% detection rate as well. Second, we show that for soft errors that are not detected by our methods, the resulting inaccuracy in the final results is small. Finally, we also show that the run-time overheads of our method are low.

Keyphrases: Conjugate Residual, GMRES, Krylov subspace, conjugate gradients, fault tolerance, iterative solvers

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:4382,
  author    = {Muhammed Emin Ozturk and Gagan Agrawal and Yukun Li and Ching-Shan Chou},
  title     = {Handling Soft Errors in Krylov Subspace Methods by Exploiting Their Numerical Properties},
  howpublished = {EasyChair Preprint 4382},
  year      = {EasyChair, 2020}}
Download PDFOpen PDF in browser