Overcoming extreme-scale reproducibility challenges through a unified, targeted, and multilevel toolset

Update item information
Publication Type pre-print
School or College College of Engineering
Department Computing, School of
Creator Gopalakrishnan, Ganesh
Other Author Ahn, Dong H.; Lee, Gregory L.; Rakamarić, Zvonimir; Schulz, Martin; Laguna, Ignacio
Title Overcoming extreme-scale reproducibility challenges through a unified, targeted, and multilevel toolset
Date 2013-01-01
Description Reproducibility, the ability to repeat program executions with the same numerical result or code behavior, is crucial for computational science and engineering applications. However, non-determinism in concurrency scheduling often hampers achieving this ability on high performance computing (HPC) systems. To aid in managing the adverse effects of non-determinism, prior work has provided techniques to achieve bit-precise reproducibility, but most of them focus only on small-scale parallelism. While scalable techniques recently emerged, they are disparate and target special purposes, e.g., single-schedule domains. On current systems with O(106) compute cores and future ones with O(109), any technique that does not embrace a unied, targeted, and multilevel approach will fall short of providing reproducibility. In this paper, we argue for a common toolset that embodies this approach, where programmers select and compose complementary tools and can effectively, yet scalably, analyze, control, and eliminate sources of non-determinism at scale. This allows users to gain reproducibility only to the levels demanded by specific code development needs. We present our research agenda and ongoing work toward this goal.
Type Text
Publisher Institute of Electrical and Electronics Engineers (IEEE)
Issue 41
First Page 44
Language eng
Bibliographic Citation Ahn, D. H., Lee, G. L., Gopalakrishnan, G., Rakamarić, Z., Schulz, M., & Laguna, I. (2013). Overcoming extreme-scale reproducibility challenges through a unified, targeted, and multilevel toolset. Proc. of SE-HPCCSE 2013: 1st Int. Workshop on SE for High Performance Computing in Computational Science and Engineering, 41-4.
Rights Management (c) 2013 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Format Medium application/pdf
Format Extent 432,881 bytes
Identifier uspace,18445
ARK ark:/87278/s64b698s
Setname ir_uspace
Date Created 2014-03-11
Date Modified 2014-03-11
ID 711702
Reference URL https://collections.lib.utah.edu/ark:/87278/s64b698s
Back to Search Results