Scalable formal dynamic verification of MPI programs through distributed causality tracking

Update Item Information
Publication Type dissertation
School or College College of Engineering
Department Computing
Author Vo, Anh
Title Scalable formal dynamic verification of MPI programs through distributed causality tracking
Date 2011-08
Description Almost all high performance computing applications are written in MPI, which will continue to be the case for at least the next several years. Given the huge and growing importance of MPI, and the size and sophistication of MPI codes, scalable and incisive MPI debugging tools are essential. Existing MPI debugging tools have, despite their strengths, many glaring de ficiencies, especially when it comes to debugging under the presence of nondeterminism related bugs, which are bugs that do not always show up during testing. These bugs usually become manifest when the systems are ported to di fferent platforms for production runs. This dissertation focuses on the problem of developing scalable dynamic verifi cation tools for MPI programs that can provide a coverage guarantee over the space of MPI nondeterminism. That is, the tools should be able to detect diff erent outcomes of nondeterministic events in an MPI program and enforce all those di fferent outcomes through repeated executions of the program with the same test harness. We propose to achieve the coverage guarantee by introducing efficient distributed causality tracking protocols that are based on the matches-before order. The matches-before order is introduced to address the shortcomings of the Lamport happens-before order [40], which is not sufficient to capture causality for MPI program executions due to the complexity of the MPI semantics. The two protocols we propose are the Lazy Lamport Clocks Protocol (LLCP) and the Lazy Vector Clocks Protocol (LVCP). LLCP provides good scalability with a small possibility of missing potential outcomes of nondeterministic events while LVCP provides full coverage guarantee with a scalability tradeoff . In practice, we show through our experiments that LLCP provides the same coverage as LVCP. This thesis makes the following contributions: •The MPI matches-before order that captures the causality between MPI events in an MPI execution. • Two distributed causality tracking protocols for MPI programs that rely on the matches-before order. • A Distributed Analyzer for MPI programs (DAMPI), which implements the two aforementioned protocols to provide scalable and modular dynamic verifi cation for MPI programs. • Scalability enhancement through algorithmic improvements for ISP, a dynamic verifi er for MPI programs.
Type Text
Publisher University of Utah
Subject Causality tracking; Correctness checking; MPI; Verification
Dissertation Institution University of Utah
Dissertation Name Doctor of Philosophy
Language eng
Rights Management Copyright © Anh Vo 2011
Format Medium application/pdf
Format Extent 2,278,392 bytes
Identifier us-etd3,36193
Source Original housed in Marriott Library Special Collections, QA3.5 2011 .V59
ARK ark:/87278/s6m04m59
Setname ir_etd
ID 194398
Reference URL https://collections.lib.utah.edu/ark:/87278/s6m04m59