Scalable formal dynamic verification of MPI programs through distributed causality tracking

Scalable formal dynamic verification of MPI programs through distributed causality tracking

Publication Type	dissertation
School or College	College of Engineering
Department	Computing
Author	Vo, Anh
Title	Scalable formal dynamic verification of MPI programs through distributed causality tracking
Date	2011-08
Description	Almost all high performance computing applications are written in MPI, which will continue to be the case for at least the next several years. Given the huge and growing importance of MPI, and the size and sophistication of MPI codes, scalable and incisive MPI debugging tools are essential. Existing MPI debugging tools have, despite their strengths, many glaring de ficiencies, especially when it comes to debugging under the presence of nondeterminism related bugs, which are bugs that do not always show up during testing. These bugs usually become manifest when the systems are ported to di fferent platforms for production runs. This dissertation focuses on the problem of developing scalable dynamic verifi cation tools for MPI programs that can provide a coverage guarantee over the space of MPI nondeterminism. That is, the tools should be able to detect diff erent outcomes of nondeterministic events in an MPI program and enforce all those di fferent outcomes through repeated executions of the program with the same test harness. We propose to achieve the coverage guarantee by introducing efficient distributed causality tracking protocols that are based on the matches-before order. The matches-before order is introduced to address the shortcomings of the Lamport happens-before order [40], which is not sufficient to capture causality for MPI program executions due to the complexity of the MPI semantics. The two protocols we propose are the Lazy Lamport Clocks Protocol (LLCP) and the Lazy Vector Clocks Protocol (LVCP). LLCP provides good scalability with a small possibility of missing potential outcomes of nondeterministic events while LVCP provides full coverage guarantee with a scalability tradeoff . In practice, we show through our experiments that LLCP provides the same coverage as LVCP. This thesis makes the following contributions: •The MPI matches-before order that captures the causality between MPI events in an MPI execution. • Two distributed causality tracking protocols for MPI programs that rely on the matches-before order. • A Distributed Analyzer for MPI programs (DAMPI), which implements the two aforementioned protocols to provide scalable and modular dynamic verifi cation for MPI programs. • Scalability enhancement through algorithmic improvements for ISP, a dynamic verifi er for MPI programs.
Type	Text
Publisher	University of Utah
Subject	Causality tracking; Correctness checking; MPI; Verification
Dissertation Institution	University of Utah
Dissertation Name	Doctor of Philosophy
Language	eng
Rights Management	Copyright © Anh Vo 2011
Format Medium	application/pdf
Format Extent	2,278,392 bytes
Identifier	us-etd3,36193
Source	Original housed in Marriott Library Special Collections, QA3.5 2011 .V59
ARK	ark:/87278/s6m04m59
Setname	ir_etd
ID	194398
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6m04m59