Testing and exposing weak graphics processing unit memory models

Update Item Information
Publication Type thesis
School or College College of Engineering
Department Computing
Author Sorensen, Tyler Rey
Title Testing and exposing weak graphics processing unit memory models
Date 2014-12
Description Graphics Processing Units (GPUs) are highly parallel shared memory microprocessors, and as such, they are prone to the same concurrency considerations as their traditional multicore CPU counterparts. In this thesis, we consider shared memory consistency, i.e. what values can be read when issued concurrently with writes on current GPU hardware. While memory consistency has been relatively well studied for CPUs, GPUs present substantially different concurrency systems with an explicit thread and memory hierarchy. Because documentation on GPU memory models is limited, it remains unclear what behaviors are allowed by current GPU implementations. To this end, this work focuses on testing shared memory consistency behavior on NVIDIA GPUs. We present a format for describing GPU memory consistency tests (dubbed GPU litmus tests) which includes the placement of testing threads into the GPU thread hierarchy (e.g. cooperative thread arrays, warps) and memory locations into GPU memory regions (e.g. shared, global). We then present a framework for running GPU litmus tests under system stress designed to trigger weak memory model behaviors, that is, executions that do not correspond to an interleaving of the instructions of the concurrent program. We discuss GPU specific incantations (i.e. heuristics) which we found to be crucial for observing weak memory model executions; these include bank conflicts and custom GPU memory stressing functions. We then report the results of running GPU litmus tests in this framework and show that we observe a controversial relaxed coherence behavior on older NVIDIA chips. We present several examples of published GPU applications which may exhibit unintended behavior due to the lack of fence synchronization; one such example is a spin-lock published in the popular CUDA by Example book. We then test several families of tests and compare our results to a proposed operational GPU memory model and show that the model is unsound (i.e. disallows behaviors that we observe on hardware). Our techniques are implemented in a modified version of a memory model testing tool named litmus.
Type Text
Subject GPU; Litmus tests; Memory consistency
Dissertation Institution University of Utah
Dissertation Name Master of Science
Language eng
Rights Management Copyright © Tyler Rey Sorensen 2014
Format Medium application/pdf
Format Extent 745,857 bytes
Identifier etd3/id/3258
ARK ark:/87278/s6rz2mbp
Setname ir_etd
ID 196823
Reference URL https://collections.lib.utah.edu/ark:/87278/s6rz2mbp
Back to Search Results