Testing and exposing weak graphics processing unit memory models

Testing and exposing weak graphics processing unit memory models

Title	Testing and exposing weak graphics processing unit memory models
Publication Type	thesis
School or College	College of Engineering
Department	Computing
Author	Sorensen, Tyler Rey
Date	2014-12
Description	Graphics Processing Units (GPUs) are highly parallel shared memory microprocessors, and as such, they are prone to the same concurrency considerations as their traditional multicore CPU counterparts. In this thesis, we consider shared memory consistency, i.e. what values can be read when issued concurrently with writes on current GPU hardware. While memory consistency has been relatively well studied for CPUs, GPUs present substantially different concurrency systems with an explicit thread and memory hierarchy. Because documentation on GPU memory models is limited, it remains unclear what behaviors are allowed by current GPU implementations. To this end, this work focuses on testing shared memory consistency behavior on NVIDIA GPUs. We present a format for describing GPU memory consistency tests (dubbed GPU litmus tests) which includes the placement of testing threads into the GPU thread hierarchy (e.g. cooperative thread arrays, warps) and memory locations into GPU memory regions (e.g. shared, global). We then present a framework for running GPU litmus tests under system stress designed to trigger weak memory model behaviors, that is, executions that do not correspond to an interleaving of the instructions of the concurrent program. We discuss GPU specific incantations (i.e. heuristics) which we found to be crucial for observing weak memory model executions; these include bank conflicts and custom GPU memory stressing functions. We then report the results of running GPU litmus tests in this framework and show that we observe a controversial relaxed coherence behavior on older NVIDIA chips. We present several examples of published GPU applications which may exhibit unintended behavior due to the lack of fence synchronization; one such example is a spin-lock published in the popular CUDA by Example book. We then test several families of tests and compare our results to a proposed operational GPU memory model and show that the model is unsound (i.e. disallows behaviors that we observe on hardware). Our techniques are implemented in a modified version of a memory model testing tool named litmus.
Type	Text
Subject	GPU; Litmus tests; Memory consistency
Dissertation Institution	University of Utah
Dissertation Name	Master of Science
Language	eng
Rights Management	Copyright © Tyler Rey Sorensen 2014
Format	application/pdf
Format Medium	application/pdf
Format Extent	745,857 bytes
Identifier	etd3/id/3258
ARK	ark:/87278/s6rz2mbp
Setname	ir_etd
ID	196823
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6rz2mbp

Back to Search Results