Reducing irregularities in control flow and memory access on graphics processing unit architectures

Reducing irregularities in control flow and memory access on graphics processing unit architectures

Publication Type	dissertation
School or College	College of Engineering
Department	Computing
Author	King, James Sokhom
Title	Reducing irregularities in control flow and memory access on graphics processing unit architectures
Date	2017
Description	Memory access irregularities are a major bottleneck for bandwidth limited problems on Graphics Processing Unit (GPU) architectures. GPU memory systems are designed to allow consecutive memory accesses to be coalesced into a single memory access. Noncontiguous accesses within a parallel group of threads working in lock step may cause serialized memory transfers. Irregular algorithms may have data-dependent control flow and memory access, which requires runtime information to be evaluated. Compile time methods for evaluating parallelism, such as static dependence graphs, are not capable of evaluating irregular algorithms. The goals of this dissertation are to study irregularities within the context of unstructured mesh and sparse matrix problems, analyze the impact of vectorization widths on irregularities, and present data-centric methods that improve control flow and memory access irregularity within those contexts. Reordering associative operations has often been exploited for performance gains in parallel algorithms. This dissertation presents a method for associative reordering of stencil computations over unstructured meshes that increases data reuse through caching. This novel parallelization scheme offers considerable speedups over standard methods. Vectorization widths can have significant impact on performance in vectorized computations. Although the hardware vector width is generally fixed, the logical vector width used within a computation can range from one up to the width of the computation. Significant performance differences can occur due to thread scheduling and resource limitations. This dissertation analyzes the impact of vectorization widths on dense numerical computations such as 3D dG postprocessing. It is difficult to efficiently perform dynamic updates on traditional sparse matrix formats. Explicitly controlling memory segmentation allows for in-place dynamic updates in sparse matrices. Dynamically updating the matrix without rebuilding or sorting greatly improves processing time and overall throughput. This dissertation presents a new sparse matrix format, dynamic compressed sparse row (DCSR), which allows for dynamic streaming updates to a sparse matrix. A new method for parallel sparse matrix-matrix multiplication (SpMM) that uses dynamic updates is also presented.
Type	Text
Publisher	University of Utah
Subject	GPU; Linear Algebra; SIMD
Dissertation Name	Doctor of Philosophy
Language	eng
Rights Management	©James Sokhom King
Format Medium	application/pdf
ARK	ark:/87278/s62r7x0j
Setname	ir_etd
ID	1347651
Reference URL	https://collections.lib.utah.edu/ark:/87278/s62r7x0j

Back to Search Results