Large-scale distributed runtime system for dag-based computational framework

Update Item Information
Publication Type dissertation
School or College College of Engineering
Department Computing
Author Meng, Qingyu
Title Large-scale distributed runtime system for dag-based computational framework
Date 2014-08
Description Recent trends in high performance computing present larger and more diverse computers using multicore nodes possibly with accelerators and/or coprocessors and reduced memory. These changes pose formidable challenges for applications code to attain scalability. Software frameworks that execute machine-independent applications code using a runtime system that shields users from architectural complexities oer a portable solution for easy programming. The Uintah framework, for example, solves a broad class of large-scale problems on structured adaptive grids using fluid-flow solvers coupled with particle-based solids methods. However, the original Uintah code had limited scalability as tasks were run in a predefined order based solely on static analysis of the task graph and used only message passing interface (MPI) for parallelism. By using a new hybrid multithread and MPI runtime system, this research has made it possible for Uintah to scale to 700K central processing unit (CPU) cores when solving challenging fluid-structure interaction problems. Those problems often involve moving objects with adaptive mesh refinement and thus with highly variable and unpredictable work patterns. This research has also demonstrated an ability to run capability jobs on the heterogeneous systems with Nvidia graphics processing unit (GPU) accelerators or Intel Xeon Phi coprocessors. The new runtime system for Uintah executes directed acyclic graphs of computational tasks with a scalable asynchronous and dynamic runtime system for multicore CPUs and/or accelerators/coprocessors on a node. Uintah's clear separation between application and runtime code has led to scalability increases without significant changes to application code. This research concludes that the adaptive directed acyclic graph (DAG)-based approach provides a very powerful abstraction for solving challenging multiscale multiphysics engineering problems. Excellent scalability with regard to the different processors and communications performance are achieved on some of the largest and most powerful computers available today.
Type Text
Publisher University of Utah
Subject Distributed system; Heterogeneous computing; MPI; Multithread; Uintah
Dissertation Institution University of Utah
Dissertation Name Doctor of Philosophy
Language eng
Rights Management Copyright © Qingyu Meng 2014
Format Medium application/pdf
Format Extent 3,531,336 bytes
Identifier etd3/id/3106
ARK ark:/87278/s6cg2z8p
Setname ir_etd
ID 196674
Reference URL https://collections.lib.utah.edu/ark:/87278/s6cg2z8p
Back to Search Results