Data scalable approach for identifying correlation in large and multidimensional data

Update Item Information
Title Data scalable approach for identifying correlation in large and multidimensional data
Publication Type dissertation
School or College College of Engineering
Department Computing
Author Nguyen, Hoa Thanh
Date 2017
Description Correlation is a powerful relationship measure used in many fields to estimate trends and make forecasts. When the data are complex, large, and high dimensional, correlation identification is challenging. Several visualization methods have been proposed to solve these problems, but they all have limitations in accuracy, speed, or scalability. In this dissertation, we propose a methodology that provides new visual designs that show details when possible and aggregates when necessary, along with robust interactive mechanisms that together enable quick identification and investigation of meaningful relationships in large and high-dimensional data. We propose four techniques using this methodology. Depending on data size and dimensionality, the most appropriate visualization technique can be provided to optimize the analysis performance. First, to improve correlation identification tasks between two dimensions, we propose a new correlation task-specific visualization method called correlation coordinate plot (CCP). CCP transforms data into a powerful coordinate system for estimating the direction and strength of correlations among dimensions. Next, we propose three visualization designs to optimize correlation identification tasks in large and multidimensional data. The first is snowflake visualization (Snowflake), a focus+context layout for exploring all pairwise correlations. The next proposed design is a new interactive design for representing and exploring data relationships in parallel coordinate plots (PCPs) for large data, called data scalable parallel coordinate plots (DSPCP). Finally, we propose a novel technique for storing and accessing the multiway dependencies through visualization (MultiDepViz). We evaluate these approaches by using various use cases, compare them to prior work, and generate user studies to demonstrate how our proposed approaches help users explore correlation in large data efficiently. Our results confirmed that CCP/Snowflake, DSPCP, and MultiDepViz methods outperform some current visualization techniques such as scatterplots (SCPs), PCPs, SCP matrix, Corrgram, Angular Histogram, and UntangleMap in both accuracy and timing. Finally, these approaches are applied in real-world applications such as a debugging tool, large-scale code performance data, and large-scale climate data.
Type Text
Publisher University of Utah
Subject Correlation Visualization; Data Scalable visualization; Data Visualization; Multidimensional Data
Dissertation Name Doctor of Philosophy
Language eng
Rights Management ©Hoa Thanh Nguyen
Format application/pdf
Format Medium application/pdf
ARK ark:/87278/s68w7jp6
Setname ir_etd
ID 1348632
Reference URL https://collections.lib.utah.edu/ark:/87278/s68w7jp6
Back to Search Results