Efficient summarization techniques for massive data

Update Item Information
Title Efficient summarization techniques for massive data
Publication Type dissertation
School or College College of Engineering
Department Computing
Author Jestes, Jeffrey
Date 2013-12
Description We are living in an age where data are being generated faster than anyone has previously imagined across a broad application domain, including customer studies, social media, sensor networks, and the sciences, among many others. In some cases, data are generated in massive quantities as terabytes or petabytes. There have been numerous emerging challenges when dealing with massive data, including: (1) the explosion in size of data; (2) data have increasingly more complex structures and rich semantics, such as representing temporal data as a piecewise linear representation; (3) uncertain data are becoming a common occurrence for numerous applications, e.g., scientific measurements or observations such as meteorological measurements; (4) and data are becoming increasingly distributed, e.g., distributed data collected and integrated from distributed locations as well as data stored in a distributed file system within a cluster. Due to the massive nature of modern data, it is oftentimes infeasible for computers to efficiently manage and query them exactly. An attractive alternative is to use data summarization techniques to construct data summaries, where even efficiently constructing data summaries is a challenging task given the enormous size of data. The data summaries we focus on in this thesis include the histogram and ranking operator. Both data summaries enable us to summarize a massive dataset to a more succinct representation which can then be used to make queries orders of magnitude more efficient while still allowing approximation guarantees on query answers. Our study has focused on the critical task of designing efficient algorithms to summarize, query, and manage massive data.
Type Text
Publisher University of Utah
Subject Big data; Data analytics; Efficient queries; Massive data; Summaries; Computer science
Dissertation Institution University of Utah
Dissertation Name Doctor of Philosophy
Language eng
Rights Management © Jeffrey Jestes
Format application/pdf
Format Medium application/pdf
Format Extent 1,814,296 bytes
Identifier etd3/id/2645
ARK ark:/87278/s61g3vdh
Setname ir_etd
ID 196220
Reference URL https://collections.lib.utah.edu/ark:/87278/s61g3vdh