Supporting scalable data analytics on large linked data

Supporting scalable data analytics on large linked data

Title	Supporting scalable data analytics on large linked data
Publication Type	dissertation
School or College	College of Engineering
Department	Computing
Author	Le, Wangchao
Date	2013-12
Description	Linked data are the de-facto standard in publishing and sharing data on the web. To date, we have been inundated with large amounts of ever-increasing linked data in constantly evolving structures. The proliferation of the data and the need to access and harvest knowledge from distributed data sources motivate us to revisit several classic problems in query processing and query optimization. The problem of answering queries over views is commonly encountered in a number of settings, including while enforcing security policies to access linked data, or when integrating data from disparate sources. We approach this problem by efficiently rewriting queries over the views to equivalent queries over the underlying linked data, thus avoiding the costs entailed by view materialization and maintenance. An outstanding problem of query rewriting is the number of rewritten queries is exponential to the size of the query and the views, which motivates us to study problem of multiquery optimization in the context of linked data. Our solutions are declarative and make no assumption for the underlying storage, i.e., being store-independent. Unlike relational and XML data, linked data are schema-less. While tracking the evolution of schema for linked data is hard, keyword search is an ideal tool to perform data integration. Existing works make crippling assumptions for the data and hence fall short in handling massive linked data with tens to hundreds of millions of facts. Our study for keyword search on linked data brought together the classical techniques in the literature and our novel ideas, which leads to much better query efficiency and quality of the results. Linked data also contain rich temporal semantics. To cope with the ever-increasing data, we have investigated how to partition and store large temporal or multiversion linked data for distributed and parallel computation, in an effort to achieve load-balancing to support scalable data analytics for massive linked data.
Type	Text
Publisher	University of Utah
Subject	Database systems; Linked data; Query optimization; Query rewriting; RDF; SPARQL
Dissertation Institution	University of Utah
Dissertation Name	Doctor of Philosophy
Language	eng
Rights Management	Copyright © Wangchao Le 2013
Format	application/pdf
Format Medium	application/pdf
Format Extent	2,009,415 bytes
Identifier	etd3/id/2660
ARK	ark:/87278/s6380hw2
Setname	ir_etd
ID	196235
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6380hw2

Back to Search Results