Ontology based data integration (OBDI): a pipeline to integrate and model high dimensional data

Ontology based data integration (OBDI): a pipeline to integrate and model high dimensional data

Publication Type	dissertation
School or College	School of Medicine
Department	Biomedical Informatics
Author	Raghunath, Sharanya
Title	Ontology based data integration (OBDI): a pipeline to integrate and model high dimensional data
Date	2014-12
Description	Gene expression data repositories provide large and ever increasing data for secondary use by translational informatics methods. For example, Gene Expression Omnibus (GEO) houses over 37,000 experiments with the goal of supporting further research. To use these published results in a larger meta-analysis, consolidation of the data are needed; however, the data are largely unstructured, thus hindering data integration efforts. Here, I propose the use of a novel pipeline, Ontology Based Data Integration (OBDI), which uses an ontological approach to combine the samples across multiple GEO experiments. The ODBI pipeline uses machine learning algorithms that permit researchers to consolidate and analyze data across GEO experiments. Here, I demonstrate how using an ontological approach to integrate samples across experiments can be used to explore the immune response at a molecular level. As part of this process, a Web Ontology Language (OWL) was developed for each data platform used. OWL serves as a core component in successfully processing different sample types. Immunological experiments from GEO were consolidated to evaluate this methodology. The experiments included samples analyzed on expression arrays, BeadChips, and sequencing technologies. The integration of a complex biological system and the incorporation of different biological data types will validate the potential of OBDI. iv The nature of biological data is highly dimensional. OBDI incorporates tools and techniques that can handle the analysis of various biological data. The machine learning analysis performed within the OBDI pipeline successfully evaluated the newly annotated experiments and provides insights that can be further explored. The OBDI pipeline can help researchers annotate experiments using ontologies and analyze the annotated experiments. To successfully build the pipeline, ontologies served as the backbone of integrating samples from GEO Series records into machine learning experiments using ML-Flex. By using the OBDI pipeline, researchers can access the uncurated experiments from GEO (GEO Data Series) and annotate the data using the terms in the ontologies. This mechanism allows for the organization of data sets in relationship to new experiments independent of GEO's GDS curation process. The OBDI system allows ontologies to grow organically around a cluster of experiments. These experiments are then further analyzed in ML-Flex using machine learning algorithms. The curated experiments are analyzed in silico and the computational analyses are supported by the OBDI ontological system.
Type	Text
Publisher	University of Utah
Subject MESH	Biological Ontologies; Computational Biology; Database Management Systems; Algorithms; Machine Learning; Programming Languages; Software; User-Computer Interface; Systems Integration; Gene Expression Profiling; Immunity, Active; T-Lymphocyte Subsets; Databases, Genetic; Data Analysis; Microarray Analysis; Molecular Sequence Data ; Meta-Analysis as Topic; Databases as Topic; Datasets as Topic; Data Curation; Semantic Web
Dissertation Institution	University of Utah
Dissertation Name	Doctor of Philosophy
Language	eng
Relation is Version of	Digital reproduction of Ontology Based Data Integration (OBDI): A Pipeline to Integrate and Model High Dimensional Data
Rights Management	Copyright © Sharanya Raghunath 2014
Format Medium	application/pdf
Format Extent	3,583,085 bytes
Source	Original in Marriott Library Special Collections
ARK	ark:/87278/s6fb9h86
Setname	ir_etd
ID	1422297
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6fb9h86

Back to Search Results