Ontology based data integration (OBDI): a pipeline to integrate and model high dimensional data

Update Item Information
Publication Type dissertation
School or College School of Medicine
Department Biomedical Informatics
Author Raghunath, Sharanya
Title Ontology based data integration (OBDI): a pipeline to integrate and model high dimensional data
Date 2014-12
Description Gene expression data repositories provide large and ever increasing data for secondary use by translational informatics methods. For example, Gene Expression Omnibus (GEO) houses over 37,000 experiments with the goal of supporting further research. To use these published results in a larger meta-analysis, consolidation of the data are needed; however, the data are largely unstructured, thus hindering data integration efforts. Here, I propose the use of a novel pipeline, Ontology Based Data Integration (OBDI), which uses an ontological approach to combine the samples across multiple GEO experiments. The ODBI pipeline uses machine learning algorithms that permit researchers to consolidate and analyze data across GEO experiments. Here, I demonstrate how using an ontological approach to integrate samples across experiments can be used to explore the immune response at a molecular level. As part of this process, a Web Ontology Language (OWL) was developed for each data platform used. OWL serves as a core component in successfully processing different sample types. Immunological experiments from GEO were consolidated to evaluate this methodology. The experiments included samples analyzed on expression arrays, BeadChips, and sequencing technologies. The integration of a complex biological system and the incorporation of different biological data types will validate the potential of OBDI. iv The nature of biological data is highly dimensional. OBDI incorporates tools and techniques that can handle the analysis of various biological data. The machine learning analysis performed within the OBDI pipeline successfully evaluated the newly annotated experiments and provides insights that can be further explored. The OBDI pipeline can help researchers annotate experiments using ontologies and analyze the annotated experiments. To successfully build the pipeline, ontologies served as the backbone of integrating samples from GEO Series records into machine learning experiments using ML-Flex. By using the OBDI pipeline, researchers can access the uncurated experiments from GEO (GEO Data Series) and annotate the data using the terms in the ontologies. This mechanism allows for the organization of data sets in relationship to new experiments independent of GEO's GDS curation process. The OBDI system allows ontologies to grow organically around a cluster of experiments. These experiments are then further analyzed in ML-Flex using machine learning algorithms. The curated experiments are analyzed in silico and the computational analyses are supported by the OBDI ontological system.
Type Text
Publisher University of Utah
Subject MESH Biological Ontologies; Computational Biology; Database Management Systems; Algorithms; Machine Learning; Programming Languages; Software; User-Computer Interface; Systems Integration; Gene Expression Profiling; Immunity, Active; T-Lymphocyte Subsets; Databases, Genetic; Data Analysis; Microarray Analysis; Molecular Sequence Data ; Meta-Analysis as Topic; Databases as Topic; Datasets as Topic; Data Curation; Semantic Web
Dissertation Institution University of Utah
Dissertation Name Doctor of Philosophy
Language eng
Relation is Version of Digital reproduction of Ontology Based Data Integration (OBDI): A Pipeline to Integrate and Model High Dimensional Data
Rights Management Copyright © Sharanya Raghunath 2014
Format Medium application/pdf
Format Extent 3,583,085 bytes
Source Original in Marriott Library Special Collections
ARK ark:/87278/s6fb9h86
Setname ir_etd
ID 1422297
Reference URL https://collections.lib.utah.edu/ark:/87278/s6fb9h86
Back to Search Results