Taxonomer: a fast and accurate metagenomics tool and its uses on clinical specimens

Update Item Information
Publication Type dissertation
School or College School of Medicine
Department Biomedical Informatics
Author Simmon, Keith Eugene
Title Taxonomer: a fast and accurate metagenomics tool and its uses on clinical specimens
Date 2016-05
Description Advances in sequencing technologies have made it possible to generate large amounts of microbiological sequence data without culture methods. The data generated pose a significant data analysis challenge. This is especially true in clinical diagnostics where accurate and timely diagnoses are key. To enable infectious disease diagnostics, we created Taxonomer, a kmer-based metagenomics software tool, which can rapidly process large amounts of sequence data with accuracy and precision similar to slower alignment-based approaches. A kmer is a nucleotide subsequence of k length. Kmer exact matching is performed in RAM, utilizing data structures with rapid query times, making kmer approaches magnitudes faster than alignment methods. Prior to Taxonomer, other kmer-based methods were subject to high false positive rates. Taxonomer differs by 1) providing a workflow that reduces false-positives, 2) including host-transcript profiling, and 3) providing a novel protein kmer tool to identify viruses, which are typically too divergent to reliably identify using nucleotide sequence. A web-based front-end was created with the D3 enabled iobio framework. Reference sets utilized in Taxonomer were obtained from NCBI, GreenGenes, unite, and uniprot databases. A wide-range of simulated datasets and real clinical specimens were created or obtained to evaluate Taxonomer. Taxonomer was compared to previously published pipelines (SURPI), classifiers (Kraken, RDP classifier), and sequence alignment methods (BLAST, SNAP, RapSearch2, DIAMOND). Taxonomer was also iv compared to a commercially available respiratory virus panel and utilized on a large cohort of pneumonia positive patients that had previously undergone extensive microbiological diagnostics. Taxonomer had agreement at 98.7% with SURPI to assign reads at the phylum level. Taxonomer, RDP classifier, and Kraken classified simulated 16S rRNA reads correctly at the species level at 59.5, 61.7, and 46.0%, respectively. Protein classification using reads derived from viruses showed similar sensitivity to alignment-based methods with RapSearch2, and DIAMOND but with slightly decreased analysis times. Taxonomer provides an accurate workflow for processing samples in a diagnostic setting. It identifies bacteria, fungi, virus, and human transcripts from clinical specimens with accuracy comparable to alignment methods. Its web-based front-end makes it accessible to laboratories without significant compute resources.
Type Text
Publisher University of Utah
Subject MESH software; metagenomics; sequence analysis; DNA; Sequence Analysis, RNA; High-Throughput Nucleotide Sequencing; Databases, Nucleic Acid; Algorithms; User-Computer Interface; Gene Expression Profiling; Classification; Bacteria; Fungi; Viruses; Web Browser; Medical Informatics; Datasets as Topic; False Positive Reactions; False Negative Reactions
Dissertation Institution University of Utah
Dissertation Name Doctor of Philosophy
Language eng
Relation is Version of Digital version of Taxonomer: A Fast and Accurate Metagenomics Tool and Its Uses On Clinical Specimens
Rights Management Copyright © Keith Eugene Simmon 2016
Format Medium application/pdf
Format Extent 5,829,440 bytes
Source Original in Marriott Library Special Collections
ARK ark:/87278/s6bp4jb4
Setname ir_etd
ID 1426441
Reference URL https://collections.lib.utah.edu/ark:/87278/s6bp4jb4
Back to Search Results