Improved methods for next generation sequencing-based conotoxin discovery

Update Item Information
Publication Type dissertation
School or College School of Medicine
Department Human Genetics
Author Li, Qing
Title Improved methods for next generation sequencing-based conotoxin discovery
Date 2017
Description Cone snails (genus Conus) have attracted scientific interest for the great neuropharmacological potential of their venoms to treat chronic pain, which consist of a complex mixture of peptides known as conotoxins. For discovery purposes, we have carried out a survey of the venom-ducts of 22 Conus species using next generation high throughput RNAseq (NGS). In silico analyses of these data are complicated because paralogous conotoxin precursors display both highly conserved, as well as hyper varied regions. As a result, NGS-based discovery involves an inherent trade off between fidelity of transcript assembly and sensitivity towards novel discovery. On the one hand, overly lenient assembly parameters create a few, long, but misassembled chimeric transcripts, which lessen the true discovery potential of NGS. On the other hand, overly stringent assembly parameters can mistake sequencing artifacts as novel discoveries. Moreover, many new conotoxins likely remain undiscovered. This fact can complicate homology-based discovery efforts using tools such as BLAST because reference databases may lack homologous peptides, leading to false negative results. With these problems in mind, I developed a comprehensive pipeline for discovery of conotoxins and their modification enzymes from high throughput RNAseq data. My pipeline includes (1) simulation software for benchmarking purposes, (2) a ‘partial extension pipeline' that employs a novel kmerization tool called Taxonomer to rapidly cluster and taxonomically classify reads prior to assembly, and (3) a discovery engine that can identify novel conotoxins even when they lack significant homologs. Collectively, my pipeline maximizes the discovery potential of Conus RNAseq data, identifying on average ~ 30% more full length toxins per sample than any other than approach in use today.
Type Text
Publisher University of Utah
Subject Bioinformatics
Dissertation Name Doctor of Philosophy
Language eng
Rights Management (c) Qing Li
Format Medium application/pdf
ARK ark:/87278/s65j2b2c
Setname ir_etd
ID 1484650
Reference URL https://collections.lib.utah.edu/ark:/87278/s65j2b2c
Back to Search Results