Improved methods for next generation sequencing-based conotoxin discovery

Improved methods for next generation sequencing-based conotoxin discovery

Title	Improved methods for next generation sequencing-based conotoxin discovery
Publication Type	dissertation
School or College	School of Medicine
Department	Human Genetics
Author	Li, Qing
Date	2017
Description	Cone snails (genus Conus) have attracted scientific interest for the great neuropharmacological potential of their venoms to treat chronic pain, which consist of a complex mixture of peptides known as conotoxins. For discovery purposes, we have carried out a survey of the venom-ducts of 22 Conus species using next generation high throughput RNAseq (NGS). In silico analyses of these data are complicated because paralogous conotoxin precursors display both highly conserved, as well as hyper varied regions. As a result, NGS-based discovery involves an inherent trade off between fidelity of transcript assembly and sensitivity towards novel discovery. On the one hand, overly lenient assembly parameters create a few, long, but misassembled chimeric transcripts, which lessen the true discovery potential of NGS. On the other hand, overly stringent assembly parameters can mistake sequencing artifacts as novel discoveries. Moreover, many new conotoxins likely remain undiscovered. This fact can complicate homology-based discovery efforts using tools such as BLAST because reference databases may lack homologous peptides, leading to false negative results. With these problems in mind, I developed a comprehensive pipeline for discovery of conotoxins and their modification enzymes from high throughput RNAseq data. My pipeline includes (1) simulation software for benchmarking purposes, (2) a ‘partial extension pipeline' that employs a novel kmerization tool called Taxonomer to rapidly cluster and taxonomically classify reads prior to assembly, and (3) a discovery engine that can identify novel conotoxins even when they lack significant homologs. Collectively, my pipeline maximizes the discovery potential of Conus RNAseq data, identifying on average ~ 30% more full length toxins per sample than any other than approach in use today.
Type	Text
Publisher	University of Utah
Subject	Bioinformatics
Dissertation Name	Doctor of Philosophy
Language	eng
Rights Management	(c) Qing Li
Format	application/pdf
Format Medium	application/pdf
ARK	ark:/87278/s65j2b2c
Setname	ir_etd
ID	1484650
Reference URL	https://collections.lib.utah.edu/ark:/87278/s65j2b2c

Back to Search Results