Description |
The rapidly decreasing of costs of sequencing is revolutionizing genetics. Two applications of next-generation sequencing data are of particular importance in this regard. First, high-throughput sequencing now offers a fast and inexpensive means to investigate the genomes and genetics of nonmodel organisms. Second, human personalgenomics data offer a unique opportunity for discovering the genetic basis of human traits and diseases. My PhD research has focused on developing computational methods to study genetics using next-generation sequencing data. In the first chapter of my thesis, I present a series of genome-based studies of the venomous cone snail Conus bullatus, a source of pharmaceutically important small cysteine-rich peptides called conopeptides or conotoxins. Using high-coverage transcriptome sequence from its venom duct together with low-coverage genomic reads, I have developed new methods to characterize key genomic traits in the absence of a complete reference genome, including genome size, sequence diversity, repeat content and mobile element densities. I have also developed an in silico transcriptomics pipeline for conotoxin discovery, and have used it to identify novel conotoxins as well as candidate enzymes that are likely to be involved in the posttranslational processing of conotoxins. In the second and the third chapters of my thesis, I describe a probabilistic disease-gene search algorithm VAAST (the Variant Annotation, Analysis and Search ! ! Tool) for finding damaged genes and their disease-causing variants; I also describe a powerful new extension to the original code-base called VAAST 2.0. In these chapters, I demonstrate that VAAST is both an accurate rare Mendelian disease-gene finder and a powerful means for identifying genes and alleles underlying common diseases. I have also carried systematic population-genetic simulations in order to benchmark the performance of VAAST and VAAST 2.0 under different genetic scenarios, and these demonstrate that VAAST 2.0 is the most robust and broadly applicable method available today for identification of genes involved in common genetic diseases such as breast cancer, hypertriglyceridemia and Crohn disease. |