Description |
Decreasing cost of next-generation sequencing (NGS) has led to its increased usage in healthcare to aid in developing a patient diagnosis and treatment plan. Sequence data analysis includes the steps of sequence alignment, variant calling, variant annotation, filtering of results, and clinical interpretation of variants. Multiple software tools are available to perform each of these steps and can be chained together to create an analysis pipeline. Different pipelines can potentially differ in results which could impact genomic variant interpretation and patient outcomes. One source of variation comes from variant annotation, the prediction of the effect of a variant with regards to a reference sequence. This dissertation aims to address the discrepancies in genomic variant representation that arise from variant annotation. Sequencing technologies and their use in healthcare, the variant annotation process, standards used in variant annotation, and ways to measure discrepancy of variant effect annotations are reviewed. Next, the differences between variant annotation tool terminologies are quantified and the unification of these tool terminologies using the Sequence Ontology are presented. Then, a systematic comparison of variant annotation tools commonly used in research and clinical NGS pipelines is presented. Finally, an evaluation framework, which includes a set of test cases and a REST API, for determining discrepancies between tools that generate variant annotations in Human Genome Variation Society (HGVS) format will be presented. |