Description |
This dissertation describes the development, implementation, and application of bioinformatic methods and tools. The objective was to provide enhanced methods for the identification of genetic factors underlying common, complex human diseases. In general, the developments were designed for genetic epidemiology applications to analyze candidate genes or regions with single nucleotide polymorphism (SNP) genotype data in resources of independent or pedigree-based subjects. The major theme of the methodological developments was to provide valid joint analyses across multiple SNPs to improve upon standard single SNP analyses. Methodological developments include a novel haplotype phasing and association analysis method, a haplotype-mining method, and a haplotype-haplotype interaction method, each of which allows for independent and/or related subjects. The novel haplotype-phasing algorithm and association method were implemented as additional modules in the Genie software. The haplotype-mining approach was implemented in the program, hapConstructor, which uses a stepwise heuristic to search for optimal multi-SNP associations. To explore gene-gene effects and identify interactions between unlinked genetic variants (that may be undetectable by single SNP or haplotype analyses), we implemented a gene-gene interaction module in the hapConstructor method. All of these novel developments were illustrated with applications to real and simulated data to demonstrate the utility and setting for such analyses. The main application was to a two-site breast cancer resource, including 3,888 subjects with data for 89 tagging-SNPs across seven genes in the apoptosis pathway. Applications to colon cancer and chronic lymphocytic leukemia (CLL) resources are also shown. |