| Title | Genetic variation in toll-like receptor 7 and toll-like receptor 8 in humans and howler monkeys and potential implications for susceptibility to yellow fever virus |
| Publication Type | dissertation |
| School or College | College of Social & Behavioral Science |
| Department | Anthropology |
| Author | Torosin, Nicole S. |
| Date | 2019 |
| Description | Yellow fever virus (YFV) is a reemerging disease affecting 200,000 people annually in Africa and South America. In order to investigate genetic factors underlying YFV susceptibility, I studied Toll-like receptor (TLR) 7 and TLR8 in primates that live within the same geographic region and share the pathogenic pressure of YFV. In all primates, single-stranded RNA (ssRNA) viruses such as YFV are detected by TLR7 and TLR8, making them likely candidates for mediation of YFV susceptibility. In 2007-2009, a major YFV outbreak in Northern Argentina decimated howler monkey (Alouatta) populations. I explored the relationship between TLR7 and TLR8 genes and YFV susceptibility using samples from Alouatta individuals alive before the YFV outbreak, individuals that were exposed to the outbreak, individuals alive after the outbreak, as well as nearby human populations. I compared patterns of selection on the genus Alouatta to other primates, measured genetic divergence between Alouatta YFV exposure groups, and evaluated genetic differences in Alouatta for functional consequences. I also used published DNA sequences from Native Americans to measure positive selection and differentiation of sites within TLR7 and TLR8 in populations in YFV endemic areas. While I did not identify common sites under selection in humans and howler monkeys, I discovered 1) that TLR7 is under strong purifying selection in the Alouatta genus; 2) that the species Alouatta guariba clamitans has positively selected sites within functionally important TLR7 regions not seen in any other primates; 3) that in Native American human populations in YFV endemic areas, the minor allele at rs179008 in TLR7, associated with progression of ssRNA viruses, is at the greatest frequency. |
| Type | Text |
| Publisher | University of Utah |
| Dissertation Name | Doctor of Philosophy |
| Language | eng |
| Rights Management | © Nicole S. Torosin |
| Format | application/pdf |
| Format Medium | application/pdf |
| ARK | ark:/87278/s6zq07f5 |
| Setname | ir_etd |
| ID | 1714358 |
| OCR Text | Show GENETIC VARIATION IN TOLL-LIKE RECEPTOR 7 AND TOLL-LIKE RECEPTOR 8 IN HUMANS AND HOWLER MONKEYS AND POTENTIAL IMPLICATIONS FOR SUSCEPTIBILITY TO YELLOW FEVER VIRUS by Nicole S. Torosin A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Anthropology The University of Utah August 2019 Copyright c Nicole S. Torosin 2019 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of Nicole S. Torosin has been approved by the following supervisory committee members: Leslie A. Knapp , Chair(s) 30 April 2019 Date Approved Patrice Showers Corneli , Member 30 April 2019 Date Approved Timothy H. Webster , Member 30 April 2019 Date Approved Lynn B. Jorde , Member 30 April 2019 Date Approved Dennis H. O’Rourke , Member 30 April 2019 Date Approved by Leslie A. Knapp , Chair/Dean of the Department/College/School of Anthropology and by David B. Kieda , Dean of The Graduate School. ABSTRACT Yellow fever virus (YFV) is a reemerging disease affecting 200,000 people annually in Africa and South America. In order to investigate genetic factors underlying YFV susceptibility, I studied Toll-like receptor (TLR) 7 and TLR8 in primates that live within the same geographic region and share the pathogenic pressure of YFV. In all primates, single-stranded RNA (ssRNA) viruses such as YFV are detected by TLR7 and TLR8, making them likely candidates for mediation of YFV susceptibility. In 2007-2009, a major YFV outbreak in Northern Argentina decimated howler monkey (Alouatta) populations. I explored the relationship between TLR7 and TLR8 genes and YFV susceptibility using samples from Alouatta individuals alive before the YFV outbreak, individuals that were exposed to the outbreak, individuals alive after the outbreak, as well as nearby human populations. I compared patterns of selection on the genus Alouatta to other primates, measured genetic divergence between Alouatta YFV exposure groups, and evaluated genetic differences in Alouatta for functional consequences. I also used published DNA sequences from Native Americans to measure positive selection and differentiation of sites within TLR7 and TLR8 in populations in YFV endemic areas. While I did not identify common sites under selection in humans and howler monkeys, I discovered 1) that TLR7 is under strong purifying selection in the Alouatta genus; 2) that the species Alouatta guariba clamitans has positively selected sites within functionally important TLR7 regions not seen in any other primates; 3) that in Native American human populations in YFV endemic areas, the minor allele at rs179008 in TLR7, associated with progression of ssRNA viruses, is at the greatest frequency. CONTENTS ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix CHAPTERS 1. 2. 3. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 1.2 1.3 1.4 History and significance of YFV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Primate susceptibility to YFV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Toll-like receptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The comparative approach and this dissertation . . . . . . . . . . . . . . . . . . . . . . . . 1 3 4 5 COMPARING THE SELECTIVE LANDSCAPE OF TLR7 AND TLR8 ACROSS PRIMATES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Sample collection and processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Published data and sequence alignment . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Phylogenetic analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4 Maximum likelihood selection analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.5 Functional assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Phylogenetic trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Substitution rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Branch site test and functional analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 10 11 11 11 12 12 12 12 13 14 INDENTIFYING FUNCTIONALLY IMPORTANT VARIANTS IN TLR7 AND TLR8 IN A. CARAYA AND A. GUARIBA CLAMITANS IN NORTHERN ARGENTINA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Sample background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Sample verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.4 Shotgun sequencing and data processing . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.5 Targeted sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 22 22 23 24 24 25 26 3.2.1 Samples and sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Diversity and divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Coding sequence variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. EXPLORING TOLL-LIKE RECEPTOR 7 AND 8 VARIATION IN GLOBAL HUMAN POPULATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Brazilian Native American data and processing . . . . . . . . . . . . . . . . . . . . 4.1.2 SGDP data and processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Brazilian Native American data and SGDP data processing and phasing 4.1.4 Fst results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.5 Differentiation of variants in YFV endemic Native American populations 4.1.6 Integrated selection of allele favored by evolution (iSAFE) . . . . . . . . . . . 4.1.7 Integrated haplotype score (iHS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.8 Population branch statistic (PBS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.9 Cross population extended haplotype homozygosity (XP-EHH) . . . . . . 4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Fst results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Differentiation of variants in YFV endemic Native American populations 4.2.3 iSAFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 iHS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.5 PBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.6 XP-EHH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Methodological considerations with X chromosome analyses . . . . . . . . . 5. 26 26 27 27 28 35 35 35 36 36 36 37 37 38 38 39 39 39 40 40 40 41 41 42 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 APPENDICES A. CHAPTER 2 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 B. CHAPTER 3 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 C. CHAPTER 4 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 v LIST OF FIGURES 1.1 TLR7 and TLR8 innate immunity signaling pathway. . . . . . . . . . . . . . . . . . . . . . 7 1.2 TLR7 and TLR8 adaptive immunity signaling pathway. . . . . . . . . . . . . . . . . . . . 7 2.1 TLR7 maximum likelihood phylogenetic tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 TLR8 maximum likelihood phylogenetic tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 Yellow fever virus transmission cycle between humans and howler monkeys. . 31 3.2 Map of samples used in this study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1 Current YFV endemic range highlighted in pale yellow . . . . . . . . . . . . . . . . . . . 46 4.2 Historic YFV endemic range highlighted in pale yellow . . . . . . . . . . . . . . . . . . . 47 4.3 Neighbor-joining tree of relationships between East Asian and American populations based on pairwise divergence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.4 Human migration into North America. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.5 iSAFE on TLR7 and TLR8 in Brazilian Native American populations. . . . . . . . . 49 4.6 iSAFE on TLR7 and TLR8 in SGDP Native American populations within YFV endemic region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.7 iSAFE on TLR7 and TLR8 in Brazilian Native American populations and SGDP Native American populations in modern YFV endemic range. . . . . . . . . 50 4.8 iSAFE on TLR7 and TLR8 in Brazilian Native American populations and SGDP Native American populations in North and South America. . . . . . . . . . . 50 4.9 iHS analysis on TLR7 and TLR8 in Brazilian Native American populations . . . 51 4.10 iHS on TLR7 and TLR8 in SGDP Native American populations in YFV endemic areas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.11 iHS analysis on TLR7 and TLR8 in Brazilian Native American populations and SGDP Native American populations in YFV endemic areas. . . . . . . . . . . . . 52 4.12 iHS analysis on TLR7 and TLR8 in Brazilian Native American populations. . . . 52 4.13 PBS for Brazilian Native American data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.14 PBS analysis using SGDP Central Asia Siberians. . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.15 XP-EHH analysis using Brazilian Native Americans [115] and Central Asia Siberians [79]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.16 XP-EHH analysis using Brazilian Native Americans [115] and Africans [79]. . . 55 4.17 XP-EHH analysis using SGDP YFV populations and Central Asia Siberians [79]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.18 XP-EHH analysis using SGDP YFV populations and Africans [79]. . . . . . . . . . . 56 A.1 Illustration of branch bipartitions selected for SelectionLRT and FEL analyses. 64 A.2 ND1-ND2-CO1 maximum likelihood phylogenetic tree. . . . . . . . . . . . . . . . . . . . 65 B.1 Microsatellite results to determine uniqueness of fecal samples collected from A. guariba clamitans (G) and A. caraya (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 B.2 PCA analysis using variants from all Alouatta samples at all mtDNA genetic loci: ND1, ND2, CO1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 B.3 PCA analysis using variants from all Alouatta samples at all TLR7 and TLR8 genetic loci. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 vii LIST OF TABLES 2.1 Primate susceptibility to YFV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2 Substitution rates for tree partitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 Sites under positive selection in only in the howler monkey lineage with functional implications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1 Summary of samples used in analyses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 Samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3 Nucleotide diversity within Alouatta species at each gene. . . . . . . . . . . . . . . . . . 33 3.4 Variant sites in A. guariba clamitans samples with functional implications. . . . . 33 4.1 Fst at each SGDP [79] TLR7 and TLR8 polymorphic locus. . . . . . . . . . . . . . . . . . 57 4.2 SGDP population allele frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.3 SGDP [79] allele frequencies, quantile rank, and composite score. . . . . . . . . . . . 58 4.4 P-values for each SGDP population [79] at each TLR7 and TLR8 polymorphic locus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 A.1 Sample metadata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 A.2 Read depth for at each gene and exon of A. caraya sample shotgun sequenced. 67 A.3 Read depth for at each gene and exon of the A. caraya and A. guariba clamitans museum samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 A.4 GenBank accession numbers for TLR7 and TLR8 sequences used in phylogenetic analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 A.5 Tree parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 B.1 GPS location for each sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 B.2 Synonymous variant loci in A. guariba clamitans. . . . . . . . . . . . . . . . . . . . . . . . . . 75 B.3 A. caraya variants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 B.4 DnaSP results for A. caraya variant data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 B.5 DnaSP results for A. guariba clamitans variant data. . . . . . . . . . . . . . . . . . . . . . . . 77 C.1 SGDP population sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 ACKNOWLEDGEMENTS This dissertation would not have been possible without the support, scientifically and morally, of many people and organizations. I would like to thank the following in particular. Jacki Naughton, Lisa McKenna, and Christie Camel, my high school science teachers and mentors. You were the first to encourage me to participate in the science fair. I absolutely loved it and for four years you supported my crazy ideas, bringing me materials and spending countless hours building a cloud chamber with me after school. Erin Waxenbaum and M. Geoffrey Hayes, my mentors in college. You inspired me to pursue anthropological research and graduate school, particularly at the University of Utah. My committee, Dennis O’Rourke, Lynn B. Jorde, Timothy H. Webster, and Patrice Showers Corneli. For all of your critical feedback, edits, use of laboratory space, career guidance, and words of encouragement over the last five years. Leslie A. Knapp, for selecting me to work in the laboratory and for all of the training you have provided me. I could not have asked for a better Ph.D. advisor. You have encouraged me to ask hard questions and to work independently. You are an inspiration and I will try to channel your fierceness every day as I move forward in my career. All the members of the Primate Immunogenetics and Molecular Ecology lab and the ancient DNA lab past and present. Especially Justin Tackney, who always generously gave up his time to help me plan and learn the protocols for my project, for his continued friendship, and for making sure I am always critical of my life and my choices. Robert Weiss, Kael Fischer, and Mike Powers and Derek Warner of the DNA Sequencing Core for providing research support to help me learn, troubleshoot, and complete the tricky protocols and bioinformatics necessary to finish this dissertation. The University of Utah pilot grant and the Global Change and Sustainability Center for providing funding for this project. Hogle Zoo, and everyone in Balzapote, Mexico, and Misiones, Argentina who helped me obtain howler monkey samples at various points throughout this project. My graduate school cohort, particularly Kirstie Kocjancic and Katie Ward, and my SLC friends, particularly Sarah Bang, Ady Dunk, Savannah Manwill, Chris Eckels, and Andrew Paulson. Thank you for feeding me, buying me drinks, taking me hiking, and generally trying to make sure I did not work myself to death over the last five years. My mom, Maya Torosin who always told me that I could achieve great things and my dad, Aleksey Torosin, who provided me with the lab space and materials to complete my very first science project. Thank you for your love and support and for making sure I was never too broke for wine and cheese. And finally, my fiance, Ian Coley. I have almost no words to describe how much your friendship, love, and partnership have meant to me. We first bonded over our shared interest in obtaining a Ph.D. in our respective fields and your support over the last five years has been indispensable. If we made it through graduate school and being in a long-distance relationship for the last six years, we will make it through anything. I am so excited to spend the rest of my life with you, starting with our post-doc at Rutgers University! x CHAPTER 1 INTRODUCTION One version of the “The Red Queen Hypothesis” is based on Hamilton’s [56] idea that sex evolved to increase species diversity and reduce the risk of infection by pathogens in offspring. The evolution of pathogens drives the evolution of the host in a cyclic manner. This “arms race” idea has been behind many studies of genetic adaptation [113]. Researchers compare the selective landscape of immune genes, for example, across mammals [10], within the Primate order [96], and among human populations [94] to understand species-specific genetic changes in response to the evolution of their pathogens. To investigate environment-specific genetic adaptations to yellow fever virus (YFV), this study compares the selective patterns and genetic polymorphisms of humans and nonhuman primates that have been affected by YFV in Northern Argentina. 1.1 History and significance of YFV Yellow fever virus (YFV) – a single-stranded RNA virus (ssRNA) – is a member of the genus Flavivirus [140]. It originated in Africa, evolving from a Flavivirus ancestor about 3000 years ago [140]. It circulates in tropical and subtropical Africa and South America, particularly in rain forest, forest-savannah, and savannah ecosystems. YFVcarrying mosquito vectors cannot survive above 2300 meters above sea level and thus YFV is absent at high altitudes [86]. YFV, along with the Aedes aegypti mosquito vector, is hypothesized to have been transported from Africa to the coasts of South America during the trans-Atlantic slave trade in the 1600s [27]. Initially an urban transmission cycle amongst humans was established, but, since nonhuman primates can act as hosts, a sylvatic transmission cycle, or a transmission cycle amongst wild animals, quickly established itself as well [27]. YFV is highly virulent compared to other flaviviruses and therefore relies on the high density of human and nonhuman primate populations to sustain transmission [57]. 2 There are seven lineages of YFV [17]. Two are found in South America and they only differ from each other at about 5% of nucleotide sites [17]. All strains of YFV are a single antigenic type, or carry the same structural pattern that stimulates the host’s immune system [17]. The little genetic diversity of YFV is likely due to the fact that this virus only infects primate hosts [57]. There is also little evidence of selection on the virus from host immune responses [17]. The combined slow rate of evolution and maintenance of a single antigenic type made it possible to develop a YFV vaccine that is still highly effective today [17]. After infection, the presentation of YFV symptoms in humans is variable. Some individuals show no outward symptoms of the disease [15]. Those that proceed into the first stage of YFV present fever, nausea, vomiting, and the onset of jaundice after two to three days [15]. In the second stage, the host exhibits a reduction in symptoms for up to 48 hours [15]. Many make a complete recovery after the second stage while about 15% of affected people will enter the third stage [15]. During stage three the symptoms of stage one return accompanied by bleeding and organ failure [15]. Between 30-60% of people who enter the third stage die [137]. In the late 1930s, an extremely successful YFV vaccine was developed [32]. Some countries in Africa and South America have childhood vaccination programs but vaccination is not sufficiently widespread and routine [16]. In 2016 there was a massive outbreak of YFV in Angola, a country that had not seen an outbreak since 1971 [32]. That same year there were simultaneous YFV outbreaks in Uganda and the DRC [32]. Up to 170,000 people are estimated to still be affected by YFV each year with up to 60,000 deaths annually [137]. Increases in rainfall and average temperature has been shown to exacerbate the spread of YFV, since mosquitos reach their highest densities during the rainy season [86]. Warming climates and environmental changes are expanding risk areas to places where vaccination is not routine [100]. For example, recent, ongoing outbreaks in Brazil are exacerbated by the fact that YFV is circulating in areas where vaccination was previously not recommended [100]. Since 2016, YFV outbreaks in humans have been reported in China, Kenya, the Democratic Republic of the Congo, Uganda, French Guiana, Nigeria, and Brazil [137]. These outbreaks indicate that YFV is a reemerging disease [16] and further research is necessary to understand genetic factors affecting susceptibility and ecological factors 3 affecting transmission as outbreaks become larger and more frequent. 1.2 Primate susceptibility to YFV Primates are the only vertebrate hosts in the sylvatic transmission cycle of YFV. Old World primates have short YFV infections and do not show overt signs of illness [57]. Sources report conflicting accounts of susceptibility to YFV in New World primates. Ruch [106] reports that infection in the genus Ateles (spider monkeys) is almost always fatal while the Brazilian Guide of Epizootics of YFV in nonhumans primates claims that they are one of the more resistant genera along with Saimiri spp. (squirrel monkeys), Cebus spp. (capuchin monkeys) (supported by [70, 102], and Callicebus spp. (titi monkeys). Aotus spp. (owl monkeys), Callithrix spp. (marmosets) (supported by [102] and Saguinus spp. (tamarins) are documented as more sensitive to YFV by Brazilian Guide of Epizootics of YFV [84] . Rosen’s [102] experiments resulted in one of four inoculated spider monkeys to develop YFV symptoms, suggesting they are moderately resistant, while zero of three inoculated marmosets developed YFV suggesting they are highly resistant. There are also some discrepancies about the susceptibility to YFV in the genus Alouatta (howler monkeys). Ruch [106] reported that infection of Alouatta by YFV is almost always fatal. Recent outbreaks of YFV in Argentina [60] and Brazil [22, 87] have resulted in extremely high rates of mortality in howler monkeys, supporting Ruch’s laboratory research. However, Kumm and Laemmert [70] tested wild primates in Brazil in 1949 and found that 23 out of 28 howlers were immune, showing that howlers can survive YFV. They hypothesized that a “less lethal” strain passed through these groups of howler monkeys (A. cararya, A. guariba, and A. belzebul). The resurgence of YFV in human populations is a global problem [48, 125]. Warming climates and changes in rainfall patterns are likely helping expand the distribution of virus-carrying mosquitoes [16, 100]. In Argentina, invasive human activities such as deforestation for highway construction and agriculture is disturbing howler monkey habitats [60, 124]. Habitat fragmentation results in group isolation and loss of genetic diversity in already small populations. These activities are bringing the urban transmission cycle closer to the howler monkeys and increasing their exposure to YFV [3, 40, 60]. The Aedes aegypti mosquito will bite a howler monkey if given the opportunity, passing the virus 4 into the sylvatic cycle [57]. Reduced genetic diversity and more frequent exposure to YFV may have led to the unprecedented number of howler monkey deaths in 2007-2009 [3, 60]. Fifty-nine howler monkeys (A. caraya N=39 and A. guariba clamitans N=20) were found dead in the province of Misiones in northeastern Argentina [60]. Deceased monkeys were found in five of the six explored regions of the Atlantic forest throughout Misiones [60]. The presence of YFV was confirmed in all five regions. Researchers did not find not find individuals of other endemic primate species deceased from YFV during the searches for howler monkeys [60]. 1.3 Toll-like receptors To study genetic factors that may contribute to YFV susceptibility, I focus on TLR7 and TLR8. Toll-like receptors (TLRs) are a set of ten proteins of the innate immune system encoded by evolutionarily conserved genes [63]. TLRs initiate the first response of the host to invading pathogens, making them an ideal choice to study how selective pressures act on the immune system [68, 96]. TLR7 and TLR8 are the two TLRs responsible for detecting ssRNA viruses, such as YFV, and polymorphisms have been implicated in progression of other diseases, making these two genes strong candidates for the underlying genetic variation that affects susceptibility to YFV [33, 63, 80]. Both TLR7 and TLR8 are intracellular proteins, and each has a solenoid-shaped extra-compartmental domain called the leucine-rich region (LRR) [83]. The LRR is responsible for recognizing pathogens based on their Pathogen Associated Molecular Patterns, or PAMPs, and therefore displays species-specific divergence [68, 134]. The intra-compartmental (TIR) domain signals the immune system after a pathogen is detected and is more highly conserved between species [134]. Once the PAMP is recognized, TLR7 and TLR8 homodimerize and start a signaling cascade that ends in transcription of type one interferon which prompts natural killer cells to destroy virus-infected cells (Figure 1.1) [62, 63]. In order to activate adaptive immune mechanisms, the signaling cascade also induces transcription of cytokines that stimulate maturation of naı̈ve T-cells. The virus is transported by the endosome and then presented to the naı̈ve T-cells so that they can recognize it and fight it in the future (Figure 1.2) [62, 63]. 5 1.4 The comparative approach and this dissertation It has been shown that certain genetic loci are prone to positive selection and that positive selection can act upon the same genes in distinct species [46]. Positive selection leads to the increase in frequency of advantageous genetic variants in a population. It is valuable to to explore whether selection has acted on the same genes across species. For example, the TRIM5-CypA fusion protein complex evolved independently in New and Old World primates [61] to block HIV-1 and other highly pathogenic lentiviruses [61]. Parallel selection is observed in TLR8 evolution where divergent taxa have positive selection on the same codon sites, caused by species-specific pathogens [10, 109, 136]. Environment-specific adaptations are common [23, 141], however, previous studies do not focus on adaptation across species within the same environment and subject to the same pathogen pressure. Studying genes and loci subject to selection in multiple species and exposed to the same pathogen within the same environment can help identify codons that are most important in adaptation within a particular environment to a particular pathogen. In this dissertation, I compare genetic data from three species, Alouatta caraya, Alouatta guariba clamitans, and Homo sapiens, living within the same geographic region and sharing the pathogenic pressure of YFV, to draw conclusions about variants within two innate immune genes, TLR7 and TLR8, that may play a role in YFV susceptibility. To complete my study, I collected samples from Alouatta individuals that were alive before the YFV outbreak in Northern Argentina in 2007-2009. I also collected samples from deceased individuals that were exposed to the outbreak, and individuals alive after the outbreak. In Chapter 2, I sequence one A. caraya and one A. guariba clamitans sample to evaluate the selective landscape of TLR7 and TLR8 across the Primate order. The goal is to compare patterns of selection across the primates of variable susceptibility to YFV to investigate what may be contributing to the extreme susceptibility of Alouatta. In Chapter 3, I sequence all the Alouatta samples I collected to measure genetic divergence between the YFV exposure groups within each species. I evaluate genetic differences between the exposure groups for functional consequences. Finally, in Chapter 4, I use published DNA sequences from Native Americans to measure positive selection and differentiation of sites within TLR7 and TLR8 in populations in YFV endemic areas. The goal is to explore whether positively selected or differentiated sites in humans in YFV endemic areas may 6 have functional importance and lead to the alleged reduced susceptibility of YFV in these populations. Finally, I check for overlap in sites under selection or diverged in Alouatta TLR7 and TLR8 to those in humans. Sites under parallel selection may be indicative of environment-specific adaptation which may have been a response to past pathogens and now confer resistance to YFV in endemic populations of multiple species. Through this study I hope to discover genetic variants associated with susceptibility to YFV and potentially other ssRNA viruses in primates. 7 Figure 1.1. TLR7 and TLR8 innate immunity signaling pathway. Figure 1.2. TLR7 and TLR8 adaptive immunity signaling pathway. CHAPTER 2 COMPARING THE SELECTIVE LANDSCAPE OF TLR7 AND TLR8 ACROSS PRIMATES Host-pathogen interactions are major drivers of immune gene evolution [113]. Viruses, in particular, have caused close to 30% of all adaptive amino acid changes in mammals [45]. Genetic comparisons between species can help identify substitutions differentially affecting disease susceptibility or resistance [10, 109, 136]. Among primates, susceptibility to yellow fever virus (YFV) ranges from complete resistance to high susceptibility (Table 2.1). Howler monkeys (genus Alouatta) are the most susceptible to YFV and usually die within a week of infection [22]. Due to deforestation in the last 50 years, howler monkey populations in Northern Argentina have been exposed to Aedes aegypti mosquitoes, the vectors of YFV in human populations [27, 40]. As a result, in 2007-2009 a major YFV outbreak devastated the howler monkey species Alouatta caraya (black and gold howler) and Alouatta guariba clamitans (brown howler) in Northern Argentina [60]. Other primate populations were unaffected [40, 60]. Currently, there is no explanation for Alouatta’s high level of susceptibility to YFV. In general, to understand species-specific genetic changes in response to pathogens, researchers compare the selective landscape of immune genes across the Primate order [96, 136]. TLRs such as TLR7 and TLR8 are evolutionarily-constrained innate immune genes specifically responsible for detecting single-stranded (ss) RNA viruses, such as West Nile virus, dengue virus, and YFV [57, 63, 68]. When these two genes detect ssRNAs, they initiate a signaling cascade that prompts the immune system to destroy infected cells and activate adaptive immunity [68]. TLR7 and TLR8 are under strong purifying selection in humans and great apes [96]. Amino acid changes that do occur are likely from selection pressure by environmental or species-specific pathogens [136]. Polymorphisms within TLR7 and TLR8 have been implicated in progression of various human diseases [77]. 9 Therefore TLR7 and TLR8 are good candidates to study variable susceptibility to ssRNA viral pathogens. Previous studies have excluded Alouatta and other New World primates when evaluating evolutionary patterns of endosomal TLRs. To address the question of Alouatta’s susceptibility to YFV, in this study we collected skin samples from one taxidermied howler monkey of each species, A. caraya and A. guariba clamitans, known to be alive in Northern Argentina before the 2007-2009 YFV outbreak [60] and compare the selective landscape of newly sequenced TLR7 and TLR8 from A. caraya and A. guariba clamitans to TLR7 and TLR8 from published New World and Old World primates less susceptible to YFV (Table 2.1). Our comparison of TLR7 and TLR8 across primate species will help identify amino acid substitutions and selection patterns that may contribute to the high susceptibility of Alouatta to YFV. 2.1 2.1.1 Materials and methods Sample collection and processing We collected one Alouatta caraya liver sample from an individual that died during the 2007-2009 YFV outbreak in Misiones, Argentina (Appendix A). To confirm the cause of death, we tested the sample for YFV antibodies upon collection. We stored the tissue at −70o C in 100% ethanol. We also collected 1 − 1.5 cm skin samples from one taxidermied A. caraya and one A. guariba clamitans museum specimen at the Museo Argentina de Ciencias Naturales Bernardino Rivadavia. We stored each sample in a sterile 1.5 ml Eppendorf tube at ambient temperature until DNA extraction. We transported the tissue and taxidermied skin samples to the Universidad de Misiones at Posadas, Misiones, Argentina (transport permit 005436 issued by the “Jefatura de Gabinete de Ministros Secretaria de Ambiente y Desarrollo Sustentable”). At the Universidad, we extracted DNA using a Qiagen Fast DNA tissue kit (Qiagen, GmbH, Germany). We transported extracted DNA and feces back to Buenos Aires (transport permit 001494 issued by the “Presidencia de la Nación Secretarı́a de Ambiente y Desarrolo Sustentable”). We exported extracted DNA and feces from Argentina using CITES export permit 042117 and CDC import permit number 2017-03-014. To obtain DNA sequences, the University of Utah Huntsman Cancer Center Sequenc- 10 ing Core prepared and shotgun sequenced the A. caraya liver tissue sample DNA on an Illumina HiSeq. The coding regions of TLR7 and TLR8 are highly conserved among primate species, so we used BWA software [75] to align the reads from the three exons of TLR7 and two exons of TLR8 to a human reference sequence and an unpublished A. palliata reference sequence (T. Disotell laboratory) to confirm alignment. We aligned mtDNA genes ND1, ND2, and CO1 to the A. palliata reference sequence. We used SPAdes software to create an A. caraya TLR7 and TLR8 consensus sequence [14] (Appendix A). Using the newly generated A. caraya TLR7 and TLR8 reference sequences, we created a custom Ampliseq library IAD149391-168 (LifeTechnologies, Austin, TX). The University of Utah Sequencing Core completed targeted sequencing of the A. caraya and A. guariba clamitans skin samples using an Ion PGM and standard library kit protocols. To process the sequencing reads, we first trimmed the ends of the targeted sequencing data to remove erroneous sequencing and filtered fastq reads to maq > 20 using BBDuk http://jgi.doe.gov/dataand-tools/bb-tools/. Second, we aligned trimmed fastq reads for each sample to the A. caraya TLR7 and TLR8 reference sequences using BWA software [75]. We used Freebayes software [53] to summarize genetic variants from each samples. We used vcftools to remove indels and filter all sites with GQ < 30 [39]. We used bcftools software to create a consensus sequence for each taxidermied skin sample [76] (Appendix A). Since we were working with museum tissue samples, we measured DNA damage. DNA damage results in deamination of cytosine and guanine resulting in and excess of tyrosine and adenine substitutions, especially at the ends of sequencing reads [26, 28]. Sequencing ends did not show depletion of GC content indicating that DNA in the museum samples was not damaged. 2.1.2 Published data and sequence alignment We obtained coding regions of TLR7 and TLR8 for New World and Old World primates from GenBank and dnazoo.org [42] (Appendix A). We aligned DNA sequences using MEGA7 [69] and confirmed protein translation of TLR7 and TLR8 using protein reference NP 057646.1 and NP 619542.1, respectively. We aligned and edited TLR7 and TLR8 protein translations for all species using Aliview [72]. 11 2.1.3 Phylogenetic analyses Optimal models for each gene were chosen on the basis of the Decision-Theoretic (DT) method [119] as implemented in PAUP 4.0 [120] for model selection and estimating parameters (Appendix A). We optimized the trees using these models in the Seaview implementation of PhyML [55]. Approximate likelihood ratio tests (aLRT) describe clades support levels across the tree [9]. 2.1.4 Maximum likelihood selection analysis To test the hypothesis that the Alouatta lineage evolved under a different selection regime from other primates, we used the the positive selection option in HyPhy to “test whether a branch (or branches) in the tree evolve under different dN/dS than the rest of the tree”(HyPhy software [67]). This analysis assumes a Muse Gaut codon model and is designed to test whether the ratio of nonsynonymous to synonymous (dN/dS) substitutions is equal across the phylogenetic tree or is best described by multiple rates [51, 88]. We compared selection coefficients of three previously defined bipartitions: 1) the branch leading to newly sequenced species A. guariba clamitans and A. caraya only; 2) the branch leading to the whole Alouatta genus; and 3) the branch leading to all other New World Primates (Appendix A). A Fixed Effect Likelihood (FEL) test (HyPhy software [67]) identified specific codons under selection in the Alouatta lineage. The null model assumes that the selection pressure for each site is constant along the entire alignment. The alternative model allows the dN/dS rate to vary site by site along the alignment. Results indicate whether certain sites in a selected lineage have been subject to positive or purifying selection. This analysis is designed for small sample sets and minimizes false positives [66]. We also completed FEL analysis for all other New World Primates and Old World primates to see whether sites identified as under selection in Alouatta are unique to the lineage. 2.1.5 Functional assessment We assessed codon sites that showed evidence of positive selection for functional purpose. The TLR7 ligand binding region is within codons 500-589 in protein reference NP 057646.1 and within codons 481-577 in protein reference NP 619542.1 for TLR8 [133]. Leucinerich repeats (LRR), non-LRR insertions, and the TIR domain for TLR7 and TLR8 are out- 12 lined in Bell et al. [19]. We examined codon sites in LRR, non-LRR insertions, and TIR domains [19] to determine if there was evidence for positive selection within any of the binding, LRR, insertion, or TIR regions. 2.2 Results 2.2.1 Sequences We generated two A. caraya sequences and one A. guariba clamitans sequence for nuclear genes TLR7, exons 1 (221 bp), 2 (196 bp), and 3 (5635 bp); TLR8, exons 1 (201 bp) and 2 (4217 bp); and mtDNA genes ND1 (1075 bp), ND2 (1145 bp), and CO1 (1742 bp). Average sequencing coverage available in Appendix A. 2.2.2 Phylogenetic trees Maximum likelihood phylogenetic trees (Figure 2.1, Figure 2.2) showed Alouatta species clustering separately from other New World primates within the New World primate clade. The separate clustering of Alouatta is supported by other studies using nuclear genes [24, 58, 118] and mtDNA (Appendix A). Old World primates clustered as anticipated [93]. In both nuclear DNA trees, the A. guariba clamitans branch length is longer than the other two Alouatta species, but it is longest in the TLR7 phylogenetic tree. 2.2.3 Substitution rates We used a maximum likelihood approach to determine whether Alouatta branches evolve under a different dN/dS ratio compared to the rest of the primate tree. For this test, significance is reached at 0.10, because the standard 0.05 cutoff is too conservative and eliminates too many observations that deviate from the null hypothesis [66]. Our null hypothesis is that dN/dS is the same across the entire phylogenetic tree [51]. We compared the following tree partitions; branches leading to the A. caraya and A. guariba clamitans clade, the branch leading to the whole Alouatta clade, the branch leading to the other New World primates, as well as the branch leading to the Old World primates (see Appendix A for illustration of branches tested). Results of the TLR7 branch-specific rate analyses (Table 2.2) suggest that the Alouatta clade is under significantly stronger purifying selection than the rest of the primates phylogenetic tree (p = 0.057). TLR7 in the Old World Primate lineage evolves under signif- 13 icantly less stringent purifying selection (p = 2.0e − 05) than the New World primates. TLR7 branch-specific rate analysis using only A. palliata also found that Alouatta is under significantly lower dN/dS (p = 0.046). This indicates that the first TLR7 result is not due to usage of a three Alouatta species compared to one species for each other genus. The TLR8 branch-specific rate analyses show no evidence that TLR8 dN/dS varies across the phylogenetic tree. 2.2.4 Branch site test and functional analysis To identify specific codons subject to selection in the Alouatta lineage, we used a FEL test. Our null hypothesis is that the selection pressure for each site is constant along the entire alignment. The alternative model allows the dN/dS rate to vary site by site along the alignment. For this test, significance is reached at 0.10 [66]. Table 2.3 lists the codons evolving under a significantly different dN/dS rate solely in the Alouatta clade, suggesting that these sites may have been subject to different selection pressure than others primates. FEL identified two sites under positive selection in TLR7 and six sites subject to positive selection in TLR8. Within TLR7, sites with varying dN/dS rates are due to a nonsynonymous variant in A. guariba clamitans (Table 2.3). The nonsynonymous change at TLR7 codon 538 is within the pathogen binding region and causes an amino acid change to serine only in A. guariba clamitans. Every other primate in the alignment has the amino acid proline at TLR7 codon 538. Proline is nonpolar while serine is polar. The second positively selected site is in the TLR7 LRR. At TLR7 codon 721 a nonsynonymous change causes the amino acid threonine in A. guariba clamitans while every other primate in the alignment has the amino acid arginine at this site. Threonine and arginine are both polar, but threonine is neutral and arginine is positively charged. FEL analysis identified six variants under selection in functionally relevant regions of TLR8. Five of these variants are found in all three of the Alouatta species used in the study, one variant is unique to A. palliata. Five of the variants are within the LRR region of TLR8. The sixth variant is within the TIR domain. 14 2.3 Discussion Previous studies evaluating evolutionary patterns of endosomal TLRs to identify speciesspecific differences have omitted the genus Alouatta in analyses [96, 136]. Our analyses of TLR7 and TLR8 DNA sequences in primates revealed strong purifying selection on TLR7 in the Alouatta clade and a number of unique sites under positive selection in both genes. This study provides the first insight into genetic factors that may contribute to Alouatta susceptibility to YFV. We first tested TLR7 and TLR8 sequences for disparity in branch-specific selection in Alouatta compared to other New World and Old World primates with variable susceptibility to YFV. Results showed that both TLR7 and TLR8 are under strong purifying selection throughout the Primate order, but TLR7 in the Alouatta clade is under very stringent purifying selection. While it is known that TLR7 and TLR8 are under strong purifying selection due to their essential role in virus recognition [10, 98, 136] our data showed that Alouatta TLR7 is subject to even stronger purifying selection than in other New World and Old World primate lineages. This strong purifying selection on TLR7 may have reduced standing variation in the gene. Since YFV originated in Africa and was introduced to the Americas by the slave trade approximately 400 years ago [27], Alouatta immune genes evolved under selection pressures from a different array of ssRNA viruses. If the stringent purifying selection on TLR7 in the Alouatta lineage was necessary to maintain resistance to another virus, there would be no variants available for selection to act upon to improve resistance to YFV, resulting in extreme susceptibility. Second, we tested Alouatta TLR7 and TLR8 sequences for changes in site-specific selection across the coding region. Most notably, despite strong purifying selection on the gene, TLR7 codon 528 in A. guariba clamitans is under positive selection. This site is in the pathogen binding region, meaning the amino acid directly interacts with ssRNA viruses such as YFV during infection [133]. Amino acid differences within this region usually result from species-specific selection pressure [96, 136]. The second positively-selected codon in TLR7 (codon 721) is due to an amino acid difference in A. guariba clamitans within the extra-compartmental LRR region. While the codon is not in the direct binding region, the LRR also plays a role in pathogen sensing and stabilization during the recognition process [133]. Amino acid variations within TLR genes can alter the protein structure or 15 function and have been noted to change progression of a range of diseases in humans [114]. Since both amino acid changes within TLR7 result in biochemical property changes at protein regions that interact with the pathogen, they are likely to affect binding or stabilization of the pathogen potentially affecting YFV progression [114]. Both amino acid differences in TLR7 are unique to A. guariba clamitans. The TLR7 phylogenetic tree reflects the FEL finding of unique nonsynonymous changes in A. guariba clamitans, as the branch length is much longer than the other two Alouatta species, indicating greater evolution. Further exploring TLR7 variants in A. guariba clamitans individuals will be particularly interesting since the both positively selected variants identified in TLR7 are unique to the species. These amino acid changes may be a cause of intragenus differences in YFV susceptibility previously unknown. FEL identified six positively selected sites in TLR8. This is consistent with results of the branch-specific selection analysis, which found that TLR8 has been under less stringent purifying selection than TLR7 in Alouatta. Weaker purifying selection allows more variation to arise in a gene, as supported by the FEL analysis which found six TLR8 variants in Alouatta but only two in TLR7. Five of the positively-selected codons in TLR8 are within the LRR region and four substitutions result in a biochemical change. Unlike the positively selected sites in TLR7, five positively selected sites in TLR8 are due to an amino acid difference in all three Alouatta species. The LRR region substitutions that cause a biochemical change could result in altered YFV sensing and immune system signaling in the Alouatta genus, affecting susceptibility. These five amino acid differences in TLR8 are found in all members of the Alouatta genus so it is likely that they arose earlier in Alouatta evolution before radiation and speciation across Central and South America. The sixth site, TLR7 codon 844, shows a nonsynonymous variant only in A. palliata. This site is within the TIR domain of the TLR. The TIR domain is the inter-compartmental region of a TLR and is responsible for initiating the signaling cascade to activate the innate and adaptive immune systems upon infection [83]. Species-specific differences in the TIR region are much rarer [10, 83]. While A. palliata is not the focus of this study, the finding of a positively selected variant in the more highly conserved region of TLR8 warrants further investigation. Typically, a complex combination of environmental, pathogen and host genetic factors play a role in determining susceptibility to infection. Our results suggest that a number of 16 genetic factors could impact YFV susceptibility in Alouatta. If polymorphisms in TLR genes play a critical role in YFV susceptibility in Alouatta, it is essential to collect additional samples from living and deceased A. caraya and A. guariba clamitans in YFV endemic ranges. This will help clarify whether amino acid differences seen in this study are associated YFV susceptibility. 17 Figure 2.1. TLR7 maximum likelihood phylogenetic tree. TLR7 maximum likelihood phylogenetic tree generated from A. caraya and A. guariba clamitans TLR7 sequences (this study), New World and Old World primate TLR7 sequences. Each sequence totaled 3141 base pairs equaling 1039 codons. HKY model used, chosen by Decision Theoretic method. Log likelihood of the tree = -6740.7 and aLRT support values on branches. 18 Figure 2.2. TLR8 maximum likelihood phylogenetic tree. TLR8 maximum likelihood phylogenetic tree generated from A. caraya and A. guariba clamitans TLR8 sequences (this study), New World and Old World primate TLR8 sequences. Each sequence totaled 3117 base pairs equaling 1047 codons. TN93 model used, chosen by Decision Theoretic method. Log likelihood of the tree = -7145.9 and aLRT support values on branches. 19 Table 2.1. Primate susceptibility to YFV. Listed by increasing susceptibility [22]. Family Genus Susceptibility Catarrhines All Resistant Platyrrhines Ateles Cebus Saimiri Saguinus Moderate Callithrix Aotus Hominidae Alouatta Extreme Homo Moderate Table 2.2. Substitution rates for tree partitions. Value listed followed by 95% confidence interval. Significance is achieved if p-value < 0.10 considered, indicated by *. The branches selected for each analysis are depicted in Appendix A. Alignment dN/dS A. caraya and A. guariba clamitans p-value dN/dS Alouatta clade p-value dN/dS all other New World primates p-value dN/dS Old World primates p-value TLR7 Coding Region 0, 0-0.555 0.271 0.079, 0.018-0.210 0.057* 0.083, 0.002-0.375 0.3 0.577, 0.423-0.765 2.0e-05* TLR8 Coding Region 0.831, 0.136-2.57 0.48 0.412, 0.207-0.740 0.778 0.426, 0.110-1.038 0.88 0.487, 0.368-0.630 0.168 20 Table 2.3. Sites under positive selection in only in the howler monkey lineage with functional implications. Sites identified by FEL analysis. Binding sites determined by [133]. LRR regions determined by [19]. Abbreviations: AGC = A. guariba clamitans; AC = A. caraya; AP = A. palliata; AA = amino acid *As indicated in “Alouatta Species” column Gene Codon Position Biochemical AA in p-value Alouatta Species Alouatta AA other primates species* property Functional difference Region (Y/N) TLR7 538 0.048 AGC SER PRO Y Binding TLR7 721 0.097 AGC THR ARG Y LRR TLR8 800 0.042 AGC, AC, AP ALA VAL N LRR TLR8 248 0.055 AGC, AC, AP ILE Y LRR TLR8 284 0.063 AGC, AC, AP HIS ASN Y LRR TLR8 844 0.065 AP VAL ALA N TIR THR Y CYS/SER Y LRR Y LRR TLR8 211 0.071 AGC, AC, AP TYR TLR8 744 0.097 AGC, AC, AP HIS ILA Siamiri, THR all others HIS Callithrix, TYR/SER CHAPTER 3 INDENTIFYING FUNCTIONALLY IMPORTANT VARIANTS IN TLR7 AND TLR8 IN ALOUATTA CARAYA AND A. GUARIBA CLAMITANS IN NORTHERN ARGENTINA Yellow fever virus (YFV) is a single-stranded (ss) RNA virus causing hemorrhagic disease endemic to Africa and South America [17]. It was introduced to the Americas on slave trade ships from Africa about 400 years ago [27]. Soon after, the virus established a sylvatic cycle, or transmission, involving nonhuman primates [57]. Today, the virus continues to circulate among humans and nonhuman primates in Northern Argentina [54, 60] (Figure 3.1). New World primates are generally more susceptible to YFV than Old World primates due to relatively recent exposure [57]. New World primates have not experienced host-pathogen interaction with YFV for enough time to potentially evolve resistance [57]. The members of the genus Alouatta (howler monkeys) are particularly susceptible to this flavivirus and usually die within a week of infection [22, 84]. When human and nonhuman primate YFV transmission cycles come into contact, the risk of a sylvatic outbreak, and howler monkey deaths from YFV, increases. In 1966, a sylvatic YFV outbreak occurred in Argentina [18]. Researchers found three dead howler monkeys and reported several human cases [37, 60]. The penultimate outbreak was in 2001 in Brazil near the Argentine border and researchers found eighty deceased brown howler monkeys [60, 108]. No other sylvatic outbreaks had been recorded since 1954 [126]. In 2007-2009, Alouatta caraya (black and gold howler) and A. guariba clamitans (brown howler) in Northern Argentina and Brazil suffered mass population reductions due to a YFV outbreak [22, 60, 87]. The mass deaths during this outbreak were likely due to the deforestation of the Atlantic forest over the last 50 years [40, 60]. Deforestation resulted in 22 the overlap of the urban and jungle transmission cycles, increasing the likelihood that the vector, Aedes aegypti, will transmit the virus to howler monkeys (Figure 3.1) [57]. The 2007-2009 outbreak provided a unique opportunity to study genetic differences between Alouatta individuals exposed and and not exposed to YFV. We generated A. guariba clamitans and A. caraya immune gene sequences to compare genetic variants between individuals alive before the YFV outbreak, those affected by the outbreak, and individuals that survived the outbreak. We hypothesized that the surviving howler monkeys may have different allele frequencies from the monkeys alive prior to the most recent YFV outbreak and those that died of YFV in 2008. Alleles at greater frequency in Alouatta individuals alive after the 2007-2009 YFV outbreak may have been key in the survival of this extremely susceptible species. Previous work found positively selected genetic variants within Toll-like receptor (TLR) 7 and TLR8 in A. guariba clamitans and A. caraya that are unique to one or both species, compared to other Old and New World primates (Chapter 2). For this study, we target TLR7 and TLR8 as well. Polymorphisms in TLR7 and TLR8 have been implicated in progression of other diseases, making these two genes strong candidates to study the underlying genetic variation that affects susceptibility to YFV [33, 63, 80]. 3.1 3.1.1 Methods Sample background This study focuses on El Parque Provincial El Piñalito in Misiones, the northernmost province of Argentina. In the park, A. guariba clamitans (brown howler) and A. caraya (black and gold howler) live sympatrically [5]. After the outbreak in 2007-2009, Holzmann and colleagues [60] found a total of 39 A. caraya and 20 A. guariba clamitans deceased in Misiones province. In El Piñalito, they found seven of each Alouatta species dead. RT-PCR for Flavivirus confirmed YFV in the area [60]. 3.1.2 Collection In 2017 we collected samples from taxidermied Alouatta living in Misiones and the neighboring province, Corrientes, between 1949-1967. These individuals may have been exposed to YFV in the 1954 [126] or the 1966 outbreak [18], but since these outbreaks affected few monkeys and deforestation was not yet leading to mass exposure of human 23 populations to Alouatta, the chance is slight. To collect taxidermied skin samples, we removed 1-1.5 cm of skin from three individuals of A. caraya and one individual of A. guariba clamitans. We stored each sample in a sterile 1.5 ml Eppendorf tube at ambient temperature until DNA extraction. We collected liver samples from deceased YFV positive and negative A. caraya throughout Misiones in 2007-2009 (Figure 3.2). We stored AC tissue 1 and AC tissues 3-6 at −70o C in 100% ethanol and (AC tissue 2) in 100% ethanol. We returned to El Piñalito Provincial Park, Misiones, Argentina in 2018 to determine whether any Alouatta were living in this region and to collect fecal samples. We collected fecal samples under contract with the Ministerio de Ecologı́a y Recursos Naturales Renovables de Misiones Provincia, Provision number 028, file number 9910-00054/17. We placed recovered fecal pieces in sterile 15 ml collection tubes with RNAlater. We extracted DNA from fecal samples in Argentina. We used a Zymo extraction kit (Irvine, CA) with a modified protocol to target host DNA from fecal samples. We transported feces and extracted DNA to Buenos Aires under transport permit number 001494 issued by the “Presidencia de la Nación Secretariı́a de Ambiente y Desarrollo Sustentable.” We exported feces and extracted DNA from Argentina under CITES export permit 042117 and CDC import permit number 2017-03-014. We also extracted DNA from fecal at the Molecular Genetics Laboratory at the University of Utah using the same Zymo extraction kit (Irvine, CA) with a modified protocol. 3.1.3 Sample verification We genotyped all samples using published microsatellite protocols (Api06, Apm 01, and Apm04) [36] to ensure that each was collected from a unique individual. We separated PCR products on an 8% acrylamide gel (Appendix B) to determine microsatellite allele sizes. To determine the sex of the museum and fecal samples, we amplified the SRY region using published protocols [35, 49]. We used Sanger sequencing (University of Utah Sequencing Core) on the SRY products. Male samples had successful SRY sequence results with no ambiguities in the chromatograms. Due to the old age of the museum tissue samples, we measured DNA damage. DNA 24 damage results in deamination of cytosine and guanine resulting in and excess of tyrosine and adenine substitutions, especially at the ends of sequencing reads [26, 28]. We compared overall GC content of the museum samples to tissue and fecal samples and evaluated GC content at the ends of reads. We also removed C >T or G > A substitutions that were unique to museum samples from analysis in case they were artifactual. 3.1.4 Shotgun sequencing and data processing We shotgun sequenced the tissue sample with the highest concentration of DNA (AC tissue 1) to create a reference for TLR7 (exons 1, 2, and 3), TLR8 (exons 1 and 2), ND1, ND2, and CO1. The University of Utah Huntsman Cancer Center Sequencing Core prepared and shotgun sequenced the samples on an Illumina HiSeq. We used BWA software [75] to align Alouatta reads to the human reference genome (hg19) [1] and an unpublished A. palliata reference genome provided by the T. Disotell laboratory at New York University. Aligned BAM files were filtered to include only reads of MAPQ quality > 60 [75]. We used SPAdes software to create a consensus sequence for Alouatta TLR7 and TLR8 from the alignment [14] (Appendix B). 3.1.5 Targeted sequencing Using consensus TLR7, TLR8, ND1, ND2, and CO1 sequences we created a custom Ampliseq library IAD149391-168 (LifeTechnologies, Austin, TX) to perform targeted sequencing on the remainder of our samples. The University of Utah Sequencing Core completed targeted sequencing of the A. caraya and A. guariba clamitans samples using an Ion PGM and standard library kit protocols. To process the sequencing reads, we first trimmed the ends of the targeted sequencing data to remove erroneous sequencing and filtered fastq reads to maq > 20 using BBDuk http://jgi.doe.gov/data-and-tools/bb-tools/. Second, we aligned trimmed fastq reads for each sample to the A. caraya TLR7, TLR8, ND1, ND2, and CO1 reference sequences using BWA software [75]. We used Freebayes software [53] to summarize genetic variants from each species separately then merged the VCF files for analysis. We chose FreeBayes because it is designed to capture variant data in small sample sets. We used vcftools to remove indels and filter all sites with GQ < 30 [39]. We omitted samples with greater than 40% data missingness for TLR genes. After removing samples with too much missing data we removed sites where all samples were monomorphic or all 25 data was missing. Finally, we excluded variable sites when more than six of the thirteen final (Table 3.1, Table 3.2) had data missing. TLR7 and TLR8 are on the X chromosome [110]. However, Alouatta have multiple sex chromosomes. A. caraya females have four X chromosomes and males have two X and two Y chromosomes [117]. A. guariba clamitans females have six X chromosomes while males have three X chromosomes and two Y chromosomes [117]. Therefore, we did not correct male genotypes to be haploid as with X chromosome analysis of other primates. FreeBayes assumes that the genome is diploid when calling variants [53]. As a result, we acknowledge that multiple sex chromosomes may affect calling of lower frequency variants, but since TLR7 and TLR8 are highly conserved we do not expect that low frequency variants are present on one chromosome. 3.1.6 Analysis We used the final variant file to create PCA plots using FlashPCA software [2] to assess how our samples clustered based on mtDNA and TLR data. We used DnaSP6 software [105] to measure measure nucleotide diversity (π) [89] within each species and Dxy [129] between the following groups: A. caraya samples “Pre”, “Exposed”, and “Post” and A. guariba clamitans “Pre” and “Post” (Table 3.1, Table 3.2). DnaSP6 only considers sites without missing data in analysis [105]. Dxy analyses of YFV exposure groups measures genetic differentiation and helps identify whether one group is divergent from the others. We explored genetic differences between the exposure groups, revealed by PCA and Dxy, for functional implications. We compared the coding region of A.guariba clamitans and A. caraya with other primates of various YFV susceptibility (Chapter 2). We obtained coding regions of TLR7 and TLR8 for New World and Old World primates from GenBank and dnazoo.org [42] (Appendix B). The Disotell laboratory at New York University provided TLR7 and TLR8 sequences for A. palliata. We aligned DNA sequences using MEGA7 [69] and confirmed protein translation of TLR7 and TLR8 using reference NP 057646.1 and NP 619542.1, respectively. We aligned and edited TLR7 and TLR8 protein translations for all species using Aliview [72]. We evaluated nonsynonymous changes in the coding region in A.guariba clamitans and 26 A. caraya compared to the other primates in the alignment. Comparing nonsynonymous coding variants in Alouatta with different YFV exposure to other primates with different YFV susceptibilities could help identify variants that are potentially important to the survival of the individuals alive post-YFV outbreak. For example, if YFV-surviving Alouatta individuals have a nonsynonymous variant at a codon locus that other YFV-resistant primates also have, this variant may play a role in reducing susceptibility to YFV. 3.2 3.2.1 Results Samples and sequences The sighting of A. caraya and A. guariba clamitans at El Piñalito Provincial Park was the first official observation of these two species at this location in eight years. We saw two A. caraya, one male and one female. We saw two groups of A. guariba clamitans, one with six individuals, including a juvenile, and one with three individuals. Microsatellite analysis revealed that we obtained one fecal sample from A. caraya and two from A. guariba clamitans. We generated sequences for nuclear genes TLR7, exons 1 (221 bp), 2 (196 bp), and 3 (5635), and TLR8, exons 1 (201 bp) and 2 (4217 bp), and mtDNA genes ND1 (1075 bp), ND2 (1145 bp), and CO1 (1742 bp), and confirmed sex for the thirteen individuals (Table 3.1, Table 3.2). Most samples contained some missing data due to age of the tissue and skin samples resulting in DNA fragmentation, and expected DNA fragmentation in fecal samples [50]. Average sequencing coverage for the genes and regions is included in Appendix B. GC content in museum samples was within the average range of GC content across all samples and did not show depletion of GC content at the ends of sequencing reads suggesting that the DNA was not damaged. C > T and G > A substitutions seen in museum samples were substantiated by other tissues as well indicating they were likely not due to damage. 3.2.2 PCA We created principal component plots for nuclear and mtDNA genetic loci using variant files filtered for completely monomorphic sites and sites with > 45% missing data. PCA using both mtDNA and TLR data indicated that individuals cluster by species. Ge- 27 netic differences at mitochondrial and nuclear loci between the two species account for approximately the same amount of variance (74% and 77% respectively) (Appendix B). No clustering by YFV exposure status is observed. 3.2.3 Diversity and divergence Nucleotide diversity within A. caraya and A. guariba clamitans is low (Table 3.3). There is no nucleotide diversity at TLR7 and TLR8 within each Alouatta species and almost none at the mtDNA loci as well. Analysis resulted in a Dxy value of zero or n/a (when there was only one total segregating site between the two groups being analyzed) for all comparisons of TLR7 and TLR8 sequences between the exposure groups for A. caraya and A. guariba clamitans (Appendix B). Dxy values for mtDNA between the groups are low (Appendix B). These results indicate that our hypothesis regarding genetic differences between exposure groups is not supported. Data indicated that there are no genetic differences in TLR7 and TLR8 in the YFV surviving Alouatta individuals compared to those alive prior to the YFV outbreak or those exposed to YFV. 3.2.4 Coding sequence variation We did not see any divergence in allele frequency between the YFV exposure groups. TLR7 and TLR8 sequences of A. guariba clamitans and A. caraya had a number of the variant sites resulting in a nonsynonymous amino acid change compared to other primates, so we explored these differences. Two of the variants within TLR7 exon 3 in A. guariba clamitans have been identified as positively selected by FEL analysis in Chapter 2 (TLR7 codons 538 and 721) (Table 3.4). These variants are within the extra-compartmental LRR region important in pathogen detection [68]. TLR7 codon 538 is in the binding region of the protein and directly interacts with ss RNA viruses such as YFV [133]. As observed in Chapter 2, the amino acid change to serine from proline at TLR7 codon 538 results in a biochemical change. The GERP score of this site is 5.84, indicating that it is extremely constrained by evolution [34] and the BLOSUM62 score is -1, meaning that this substitution is not expected by chance [59]. It is noteworthy that within a functionally important and evolutionarily constrained region, there is an amino acid change present in one species and the site is under positive selection 28 solely in that species. The amino acid change at TLR7 codon 721 in A. guariba clamitans is also unique to the species. While the GERP score is lower, 1.2, it is still positive and has a BLOSUM score of -1, indicating evolutionary constraint at this locus [59]. One of the other variants in A. guariba clamitans (TLR7 codon 563) results in an amino acid that other YFV resistant genera contain such as Cebus. All but one of the nucleotide differences (TLR7 codon 241) seen in an A. caraya sample (AC tissue 2) are synonymous. The heterozygous variant in sample AC tissue 2 at codon 241 results in an amino acid change. One of the variants results in the amino acid that is consistent with all other New World primates (glycine) while the second variant results in the amino acid that is consistent with Old World primates (aspartic acid). AC tissue 2 is an interesting sample due to its numerous heterozygous variants within the third exon of TLR7. This sample is from one of the tracked Alouatta groups [5] in El Parque Piñalito that died from YFV [60]. Neither pre-YFV museum samples nor YFV surviving samples contained the same heterozygous variants. More samples from A. caraya from El Parque Piñalito are necessary to determine whether these three heterozygous variants are a common or unusual occurrence and explore any potential significance. 3.3 Discussion Our original hypothesis, that there would be significantly different allele frequencies in the post-YFV exposure Alouatta group compared to the preexposure group, was not supported by the available data. Overall, nucleotide diversity within each Alouatta species was found to be low. This is unsurprising and in line with values for other primates [96] for TLR7 and TLR8 given but the low diversity within the mtDNA given the geographic and temporal variation between the samples is less expected [12]. Despite lack of polymorphism within each species, the identification of nonsynonymous, species-specific changes in a highly conserved gene that is known to adapt to speciesspecific selection pressure [134], is informative. We identified three nonsynonymous variants in TLR7 of A. guariba clamitans. Two of the nonsynonymous TLR7 variants under positive selection in A. guariba clamitans (Chapter 2). The other nonsynonymous variant in TLR7 cause an amino acid change to the same amino acid found in more YFV resistant species. A. caraya shows only synonymous changes within TLR7 and TLR8. 29 It is noteworthy that within a highly conserved gene that is under strong purifying selection, we identified nonsynonymous species-specific variants under positive selection. New research supports this finding, stating that broadly conserved virus-interacting proteins, such as TLR7, are subject to strong adaptive events and that these adaptive events are explained by viruses [29]. The consequences of the nonsynonymous variants in TLR7 in A. guariba clamitans and their potential role in YFV susceptibility should be considered. Especially since two of the sites are under evolutionary constraint and are unique to the A. guariba clamitans lineage, not the whole Alouatta genus. There have been varying accounts of Alouatta susceptibility to YFV in general [70, 87, 102, 106]. The mass deaths caused by the last YFV outbreak were attributed to deforestation over the last 50 years and greater exposure to human populations [40, 60]. Widespread howler monkey deaths from YFV were assumed, due to the lack of formal searchers, but not confirmed [60]. Older accounts of Alouatta susceptibility document that they die quickly from the virus, but high rates of immunity in some regions has also been observed [70]. In 2017, two A. caraya individuals were spotted and nine A. guariba clamitans individuals, including one juvenile. The A. guariba clamitans individuals were surprisingly quiet, lack of howling could have caused the brown howlers to be mistakenly declared deceased. The numbers suggest that A. guariba clamitans also had a greater number of survivors of the YFV outbreak. Immediately after the YFV outbreak, although the exact numbers are unknown, approximately half of the number of A. guariba clamitans (N=20) individuals to A. caraya (N=39) individuals were found deceased in Misiones. Exact numbers of each species alive throughout the province prior to the outbreak are also unknown but it is estimated that the density of A. caraya was 15 individuals/km2 and 10 individuals/km2 for A. guariba clamitans [6]. However, the lower number of documented A. guariba clamitans that died of the outbreak combined with the much greater number of survivors implies that they may have better resistance to the virus than previously thought. The nonsynonymous variants in TLR7 found in this species may be responsible for A. guariba clamitans survival. Since the species’ ranges overlap, it is likely that all four tracked groups were affected by YFV [60]. The survivors in this study were found in the same region as the tracked groups [5]. It is possible that the A. guariba clamitans survivors migrated to the area from a different part of El Piñalito and were not 30 exposed to YFV. It is also possible, but unlikely, that they migrated from another part of the greater Atlantic Forest which spreads across Misiones but is not continuous [6]. The regions are separated by roads and towns with minimal connection to other forest sections, so the monkeys cannot migrate outside their home forest region easily [4, 6]. The genus Alouatta has the greatest range of all Neotropical primates [38]. A. caraya resides in dry semi-deciduous and deciduous forests in Brazil, Bolivia, Paraguay and Argentina [38]. A. guariba clamitans prefers a wetter, evergreen environment spanning the southern Atlantic coast of Brazil into Northern Argentina [38]. While sympatric in El Parque Piñalito today, prior to the 1940s, the two species were not sympatric, adhering to their preferred habitats [7]. Potentially, the unique, positively selected genetic variants found in A. guariba clamitans may have resulted from specialized immune gene evolution to their particular environmental niche. While not evaluating genetics, one study suggested that much of the current variability in the Alouatta populations is due to recent events such as habitat alteration and disease [31]. Extending analyses of TLR7 and TLR8 to a greater range of Alouatta species is essential to decipher whether other species have the same amino acid substitutions as identified here in A. guariba clamitans. Additionally, more research is necessary to discover whether the genetic changes in A. guariba clamitans resulted from species-specific, environment specific pathogen-host interaction in the past, or whether they are from more recent changes in habitat and new diseases. Results from this study warrant further investigation into the genetic variants of TLR7 and TLR8 in Alouatta. The sample set available for this study was small and the biological material often degraded resulting in missing data. More samples should be collected, not just from the two Alouatta species in this study, but from Alouatta species across Central and South America, to fully explore the immune genetic variation of the genus. Expanding the dataset of Alouatta immune genes will help answer questions about other Alouatta species and could potentially clarify the relationship between genetic variants and YFV susceptibility. 31 Jungle/Sylvatic Cycle Urban Cycle Haemagogus spp. Aedes aegypti Alouatta spp. Humans Figure 3.1. Yellow fever virus transmission cycle between humans and howler monkeys. YFV circulates among humans through the Aedes aegypti mosquito vectors, called the urban transmission cycle. Aedes aegypti mosquitoes do not discriminate between nonhuman primate hosts and will bite nonhuman primates if presented with the opportunity [57]. YFV can also be transmitted among nonhuman primates hosts by the Haemagogus spp. mosquito, called the sylvatic transmission cycle [16]. Figure 3.2. Map of samples used in this study. 3 1 A. caraya A. guariba clamitans Collection location Pindapoy Arroyo, San Jose Corrientes, Argentina Misiones Corrientes Piiñalito Provincial Park, Misiones Ea Santa Inés, Garupá Barrio La Eugenia, Garupá Ea Santa Inés, Garupá Saint Joseph El Piiñalito Provincial Park Misiones, Argentina El Piiñalito Provincial Park El Piiñalito Provincial Park Species A. caraya A. caraya A. caraya A. caraya A. caraya A. caraya A. caraya A. caraya A. caraya A.caraya A. guariba clamitans A. guariba clamitans A. guariba clamitans AC tissue 1 AC museum 1 AC museum 2 AC museum 3 AC tissue 2 AC tissue 3 AC tissue 4 AC tissue 5 AC tissue 6 AC fecal 1 AGC museum 1 AGC fecal 1 AGC fecal 2 0 (-) = 1 (+) = 5 2017 2017 1952 2017 2009 2009 2009 2009 2009 1961 1949 1967 2009 Collection date Naturales Bernardino Rivadavia skin Fecal Nicole Torosin Nicole Torosin Museo Argentina de Ciencias Taxidermied Fecal Nicole Torosin Hernan Argibay Hernan Argibay Hernan Argibay Hernan Argibay Ilaria Agostini Rivadavia Museo Argentina de Ciencias Naturales Bernardino Fecal Liver Liver Liver Liver Liver Taxidermied skin Rivadavia Museo Argentina de Ciencias Naturales Bernardino Rivadavia skin Taxidermied skin Museo Argentina de Ciencias Naturales Bernardino Taxidermied Collected by Hernan Argibay 2 1 Alive post-outbreak Liver Sample type Exposed to YFV and YFV antibody status Sample Table 3.2. Samples. Pre-outbreak (N) Species Table 3.1. Summary of samples used in analyses. 16C 4C 52.41 AC11 AC14 AC13 AC11 AC09 AC3 13902 49.461 13901 AC10 Source ID F M M M M M F M F M F F M Sex Post Post Pre Post Exposed - Exposed + Exposed + Exposed + Exposed + Pre Pre Pre Exposed + status and antibodies 2007-09 YFV outbreak 32 0 1 3 1 TLR8 ND1 ND2 CO1 0 0 2 2 0 TLR7 TLR8 ND1 ND2 CO1 0 0.0009 0.0007 0 0 6.90E-05 0.0009 0.0003 0 0 π Codon Position 538 563 721 1018 Gene TLR7 TLR7 TLR7 TLR7 AGC fecal 1 (Het) AGC museum 1, AGC fecal 2. AGC fecal 1 AGC museum 1, AGC fecal 2. Individuals with variant PRO THR HIS SER AA in listed Alouatta individuals PRO ARG ARG all other primates HIS: Cebus and Pan, PRO AA in other primates N Y N Y Biochemical property difference (Y/N) TIR LRR LRR binding Functional Region N Y N Y Positively selected (Y/N) 0.05 1.2 -11.7 5.84 GERP score N/A -1 0 -1 BLOSUM62 Score Table 3.4. Variant sites in A. guariba clamitans samples with functional implications. Variant sites identified by comparing to other primates. Codon positions based on protein translation of TLR7, protein reference NP 057646.1. clamitans A. guariba 0 TLR7 A. caraya (w/o missing data) Gene Species Segregating sites Table 3.3. Nucleotide diversity within Alouatta species at each gene. 33 CHAPTER 4 EXPLORING TOLL-LIKE RECEPTOR 7 AND 8 VARIATION IN GLOBAL HUMAN POPULATIONS Yellow fever virus (YFV) is a reemerging disease [27]. Brazil and Africa have experienced recurrent outbreaks over the last two years [137]. Up to 170,000 people are estimated to still be affected by YFV each year with up to 60,000 deaths annually [137]. After infection, the presentation of YFV symptoms in humans is variable. Some individuals show no outward symptoms of the disease [15]. Various studies have suggested that humans with African heritage may have a lower mortality rate from YFV compared to other populations. High rates of resistance may allow for YFV to circulate without resurgence among African populations creating a type of herd immunity [81]. Thus far, none have identified genetic factors that could explain YFV progression [85, 86, 122]. YFV was brought from Africa to the coasts of South America in the 1600s on trans-Atlantic slave trade ships [27], exposing American populations in more recent history. The mortality rate from YFV in Africa is approximately 20% while it ranges from 40-60% in South America, further supporting the potential existence of genetic factors in African populations that contribute to resistance [86]. Despite known circulation of YFV in Northern Argentina due to confirmed epizootic cases [54, 60], no urban outbreaks have been reported in Northern Argentina, adjacent to the Brazilian region with repeated outbreaks [137]. We postulated that YFV may be able to circulate silently without resurgence here as it is thought to in African populations [81, 85, 86, 122]. We hypothesized that Native American populations in YFV endemic areas may have genetic variants that contribute to YFV resistance due to repeated exposure to YFV in the last 400 years. Alternatively, past selection on genetic variants for resistance to other pathogens may have helped the Native American populations with resistance to YFV. 35 In order to explore genetic variation that may lend YFV resistance to Native American populations, we focused on toll-like receptor (TLR) 7 and TLR8 since they are innate immune genes responsible for detecting single-stranded RNA (ssRNA) viruses such as YFV [68, 140]. Single nucleotide polymorphisms in TLR7 and TLR8 have been identified in human populations and are often associated with the progression of other ssRNA viral diseases [114] such as Chikungunya [43], Crimean Congo hemorrhagic fever [11], hepatitis C [130], and HIV [91]. In this study, we use data from the Simons Genome Diversity Project (SGDP) [79] and from Skoglund [115] to evaluate whether there has been positive selection on, or differentiation from the global mean of Native American TLR7 and TLR8 variants in YFV endemic areas (Figure 4.1, Figure 4.2). We then evaluated positions within TLR7 and TLR8 found to be under positive selection or differentiated in Native American populations for functional importance and previous association with pathogenesis. 4.1 4.1.1 Methods Brazilian Native American data and processing The data was originally published by Skoglund [115]: https://reich.hms.harvard.edu/ datasets. The data represents 48 individuals from nine Native American populations in present-day Brazil. Authors generated genotype data using the Affymetrix Human Origins Array. We used PLINK [95] to extract data from the non-pseudoautosomal region on the X chromosome and output to VCF format (Appendix C). 4.1.2 SGDP data and processing SGDP data include a total of 278 fully public samples sequenced at Illumina Ltd. to an average coverage of 43-fold [79]. We downloaded the sequences available for public use in the v3 cteam lite project data format from http://reichdata.hms.harvard.edu/pub/dataset s/sgdp/. We used ctools software to extract all polymorphic sites within non-pseudoautosomal region on the X chromosome for each of the seven populations [79]. Using Eigensoft 4.2 [92] we converted the Reich lab data files first to PLINK format and finally to VCF format for downstream analysis [95] (Appendix C). 36 4.1.3 Brazilian Native American data and SGDP data processing and phasing To prepare the VCF files for the Brazilian Native Americans and the SGDP subpopulations for phasing, we used VCFtools [39] to identify sites in each dataset not present in the Haplotype Reference Consortium (HRC) [82]. We filtered sites not present in the HRC from each VCF using PLINK [95]. Finally, we used python to update the reference alleles in each VCF to match the alleles in the HRC. (Appendix C) To phase the data for each population, we uploaded the VCF file for the Brazilian Native Americans and each of the seven SGDP populations to the Sanger Imputation Server [82]. We selected Haplotype Reference Consortium [82] as the reference panel and “pre-phase with EAGLE2 and impute” as the pipeline option. (Appendix C) 4.1.4 Fst results Fst is a measure of population differentiation at a variant site calculated by comparing the expected heterozygosity in the subpopulations and the expected heterozygosity for the overall population [90]. Fst values range between 0 and 1. A score closer to one indicates complete homogeneity between populations at a locus while a score closer to 1 indicates population divergence at the locus. We filtered the SGDP datasets to include only female samples using BCFtools [76]. Using VCFtools we calculated allele frequencies for variant sites within subpopulations [39]. Using the following formula we calculated Fst at each variant site within TLR7 and TLR8 [90]. (Appendix C) Fst = (Ht − Hs)/Ht 4.1.5 Differentiation of variants in YFV endemic Native American populations To test whether TLR7 and TLR8 allele frequencies in Native American populations in YFV endemic areas are statistically different from global frequencies, we first divided the South American SGDP samples into YFV endemic (SGDP YFV) and non-YFV endemic (SGDP nYFV) groups (Appendix C). We used VCFtools to calculate the allele frequencies of each variant (three within TLR7, five within TLR8) in each of the seven populations [39] (Appendix C). We noted the major allele frequency for the SGDP American populations first and then documented the frequency for that allele for all other populations . 37 To evaluate whether TLR7 and TLR8 variants frequencies are outliers in the YFV endemic Native American populations, we calculated the quantile rank for the allele frequency at each variant site of each population. We subtracted 50 from the quantile ranks to obtain the difference from the median and then summed the values per population. Populations with a large composite score indicate that allele frequencies at TLR7 and TLR8 for that population are far from the median and therefore unusual (Appendix C). To calculate whether the allele frequency at each site for each population is significantly different from the others we calculated z-score and p-value for each of the frequencies. Variant sites within a population with a p-value of < 0.05 for an allele frequency are significantly different from the global mean (Appendix C). 4.1.6 Integrated selection of allele favored by evolution (iSAFE) To test for selection on TLR7 and TLR8 variants, we implemented iSAFE [8]. This method accurately pinpoints the favored mutation in a selective sweep, specifically within a large (∼ 5 megabase-pair (Mbp)) region. The method relies on the fact that different genomic windows have different genealogies based on recombination [8]. In the first step, the method identifies candidate mutations in small, low recombination windows using properties of the the haplotype allele frequency (HAF) score, to assign a SAFE score to each variant [8]. The HAF score is designed to separate carrier haplotypes from noncarrier haplotypes without knowledge of the favored mutation [101]. In the second step the iSAFE method places each identified variant in the smaller windows into each of the other small window and recalculates the SAFE score [8]. The SAFE scores of each variant in each small window are combined to assign an iSAFE score to each variant over the large window (5 Mbp) [8]. An outlying iSAFE score indicates that a mutation is favored [8]. We ran this test on a 5 Mbp region of the X chromosome in the Skogland [115] data surrounding TLR7 and TLR8, on the SGDP YFV populations, on the Brazilian Native American dataset plus the SGDP YFV populations, and finally on all the samples available from populations within the historic YFV range in North America. 4.1.7 Integrated haplotype score (iHS) iHS [128] is similar to iSAFE – the goal is also to detect recent selection. The test is sensitive to detecting selection on alleles occurring within the last 12,000 years that have not yet 38 reached fixation. This statistic measures the decay of homozygosity surrounding a SNP site [128]. We ran iHS on the whole non-pseudoautosomal region of the X chromosome using the Skogland [115] dataset, SGDP YFV populations, the Brazilian Native American dataset plus the SGDP YFV populations, and lastly, all American populations within the historic YFV endemic range. 4.1.8 Population branch statistic (PBS) We implemented PBS as described in Shriver et al. [112]. The PBS is a measure of Fst that determines the relative proportion of genetic variance between populations at each base pair locus. This analysis requires three populations to determine the differentiation of the alleles in the target population. One population must be phylogenetically close to the target population and the other must be a distantly related outgroup [112]. First, we found the PBS scores for the Brazilian Native American populations using the SGDP populations within the modern YFV range as the second population, and the 1000 genomes CEU (Northern European from Utah) as the outgroup. In order to confirm that differentiated variants diverged within North America and not previously, we also ran PBS on the Central Asia Siberian populations from the SGDP dataset that reside over 60o latitude (N=10). These populations would not have had exposure to YFV, and, as shown by the phylogenetic tree and the migration patterns, they are closely related to American populations (Figure 4.3, Figure 4.4). 4.1.9 Cross population extended haplotype homozygosity (XP-EHH) We used XP-EHH as described in Sabeti et al. [107] to test whether any alleles within TLR7 and TLR8 in the American populations are approaching or have achieved fixation compared to other populations. A fixed allele would indicate that it had been selected for in the population [107]. We tested the Skoglund [115] Native American populations against the African and Central Asian Siberian SGDP populations as well as testing the SGDP Native American populations against the African and Central Asian Siberian SGDP populations. 39 4.2 Results 4.2.1 Fst results We calculated Fst at the three variant loci within TLR7. Fst values ranged from 0.07 − 0.21 indicating little divergence at this gene. Fst values at the five variant loci within TLR8 ranged from 0.14 − 0.38. While some of the TLR8 Fst values are higher, they are still closer to zero than one, indicating that the loci are not divergent between global populations (Table 4.1). 4.2.2 Differentiation of variants in YFV endemic Native American populations When calculating whether any of the SGDP populations are outliers based on their TLR7 and TLR8 allele frequencies (Table 4.2), composite scores based on quantile ranks ranged from 125 − 284 (Table 4.3). The SGDP nYFV populations scored 284, indicating that the TLR7 and TLR8 frequencies are the most unusual compared to global frequencies. The SGDP YFV populations scored 229 ranking them third behind the SGDP European populations. When all SGDP American populations (YFV and non-YFV endemic) allele frequencies are compared to the other six populations, the composite score is 284, indicating that when all variants are considered together, the TLR7 and TLR8 variant frequencies in Native American populations are the most different compared to global frequencies. Evaluating each polymorphic locus individually, the SGDP YFV populations are in the lowest percentile for allele frequency at TLR8 base pair locus 12940564, TLR7 locus 12904282, and TLR7 locus 12903659. While statistical testing did not find that the frequencies of these three alleles are significantly different than the global range (Table 4.4), we still explored the implications of the low frequency of these three alleles in the SGDP YFV population. TLR8 base pair locus 12940564, rs5744088, is found in the 3’ UTR region of TLR8 [111]. The SGDP YFV populations have a major allele frequency of 58% at rs5744088, lower than any other global population. While not the main SNP associated with disease progression in case-control studies of various diseases, the minor allele, which is at a higher frequency in the SGDP American populations, alters TLR8 signaling and disease progression [116, 138]. TLR7 base pair locus 12903659, rs179008, is a coding sequence variant causing a missense mutation [111]. The major allele, A, is associated with sponta- 40 neous clearing of hepatitis C virus [44, 47]. The minor allele, T, is associated with increased risk of hepatitis C progression [47] and more rapid progression of HIV [91]. SGDP YFV populations have a major allele frequency of 73%, the lowest of all SGDP populations, and a major allele frequency of 80% among all SGDP American populations. Finally, TLR7 base pair locus 12904282, rs149314023, is a coding sequence missense variant [111]. No studies have been published on the implications of this SNP. SGDP YFV populations have major allele frequency of 92% while the SGDP nYFV populations have a major allele frequency of 100%. Variant information for this locus was only available for three SGDP populations: East Asians, Central Asia Siberians, and Americans. Variant data at this locus for more populations would help identify functional implications and the importance of the frequency discrepancy in populations in YFV endemic and nonendemic areas. 4.2.3 iSAFE We calculated iSAFE scores within a 5 Mbp region of the X chromosome surrounding TLR7 and TLR8 in Brazilian Native American populations [115] (Figure 4.5), in SGDP YFV populations [79] (Figure 4.6), in the Brazilian Native American dataset plus the SGDP YFV populations (Figure 4.7), and in all the Native American populations in North and South America (Figure 4.8) [79, 115]. This did not uncover any mutations under selection within TLR7 and TLR8. 4.2.4 iHS We calculated iHS on the whole non-pseudoautosomal region of the X chromosome using the Skogland [115] dataset (Figure 4.9), on SGDP YFV populations (Figure 4.10), on the Brazilian Native American dataset plus the SGDP YFV populations (Figure 4.11), and lastly on all American populations within the historic YFV endemic range Figure 4.12). No sites were found to be under positive selection in TLR7 and TLR8 using iHS. 4.2.5 PBS We calculated PBS [139] scores for the Brazilian Native American populations. We used SGDP YFV as the second population and the 1000 genomes CEU (Northern European from Utah) as the outgroup. The results of this analysis revealed two slightly outlying variants 41 in TLR8 in the Brazilian Native American population (Figure 4.13). These two variants were not detected by iSAFE or iHS. In order to confirm that these variants differentiated within North America and not elsewhere before human dispersal into the Americas, we also ran PBS on the Central Asia Siberian populations from the SGDP dataset that above 60o latitude (Figure 4.14). We used the same second population (SGDP YFV) and outgroup (CEU). The SGDP Siberian populations would not have had exposure to YFV since YFV is not found over 60o latitude [99]. Also, as shown by the phylogenetic tree and the migration patterns, they are closely related to American populations. One of the variants identified is in a TLR8 exon and the other is a variant within the antisense mRNA regions of TLR8 important for gene expression. 4.2.6 XP-EHH We found no selected alleles in the American populations using XP-EHH. We tested the Skoglund [115] Native American populations against the African and Central Asian Siberian SGDP populations (Figure 4.15, Figure 4.16) as well as testing the SGDP YFV populations areas against the African and Central Asian Siberian SGDP populations (Figure 4.17, Figure 4.18) . 4.3 Discussion We evaluated whether SNPs within TLR7 and TLR8 are under positive selection or are differentiated in Native American populations in YFV endemic areas compared to global populations. We hypothesized that due to exposure to historic pathogens or YFV over the last 400 years, Native American populations may have genetic variants that contribute to YFV resistance. The first approach we took was to calculate Fst and allele frequency differentiation at variant loci within TLR7 and TLR8 using SGDP populations. Fst results indicated little differentiation at TLR7 and TLR8 variant loci between global populations and SGDP YFV allele frequencies are not significantly different compared to global frequencies. However, SGDP nYFV composite TLR7 and TLR8 variant frequencies are outliers compared to global frequencies. The sample size for the SGDP nYFV region was fairly small (N= 9) and the samples are from Mexico, a country that was in the YFV endemic region before the 1960s [99]. Future studies with greater sample size should reevaluate 42 this finding and whether the decrease in major allele frequencies in this population is due to selection on the minor allele or drift. Three variants within TLR7 and TLR8 are at the lowest allele frequency in SGDP YFV populations compared to global allele frequencies at these loci. While the difference in the allele frequencies in the SGDP YFV populations is not significantly different than the global mean, this finding should be reevaluated with a larger sample size. The first variant is rs179008, and the the minor allele has been associated with increased progression of other ssRNA viruses - hepatitis C and HIV [47, 91]. The minor allele, T, results in an amino acid change to leucine from glutamine causing a biochemical change [41] which decreases production of downstream IFNα potentially affecting overall immune system response to ssRNA viruses [91]. The minor allele is at the greatest frequency, 27%, in the SGDP YFV population. It is at 20% frequency in the SGDP American populations overall (YFV and non-YFV endemic). Considering the increased frequency of this allele in American populations, it is possible that, while detrimental for susceptibility to some ssRNA viruses, it could improve susceptibility to others more prevalent in the Americas. iSAFE, iHS, and XP-EHH tests of selection did not reveal any positively selected variants within Native Americans in YFV endemic areas. PBS identified two slightly differentiated variants within TLR8. Genotyping along the X chromosome was sparse in the Skoglund [115] dataset, with only about 4200 variants genotyped in 48 individuals across Brazil. As a result, other variants that may be favored or differentiated were not found. Increased samples size and genotyping would be beneficial to expand and confirm these findings. 4.3.1 Methodological considerations with X chromosome analyses Homo sapiens have two sex chromosomes, X and Y. The sex chromosomes began as a pair of homologous autosomes that started differentiating through a series of recombination suppression events and subsequent gene loss on the Y chromosome [131]. In the process, the X chromosome maintained the original autosomal functional elements while the Y chromosome lost almost all traces of the ancestral autosome [103]. Females typically carry two X chromosomes while males usually carry one X and one Y chromosome, resulting in smaller effective population size for the X chromosome than autosomes [20]. 43 Since each X chromosome spends 2 3 of its time in females and 1 3 of its time in males [132], the X chromosome is disproportionately influenced by female demography [127]. Factors such as unequal sex ratios caused by differences in reproduction, sex-biased migration, and admixture and introgression change the male to female ratio in populations [132]. Population and generation specific fluctuations caused by the described demographic factors alter the number of the X chromosomes in a population and the relative time an X chromosome spends in males and females predicted by most analyses [132]. Due to the complicated history and inheritance patterns of in X chromosome, studying selection on the X chromosome poses a variety of problems, discussed here. First, this study focused on X-linked immune genes. The X chromosome contains the largest number of immune genes of the whole genome [21]. X chromosome inactivation is supposed to control for double expression of genes in females [21], but due to incomplete X inactivation, many genes on the X chromosome show increased expression in females [65]. As a result of overexpression, immune genes on the X chromosome are highly linked to autoimmune diseases [30]. Numerous other disease ranging from digestive system disorders to cancer have been linked to SNPs on the X chromosome [64]. Differences in fitness effects linked to immune genes on the X chromosome, combined with the X chromosome’s haploid state in males, and sex-biased demography will affect how natural selection works, causing fixation or loss to proceed more rapidly than on the autosomes [13]. Studies show that the increased differentiation on the X chromosome is not just a result of drift and sex-biased demographic events, but that positive selection and adaptation played a role [71]. Due to the combined effects of sex-biased demography, drift, and selection, alleles found to be under selection or highly differentiated on the X chromosome are very punctuated, meaning that regions outside those with high Fst immediately return to background levels [71]. It is possible that highly differentiated sites on the X chromosome “mask” signals of recent selection caused by forces such as YFV that tests such as iSAFE and iHS would otherwise detect. On the other hand, since differentiation on the X chromosome proceeds rapidly, if there were sites in TLR7 and TLR8 that were historically adaptive, we would see these sites fixed by analyses such as XP-EHH. Bioinformatic pipelines for appropriately analyzing sex chromosome sequencing data 44 are only currently gaining attention [64]. A number of regions remain where the X and Y chromosomes share homologous genes: pseudoautosomal regions 1 and 2, which are still able to recombine during meiosis, the X-added region, the X-conserved region, and the X-transposed region [103]. The presence of multiple sections of homology throughout the X and Y chromosomes creates problems when aligning sequencing results from the X chromosome [131]. As opposed to the diploid autosomes that are each represented once in a reference genome, the homologous sections of the X and Y are represented twice [131]. When aligning newly sequenced genomes to the reference genome, regions within the homologous regions of the X and Y chromosomes will map to two different regions [131]. When regions map identically to two different regions, the mapping quality will be substantially reduced which will affect downstream analyses such as variant calling [131]. Webster and colleagues [131] showed that even though mapping quality is drastically affected in known X and Y homologous regions, it is also affected throughout the X chromosome, suggesting more homology than previously thought. Masking the Y chromosome before aligning reads improves mapping quality along the entire X chromosome and variant calling. While TLR7 and TLR8 are not in a known X and Y homologous region, the datasets used in this study were not processed using suggested protocols to improve mapping along the X chromosome. Additionally, genotype arrays often underrepresent X chromosome SNPs [135]. Variants on the X chromosome were potentially lost in both the Skoglund [115] and the Mallick [79] studies as a result. In addition to the described analysis problems with the X chromosome, the sample size for the region of interest of this study was small and limited by sampling location. We were particularly interested in Native American human samples within the region of Brazil currently experiencing recurrent YFV outbreaks [137]. Only a handful (N=17) of the samples from the from the Skoglund [115] study are from the Guarani population, situated directly in the zone where YFV outbreaks have not only been recorded since the 1900s [99], but continue to today [137]. There are many factors that could have contributed to lack of selection signal within TLR7 and TLR8 in our population of interest. Uncontrollable factors such as sex-biased demography that tests of selection do not take into account and bioinformatic pipelines that are not sensitive to X chromosome variants could have lost selection signals within 45 these genes. Methods are now being developed to take demographic factors into consideration [73], sequence processing [131], and genome-wide association studies on the X chromosome [52]. It is possible that TLR7 and TLR8 are not responsible for the immunity to YFV in Native American populations. Rotival et al. [104] found that a number of genes in the TLR7/8 signaling pathway have variable expression levels in African and European populations which could contribute to differences in host immune response. In future studies, we will collect a large number of samples from the human populations in YFV endemic areas in Northern Argentina and Southern Brazil and sequence the entire X chromosome. Due to the variable recombination pattern of the X chromosome, we would standardize selection analyses taking inflation and deflation of genetic diversity into consideration [131]. We observed trends in allele frequencies of TLR7 and TLR8 in YFV endemic populations which makes this an interesting, yet complicated region to continue studying. 46 Figure 4.1. Current YFV endemic range highlighted in pale yellow [99]. Sampling locations from Skoglund [115] indicated in orange. Sampling locations of SGDP American populations used in this study indicated in black [79]. 47 Figure 4.2. Historic YFV endemic range highlighted in pale yellow [99]. Sampling locations from Skoglund [115] indicated in orange. Sampling locations of SGDP American populations used in this study indicated in black [79]. 48 Figure 4.3. Neighbor-joining tree of relationships between East Asian and American populations based on pairwise divergence [79]. Figure 4.4. Human migration into North America. 49 Figure 4.5. iSAFE on TLR7 and TLR8 in Brazilian Native American populations [115]. Test ran on 5 Mbp region surrounding TLR7 and TLR8. Region encompassing TLR7 and TLR8 within vertical bars. Figure 4.6. iSAFE on TLR7 and TLR8 in SGDP Native American populations within YFV endemic region [79]. Test ran on 5 Mbp region surrounding TLR7 and TLR8. Region encompassing TLR7 and TLR8 within vertical bars. 50 Figure 4.7. iSAFE on TLR7 and TLR8 in Brazilian Native American populations [115]and SGDP Native American populations [79]. Test ran on 5 Mbp region surrounding TLR7 and TLR8. Region encompassing TLR7 and TLR8 within vertical bars. Figure 4.8. iSAFE on TLR7 and TLR8 in Brazilian Native American populations [115] and SGDP Native American populations in North and South America [79]. Test ran on 5 Mbp region surrounding TLR7 and TLR8. Region encompassing TLR7 and TLR8 within vertical bars. 51 Figure 4.9. iHS analysis on TLR7 and TLR8 in Brazilian Native American populations [115]. Test ran on entire non-pseudoautosomal region of X chromosome. Plot shows 5 Mbp region surrounding TLR7 and TLR8, region encompassing TLR7 and TLR8 within vertical bars. Figure 4.10. iHS on TLR7 and TLR8 in SGDP Native American populations in YFV endemic areas. [79]. Test ran on entire non-pseudoautosomal region of X chromosome. Plot shows 5 Mbp region surrounding TLR7 and TLR8, region encompassing TLR7 and TLR8 within vertical bars. 52 Figure 4.11. iHS analysis on TLR7 and TLR8 in Brazilian Native American populations [115] and SGDP Native American populations in YFV endemic areas [79]. Test ran on entire non-pseudoautosomal region of X chromosome. Plot shows 5 Mbp region surrounding TLR7 and TLR8, region encompassing TLR7 and TLR8 within vertical bars. Figure 4.12. iHS analysis on TLR7 and TLR8 in Brazilian Native American populations [115] and SGDP Native American populations [79]. Test ran on entire non-pseudoautosomal region of X chromosome. Plot shows 5 Mbp region surrounding TLR7 and TLR8, region encompassing TLR7 and TLR8 within vertical bars. 53 Figure 4.13. PBS for Brazilian Native American data [115]. Test ran on entire non-pseudoautosomal region of X chromosome. Plot shows 5 Mbp region surrounding TLR7 and TLR8, region encompassing TLR7 and TLR8 within vertical bars. 54 Figure 4.14. PBS analysis using SGDP Central Asia Siberians [79]. Test ran on entire non-pseudoautosomal region of X chromosome. Plot shows 5 Mbp region surrounding TLR7 and TLR8, region encompassing TLR7 and TLR8 within vertical bars. Figure 4.15. XP-EHH analysis using Brazilian Native Americans [115] and Central Asia Siberians [79]. 55 Figure 4.16. XP-EHH analysis using Brazilian Native Americans [115] and Africans [79]. Figure 4.17. XP-EHH analysis using SGDP YFV populations and Central Asia Siberians [79]. 56 Figure 4.18. XP-EHH analysis using SGDP YFV populations and Africans [79]. 57 Table 4.1. Fst at each SGDP [79] TLR7 and TLR8 polymorphic locus. Gene Position (Mb) Fst TLR7 12903659 0.066 TLR7 12904282 0.018 TLR7 12907658 0.212 TLR8 12937513 0.207 TLR8 12937804 0.322 TLR8 12939112 0.317 TLR8 12939412 0.381 TLR8 12940564 0.138 0.863 0.594 0.447 0.403 0.57 CAS Europe Oceania SouthAsia EastAsia America 0.833 0.730 0.702 0.406 0.385 0.791 0.769 0.889 0.833 0.742 0.745 0.618 0.415 0.791 0.769 0.889 0.947 12939112 Freq 0.171 0.266 0.302 0.387 0.663 0.171 0.231 0.125 0.636 12939412 Freq 0.618 0.850 0.860 0.788 0.798 0.825 0.583 0.667 12940564 Freq 0.400 0.170 0.704 0.585 0.791 0.676 0.346 0.313 0.740 12907658 Freq 1.000 0.984 0.977 0.923 1.000 12904282 Freq 28 71 0 85 57 42 14 49 AmericaNonYFV AmericaYFV CAS Europe Oceania SouthAsia EastAsia America 83 42 28 14 0 71 57 99 85 rank rank 99 12937804 12937513 Africa Population 83 28 42 14 0 71 57 85 99 rank 12939112 16 42 57 71 99 14 28 0 85 rank 12939412 0 83 99 33 49 66 0 16 rank rank 16 0 71 42 99 57 28 14 85 12907658 12940564 99 66 33 0 99 rank 12904282 0 79 19 99 0 59 39 rank 12903659 Table 4.3. SGDP [79] allele frequencies, quantile rank, and composite score. 0.667 0.372 AmericaYFV 0.444 AmericaNonYFV 0.842 12937804 12937513 0.891 Freq Freq Africa Population Table 4.2. SGDP population allele frequencies. 284 173 144 125 265 217 229 284 214 Composite 0.800 0.956 0.835 0.976 0.731 0.889 0.877 12903659 Freq 58 0.631 0.516 America 0.304 Europe EastAsia 0.653 CAS 0.494 0.440 AmericaYFV 0.601 0.602 AmericaNonYFV SouthAsia 0.287 Africa Oceania 12937513 Population 0.366 0.468 0.490 0.713 0.727 0.420 0.437 0.346 0.381 12937804 0.400 0.498 0.495 0.610 0.772 0.453 0.473 0.366 0.316 12939112 0.655 0.561 0.534 0.471 0.278 0.629 0.586 0.661 0.295 12939412 0.807 0.372 0.357 0.468 0.452 0.410 0.766 0.654 12940564 0.630 0.727 0.395 0.471 0.342 0.413 0.624 0.645 0.373 12907658 0.219 0.410 0.460 0.796 0.308 12904282 0.721 0.346 0.584 0.309 0.770 0.476 0.500 12903659 Table 4.4. P-values for each SGDP population [79] at each TLR7 and TLR8 polymorphic locus. 59 CHAPTER 5 CONCLUSION In this dissertation I found that 1) TLR7 is under strong purifying selection in the Alouatta genus; 2) positively selected nucleotide sites in functionally important regions of TLR7 are present in Alouatta guariba clamitans but not seen in other primates, including other Alouatta species; 3) in Native American human populations in YFV endemic areas, an allele that has been associated with increased progression of other single-stranded RNA viruses hepatitis C and HIV [47, 91], is at elevated frequency. I found no support for my original hypothesis that the same TLR7 and TLR8 sites are under positive selection or differentiated in humans and howler monkeys in YFV endemic areas. Results did not reveal genetic differentiation the YFV exposed howler monkey group compared to the preexposure and exposed groups and as anticipated. Regardless, I discovered two unique, important genetic variants in TLR7 and TLR8 in A. guariba clamitans, but not in other Alouatta species. Considering that TLR7 and TLR8 are highly conserved genes, it was surprising that only one species exhibited these variants. Expanding the study to other Alouatta species would help resolve whether these variants are truly unique to A. guariba clamitans, or more widespread. Furthermore, for most genera used in the comparative analysis, I used one species per genus. Sequencing TLR7 and TLR8 in multiple species of each New World genera would clarify whether other genera also have species-specific substitutions in these two genes. Despite the successes of the project, there were a number of limitations that affected the results. First, increased time in the field could have yielded a greater number of samples and more data. Due to permitting issues and flooding in one of the provincial parks in Misiones, the final sample size was smaller than I had originally planned. Our sample size was further decreased due to the age and nature of the biological material. Tissue samples from howler monkeys alive before the outbreak were fairly old, resulting in degraded and 61 low concentration DNA. Some tissue samples from YFV exposed primates were not stored under optimal conditions resulting in low quality DNA. Finally, fecal sample DNA was the only type of sample available from the surviving howler monkeys. It is often degraded, at low concentration, and full of PCR inhibitors, limiting the amount of genetic information attainable [50]. This project originally aimed to collect data from humans living directly adjacent to the howler monkeys. The human populations are known to be affected by YFV epidemics [54]. However, due to permitting issues human samples could not be collected. As a substitute, I used data from the Skoglund [115] study where researchers collected DNA samples from a number of Brazilian Native American populations. Unfortunately, only one of the populations sampled, the Guarani, was directly proximate to the howler monkeys in the present study. I also used data from the Simons Genome Diversity Project (SGDP) [79]. These researchers sampled global indigenous populations that are often excluded from studies. Although I used the American indigenous populations for our analyses, these populations were scattered across Central and South America and none were near the howler monkey study region. Using previously published human X chromosome data also introduced a number of complicated issues (see Chapter 4). In the future, I will collect samples from the populations of interest and consider the bioinformatic issues with the X chromosome to reevaluate the results from Chapter 4. Even considering the unavoidable pitfalls of this study, and all research, I obtained insight into the genetic variation of Alouatta that should not only be reevaluated, but expanded upon. This study targeted two immune genes that putatively could affect susceptibility to YFV [104]. There are many genes in the downstream signaling pathway of TLR7 and TLR8 that could play a role in YFV susceptibility. However, there are not many genes comparable to TLR7 and TLR8, that are at the interface of host-pathogen interaction during exposure [121]. In the future, more samples from Alouatta and humans in Northern Argentina would be a valuable addition to the project to expand the sample size. Additionally, there have been ongoing sylvatic outbreaks in Southern Brazil, adjacent to my Northern Argentinian study site, and samples from this region would contribute to this study. Finally, expanding genetic analysis to RIG-1, sequencing all of the genes in the TLR7/8 pathway, and obtain- 62 ing expression data would provide valuable information about other genetic factors that may affect susceptibility to YFV. APPENDIX A CHAPTER 2 APPENDIX 64 Figure A.1. Illustration of branch bipartitions selected for SelectionLRT and FEL analyses. 65 Figure A.2. ND1-ND2-CO1 maximum likelihood phylogenetic tree. Tree generated from A. caraya and A. guariba clamitans TLR8 sequences (this study), New World and Old World primate mtDNA sequences Table A.4 on page 68. HKY model used, chosen by Decision Theoretic method. Each sequence totaled 3541 base pairs. Log likelihood of the tree = -24404.7 and aLRT support values on branches. Collection location Pindapoy Arroyo, San Jose Corrientes, Argentina Misiones, Argentina Species A. caraya A. caraya A. guariba clamitans Sample AC tissue AC Museum AGC Museum Table A.1. Sample metadata. 1952 1967 2009 Collection date taxidermied skin taxidermied skin liver tissue Sample type Obtained from Museo Argentina de Ciencias Naturales Bernardino Rivadavia Museo Argentina de Ciencias Naturales Bernardino Rivadavia Hernan Argibay Source ID 52.41 13901 AC10 M F M Sex 66 67 Table A.2. Read depth for at each gene and exon of A. caraya sample shotgun sequenced. Gene Exon Average Read Depth TLR7 1 9.13 2 12.45 3 10.6 1 7.53 2 10.68 TLR8 Table A.3. Read depth for at each gene and exon of the A. caraya and A. guariba clamitans museum samples. Average Read Depth Sample Gene Exon Average Read Depth AC Museum TLR7 1 516.96 88.78 2 5050.54 289.12 3 1257.26 214.78 1 349.05 41.47 2 1037.97 182.96 1 21.94 6.42 2 80.83 9.07 3 43.98 8.33 1 10.51 3.94 2 60.63 9.74 TLR8 AGC Museum TLR7 TLR8 after removing duplicates 68 Table A.4. GenBank accession numbers for TLR7 and TLR8 sequences used in phylogenetic analysis. † This assembly was generated by the DNA Zoo team in collaboration with Jessica Alfoldi, Broad Institute, using an early version of DISCOVAR de novo contigger. Species TLR7 GenBank Accession TLR8 GenBank Accession mtDNA Cebus capucinus XM 017508809.1 XM 017508811.1 not available Cebus albifrons NC 002763.1 Aotus nancymaae XM 012435148.1 XM 012435138.2 NC 018116.1 Chlorocebus sabaeus XM 007991035.1 XM 007991038.1 EF597503 Mandrillus leucophaeus XM 011994858.1 XM 011994857.1 NC 028442 Pongo pygmaeus AB445663.1 AB445670.1 NC 001646.1 Papio anubis XM 021932909.1 XM 009197197.3 MG787545.1 Macaca mulatta EU204943.1 EU204944.1 NC 005943 Gorilla gorilla KF321036.1 KF321278.1 NC 001645.1 Pan paniscus XM 003805682.2 XM 008973908.2 NC 001644.1 Saimiri boliviensis DNAzoo.org† DNAzoo.org† KC959987.1 Pan troglodytes KF321089.1 KF321320.1 D38113.1 Callithrix jacchus XM 002762618.4 XM 008988948.2 KM588314 Homo sapiens Variants from SGDP dataset [79] applied to hg19 reference [1] NC 012920.1 Table A.5. Tree parameters: Phylogenetic tree parameters for each alignment as determined by Decision Theoretic method in PAUP [120]. Alignment Model Ts/Tv ratio Invariable sites/ pinv Across-site rate variation/ Gamma rate TLR7 Coding Region HKY 4.54 0.747 0.171 TLR8 Coding Region TN Optimized 0.685 0.164 mtDNA: CO1-ND1-ND2 HKY 27.33 0.391 0.32 69 A.1 Methods We processed shotgun sequencing results and created a reference sequence using the following pipeline and commands: 1. Remove Illumina adapters from forward and reverse sequencing data [25]. java −jar /bin/Trimmomatic−0.32/trimmomatic−0.32.jar PE − threads 24 14457X1 170823 D00294 0336 BCBLJ8ANXX 4 1.txt.gz 14457X1 170823 D00294 0336 BCBLJ8ANXX 4 2.txt.gz 14457 X1 howlerX1 1P.fq.gz 14457X1 howlerX1 1U.fq.gz 14457 X1 howlerX1 2P.fq.gz 14457X1 howlerX1 2U.fq.gz ILLUMINACLIP :/usr/local/bwlab/bin/Trimmomatic−0.32/adapters/TruSeq3−PE .fa:2:30:10:1:true LEADING:3 TRAILING:3 MAXINFO:50:0.2 MINLEN:50 2. Align forward and reverse howler monkey reads to a masked version of the human reference genome (hg19) [1] to reduce repetitive sequences [75]. Output interlaced paired end read alignment. bwa mem −t 32 Homo sapien hg19/masked/hg19 masked 14457 X1 howlerX1 1P.fq.gz 14457X1 howlerX1 2P.fq.gz 3. Sort the aligned BAM files [74]. samtools view −bS hg19 masked 14457X1 howlerX1 PE.sam | samtools sort − hg19 masked 14457X1 howlerX1 PE.sorted.bam 4. BAM files filtered to target aligned reads to TLR7 and TLR8 and only include reads of quality MAPQ > 60 [74]. samtools view −q 60 −b {hg19 masked 14457X1 howlerX1 PE.sorted .bam} 12885000−12941400 > hg19 TLR7−8 chrX :12885000−12941400 MAPQ60.fq 5. Create a consensus sequence of TLR7 and TLR8 using the fastq files [14]. spades.py −−only−error−correction −t 24 −m 128 −−12 hg19 TLR7−8 chrX:12885000 −12941400 MAPQ60.fq −o hg19 TLR7 −8 chrX 12885000−12941400 MAPQ60 TLR7 and TLR8 are extremely conserved, especially within primates, therefore the howler monkey TLR7 and TLR8 genes were expected to align to the human reference genome. The same pipeline was used to align Alouatta caraya sequencing data to an A. palliata reference 70 sequence provided by the Disotell laboratory at New York University. The final A. caraya TLR7 and TLR8 reference genomes created were identical. We processed targeted sequencing results using the following pipeline: 1. Unalign BAM files for AC museum and AGC museum using GATK software [123]. java −jar /GATK Resource/picard−tools−1.134/picard.jar RevertSam I={input.bam} O={output.unaligned bam} 2. Convert unaligned BAM files to Fastq format [97]. bedtools bamtofastq −i {input.unaligned bam} −fq {output. fastq} 3. Assess fastq quality. Software available at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/. fastqc −o fastqc results sample.fastq 4. Trim fastq reads based on fastq quality results. Software available at https://sourceforge.net/projects/bbmap/. bbduk.sh −Xmx1g in={input.fq} out={output.out fq} ftl=10 ftr =200 minlen=75 maq=20 5. Realign trimmed Fastq files for each sample to AC tissue reference [75]. bwa mem −R @RG\\tID:{id}\\tSM:{id}\\tLB:{id}\\tPU:{id}\\tPL: { IonPGM} {input.ref} {input.fq} | samtools fixmate −O bam − − | samtools sort −O bam −o {output.bams} 6. Index each BAM file [74]. samtools index {input} 7. Mark duplicate reads using GATK software [39]. java −jar /GATK Resource/picard−tools−1.134/picard.jar MarkDuplicates I={input.bam} O={output.bam} M={output. metrics} 8. Create a consensus VCF based on the BAM files (with duplicate reads marked) of each species [53]. 71 freebayes −−fasta−reference AC tissue.fasta −−report− monomorphic −−genotype−qualities −−L {list of bam files} −v {output.vcf} 9. Remove indels from VCF file and filter by genotype quality [39]. Indels were removed since they appeared to be related to sequencing error caused by long strings of nucleotide Adenine. vcftools −−vcf {output.vcf} −−minGQ 30 −−remove−indels −− recode −out {output minGQ30.vcf} 10. Create a consensus sequence for the A. caraya and A. guariba clamitans taxidermied skin samples [76]. bcftools consensus −f ACref.fasta −s {sample} −H A −I output minGQ30.vcf.gz > sample consensus.fasta APPENDIX B CHAPTER 3 APPENDIX 73 Figure B.1. Microsatellite results to determine uniqueness of fecal samples collected from A. guariba clamitans (G) and A. caraya (C). Different alleles observed at Api06 for A. guariba clamitans. G1 (AGC fecal 1) and G2 (AGC fecal 2) are also male and female, respectively. Extra bands seen in the C2 lanes are due to nonspecific amplification. A. caraya samples have the same alleles at all three loci. Figure B.2. PCA analysis using variants from all Alouatta samples at all mtDNA genetic loci: ND1, ND2, CO1. 74 Figure B.3. PCA analysis using variants from all Alouatta samples at all TLR7 and TLR8 genetic loci. Table B.1. GPS location for each sample. Sample Latitude Longitude Notes AC tissue 1 -27.658456 -56.074939 AC museum 1 -27.612103 -57.279819 GPS location generalized Corrientes location AC museum 2 -26.942803 -54.516458 GPS location generalized Misiones location AC museum 3 -27.612103 -57.279819 GPS location generalized Corrientes location AC tissue 2 -26.427959 -53.844202 AC tissue 3 -27.5275 -55.867222 AC tissue 4 -27.657928 -56.075358 AC tissue 5 -27.658442 -56.075425 AC tissue 6 -27.6584 -56.0754 AC fecal 1 -26.425012 -53.835272 AGC museum 1 -26.942803 -54.516458 AGC fecal 1 -26.431315 -53.836815 AGC fecal 2 -26.431315 -53.836815 GPS location generalized Misiones location 75 Table B.2. Synonymous variant loci in A. guariba clamitans. Gene Codon Samples with variant: homozygous Samples with variant: heterozygous TLR8 272 AGC museum 1, AGC fecal 1, AGC fecal 2 TLR8 686 AGC museum 1, AGC fecal 1, AGC fecal 2 TLR8 704 AGC museum 1 TLR8 830 AGC museum 1, AGC fecal 1 AGC fecal 2 Table B.3. A. caraya variants: Variant sites in A. caraya samples with functional implications. Codon positions based on protein translation of TLR8, protein reference NP619542.1. Samples with variant: Samples with variant: homozygous heterozygous Gene Codon Position TLR7 112 TLR7 137 AC tissue 2 N TLR7 144 AC tissue 2 N TLR7 146 AC tissue 2 N TLR7 228 AC tissue 2 N TLR7 241 AC tissue 2 N TLR7 860 All AC samples N TLR8 686 All AC samples N TLR8 830 All AC samples N All AC samples Amino acid change (Y/N) Implications N ASP (same as Old World primates) or GLY (same as New World primates) 76 Table B.4. DnaSP results for A. caraya variant data. Only sites without missing data included in analysis. Pop1: AC museum 1, AC museum 2, AC museum 3 Pop2: AC tissue 1, AC tissue 2, AC tissue 3, AC tissue 4, AC tissue 5, AC tissue 6 Pop3: AC fecal 1 Population 1 Pop 2 Loci Total Differences Dxy 2 1 TLR7e2 0 n/a 2 3 TLR7e2 0 n/a 1 3 TLR7e2 0 n/a 2 1 TLR7e3 0 0 2 3 TLR7e3 0 0 1 3 TLR7e3 0 0 2 1 TLR8e2 0 0 2 3 TLR8e2 0 0 1 3 TLR8e2 0 0 2 1 TLR8e1 0 n/a 2 3 TLR8e1 0 n/a 1 3 TLR8e1 0 n/a 2 1 ND2 5 0.0022 2 3 ND2 3 0.0021 1 3 ND2 3 0.0015 2 1 ND1 2 0.00075 2 3 ND1 2 0.00063 1 3 ND1 1 0.00036 2 1 CO1 0 0 2 3 CO1 1 0.00035 1 3 CO1 2 0.000578 77 Table B.5. DnaSP results for A. guariba clamitans variant data. Only sites without missing data included in analysis. Pop1: AC museum 1, AC museum 2, AC museum 3 Pop2: AC tissue 1, AC tissue 2, AC tissue 3, AC tissue 4, AC tissue 5, AC tissue 6 Pop3: AC fecal 1 Population 1 Pop 2 Loci Total Differences Dxy 1 2 TLR7e2 0 n/a 1 2 TLR7e3 0 0 1 2 TLR8e2 0 0 1 2 TLR8e1 0 n/a 1 2 ND2 0 0.00082 1 2 ND1 2 0.0012 1 2 CO1 0 0 78 B.1 Methods We processed targeted sequencing results using the following pipeline: 1. Unalign BAM files for AC museum and AGC museum using GATK software [39]. java −jar /GATK Resource/picard−tools−1.134/picard.jar RevertSam I={input.bam} O={output.unaligned bam} 2. Convert unaligned BAM files to Fastq format [97]. bedtools bamtofastq −i {input.unaligned bam} −fq {output. fastq} 3. Assess fastq quality. Software available at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ fastqc −o fastqc results sample.fastq 4. Trim fastq reads based on fastq quality results. Software available at https://sourceforge.net/projects/bbmap/ bbduk.sh −Xmx1g in={input.fq} out={output.out fq} ftl=10 ftr =200 minlen=75 maq=20 5. Realign trimmed Fastq files for each sample to AC tissue reference [75, 76]. bwa mem −R @RG\\tID:{id}\\tSM:{id}\\tLB:{id}\\tPU:{id}\\tPL: {IonPGM} {input.ref} {input.fq} | samtools fixmate −O bam − − | samtools sort −O bam −o {output.bams} 6. Index each BAM file [76]. samtools index {input} 7. Mark duplicate reads using GATK software [39]. java −jar /GATK\ Resource/picard−tools−1.134/picard.jar MarkDuplicates I={input.bam} O={output.bam} M={output. metrics} 8. Create a consensus VCF based on the BAM files (with duplicate reads marked) of each species [53]. 79 freebayes −−fasta−reference AC tissue.fasta −−report− monomorphic −−genotype−qualities −−L {list of bam files} −v {output.vcf} 9. Remove samples with > 40% missing data within TLR genes. 10. Merge VCF files [76]. bcftools merge {A caraya.vcf.gz} {A guariba clamitans.vcf.gz} −O v −o {AC AGC merged.vcf} 11. Remove indels from VCF file and filter by genotype quality [39]. Indels were removed since they appeared to be related to sequencing error caused by long strings of nucleotide adenine. vcftools −−vcf {AC AGC merged.vcf} −−minGQ 30 −−remove− indels −−recode −out \newline {AC AGC merged minGQ30.vcf} 12. Remove variant sites where every sample is monomorphic or missing data. 13. Remove variant sites where >6 samples are missing data (out of 13 total). APPENDIX C CHAPTER 4 APPENDIX 81 Table C.1. SGDP population sizes. Population All individuals Females Only America 22 14 America YFV endemic (SGDP YFV) 13 7 America non-YFV endemic (SGDP nYFV) 9 7 Europe 75 23 Africa 44 13 Central Asia Siberia 27 16 10 5 East Asia 46 19 South Asia 39 8 Oceania 25 8 Central Asia Siberians over 60o latitude 82 C.1 Methods C.1.1 Skoglund [115] and SGDP [79] data processing We downloaded Brazilian Native America data from https://reich.hms.harvard.edu/ datasets and converted the datafiles to a VCF, extracting only the X chromosome. plink −−file Brazil n48 2015 −−chr 23 −−recode vcf −−out Brazil n48 2015 We downloaded v3 cteam lite SGDP data from http://reichdata.hms.harvard.edu/pub/datasets/sgdp/ 1. We used Ctools software [79] to extract all polymorphic sites within non-pseudoautosomal region of the X chromosome for each of the seven populations. https://github.com/DReichLab/cTools/blob/master/README.analysis.md cpoly −p {parameter file} −V 2. Convert Reich Lab data files to Plink format using Eigensoft 4.2 [92]. convertf −p {parameter file} 3. Convert Plink format files to VCF format [39]. plink −−ped {file.ped} −−map {file.pedsnp} −−recode vcf −− out {file.vcf} To update reference alleles to match the Haplotype Reference Consortium for both datasets, we downloaded the site list from: http://www.haplotype-reference-consortium.org/site We used the following VCFtools [39], awk, plink [95], and python commands to update the reference allele for the VCF file for each SGDP population before phasing: 1. vcftools −−vcf alleletestX.vcf −−diff SGDP.vcf −−diff−site HRC.r1−1.GRCh37.wgs.mac5.sites.vcf −−out out.diff.sites in files #identify sites in SGDP populations that are not represented in HRC. 2. awk ’$6=="."{print $1,$2}’out.diff.sites in files > SNPStoremove.txt 83 #create a list of SNPS to remove from SGDP VCF file. 3. plink −−vcf SGDP.vcf −−exclude SNPStoremove.txt −−double−id −−recode vcf −−out SGDPfiltered.vcf 4. Manually remove VCF header except column labels from filtered VCF. 5. vcf = pandas.read table("") #import VCF 6. vcols = vcf.columns.tolist() #put column values into a list 7. ID = vcf.ID.tolist() #put SNP IDs into a list 8. map = pandas.read table("HRC.r1−1.GRCh37.wgs.mac5.sites.tab.gz ") #upload the HRC reference allele list 9. REF = map.REF2.tolist() #save HRC reference alleles to list 10. ALT = map.ALT2.tolist() #save HRC alternate alleles to list 11. dREF = dict(zip(ID,REF)) #create a dictionary matching the HRC reference alleles to the SNPS in the SGDP 12. dALT = dict(zip(ID,ALT)) #create a dictionary matching the HRC alternate alleles to the SNPS in the SGDP 13. vcf[’REF’] = vcf.ID.map(dREF) #create a new reference allele column in the VCF using HRC reference alleles 12. vcf[’ALT’] = vcf.ID.map(dALT) #create a new alternate allele column in the VCF using HRC alternate alleles 13. vcf.to csv(’vcfedit’, sep=’\t’) #Save updated VCF We phased the VCFs for downstream analyses using the Sanger Imputation Server https://imputation.sanger.ac.uk/ [82]. We used the Haplotype Reference Consortium reference panel and “pre-phase with EAGLE2 and impute” pipeline. We uploaded a file with the sex of each of the samples in the VCF so that male samples were treated accordingly. C.1.2 Fst To calculate Fst we: 1. Removed all male samples from each SGDP subpopulation [76]. bcftools +fixploidy -vcf file.vcf -s sample list with sexes -p ploidy- 84 file -recode vcf -out outfile 2. Calculated the allele frequencies at variant sites within TLR7 and TLR8 for female subpopulations [39]. vcftools --vcf file.vcf --freq --out file.frq C.1.3 Differentiation of TLR7 and TLR8 variants in YFV endemic Native American populations In order to test whether allele frequencies of TLR7 and TLR8 in YFV endemic Native American populations are outliers compared to the global average, we first used bcftools to haploidize the male samples in the pre-phased VCFs for SGDP subpopulation [76]. bcftools +fixploidy -vcf file.vcf -s sample list with sexes -p ploidyfile -recode vcf -out outfile We used VCFtools to calculate allele frequencies in each population [39]: vcftools --vcf file.vcf] --freq --out file.frq We wrote an python script to first calculate the quantile of the allele frequency for each population at each locus. import pandas import numpy as np import scipy.stats as st from scipy.special import ndtr as ndtr df = pandas.read csv("allele frequencies.csv", sep = ",") C = list(df) for x in C[1:9]: df[x +" rank"]=pandas.qcut(df[x],100,labels=False) C2 = list(df) #assign percentile to all allele frequencies for x in C2[9:19]: df[x+"−50"]= abs(df[x] − 50.0) C3 = list(df) #subtract 50 from all percentile ranks df["quant sum"] = df[’Freq 12937804 rank−50’]+ df[’Freq 12939112 rank−50’]+df[’Freq 12939412 rank−50’] +df[’Freq 12940564 rank−50’]+df[’Freq 12937513 rank−50’] +df[’Freq 12907658 rank−50’]+df[’Freq 12904282 rank−50’] +df[’Freq 12903659 rank−50’] 85 #sum all quantile values to see if allele freq is unusual for col in df.columns[1:9]: df[col + ’ zscore’] = (df[col]−df[col].mean())/df[col].std( ddof=(df[col].count() − 1.0)) C4 = list(df) #compute zscore for each population at each site for col in df.columns[26:34]: df[col + ’pvalues’] = 1 − ndtr(df[col]) C5 = list(df) #compute pvalue for each population at each site significance = 0.05 for col in df.columns[35:44]: df[col + ’stat sig’] = (df[col] >= significance).astype(int) #assign 0 for pvalues under 0.05 df.to csv("export quantile pval.csv") #export data to csv C.1.4 iSAFE To run iSAFE analysis on TLR7 and TLR8 we used a phased VCF of my target population with the following flags. The region is a 5 Mbp re- gion encompassing TLR7 and TLR8 [8]. python2.7 /iSAFE-1.0.3/src/isafe.py --format vcf --i X.vcf.gz --AA /cg02b/NGS/NicoleTorosin Anthro2017/homo sapiens ancestor GRCh37 e71/ homo sapiens ancestor X.fa --region X:10413245-15413245 --IgnoreGaps --out isafeX:10413245-15413245TLR7-TLR8 C.1.5 iHS To calculate integrated haplotype score for each variant in the non-pseodoautosomal region, we first converted the phased VCF to \hapsample" format using bcftools [76]. bcftools convert --hapsample out.prefix phased.vcf.gz iHS bin software will not accept missing data, we filtered missing data, represented by \-" from the .hap file with the following command: awk ’NR==1{for(i=1; i<=NF; i++) if ($i==1 || $i==0) {a[i]++;}}{for (i in a) printf "%s", $i; printf "\n"}’ 86 X.hap > Xed.hap To create the map file necessary for iHS bin using base pair locations along the X instead of rs numbers, we copied the fourth column of the map file as the second column. The X chromosome is coded as \X" in the first column, rather than \23." We used the following ihsbin [78] execute the integrated haplotype analysis: ihsbin --hap Xed.hap --map X.map --out ihs C.1.6 PBS To calculate population branch statistics at each variant sites, we using the following commands: 1. python pbs.py pop1 pop2 pop3 size1,size2,size3 > output #where the pops are the data files for each population in order. First is the population you want to run pbs on. The order here matters. You will also need the sizes. Note that the sizes are all argument four, separated by commas. Next you need to standardize the output. 2. python standard pbs.py output #bins > results #You must specify the number of frequency bins for the standardization process. Running this script will also produce the file freq counts.hist.png. This is a histogram that will show you your sample size for each bin. Selecting the number of bins is a ’’guess and check" process. The resulting histograms should look more or less uniform across bin size. For my analyses, number of bins ranged from 8-10. The script for the first step follows: import numpy as np import sys def get values(first, second,first size,second size): #This block calculates the branch length Hs = (2∗first∗(1−first)∗first size + 2∗second∗(1−second)∗ second size) / (first size+second size) Ht = 2 ∗ ( (first∗first size + second∗second size) / ( first size+ second size) ) ∗ (1− (first∗first size + second∗ second size) / (first size+ second size) ) 87 Fst = (Ht − Hs) / Ht T = −np.log(1−Fst) return T,Fst def find sites(one,two,three,lim): #This is filtering to common sites. #This block for only excluding fixed cites. #values can be chnage to exclude any frequencies. hi = np.array([i for i,j in enumerate(one) if lim < j < 1−lim ]) #two for both fixed for one allele, 0 for both fixed in another ji = np.array([i for i,j in enumerate(two) if lim < j < 1−lim ]) ki = np.array([i for i,j in enumerate(three) if lim < j < 1− lim]) #These are returning indices from the original array of valid snps. keep = np.intersect1d(hi,ji) keep = np.intersect1d(ki,keep) return keep def snpsmaf(x): #Read the data in. I put the data in my own format here. pop = np.genfromtxt(x,dtype = str) var = pop[0] irs = np.where(var == ’SNP’)[0][0] isnps = np.where(var == ’POS’)[0][0] imaf = np.where(var == ’MAF’)[0][0] pop rs = pop[1:,irs] pop snps = pop[1:,isnps].astype(int) pop maf = pop[1:,imaf].astype(float) return pop snps,pop maf,pop rs def get data(one, two, three): # getting data. popa snps,popa p,popa rs = snpsmaf(one) popb snps,popb p,popb rs = snpsmaf(two) ref snps,ref p,ref rs = snpsmaf(three) keep = find sites(popa p,popb p,ref p,0.05) 88 rs = popa rs[keep] popa p = popa p[keep] popb p = popb p[keep] ref p = ref p[keep] snps = popa snps[keep] #This will only work while each population file has a complete list of the same snp sin the same order. return popa p,popb p,ref p,snps,rs if name == ’ main ’: popa = sys.argv[1] popb = sys.argv[2] ref = sys.argv[3] sizes = sys.argv[4] popa size,popb size,ref size = [int(i) for i in sizes.split (’,’)] popa p,popb p,ref p,snps,rs = get data(popa,popb,ref) Tab,fstab = get values(popa p,popb p,popa size,popb size) # find branch lengths Taref,fstar = get values(popa p,ref p,popa size,ref size) Tbref,fstbr = get values(popb p,ref p,popb size,ref size) pbs = (Tab + Taref − Tbref)/2.0 #find pbs for i in range(len(snps)): print "%s\t%d\t%f\t%f\t%f\t%f" % (rs[i],snps[i],pbs[i], popa p[i],popb p[i],ref p[i]) #print out values The script for the second step follows: import numpy as np import sys import matplotlib.pyplot as plt if name == ’ main ’: pop = sys.argv[1] #getting arguments from command line numbins = int(sys.argv[2]) bin size = 0.5/numbins #specifying bins for standardization and histogram rs = np.genfromtxt(pop,dtype=str)[0:,0] data = np.genfromtxt(pop)[0:,[1,2,3]] bins = np.arange(min(data[0:,−1]), 0.5, bin size) 89 bins[−1] = 0.5 #the largest MAF is 0.5 #data[0:,1] = [np.random.random() for i in range(135817)] plt.hist(data[0:,−1],bins=bins) #create histogram plt.xlabel(’MAF’) plt.ylabel(’Count’) plt.savefig(’freq counts.hist.png’,format=’png’,dpi=300) #save histogram plt.clf() bin ind = np.digitize(data[0:,−1],bins) # find which bin each snp belongs in new = np.zeros(len(bin ind)) for bin in set(bin ind): #iterate over bin numbers and standardize within the bin ind = bin ind==bin m = np.mean(data[0:,1][ind]) s = np.std(data[0:,1][ind]) new[ind] = (data[0:,1][ind] − m)/s #print bin,m,s for i in range(len(data)): print "%s\t%d\t%f\t%f" % (rs[i],data[i,0],data[i,1],new[i]) #print out the values. Nathan Harris wrote the PBS scripts REFERENCES [1] 1000 Genomes Project Consortium, A. Auton, L. D. Brooks, R. M. Durbin, E. P. Garrison, H. M. Kang, J. O. Korbel, J. L. Marchini, S. McCarthy, G. A. McVean, and G. R. Abecasis, A global reference for human genetic variation, Nature, 526 (2015), pp. 68–74. [2] G. Abraham, Y. Qiu, and M. Inouye, FlashPCA2: principal component analysis of biobank-scale genotype datasets, Bioinformatics, 33 (2017), pp. 2776–2778. [3] I. Agostini, I. Holzmann, M. S. D. Bitetti, I. Luciana, M. M. Kowalewski, P. M. Beldomnico, M. Martı́nez, E. S. Moreno, A. L. J. Desbiez, and P. Miller, Conservation letter building a species conservation strategy for the brown howler monkey (Alouatta guariba clamitans) in Argentina in the context of yellow fever outbreaks, Trop. Conserv. Sci., 7 (2014), pp. 26–34. [4] I. Agostini, I. Holzmann, and M. S. Di Bitetti, Infant hybrids in a newly formed mixed-species group of howler monkeys (Alouatta guariba clamitans and Alouatta caraya) in northeastern Argentina, Primates, 49 (2008), pp. 304–307. [5] , Ranging patterns of two syntopic howler monkey species (Alouatta guariba and A. caraya) in northeastern Argentina, Int. J. Primatol., 31 (2010), pp. 363–381. [6] I. Agostini, E. Pizzio, C. De Angelo, and M. S. Di Bitetti, Population status of primates in the Atlantic forest of Argentina, Int. J. Primatol., 36 (2015), pp. 244–258. [7] L. M. Aguiar, D. M. Mellek, K. C. Abreu, T. G. Boscarato, I. P. Bernardi, J. M. D. Miranda, and F. C. Passos, Sympatry between Alouatta caraya and Alouatta clamitans and the rediscovery of free-ranging potential hybrids in southern Brazil, Primates, 48 (2007), pp. 245–248. [8] A. Akbari, J. J. Vitti, A. Iranmehr, M. Bakhtiari, P. C. Sabeti, S. Mirarab, and V. Bafna, Identifying the favored mutation in a positive selective sweep, Nat. Methods, 15 (2018), pp. 279–282. [9] M. Anisimova and O. Gascuel, Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative, Syst. Biol., 55 (2006), pp. 539–552. [10] H. Areal, J. Abrantes, and P. J. Esteves, Signatures of positive selection in toll-like receptor (TLR) genes in mammals, BMC Evol. Biol., 11 (2011), p. 368. [11] S. Arslan, A. Engin, N. Özbilüm, and M. Bakır, Toll-like receptor 7 Gln11Leu, c.4-151A/G, and +1817G/T polymorphisms in Crimean Congo hemorrhagic fever, J. Med. Virol., 87 (2015), pp. 1090–1095. [12] M. S. Ascunce, E. Hasson, C. J. Mulligan, and M. D. Mudry, Mitochondrial sequence diversity of the southernmost extant new world monkey, alouatta caraya, Mol. Phylogenet. Evol., 43 (2007), pp. 202–215. 91 [13] P. J. Avery, The population genetics of haplo-diploids and x-linked genes, Genet. Res., 44 (1984), pp. 321–341. [14] A. Bankevich, S. Nurk, D. Antipov, A. A. Gurevich, M. Dvorkin, A. S. Kulikov, V. M. Lesin, S. I. Nikolenko, S. Pham, A. D. Prjibelski, A. V. Pyshkin, A. V. Sirotkin, N. Vyahhi, G. Tesler, M. A. Alekseyev, and P. A. Pevzner, SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing, J. Comput. Biol., 19 (2012), pp. 455–477. [15] E. D. Barnett, Yellow fever: epidemiology and prevention, Clin. Infect. Dis., 44 (2007), pp. 850–856. [16] A. D. T. Barrett and S. Higgs, Yellow fever: a disease that has yet to be conquered, Annu. Rev. Entomol., 52 (2007), pp. 209–229. [17] D. W. C. Beasley, A. J. McAuley, and D. a. Bente, Yellow fever virus: Genetic and phenotypic diversity and implications for detection, prevention and therapy, Antiviral Res., 115 (2014), pp. 48–70. [18] J. F. R. Bejarano, Estudio sobre la fiebre amarilla selvática en la República Argentina, Buenos Aires: Ministerio de Bienestar Social de la Nacion/Secretaria de Estado de Salud Publica, (1979), p. 38. [19] J. K. Bell, G. E. D. Mullen, C. A. Leifer, A. Mazzoni, D. R. Davies, and D. M. Segal, Leucine-rich repeats and pathogen recognition in toll-like receptors, Trends Immunol., 24 (2003), pp. 528–533. [20] R. Bergero and D. Charlesworth, The evolution of restricted recombination in sex chromosomes, Trends Ecol. Evol., 24 (2009), pp. 94–102. [21] I. Bianchi, A. Lleo, M. E. Gershwin, and P. Invernizzi, The X chromosome and immune associated genes, J. Autoimmun., 38 (2012), pp. J187–J192. [22] J. C. Bicca-marques and D. S. D. Freitas, The role of monkeys, mosquitoes, and humans in the occurrence of a yellow fever outbreak in a fragmented landscape in south Brazil: Protecting howler monkeys is a matter of public health, Trop. Conserv. Sci., 3 (2010), pp. 78–89. [23] A. W. Bigham, M. J. Wilson, C. G. Julian, M. Kiyamu, E. Vargas, F. LeonVelarde, M. Rivera-Chira, C. Rodriquez, V. A. Browne, E. Parra, T. D. Brutsaert, L. G. Moore, and M. D. Shriver, Andean and Tibetan patterns of adaptation to high altitude, Am. J. Hum. Biol., 25 (2013), pp. 190–197. [24] S. Boissinot, Y. Tan, S. K. Shyue, H. Schneider, I. Sampaio, K. Neiswanger, D. Hewett-Emmett, and W. H. Li, Origins and antiquity of x-linked triallelic color vision systems in new world monkeys, Proc. Natl. Acad. Sci. U. S. A., 95 (1998), pp. 13749–13754. [25] A. M. Bolger, M. Lohse, and B. Usadel, Trimmomatic: A flexible trimmer for illumina sequence data, Bioinformatics, 30 (2014), pp. 2114–2120. 92 [26] A. W. Briggs, U. Stenzel, P. L. F. Johnson, R. E. Green, J. Kelso, K. Prüfer, M. Meyer, J. Krause, M. T. Ronan, M. Lachmann, and S. Pääbo, Patterns of damage in genomic DNA sequences from a Neandertal, Proc. Natl. Acad. Sci. U. S. A., 104 (2007), pp. 14616–14621. [27] J. E. Bryant, E. C. Holmes, and A. D. T. Barrett, Out of Africa: A molecular perspective on the introduction of yellow fever virus into the americas, PLoS Pathog., 3 (2007), p. e75. [28] A. Burrell, T. Disotell, and C. Bergey, The use of museum specimans with highthroughput DNA sequencers, J. Hum. Evol., (2015), pp. 35–44. [29] D. Castellano, L. H. Uricchio, K. Munch, and D. Enard, Viruses rule over adaptation in conserved human proteins, bioRxiv, (2019), p. 555060. [30] D. Chang, F. Gao, A. Slavney, L. Ma, Y. Y. Waldman, A. J. Sams, P. BillingRoss, A. Madar, R. Spritz, and A. Keinan, Accounting for exentricities: analysis of the X chromosome in GWAS reveals x-linked genes implicated in autoimmune diseases, PLoS One, 9 (2014), p. e113684. [31] C. A. Chapman and S. R. Balcomb, Population characteristics of howlers: Ecological conditions or group history, Int. J. Primatol., 19 (1998), pp. 385–403. [32] N. D. Collins and A. D. T. Barrett, Live attenuated yellow fever 17D vaccine: A legacy vaccine still controlling outbreaks in modern day, Curr. Infect. Dis. Rep., 19 (2017), p. 14. [33] D. N. Cook, D. S. Pisetsky, and D. a. Schwartz, Toll-like receptors in the pathogenesis of human disease, Nat. Immunol., 5 (2004), pp. 975–979. [34] G. M. Cooper, E. A. Stone, G. Asimenos, NISC Comparative Sequencing Program, E. D. Green, S. Batzoglou, and A. Sidow, Distribution and intensity of constraint in mammalian genomic sequence, Genome Res., 15 (2005), pp. 901–913. [35] L. Cortés-Ortiz, T. F. Duda, D. Canales-Espinosa, F. Garcı́a-Orduña, E. Rodrı́guez-Luna, and E. Bermingham, Hybridization in large-bodied new world primates, Genetics, 176 (2007), pp. 2421–2425. [36] L. Cortés-Ortiz, E. Mondragón, and J. Cabotage, Isolation and characterization of microsatellite loci for the study of mexican howler monkeys, their natural hybrids, and other neotropical primates, Conserv. Genet. Resour., 2 (2010), pp. 21–26. [37] J. A. Crespo, Comentarios sobre nuevas localidades para mamı́feros de Argentina y de bolivia, Rev. Mus. Argent. Cienc. Nat., 11 (1974), pp. 1–31. [38] C. M. Crockett, Conservation biology of the genus Alouatta, Int. J. Primatol., 19 (1998), pp. 549–578. [39] P. Danecek, A. Auton, G. Abecasis, C. A. Albers, E. Banks, M. A. DePristo, R. E. Handsaker, G. Lunter, G. T. Marth, S. T. Sherry, G. McVean, R. Durbin, and 1000 Genomes Project Analysis Group, The variant call format and VCFtools, Bioinformatics, 27 (2011), pp. 2156–2158. 93 [40] M. S. Di Bitetti, G. Placci, A. D. Brown, and D. I. Rode, Conservation and population status of the brown howling monkey (Alouatta fusca clamitans) in Argentina, Neotrop. Primates, 2 (1994), pp. 1–4. [41] B. P. dos Santos, J. V. Valverde, P. Rohr, O. A. Monticielo, J. C. T. Brenol, R. M. Xavier, and J. A. B. Chies, TLR7/8/9 polymorphisms and their associations in systemic lupus erythematosus patients from southern Brazil, Lupus, 21 (2012), pp. 302– 309. [42] O. Dudchenko, S. S. Batra, A. D. Omer, S. K. Nyquist, M. Hoeger, N. C. Durand, M. S. Shamim, I. Machol, E. S. Lander, A. P. Aiden, and E. L. Aiden, De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds, Science, 356 (2017), pp. 92–95. [43] S. K. Dutta and A. Tripathi, Association of toll-like receptor polymorphisms with susceptibility to chikungunya virus infection, Virology, 511 (2017), pp. 207–213. [44] M. El-Bendary, M. Neamatallah, H. Elalfy, T. Besheer, A. Elkholi, M. ElDiasty, M. Elsareef, M. Zahran, B. El-Aarag, A. Gomaa, D. Elhammady, M. ElSetouhy, A. Hegazy, and G. Esmat, The association of single nucleotide polymorphisms of toll-like receptor 3, toll-like receptor 7 and toll-like receptor 8 genes with the susceptibility to HCV infection, Br. J. Biomed. Sci., 75 (2018), pp. 175–181. [45] D. Enard, L. Cai, C. Gwennap, and D. A. Petrov, Viruses are a dominant driver of protein adaptation in mammals, Elife, 5 (2016), p. e12469. [46] D. Enard, F. Depaulis, and H. R. Crollius, Human and non-human primate genomes share hotspots of positive selection, PLoS Genet., 6 (2010), p. e1000840. [47] F.-Z. Fakhir, M. Lkhider, W. Badre, R. Alaoui, E. F. Meurs, P. Pineau, S. Ezzikouri, and S. Benjelloun, Genetic variations in toll-like receptors 7 and 8 modulate natural hepatitis C outcomes and liver disease progression, Liver Int., 38 (2018), pp. 432–442. [48] N. R. Faria and et al., Genomic and epidemiological monitoring of yellow fever virus transmission potential, Science, 361 (2018), pp. 894 –899. [49] A. D. Fiore, A rapid genetic method for sex assignment in non-human primates, Conserv. Genet., 6 (2006), pp. 1053–1058. [50] M. A. Frantzen, J. B. Silk, J. W. Ferguson, R. K. Wayne, and M. H. Kohn, Empirical evaluation of preservation methods for faecal DNA, Mol. Ecol., 7 (1998), pp. 1423–1428. [51] S. D. W. Frost, Y. Liu, S. L. K. Pond, C. Chappey, T. Wrin, C. J. Petropoulos, S. J. Little, and D. D. Richman, Characterization of human immunodeficiency virus type 1 (HIV-1) envelope variation and neutralizing antibody responses during transmission of HIV-1 subtype B, J. Virol., 79 (2005), pp. 6523–6527. [52] F. Gao, D. Chang, A. Biddanda, L. Ma, Y. Guo, Z. Zhou, and A. Keinan, XWAS: A software toolset for genetic data analysis and association studies of the X chromosome, J. Hered., 106 (2015), pp. 666–671. [53] E. Garrison and G. Marth, Haplotype-based variant detection from short-read sequencing, arXiv [q-bio.GN], (2012), pp. 1–9. 94 [54] S. Goenaga, C. Fabbri, J. C. R. Dueñas, C. N. Gardenal, G. C. Rossi, G. Calderon, M. A. Morales, J. B. Garcia, D. A. Enria, and S. Levis, Isolation of yellow fever virus from mosquitoes in Misiones province, Argentina, Vector Borne Zoonotic Dis., 12 (2012), pp. 986–993. [55] S. Guindon, J.-F. Dufayard, V. Lefort, M. Anisimova, W. Hordijk, and O. Gascuel, New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0, Syst. Biol., 59 (2010), pp. 307–321. [56] W. D. Hamilton, Sex versus non-sex versus parasite, Oikos, 35 (1980), pp. 282–290. [57] K. A. Hanley, T. P. Monath, S. C. Weaver, S. L. Rossi, R. L. Richman, and N. Vasilakis, Fever versus fever: the role of host and vector susceptibility and interspecific competition in shaping the current and future distributions of the sylvatic cycles of dengue virus and yellow fever virus, Infect. Genet. Evol., 19 (2013), pp. 292–311. [58] M. L. S. Harada, H. Schneider, M. P. C. Sampaio, I. Czelusniak, and M. Goodman, DNA evidence on the phylogenetic systematics of New World monkeys: Support for the sister-grouping of Cebus and Saimiri from two unlinked niclear genes, Mol. Phylogenet. Evol., 4 (1995), pp. 331–349. [59] S. Henikoff and J. G. Henikoff, Amino acid substitution matrices from protein blocks, Proc. Natl. Acad. Sci. U. S. A., 89 (1992), pp. 10915–10919. [60] I. Holzmann, I. Agostini, J. I. Areta, H. Ferreyra, P. Beldomenico, and M. S. di Bitetti, Impact of yellow fever outbreaks on two howler monkey species (Alouatta guariba clamitans and A. caraya) in Misiones, Argentina, Am. J. Primatol., 72 (2010), pp. 475–480. [61] A. Iwasaki, Innate immune recognition of HIV-1, Immunity, 37 (2012), pp. 389–398. [62] T. Kaisho and S. Akira, Toll-like receptor function and signaling, J. Allergy Clin. Immunol., 117 (2006), pp. 979–87. [63] T. Kawai and S. Akira, The role of pattern-recognition receptors in innate immunity: Update on toll-like receptors, Nat. Immunol., 11 (2010), pp. 373–384. [64] E. A. Khramtsova, L. K. Davis, and B. E. Stranger, The role of sex in the genomics of human complex traits, Nat. Rev. Genet., 20 (2019), pp. 173–190. [65] S. L. Klein and K. L. Flanagan, Sex differences in immune responses, Nat. Rev. Immunol., 16 (2016), pp. 626–638. [66] S. L. Kosakovsky Pond and S. D. W. Frost, Not so different after all: a comparison of methods for detecting amino acid sites under selection, Mol. Biol. Evol., 22 (2005), pp. 1208–1222. [67] S. L. Kosakovsky Pond, S. D. W. Frost, and S. V. Muse, HyPhy: Hypothesis testing using phylogenies, Bioinformatics, 21 (2005), pp. 676–679. [68] H. Kumar, T. Kawai, and S. Akira, Pathogen recognition by the innate immune system, Int. Rev. Immunol., 30 (2011), pp. 16–34. 95 [69] S. Kumar, G. Stecher, and K. Tamura, MEGA7: Molecular evolutionary genetics analysis version 7.0 for bigger datasets, Mol. Biol. Evol., 33 (2016), pp. 1870–1874. [70] H. W. Kumm and H. W. Laemmert, The geographical distribution of immunity to yellow fever among the primates of Brazil, Am. J. Trop. Med. Hyg., 30 (1950), pp. 733–748. [71] C. A. Lambert, C. F. Connelly, J. Madeoy, R. Qiu, M. V. Olson, and J. M. Akey, Highly punctuated patterns of population structure on the X chromosome and implications for African evolutionary history, Am. J. Hum. Genet., 86 (2010), pp. 34–44. [72] A. Larsson, AliView: A fast and lightweight alignment viewer and editor for large datasets, Bioinformatics, 30 (2014), pp. 3276–3278. [73] C. Lasne, C. M. Sgrò, and T. Connallon, The relative contributions of the X chromosome and autosomes to local adaptation, Genetics, 205 (2017), pp. 1285–1304. [74] H. Li, A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data, Bioinformatics, 27 (2011), pp. 2987–2993. [75] H. Li and R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, 25 (2009), pp. 1754–1760. [76] H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, and 1000 Genome Project Data Processing Subgroup, The sequence alignment/map format and SAMtools, Bioinformatics, 25 (2009), pp. 2078– 2079. [77] Y.-T. Lin, A. Verma, and C. P. Hodgkinson, Toll-like receptors and human disease: Lessons from single nucleotide polymorphisms, Curr. Genomics, 13 (2012), pp. 633–645. [78] C. A. Maclean, N. P. Chue Hong, and J. G. D. Prendergast, hapbin: An efficient program for performing Haplotype-Based scans for positive selection in large genomic datasets, Mol. Biol. Evol., 32 (2015), pp. 3027–3029. [79] S. Mallick, H. Li, M. Lipson, I. Mathieson, M. Gymrek, F. Racimo, M. Zhao, N. Chennagiri, S. Nordenfelt, A. Tandon, P. Skoglund, I. Lazaridis, S. Sankararaman, Q. Fu, N. Rohland, G. Renaud, Y. Erlich, T. Willems, C. Gallo, J. P. Spence, Y. S. Song, G. Poletti, F. Balloux, G. van Driem, P. de Knijff, I. G. Romero, A. R. Jha, D. M. Behar, C. M. Bravi, C. Capelli, T. Hervig, A. Moreno-Estrada, O. L. Posukh, E. Balanovska, O. Balanovsky, S. Karachanak-Yankova, H. Sahakyan, D. Toncheva, L. Yepiskoposyan, C. Tyler-Smith, Y. Xue, M. S. Abdullah, A. Ruiz-Linares, C. M. Beall, A. Di Rienzo, C. Jeong, E. B. Starikovskaya, E. Metspalu, J. Parik, R. Villems, B. M. Henn, U. Hodoglugil, R. Mahley, A. Sajantila, G. Stamatoyannopoulos, J. T. S. Wee, R. Khusainova, E. Khusnutdinova, S. Litvinov, G. Ayodo, D. Comas, M. F. Hammer, T. Kivisild, W. Klitz, C. A. Winkler, D. Labuda, M. Bamshad, L. B. Jorde, S. A. Tishkoff, W. S. Watkins, M. Metspalu, S. Dryomov, R. Sukernik, L. Singh, K. Thangaraj, S. Pääbo, J. Kelso, N. Patterson, and D. Reich, The Simons Genome Diversity Project: 300 genomes from 142 diverse populations, Nature, 538 (2016), pp. 201–206. 96 [80] J. N. Mandl, R. Akondy, B. Lawson, N. Kozyr, S. I. Staprans, R. Ahmed, and M. B. Feinberg, Distinctive TLR7 signaling, type I IFN production, and attenuated innate and adaptive immune responses to yellow fever virus in a primate reservoir host, J. Immunol., 186 (2011), pp. 6406–6416. [81] L. Markoff, Yellow fever outbreak in sudan, N. Engl. J. Med., 368 (2013), pp. 689–691. [82] S. McCarthy, S. Das, Haplotype Reference Consortium, and et al., A reference panel of 64,976 haplotypes for genotype imputation, Nat. Genet., 48 (2016), pp. 1279–1283. [83] T. Mikami, H. Miyashita, S. Takatsuka, Y. Kuroki, and N. Matsushima, Molecular evolution of vertebrate toll-like receptors: evolutionary rate difference between their leucinerich repeats and their TIR domains, Gene, 503 (2012), pp. 235–243. [84] Ministério da Saúde, Guia de vigilância de epizootias em primatas não humanos e entomologia aplicada à vigilância da febre amarela, tech. rep., 2014. Accessed: 2019-1-10 from http://bvsms.saude.gov.br/bvs/publicacoes/guia vigilancia epizootias primatas entomologia.pdf. [85] T. P. Monath, R. Nichols, W. T. Archambault, L. Moore, R. Marchesani, J. Tian, R. E. Shope, N. Thomas, R. Schrader, D. Furby, and P. Bedford, Comparative safety and immunogenicity of two yellow fever 17D vaccines (ARILVAX and YF-VAX) in a phase III multicenter, double-blind clinical trial, Am. J. Trop. Med. Hyg., 66 (2002), pp. 533–541. [86] T. P. Monath and P. F. C. Vasconcelos, Yellow fever, J. Clin. Virol., 64 (2015), pp. 160–173. [87] E. S. Moreno, R. Spinola, C. H. Tengan, R. A. Brasil, M. M. Siciliano, T. L. M. Coimbra, V. R. Silveira, I. M. Rocco, I. Bisordi, R. P. de Souza, S. Petrella, L. E. Pereira, A. Y. Maeda, F. G. da Silva, and A. Suzuki, Epizootias de febre amarela em primatas não humanos no estado de são paulo, Brasil, 2008-2009, Rev. Inst. Med. Trop. Sao Paulo, 55 (2013), pp. 45–50. [88] S. V. Muse and B. S. Gaut, A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome, Mol. Biol. Evol., 1 (1994), pp. 715–724. [89] M. Nei, Molecular Evolutionary Genetics, Columbia University Press, 1987. [90] M. Nei and R. K. Chesser, Estimation of fixation indices and gene diversities, Ann. Hum. Genet., 47 (1983), pp. 253–259. [91] D.-Y. Oh, K. Baumann, O. Hamouda, J. K. Eckert, K. Neumann, C. Kücherer, B. Bartmeyer, G. Poggensee, N. Oh, A. Pruss, H. Jessen, and R. R. Schumann, A frequent functional toll-like receptor 7 polymorphism is associated with accelerated HIV-1 disease progression, AIDS, 23 (2009), pp. 297–307. [92] N. Patterson, A. L. Price, and D. Reich, Population structure and eigenanalysis, PLoS Genet., 2 (2006), p. e190. 97 [93] L. Pozzi, J. A. Hodgson, A. S. Burrell, K. N. Sterner, R. L. Raaum, and T. R. Disotell, Primate phylogenetic relationships and divergence dates inferred from complete mitochondrial genomes, Mol. Phylogenet. Evol., 75 (2014), pp. 165–183. [94] F. Prugnolle, A. Manica, M. Charpentier, J. F. Guégan, V. Guernier, F. Balloux, U. Institut, D. Recherche, M. D. Télédétection, and J. F. Breton, Pathogen-driven selection and worldwide HLA class I diversity, Curr. Biol., 15 (2005), pp. 1022–1027. [95] S. Purcell, B. Neale, K. Todd-Brown, L. Thomas, M. A. R. Ferreira, D. Bender, J. Maller, P. Sklar, P. I. W. de Bakker, M. J. Daly, and P. C. Sham, PLINK: A tool set for whole-genome association and population-based linkage analyses, Am. J. Hum. Genet., 81 (2007), pp. 559–575. [96] H. Quach, D. Wilson, G. Laval, E. Patin, J. Manry, J. Guibert, L. B. Barreiro, E. Nerrienet, E. Verschoor, A. Gessain, M. Przeworski, and L. Quintana-Murci, Different selective pressures shape the evolution of toll-like receptors in human and African great ape populations, Hum. Mol. Genet., 22 (2013), pp. 4829–4840. [97] A. R. Quinlan, BEDTools: The Swiss-Army tool for genome feature analysis, Curr. Protoc. Bioinformatics, 47 (2014), pp. 11.12.1–34. [98] J. C. Rassa and S. R. Ross, Viruses and toll-like receptors, Microbes Infect., 5 (2003), pp. 961–968. [99] D. J. Rogers, A. J. Wilson, S. I. Hay, and A. J. Graham, The global distribution of yellow fever and dengue, Adv. Parasitol., 62 (2006), pp. 181–220. [100] A. P. M. Romano, Z. G. A. Costa, D. G. Ramos, M. A. Andrade, V. d. S. Jayme, M. A. B. d. Almeida, K. C. Vettorello, M. Mascheretti, and B. Flannery, Yellow fever outbreaks in unvaccinated populations, Brazil, 2008-2009, PLoS Negl. Trop. Dis., 8 (2014), p. e2740. [101] R. Ronen, G. Tesler, A. Akbari, S. Zakov, N. A. Rosenberg, and V. Bafna, Predicting carriers of ongoing selective sweeps without knowledge of the favored allele, PLoS Genet., 11 (2015), p. e1005527. [102] L. Rosen, Experimental infection of new world monkeys with dengue and yellow fever viruses, Am. J. Trop. Med. Hyg., (1958), pp. 406–410. [103] M. T. Ross, D. V. Grafham, A. J. Coffey, S. Scherer, and et al., The DNA sequence of the human X chromosome, Nature, 434 (2005), pp. 325–337. [104] M. Rotival, J. Pothlichet, Y.-H. Eddie, M. Dannemann, N. Zidane, G. Laval, E. Patin, C. Harmant, M. Lopez, M. Deschamps, N. Naffakh, A. Coen, G. Lerouxroels, A. Boland, J. Kelso, M. L. Albert, and L. Quintana-murci, Genetic adap- tation and Neandertal admixture shaped the immune system of modern human populations, Cell, 167 (2016), pp. 643–656.e17. [105] J. Rozas, A. Ferrer-Mata, J. C. Sánchez-DelBarrio, S. Guirao-Rico, P. Librado, S. E. Ramos-Onsins, and A. Sánchez-Gracia, DnaSP 6: DNA sequence polymorphism analysis of large data sets, Mol. Biol. Evol., 34 (2017), pp. 3299–3302. 98 [106] T. C. Ruch, Diseases of laboratory primates, Philadelphia (& London): WB Saunders Co., 1959. [107] P. C. Sabeti, P. Varilly, B. Fry, J. Lohmueller, and et al., Genome-wide detection and characterization of positive selection in human populations, Nature, 449 (2007), pp. 913–918. [108] E. S. V. Sallis, V. L. R. S. de Barros, S. L. Garmatz, R. A. Fighera, and D. L. Graça, A case of yellow fever in a brown howler (Alouatta fusca) in southern Brazil, J. Vet. Diagn. Invest., 15 (2003), pp. 574–576. [109] J. Schad and C. C. Voigt, Adaptive evolution of virus-sensing toll-like receptor 8 in bats, Immunogenetics, 68 (2016), pp. 783–795. [110] N. Shen, Q. Fu, Y. Deng, X. Qian, J. Zhao, K. M. Kaufman, Y. L. Wu, C. Y. Yu, Y. Tang, J.-Y. Chen, W. Yang, M. Wong, A. Kawasaki, N. Tsuchiya, T. Sumida, Y. Kawaguchi, H. S. Howe, M. Y. Mok, S.-Y. Bang, F.-L. Liu, D.-M. Chang, Y. Takasaki, H. Hashimoto, J. B. Harley, J. M. Guthridge, J. M. Grossman, R. M. Cantor, Y. W. Song, S.-C. Bae, S. Chen, B. H. Hahn, Y. L. Lau, and B. P. Tsao, Sex-specific association of x-linked toll-like receptor 7 (TLR7) with male systemic lupus erythematosus, Proc. Natl. Acad. Sci. U. S. A., 107 (2010), pp. 15838–15843. [111] S. T. Sherry, M. H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski, and K. Sirotkin, dbSNP: The NCBI database of genetic variation, Nucleic Acids Res., 29 (2001), pp. 308–311. [112] M. D. Shriver, G. C. Kennedy, E. J. Parra, H. A. Lawson, V. Sonpar, J. Huang, J. M. Akey, and K. W. Jones, The genomic distribution of population substructure in four populations using 8,525 autosomal SNPs, Hum. Genomics, 1 (2004), pp. 274–286. [113] M. Sironi, R. Cagliani, D. Forni, and M. Clerici, Evolutionary insights into host–pathogen interactions from mammalian sequence data, Nat. Rev. Genet., 16 (2015), pp. 224–236. [114] C. Skevaki, M. Pararas, K. Kostelidou, A. Tsakris, and J. G. Routsias, Single nucleotide polymorphisms of toll-like receptors and susceptibility to infectious diseases, Clin. Exp. Immunol., 180 (2015), pp. 165–177. [115] P. Skoglund, S. Mallick, M. C. Bortolini, N. Chennagiri, T. Hünemeier, M. L. Petzl-Erler, F. M. Salzano, N. Patterson, and D. Reich, Genetic evidence for two founding populations of the Americas, Nature, 525 (2015), pp. 104–108. [116] J. R. Stark, F. Wiklund, H. Grönberg, F. Schumacher, J. A. Sinnott, M. J. Stampfer, L. A. Mucci, and P. Kraft, Toll-like receptor signaling pathway variants and prostate cancer mortality, Cancer Epidemiol. Biomarkers Prev., 18 (2009), pp. 1859– 1863. [117] E. R. Steinberg, M. Nieves, and M. D. Mudry, Multiple sex chromosome systems in howler monkeys (platyrrhini, Alouatta), Comp. Cytogenet., 8 (2014), pp. 43–69. [118] M. E. Steiper and M. Ruvolo, New world monkey phylogeny based on x-linked G6PD DNA sequences, Mol. Phylogenet. Evol., 27 (2003), pp. 121–130. 99 [119] J. Sullivan and P. Joyce, Model selection in phylogenetics, Annu. Rev. Ecol. Evol. Syst., 36 (2005), pp. 445–466. [120] D. L. Swofford, PAUP*. Phylogenetic analysis using parsimony (*and other methods), 2003. Version 4. Sinauer Associates, Sunderland, Massachusetts. [121] O. Takeuchi and S. Akira, Innate immunity to virus infection, Immunol. Rev., 227 (2009), pp. 75–86. [122] S. H. Tuboi, Z. G. A. Costa, P. F. da Costa Vasconcelos, and D. Hatch, Clinical and epidemiological characteristics of yellow fever in Brazil: Analysis of reported cases 19982002, Trans. R. Soc. Trop. Med. Hyg., 101 (2007), pp. 169–175. [123] G. A. Van der Auwera, M. O. Carneiro, C. Hartl, R. Poplin, G. Del Angel, A. Levy-Moonshine, T. Jordan, K. Shakir, D. Roazen, J. Thibault, E. Banks, K. V. Garimella, D. Altshuler, S. Gabriel, and M. A. DePristo, From FastQ data to high confidence variant calls: the genome analysis toolkit best practices pipeline, Curr. Protoc. Bioinformatics, 43 (2013), pp. 11.10.1–33. [124] P. F. Vasconcelos, Z. G. Costa, E. S. Travassos Da Rosa, E. Luna, S. G. Rodrigues, V. L. Barros, J. P. Dias, H. A. Monteiro, O. F. Oliva, H. B. Vasconcelos, R. C. Oliveira, M. R. Sousa, J. Barbosa Da Silva, A. C. Cruz, E. C. Martins, and J. F. Travassos Da Rosa, Epidemic of jungle yellow fever in Brazil, 2000: Implications of climatic alterations in disease spread, J. Med. Virol., 65 (2001), pp. 598–604. [125] P. F. C. Vasconcelos and T. P. Monath, Yellow fever remains a potential threat to public health, Vector Borne Zoonotic Dis., 16 (2016), pp. 566–567. [126] D. Vezzani and A. E. Carbajo, Aedes aegypti, Aedes albopictus, and dengue in Argentina: Current knowledge and future directions, Mem. Inst. Oswaldo Cruz, 103 (2008), pp. 66–74. [127] B. Vicoso and B. Charlesworth, Evolution on the X chromosome: Unusual patterns and processes, Nat. Rev. Genet., 7 (2006), pp. 645–653. [128] B. F. Voight, S. Kudaravalli, X. Wen, and J. K. Pritchard, A map of recent positive selection in the human genome, PLoS Biol., 4 (2006), pp. 0446–0458. [129] J. Wakeley, The variance of pairwise nucleotide differences in two populations with migration, Theor. Popul. Biol., 49 (1996), pp. 39–57. [130] C. H. Wang, H. L. Eng, K. H. Lin, C. H. Chang, C. A. Hsieh, Y. L. Lin, and T. M. Lin, TLR7 and TLR8 gene variations and susceptibility to hepatitis C virus infection, PLoS One, 6 (2011), p. e26235. [131] T. H. Webster, M. Couse, B. M. Grande, E. Karlins, T. N. Phung, P. A. Richmond, W. Whitford, and M. A. Wilson Sayres, Identifying, understanding, and correcting technical biases on the sex chromosomes in next-generation sequencing data, bioRxiv, (2018), p. 346940. [132] T. H. Webster and M. A. Wilson Sayres, Genomic signatures of sex-biased demography: Progress and prospects, Curr. Opin. Genet. Dev., 41 (2016), pp. 62–71. 100 [133] T. Wei, J. Gong, F. Jamitzky, W. M. Heckl, R. W. Stark, and S. C. Rössle, Homology modeling of human toll-like receptors TLR7, 8, and 9 ligand-binding domains, Protein Sci., 18 (2009), pp. 1684–1691. [134] D. Werling, O. C. Jann, V. Offord, E. J. Glass, and T. J. Coffey, Variation matters: TLR structure and species-specific pathogen recognition, Trends Immunol., 30 (2009), pp. 124–130. [135] A. L. Wise, L. Gyi, and T. A. Manolio, eXclusion: Toward integrating the X chromosome in genome-wide association analyses, Am. J. Hum. Genet., 92 (2013), pp. 643–647. [136] G. Wlasiuk and M. W. Nachman, Adaptation and constraint at toll-like receptors in primates, Mol. Biol. Evol., 27 (2010), pp. 2172–2186. [137] World Health Organization, Yellow fever, (2019). Accessed: 2019-1-9 from https://www.who.int/csr/don/archive/disease/yellow fever/en/. [138] W. Xiao, Z. Liu, J. Lin, J. Li, K. Wu, Y. Ma, C. Xiong, Y. Gong, and Z. Liu, Association of toll-like receptor 7 and 8 gene polymorphisms with Graves’ disease in Chinese Cantonese population, Tissue Antigens, 85 (2015), pp. 29–34. [139] X. Yi, Y. Liang, E. Huerta-Sanchez, X. Jin, Z. X. P. Cuo, J. E. Pool, X. Xu, H. Jiang, N. Vinckenbosch, T. S. Korneliussen, H. Zheng, T. Liu, W. He, K. Li, R. Luo, X. Nie, H. Wu, M. Zhao, H. Cao, J. Zou, Y. Shan, S. Li, Q. Yang, Asan, P. Ni, G. Tian, J. Xu, X. Liu, T. Jiang, R. Wu, G. Zhou, M. Tang, J. Qin, T. Wang, S. Feng, G. Li, Huasang, J. Luosang, W. Wang, F. Chen, Y. Wang, X. Zheng, Z. Li, Z. Bianba, G. Yang, X. Wang, S. Tang, G. Gao, Y. Chen, Z. Luo, L. Gusang, Z. Cao, Q. Zhang, W. Ouyang, X. Ren, H. Liang, H. Zheng, Y. Huang, J. Li, L. Bolund, K. Kristiansen, Y. Li, Y. Zhang, X. Zhang, R. Li, S. Li, H. Yang, R. Nielsen, J. Wang, and J. Wang, Sequencing of 50 human exomes reveals adaptation to high altitude, Science, 329 (2010), pp. 75–78. [140] P. M. d. A. Zanotto, E. A. Gould, G. F. Gao, P. H. Harvey, and E. C. Holmes, Population dynamics of flaviviruses revealed by molecular phylogenies, Proc. Natl. Acad. Sci., 93 (1996), pp. 548–553. [141] X. Zhou, X. Meng, Z. Liu, J. Chang, B. Wang, M. Li, P. O.-T. Wengel, S. Tian, C. Wen, Z. Wang, P. A. Garber, H. Pan, X. Ye, Z. Xiang, M. W. Bruford, S. V. Edwards, Y. Cao, S. Yu, L. Gao, Z. Cao, G. Liu, B. Ren, F. Shi, Z. Peterfi, D. Li, B. Li, Z. Jiang, J. Li, V. N. Gladyshev, R. Li, and M. Li, Population genomics reveals low genetic diversity and adaptation to hypoxia in snub-nosed monkeys, Mol. Biol. Evol., 33 (2016), pp. 2670–2681. |
| Reference URL | https://collections.lib.utah.edu/ark:/87278/s6zq07f5 |



