| Title | The molecular basis of hybrid incompartibility and detecting recurrent positive selection in genomic data |
| Publication Type | dissertation |
| School or College | College of Science |
| Department | Biological Sciences |
| Author | Cooper, Jacob Carter |
| Date | 2019 |
| Description | All organisms must adapt to their environment to prosper. The measure of this in all organisms is fitness, the capacity to transmit their genetic information to the next generation. In some cases, the fitness of two organisms can be at odds with one another. This may happen both between two species and within a single species. When this occurs, the conflict that ensues drives some of the most rapid changes in all of evolution. These changes leave signals in the genome that are the consequence of their tumultuous past and point directly to the specific innovations that were chosen through the course of evolution to improve the fitness of a population. By studying these evolutionary conflicts, the specific details of these changes not only illuminate the course of evolutionary history but also further the understanding of the mechanisms of the genes under selection. Here, I will cover my work on evolutionary conflicts in two main areas. The first focuses on the conflict of the genomes of two closely related species, and their inability to hybridize. I investigate the genetics and molecular biology of this hybrid incompatibility to understand how two perfectly fit parents can fail to produce hybrid offspring. The second is centered on detecting a hallmark of evolutionary conflicts, recurrent positive selection, at the genomic scale. In these chapters I show how sperm channels from distant taxa have experienced similar selective pressures, indicating that similar evolutionary strategies are common over a wide range of conditions. I also conduct a genome-wide scan for recurrent positive selection in six clades of mammals, and present results that show that recurrent positive selection can target the same molecular interface over long stretches of evolutionary history. Together, this work provides two comprehensive examples of the impact that evolutionary conflict has on shaping the living world. |
| Type | Text |
| Publisher | University of Utah |
| Dissertation Name | Doctor of Philosophy |
| Language | eng |
| Rights Management | © Jacob Carter Cooper |
| Format | application/pdf |
| Format Medium | application/pdf |
| ARK | ark:/87278/s60062hb |
| Setname | ir_etd |
| ID | 1709791 |
| OCR Text | Show THE MOLECULAR BASIS OF HYBRID INCOMPATIBILITY AND DETECTING RECURRENT POSITIVE SELECTION IN GENOMIC DATA by Jacob Carter Cooper A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biology School of Biological Sciences The University of Utah August 2019 Copyright © Jacob Carter Cooper 2019 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of Jacob Carter Cooper has been approved by the following supervisory committee members: Nitin Phadnis , Chair May 22, 2019 Date Approved Michael D. Shapiro , Member May 22, 2019 Date Approved Nels C. Elde , Member May 22, 2019 Date Approved Kent G. Golic , Member May 22, 2019 Date Approved Mark M. Metzstein , Member May 22, 2019 Date Approved and by M. Denise Dearing , Chair/Dean of the Department/College/School of Biological Sciences and by David B. Kieda, Dean of The Graduate School. ABSTRACT All organisms must adapt to their environment to prosper. The measure of this in all organisms is fitness, the capacity to transmit their genetic information to the next generation. In some cases, the fitness of two organisms can be at odds with one another. This may happen both between two species and within a single species. When this occurs, the conflict that ensues drives some of the most rapid changes in all of evolution. These changes leave signals in the genome that are the consequence of their tumultuous past and point directly to the specific innovations that were chosen through the course of evolution to improve the fitness of a population. By studying these evolutionary conflicts, the specific details of these changes not only illuminate the course of evolutionary history but also further the understanding of the mechanisms of the genes under selection. Here, I will cover my work on evolutionary conflicts in two main areas. The first focuses on the conflict of the genomes of two closely related species, and their inability to hybridize. I investigate the genetics and molecular biology of this hybrid incompatibility to understand how two perfectly fit parents can fail to produce hybrid offspring. The second is centered on detecting a hallmark of evolutionary conflicts, recurrent positive selection, at the genomic scale. In these chapters I show how sperm channels from distant taxa have experienced similar selective pressures, indicating that similar evolutionary strategies are common over a wide range of conditions. I also conduct a genome-wide scan for recurrent positive selection in six clades of mammals, and present results that show that recurrent positive selection can target the same molecular interface over long stretches of evolutionary history. Together, this work provides two comprehensive examples of the impact that evolutionary conflict has on shaping the living world. iv TABLE OF CONTENTS ABSTRACT ..............................................................................................................iii LIST OF FIGURES .................................................................................................vii Chapters 1. INTRODUCTION ................................................................................................. 1 Molecular Basis of Hybrid Incompatibility ....................................................... 2 Detecting Recurrent Positive Selection in Genomic Data ............................ 10 References..................................................................................................... 12 2. AN ESSENTIAL CELL CYCLE REGULATION GENE CAUSES HYBRID INVIABILITY IN DROSOPHILA ............................................................................. 15 References and Notes ................................................................................... 19 3. ALTERED LOCALIZATION OF HYBRID INCOMPATIBILITY PROTEINS IN DROSOPHILA........................................................................................................ 20 Introduction .................................................................................................... 21 Results ........................................................................................................... 23 Discussion ...................................................................................................... 27 Materials and Methods .................................................................................. 29 Acknowledgements........................................................................................ 32 References..................................................................................................... 32 4. A TRIPLE-HYBRID CROSS REVEALS A NEW HYBRID INCOMPATIBILITY LOCUS BETWEEN D. MELANOGASTER AND D. SECHELLIA......................... 35 Abstract .......................................................................................................... 36 Author Summary ............................................................................................ 37 Introduction .................................................................................................... 37 Results ........................................................................................................... 43 Discussion ...................................................................................................... 53 Materials and Methods .................................................................................. 56 Acknowledgements........................................................................................ 61 References..................................................................................................... 62 5. PARALLEL EVOLUTION OF SPERM HYPER-ACTIVATION CA2+ CHANNELS ............................................................................................................ 67 Abstract .......................................................................................................... 68 Introduction .................................................................................................... 68 Materials and Methods .................................................................................. 69 Results ........................................................................................................... 71 Discussion ...................................................................................................... 76 Supplementary Material................................................................................. 77 Acknowledgements........................................................................................ 77 Literature Cited .............................................................................................. 77 6. RECURRENT RECURRENT POSITIVE SELECTION IN MAMMALIAN PHYLOGENIES ..................................................................................................... 80 Abstract .......................................................................................................... 81 Introduction .................................................................................................... 82 Results ........................................................................................................... 84 Discussion ...................................................................................................... 97 Materials and Methods ................................................................................ 103 References................................................................................................... 106 vi LIST OF FIGURES Figures 2.1 A genomics screen identifies gfzfsim as a hybrid inviability gene. .................. 17 2.2 Knockdown of gfzfsim rescues cell proliferation defects and restores hybrid male viability.. ......................................................................................................... 18 3.1 Three genes are required for F1 male lethality. .............................................. 22 3.2 GFZF localization is different at chromosome ends in D. melanogaster and D. simulans............................................................................................................. 24 3.3 GFZF does not have species specific binding patterns in polytenes. ............ 25 3.4 Aberrant co-localization of GFZF and HMR in F1 hybrids requires GFZF sim. ................................................................................................................. 26 3.5 Over-expression of Hmr and Lhr in D. melanogaster causes HMR binding to GFZF chromatin sites. ........................................................................................... 28 4.1 Hybrid incompatibilities between D. melanogaster and the D. simulans clade. ...................................................................................................................... 39 4.2 gfzf knockdown rescues hybrids with D. simulans and D. mauritiana but not D. sechellia. ............................................................................................................ 45 4.3 RNAi machinery functions in D. melanogaster - D. simulans clade hybrids. . 48 4.4 gfzfsib RNAi construct reduces the expression of gfzfsib in all three species .. 48 4.5 D. sechellia is resistant to hybrid male rescue by gfzfsib RNAi due to two loci .......................................................................................................................... 50 5.1 The entire CatSper complex evolves adaptively in primates ......................... 72 5.2 pkd2 evolves adaptively in Drosophila ............................................................ 74 5.3 A predicted structural model of Drosophila pkd2 shows that nonsynonymous changes between D. melanogaster and D. simulans reside on the extracellular face ......................................................................................................................... 75 5.4 Both intermale sperm competition and female choice can drive the rapid evolution of the sperm hyper-activation channels ................................................. 76 6.1 Phylogeny of six mammalian clades used in this study ................................. 86 6.2 Corsair program architecture........................................................................... 88 6.3 Summary statisitics for Corsair run over six clades of mammals ................... 89 6.4 Pairwise clade analysis.................................................................................... 93 6.5 PIGR sites under selection .............................................................................. 98 viii CHAPTER 1 INTRODUCTION 2 Evolutionary conflict occurs when genomes with competing goals come into contact with each other. Competing interests between two different entities result in some of the most evolutionary dynamic properties of living things, as selection acts decisively to favor winning genotypes and thus sets the new standard for all other genotypes to match. In the study of evolutionary biology, evolutionary conflicts often offer the best handle to understand general evolutionary phenomenon because processes governed by evolutionary conflict have a high degree of difference between closely related species. Here, I will present my work on two different facets of evolutionary conflicts: the molecular basis of hybrid incompatibility in Drosophila, and detecting recurrent positive selection in genomic data in mammals. In this chapter I will provide an introduction to both topics, focusing on general knowledge not covered in the introduction of each specific chapter. In each section, I will comment on the place of my work in the history of the system. Finally, I will address open questions in the field and place my work in the context of these questions. The work presented brings together concepts from genetics, molecular biology, and evolutionary biology to try to understand better how evolutionary conflict shapes the living world. Molecular Basis of Hybrid Incompatibility Speciation is the process by which a single species splits into two new species. Under the biologic species concept, separate species are populations which produce hybrids that are less fit than intraspecies progeny.1 Speciation is, 3 therefore, the result of competing interests of the two parental genomes, which have evolved such that they now contain genes that make sub-optimal offspring when brought together. The genes that cause this phenomenon are hybrid incompatibility genes, as they are the genetic basis for the decreased fitness of the hybrids. When two populations remain isolated, they have the potential to become new species. As these populations diversify from one another, they will independently accumulate genetic diversity. If the genetic changes that rise to fixation in the first population are incompatible with the changes that rise to fixation in the second population, then hybrids between the two populations will suffer a decrease in fitness – a model known as the Bateson-Dozhansky-Muller model of hybrid incompatibility.2 In well-isolated species, these hybrid incompatibilities take the form of first generation (F1) hybrid sterility or hybrid inviability. Though hybrid incompatibles are the basis for segregation of nearly all species, there are still few examples of hybrid incompatibility genes and no explanation of the molecular mechanism by which they cause sterility or lethality. Drosophila has been at the center of speciation genetics for a century.3 Since the isolation of Drosophila melanogaster and Drosophila simulans in the early days of genetics,4 there has been a concerted effort in evolutionary genetics to understand what genes keep these two species isolated and the molecular mechanism of that isolation. The study of these two species has not only provided a handle for understanding the evolution of hybrid incompatibilities in this system but also informed the study of speciation in other species of 4 Drosophila and in other clades of life. The genetics of hybrid incompatibility in the D. melanogaster – D. simulans hybridization D. melanogaster, the common genetic model organism, has closely related sister species called D. simulans. These two species originated in subSaharan Africa, but have since spread over the globe as commensals to human populations.5,6 They are diverged from each other by 2-5 million years,5,7–10 and a nearly morphologically indistinguishable. They are completely isolated genetically by sterility and inviability of their hybrid F1 progeny. D. melanogaster females crossed to D. simulans males produce sterile hybrid F1 females and lethal hybrid F1 males; D. simulans females crossed to D. melanogaster males produce sterile hybrid F1 males and lethal hybrid F1 females.4,11 Though both directions of this incompatibility have been studied, here I will focus solely on the hybrid F1 male lethality. The first breakthrough in understanding the genetics of this system came from an X-ray mutagenesis screen using triploid females and decoupling of compound chromosomes to isolate the combination of chromosomes that is lethal to the hybrid F1 males in the D. melanogaster females to D. simulans male direction of the cross. The results of this screen demonstrated that the X chromosome of D. melanogaster had a dominantly acting lethal interaction with the 2nd and 3rd chromosome from D. simulans.12,13 That is, hybrid F1 male lethality is triggered when the D. melanogaster X (without the D. simulans X) and 5 the D. simulans 2nd and 3rd chromosomes are brought together in the same individual. Subsequently, two hybrid F1 male rescue alleles were recovered from natural populations that explained two of these factors. Lethal hybrid rescue (Lhr) was recovered in D. simulans, maps to the 2nd chromosome, and rescues hybrid F1 males when crossed with D. melanogaster.14 Hybrid male rescue (Hmr) was recovered in D. melanogaster, maps to the X chromosome, and rescues hybrid F1 males when crossed with D. simulans.15 No natural rescue allele was recovered that explains the dominantly acting lethal locus on the 3rd chromosome. These rescue alleles eventually enabled the mapping of the first hybrid incompatibility genes in this system. Hmr mapped to a MADF domain containing gene which is rapidly evolving between D. melanogaster and D. simulans.16 Lhr was mapped to Heterochromatin Protein 3 and was also found to be rapidly evolving between D. melanogaster and D. simulans.17 Interestingly, Hmr has signatures of recurrent rapid evolution in the D. melanogaster group of species, indicating that it has been evolutionary flexible for some time.18 Both Hmr and Lhr are dipteran specific genes, meaning that the molecular processes that govern the D. melanogaster – D. simulans hybrid F1 male lethality are likely fly specific. None the less, at the time they were mapped these represented some of the only known hybrid incompatibility genes in any pair of species. Outside of the dominantly acting rescue alleles, two major efforts have been made to expand the understanding of the genetic architecture of this hybrid incompatibility. The first made use of the Lhr mutation and the tool kit of 6 chromosomal deletions in D. melanogaster to identify recessive factors from D. simulans that contributed to hybrid F1 male lethality.19 In this scheme, males that were rescued by the Lhr mutation were tested for rescue over a deletion from D. melanogaster, thus exposing any D. simulans alleles that may have recessive effects on hybrid F1 male viability. This study identified more 40 regions of the D. simulans genome that has a complete or partial effect on F1 male viability. In a second attempt to reveal additional hybrid incompatibility genes, a similar series of chromosomal deletions from D. melanogaster were crossed to D. simulans to see if any would rescue hybrid F1 males in the absence of any other rescue alleles.20 This effort did not identify any additional hybrid rescue alleles. The conclusion from both of these works is that while Hmr represents the only dominantly acting hybrid male rescue allele in D. melanogaster, there are many additional recessively acting alleles that could interact to cause hybrid male lethality. Taken together, these works say something crucial about this system of hybrid incompatibility – studying it can further the understanding of hybrid inviability, but it is certainly not the primary barrier to the intermixing of these two species. Given the extensive sterility observed in the presence of hybrid rescue alleles (rescued hybrid F1 males are sterile, completely lacking testis), this system represents a late stage in the processes of speciation. While the potential to rescue hybrid F1 male viability opens the possibility of understanding very strongly isolating barriers, it is not amenable to understanding the early events of speciation. Rather, the D. melanogaster – D. simulans hybrid F1 male lethality 7 represents the end product generated over many generations of divergence. The molecular biology of hybrid incompatibility in the D. melanogaster D. simulans hybridization The promise of studying the molecular genetics of hybrid incompatibilities is not just to identify the that the genes involved, but that the fundamental actions of those genes could illuminate the essential intrinsic properties that diverge between two species to cause hybrid incompatibilities. As mentioned before, the conflict in interspecies hybrids represents incompatibilities in the normal development of the two parental species, even when those species are closely related. In the specific case of hybrid F1 male lethality between D. melanogaster and D. simulans, the developmental defect occurs when hybrid F1 males fail to grow past the 2nd larval instar stage, and arrest as late L2 larvae or small early L3 larvae.21–23 This is a critical juncture in larval development; during the L3 growth stage, larvae expand their body size many times over and build up the energy and resources required for pupation.24 This period is marked by an increase in cell division of diploid tissues that will go on to form the adult fly. Hybrid F1 males exhibit a stall in this cell division, with males displaying far fewer mitotic figures than are common for this stage.22,23 In studying the molecular properties of hybrid incompatibility genes in this system, one of the enduring questions has been to uncover the primary defect that leads to this stall in cell division and ultimately the growth of the larva. 8 After the identification of Hmr and Lhr, one of the primary focuses of the field became to describe what these genes do at the cellular and molecular level to cause hybrid incompatibility. Both Hmr and Lhr produce nuclear-localized proteins (HMR and LHR).17,25 HMR and LHR physically interact with each other in the cell and form an interaction with Heterochromatin Protein 1.17,25 In yeast 2hybrid assays, HMR and LHR suppress gene expression 25 – in homozygous loss of function mutants for either gene, there is a large increase in many families of transposable elements as observed by RNA-seq.25,26 Therefore, the broadest useful characterization of HMR and LHR are that they are chromatin associated transcriptional repressors. The exact nature of the localization of HMR and LHR has been a matter of some debate. In polytene chromosomes, HMR and LHR localize to the chromocenter (centromere) and a few discrete locations through the rest of the genome 25 (also, see Chapter 3). In the nurse cells of ovaries, HMR and LHR are found in dense pockets of heterochromatin, further supporting that they play a role in transcriptional repression.26 However, in mitotically active neuroblasts there is no indication that HMR or LHR localize to the centromere. 27 Additionally, both in S2 cell culture and in mitotic cells, loss of Hmr and Lhr cause a lag in separation during anaphase in cell division.25,27 These observations suggest that HMR and LHR either have two distinct molecular roles or that their heterochromatin associated function is central to more than one process (i.e., transcriptional repression and cell division). It is not yet clear which of these effects are direct or indirect, or which might be more important to the phenotype 9 of hybrid F1 male lethality. Given the properties of HMR and LHR, one proposed mechanism for hybrid F1 male lethality is the broad-scale mis-regulation of transposable elements, leading to a mutational meltdown like scenario.25,26 Under this model, greatly elevated rates of mutation kill hybrid F1 males. While a tempting explanation, it is likely incorrect because of two empirical observations. First, there does not appear to be a significant difference between the increase in transposable element expression in hybrid F1 males and hybrid F1 females,25,26 meaning it is unclear why males would be in peril while females are not. Second, either sex of F1 hybrids does not appear to have an increased mutational load, as measured by quantifying the frequency of clonal patches that uncover heterozygous recessive visible markers.22 Therefore, if the transcriptional regulatory functions of HMR and LHR play a key role in the hybrid, it is likely more complex than a loss of control of the transposable elements in the genome. Looking forward and the structure of this dissertation The largest remaining objective in this field is to connect the molecular function of HMR and LHR to the phenotype of hybrid F1 male lethality in larval development. As the current understanding of these hybrid incompatibility genes does lend an immediate explanation, there are two useful lines of inquiry. First, Hmr and Lhr likely to not act in the absence of any other variation to cause hybrid F1 male lethality. Finding additional hybrid incompatibility genes or genes that modify the action of dominantly acting hybrid incompatibility genes may be useful 10 for further studying the phenotype. Second, additional characterization of the molecular properties of these hybrid incompatibility genes may open new lines of investigation. A better understanding of the processes that these genes participate in may allow for much finer, testable hypotheses as to their mechanism of killing hybrid F1 males. Here, I present three chapters that focus on this question. In Chapter 2, I present published work on identifying a new hybrid incompatibility gene. In Chapter 3, I present published work that focuses on understanding the interaction between this new gene and Hmr. In Chapter 4, I present a genetic mapping approach that uses interspecies variation to identify a major modifier of hybrid F1 male lethality. Together, this work furthers our understanding of how the conflict between the genomes of these two species causes hybrid F1 male lethality. Detecting Recurrent Positive Selection in Genomic Data While hybrid incompatibilities represent a genetic conflict between two closely related species, there are many other types of genetic conflict that shape evolution. Adaptation to the environment, interactions between a pathogen and its host, and sexual competition are all examples of pressures that cause genes to change rapidly between different species. Positive selection – selection on a gene that improves the host fitness – is an important concept to study because it gives insight into which evolutionary forces have been important for shaping the evolution of a species. 11 Recurrent positive selection is the case where the same gene experiences positive selection in several closely related species. In essence, it occurs in circumstances where changing to a new strategy is often advantageous for fitness. Recurrent positive selection can be the product of adaptation to a moving target. This target might be a quickly fluctuating environment, but is often the result of inter-genomic conflict and a counter-adapting genome. This back and forth battle between two genomes that plays out at the level of protein-coding changes is referred to as a molecular arms race.28 Additionally, studying how recurrent positive selection alters the function of genes can be informative for understanding the molecular and cellular properties of genes without conducting a lengthily genetic screen. Molecular arms races play out at the interfaces of two proteins, usually following a pattern of adaptation for either recognition of a target or evasion of an inhibitor. For example, this might take the form of host immune proteins changing to match the capsid of a virus 29 or the seminal peptides of a male adapting to induce a females mating response.30 In both these examples, the key insight to unraveling the underlying biologic process was to use natural variation as a tool to understand how genes might interact with each other or their environment. Recurrent positive selection is a useful concept in this sense because it often creates some of the largest differences in natural variation between species. In the following chapters, I will cover my work on detecting recurrent positive selection in two separate systems. In Chapter 5, I present my published work that brings together reproductive biology in flies and in primates. In this 12 instance, an analysis of the calcium ion channels that govern sperm motility reveals that they have very similar patterns of positive selection, despite being nonorthologous genes. In Chapter 6, I present my work to scan for recurrent positive selection at the genome-wide scale in six different clades of mammals. This work provides many new examples of recurrent positive selection and highlights a mdolecular interface that has likely been the target of a hostpathogen molecular arms race for more than 100 million years. Together, this work highlights how studying recurrent positive selection can make sense of intergenomic conflicts and the signatures they leave in the genome. References 1. Mayr, E. The Growth of Biological Thought: Diversity, Evolution, and Inheritance. (Harvard University Press, 1982). 2. Orr, H. A. Dobzhansky, Bateson, and the Genetics of Speciation. Genetics 144, 1331–1335 (1996). 3. Barbash, D. A. Ninety Years of Drosophila melanogaster Hybrids. Genetics 186, 1–8 (2010). 4. Sturtevant, A. H. A New Species Closely Resembling Drosophila melanogaster. Psyche J. Entomol. 26, 153–155 (1919). 5. Lachaise, D. et al. Historical Biogeography of the Drosophila melanogaster Species Subgroup in Evolutionary Biology (eds. Hecht, M. K., Wallace, B. & Prance, G. T.) 159–225 (Springer US, 1988). doi:10.1007/978-14613-0931-4_4 6. Pool, J. E. et al. Population Genomics of Sub-Saharan Drosophila melanogaster: African Diversity and Non-African Admixture. PLOS Genet. 8, e1003080 (2012). 7. Ballard, J. W. O. Sequential Evolution of a Symbiont Inferred From the Host: Wolbachia and Drosophila simulans. Mol. Biol. Evol. 21, 428–442 (2004). 13 8. Dean, M. D. & Ballard, J. W. O. Linking Phylogenetics with Population Genetics to Reconstruct the Geographic Origin of a Species. Mol. Phylogenet. Evol. 32, 998–1009 (2004). 9. Baudry, E., Derome, N., Huet, M. & Veuille, M. Contrasted Polymorphism Patterns in a Large Sample of Populations From the Evolutionary Genetics Model Drosophila simulans. Genetics 173, 759–767 (2006). 10. Kopp, A., Frank, A. & Fu, J. Historical Biogeography of Drosophila simulans Based on Y-chromosomal Sequences. Mol. Phylogenet. Evol. 38, 355– 362 (2006). 11. Quackenbush, L. S. Unisexual Broods of Drosophila. Science 32, 183– 185 (1910). 12. Muller, H. J. & Pontecorvo, G. Recombinants between Drosophila Species the F1 Hybrids of which are Sterile. (1940). Available at: http://www.nature.com/nature/journal/v146/n3693/abs/146199b0.html. (Accessed: 10th March 2016) 13. Pontecorvo, G. Viability Interactions Between Chromosomes of Drosophila melanogaster and Drosophila simulans. J. Genet. 45, 51–66 (1943). 14. Watanabe, T. K. A Gene that Rescues the Lethal Hybrids Between. Drosophila melanogaster and D. simulans. Jpn. J. Genet. 54, 325–331 (1979). 15. Hutter, P. & Ashburner, M. Genetic Rescue of Inviable Hybrids Between Drosophila melanogaster and its Sibling Species. Nature 327, 331–333 (1987). 16. Barbash, D. A., Siino, D. F., Tarone, A. M. & Roote, J. A Rapidly Evolving MYB-related Protein Causes Species Isolation in Drosophila. Proc. Natl. Acad. Sci. 100, 5302–5307 (2003). 17. Brideau, N. J. et al. Two Dobzhansky-Muller Genes Interact to Cause Hybrid Lethality in Drosophila. Science 314, 1292–1295 (2006). 18. Maheshwari, S., Wang, J. & Barbash, D. A. Recurrent Positive Selection of the Drosophila Hybrid Incompatibility Gene Hmr. Mol. Biol. Evol. 25, 2421– 2430 (2008). 19. Presgraves, D. C. A Fine-Scale Genetic Analysis of Hybrid Incompatibilities in Drosophila. Genetics 163, 955–972 (2003). 20. Cuykendall, T. N. et al. A Screen for F1 Hybrid Male Rescue Reveals No Major-Effect Hybrid Lethality Loci in the Drosophila melanogaster Autosomal Genome. G3 GenesGenomesGenetics 4, 2451–2460 (2014). 14 21. Sánchez, L. & Dübendorfer, A. Development of Imaginal Discs from Lethal Hybrids Between Drosophila melanogaster and Drosophila mauritiana. Wilhelm Rouxs Arch. Dev. Biol. 192, 48–50 (1983). 22. Orr, A., Madden, L. D., Coyne, J. A., Goodwin, R. & Hawley, R. S. The Developmental Genetics of Hybrid Inviability: A Mitotic Defect in Drosophila Hybrids. Genetics 1031–1040 (1997). 23. Bolkan, B. J., Booker, R., Goldberg, M. L. & Barbash, D. A. Developmental and Cell Cycle Progression Defects in Drosophila Hybrid Males. Genetics 177, 2233–2241 (2007). 24. Mirth, C. K., Truman, J. W. & Riddiford, L. M. The Ecdysone Receptor Controls the Post-Critical Weight Switch to Nutrition-Independent Differentiation in Drosophila Wing Imaginal Discs. Development 136, 2345–2353 (2009). 25. Thomae, A. W. et al. A Pair of Centromeric Proteins Mediates Reproductive Isolation in Drosophila Species. Dev. Cell 27, 412–424 (2013). 26. Satyaki, P. R. V. et al. The Hmr and Lhr Hybrid Incompatibility Genes Suppress a Broad Range of Heterochromatic Repeats. PLoS Genet 10, e1004240 (2014). 27. Blum, J. A. et al. The Hybrid Incompatibility Genes Lhr and Hmr Are Required for Sister Chromatid Detachment During Anaphase but Not for Centromere Function. Genetics 207, 1457–1472 (2017). 28. Van Valen, L. A New Evolutionary Law. Evol. Theory 1, 1–30 (1973). 29. Sawyer, S. L., Wu, L. I., Emerman, M. & Malik, H. S. Positive Selection of Primate TRIM5α Identifies a Critical Species-Specific Retroviral Restriction Domain. Proc. Natl. Acad. Sci. U. S. A. 102, 2832–2837 (2005). 30. Clark, N. L., Alani, E. & Aquadro, C. F. Evolutionary Rate Covariation Reveals Shared Functionality and Coexpression of Genes. Genome Res. 22, 714–720 (2012). CHAPTER 2 AN ESSENTIAL CELL CYCLE REGULATION GENE CAUSES HYBRID INVIABILITY IN DROSOPHILA Reprinted with permission from An Essential Cell Cycle Regulation Gene Causes Hybrid Inviability In Drosophila. Science, 350(6267), Nitin Phadnis, Emily ClaireBaker, Jacob C Cooper, Kimberly Frizzel, Emily Hsieh, Aida Flor A. de la Cruz, Jay Shendure, Jacob O. Kitzman, and Harmit S. Malik, 1552-1555. AAAS, Copyright 2015. 16 17 18 19 CHAPTER 3 ALTERED LOCALIZATION OF HYBRID INCOMPATIBILITY PROTEINS IN DROSOPHILA Reprinted with permission from Molecular Biology and Evolution, Oxford University Press. Jacob C. Cooper, Andrea Lukacs, Shelley Reich, Tamas Schauer, Axel Imhof, and Nitin Phadnis, Copyright 2019 21 22 23 24 25 26 27 28 29 30 31 32 33 34 CHAPTER 4 A TRIPLE-HYBRID CROSS REVEALS A NEW HYBRID INCOMPATIBILITY LOCUS BETWEEN D. MELANOGASTER AND D. SECHELLIA 36 Abstract Hybrid incompatibilities are the result of negative interactions between divergent genes of two species. In Drosophila, hybrid F1 males from crosses between females from D. melanogaster and males from the D. simulans clade (D. simulans, D. mauritiana, D. sechellia) fail to develop past larval development. When attempting to rescue hybrid F1 males by depleting the incompatible allele of gfzf, a previously identified hybrid incompatibility gene, we observed robust rescue in crosses of D. melanogaster to D. simulans or D. mauritiana, but no rescue in crosses to D. sechellia. We leverage this variation to investigate the genetic basis of D. sechellia resistance to hybrid male rescue by designed a triple-hybrid cross to generate recombinant D. sechellia / D. simulans genotypes. We tested the ability of these recombinant genotypes to rescue hybrid males with D. melanogaster, and used whole genome sequencing to measure the D. sechellia / D. simulans allele frequency of viable F1 males. We found that recombinant genotypes rescued hybrid males when they received two specific loci from their D. simulans grandparent – the first region contains the previously identified Lethal hybrid rescue (Lhr), and the second is a region of chromosome 3L previously unknown to affect hybrid male rescue. Our results show that the genetic basis for the recent evolution of this hybrid incompatibility is a simple rather than a highly dispersed effect. Further, these data suggest that fixation of differences at Lhr after the split of the D. simulans clade strengthened the hybrid incompatibility between D. sechellia and D. melanogaster. 37 Author Summary Hybrid incompatibility genes keep species reproductively isolated from each other. They are the end product of one species splitting into two new lineages, and are therefore central to the formation and maintenance of new species. Since hybrid incompatibilities are fixed between species, there is often little variation within species that can be leveraged to understand how hybrid incompatibility genes change over time or the networks of genes that they interact with. We find that two closely related fly species, D. simulans and D. sechellia, have a large difference in the effect of a hybrid male rescue when crossed with D. melanogaster. By generating triple hybrids with different mixtures of D. simulans and D. sechellia genomes, we use this variation in hybrid male rescue to uncover the genetic architecture that led to a stronger hybrid incompatibility. The stronger hybrid incompatibility is due to two major effect loci – one that maps to a previously known hybrid incompatibility gene, and one that maps to a new locus. These findings point to few changes of large effect as a method of altering hybrid incompatibilities, and highlight the power of using the interspecies variation of multiple species to map interacting genes. Introduction The evolution of reproductive isolation barriers such as hybrid inviability is a key step in the origins of new species.1 Intrinsic reproductive barriers such as hybrid inviability are caused by hybrid incompatibilities, which are deleterious genetic interactions between the genomes of parental species. Understanding 38 the nature of hybrid incompatibilities can provide insights into how genomes evolve such that two wild type parental genomes interact to produce dysfunctional hybrids. Despite the central role of this problem in evolutionary biology, we still understand very little about the genetic architecture and the genes that underlie hybrid incompatibilities. Our best understanding about the genetic architecture and the genes that underlie hybrid incompatibilities comes from studying crosses between the model genetic system Drosophila melanogaster and its closest sister species D. simulans. Crosses between D. melanogaster females and D. simulans males produce lethal hybrid F1 males. Research into understanding the genetic basis of this hybrid F1 male lethality has a rich, 100-year history that highlights creative and often surprising genetic approaches.2 These diverse approaches span a combination of classical genetic tools, X-rays, chemical mutagenesis, and the isolation of natural rescue alleles. The approaches have so far identified three hybrid incompatibility genes required for hybrid F1 male lethality between D. melanogaster and D. simulans – Hybrid male rescue (Hmr), Lethal hybrid rescue (Lhr), and GST-containing FLYWCH Zinc-Finger protein (gfzf).3–8 In hybrid F1 males, only one allele of each of these three genes is incompatible Hmr mel, Lhr sim, and gfzf sim (Fig 4.1 A). Loss of any single incompatible allele is sufficient to rescue the viability of hybrid F1 males. Hmr and Lhr are heterochromatin associated transcriptional repressors that physically bind to each other and suppress the expression levels of transcripts from many transposable elements and repetitive DNA sequences.9,10 39 A D. melanogaster D. simulans Hmr mel Lhr sim gfzf sim Lhr sim gfzf sim X Hmr mel Hmr mel B ~ 240K years D. simulans D. mauritiana D. sechellia Lhr sim gfzf sim Hmr mel + Lhr sim + gfzf sim = dead D. melanogaster ~ 2.5 mil years Figure 4.1. Hybrid incompatibilities between D. melanogaster and the D. simulans clade (A) Schematic of the hybrid incompatibility genes between D. melanogaster and D. simulans (B) Cladogram for the D. melanogaster – D. simulans clade relationship. 40 gfzf is a general transcriptional co-activator for approximately 1,700 genes whose expression is controlled by TATA-less promoters.11 In addition, Hmr mislocalizes to many gfzf-bound chromatin sites across the genome in F1 hybrids.12 Together, these results indicate that both proteins can interact with many loci across the genome in pure species and in hybrids, suggesting that genetic network of these hybrid incompatibility genes may involve many distributed interacting partners. Although the identities of these three hybrid incompatibility genes has now been established, a comprehensive explanation how they contribute to hybrid lethality has remained elusive. Although much of the effort to understand the genetic architecture of hybrid F1 male lethality has focused on crosses between D. melanogaster and D. simulans, the Drosophila simulans clade contains three closely related species – D. simulans, D. mauritiana, and D. sechellia. These three species diverged from D. melanogaster between around 3 million years ago,13–17 and from each other approximately 240 thousand years ago 18–20 (Fig 4.1 B). These three species are isolated by complete F1 male sterility in every direction of crossing, though introgression events between some species have been detected.21,22 Crosses between D. melanogaster and any of the three species of the D. simulans clade produce identical patterns of hybrid lethality in both directions of the crosses.23 Moreover, mutations in D. melanogaster Hmr are sufficient to rescue hybrid F1 male viability in crosses between D. melanogaster and any of the three D. simulans clade species.5 These results suggest that the genetic architecture of the hybrid incompatibility may be at least partially shared 41 between all three hybridizations. The extent of Hmr-mediated hybrid male rescue, however, varies between crosses between D. melanogaster and the three D. simulans clade species. In particular, hybrid male rescue between D. melanogaster – D. sechellia occurs at lower rate as compared to that observed between the other two hybridizations.5 However, the genetic basis for the lower rate of hybrid male rescue between D. melanogaster and D. sechellia remains unknown. Here, we show that this pattern of poor hybrid F1 male viability rescue with D. sechellia is also true for gfzf-mediated hybrid rescue. There are at least two potential genetic explanations for why D. sechellia hybrids with D. melanogaster have lower rates of hybrid rescue. First, additional hybrid incompatibility interactors may have evolved that are unique to D. sechellia, that are not shared with D. simulans and D. mauritiana. Second, additional changes in the D. sechellia alleles at the same hybrid incompatibility genes that are shared with D. simulans and D. mauritiana may explain poor hybrid rescue through a more penetrant hybrid incompatibility. If additional hybrid incompatibility interactors have evolved unique to D. sechellia, it is unclear whether the decreased rate of hybrid male rescue is due to few large effect changes or due to many small changes distributed across the genome. Although much of the focus on hybrid incompatibility genes has revealed large effect genes so far, the idea that the poor rescue of D. sechellia hybrids may be due to many distributed changes is not easily dismissed for the following reasons. First, Hmr and gfzf are known to interact with many loci 42 across the genome, representing a highly distributed genetic interaction network. Second, several proposals for the mechanism of hybrid male lethality are consistent with a distributed genetic basis: acting to buffer chromatin or transposable element mediated effects,7 general buffers against lethality,24 or incremental effects on different phases of the cell cycle.25 Finally, the rescue effect of another hybrid rescue system Maternal hybrid rescue (a component of the female embryonic lethality in crosses between D. melanogaster males and D. simulans females) likely appears to involve a distributed, multigenic basis.26 Under such a scenario, resolving the genetic architecture of such interactors is complicated. Here, we use a triple-hybrid cross to dissect the interspecies variation in hybrid rescue between D. melanogaster and the D. simulans clade. We use RNAi mediated knockdown of the D. simulans sibling species allele of gfzf (gfzf sib) to measure the rate of hybrid male rescue between D. melanogaster and the three species of the D. simulans clade. We find that, in contrast to D. simulans and D. mauritiana, there is no rescue of hybrid males between D. melanogaster and D. sechellia. To dissect the genetic architecture underlying this lack of rescue with D. sechellia hybrids, we design a triple-hybrid cross to leverage the variation in rescue between D. simulans and D. sechellia. We find that the lack of male rescue in D. sechellia is due to two dominantly-acting major effect loci. The first locus maps to chromosome 2R at the same genomic coordinates as Lhr. The second locus maps to new location on chromosome 3L, separate from Hmr, Lhr and gfzf, which we name Sechellia aversion to hybrid 43 rescue (Satyr). Our results suggest that the variation in gfzf-mediated rescue between D. simulans and D. sechellia is due to few large effect changes, and indicates that major components of the D. melanogaster – D. simulans clade hybrid incompatibility are yet to be identified. Results The closest species to D. melanogaster belong to the simulans clade, which includes D. simulans, D. mauritiana and D. sechellia. These three species are estimated to have diverged from their last common ancestor approximately 240,000 years ago, meaning they are relatively young species for Drosophila.20 The phylogenetic relationship between these three species remains an unresolved trichotomy.20 D. melanogaster females carrying null mutations at Hmr produce viable hybrid F1 males in crosses with males from any of the three D. simulans clade species, indicating a shared genetic basis of hybrid F1 male lethality between D. melanogaster and the simulans clade. The extent to which this pattern may be shared with Lhr- and gfzf-mediated rescue is unknown. Null mutants of Lhr have not been isolated in D. mauritiana and D. sechellia, but testing the effects of gfzf-mediated rescue across species is possible. gfzf sim is necessary for the lethality of hybrid F1 males in crosses between D. melanogaster females and D. simulans males. RNAi induced knockdown of gfzf sim is sufficient to rescue the viability of hybrid F1 males.8 The RNAi target sequences for gfzf are shared across all sibling species of the simulans clade (gfzf sib) allowing us to use the transgenes in D. melanogaster to test whether 44 knockdowns of the gfzf allele from D. mauritana and D. sechellia are also sufficient to rescue hybrid male viability in the respective hybridizations. To measure variation in the rate of gfzf sib knockdown mediated hybrid male rescue across the D. simulans clade, we crossed D. melanogaster females carrying the gfzf knockdown constructs to males from several lines each of D. simulans, D. mauritiana, and D. sechellia. We used two RNAi constructs – RNAi-1 and RNAi-2 – that specifically target gfzf from the sister species at different regions of the gene, and not the D. melanogaster allele. We first sequenced both RNAi target sites from all of our D. simulans, D. mauritiana and D. sechellia lines used in this study. In D. simulans, we found no sequence variation at either RNAi target sequence. Similarly, the D. mauritiana strains were perfectly matched for both RNAi-1 and RNAi-2 knockdown constructs except for one line (w140, which carries a single nucleotide mismatch for both knockdown constructs). D. sechellia is fixed for the same single nucleotide mismatch for the RNAi-1 knockdown construct target sequence, but has perfect identity with the RNAi-2 construct target sequence (Fig 4.2 A). We observed robust rescue in crosses between D. melanogaster females carrying either gfzf sib RNAi knockdown constructs and multiple lines of D. simulans males (Fig 4.2 B) (mean 82.9% RNAi-1, 50.0% RNAi-2). We crossed these same D. melanogaster gfzf sib RNAi lines with males from D. mauritiana strains, and again observed robust rescue of hybrid F1 male viability at rates comparable or slightly better than those observed with D. simulans (mean 90.5% RNAi-1, 63.0% RNAi-2). Only one D. mauritiana line (w140) recorded no rescue, 45 Figure 4.2. gfzf knockdown rescues hybrids with D. simulans and D. mauritiana but not D. sechellia (A) Rescue crosses for hybrids with both gfzf sib RNAi constructs. Summaries are presented for each species. (B) Alignment of RNAi targeting sites in all three D. simulans clade species. Below is an alignment with D. melanogaster, demonstrating the deletion that is fixed in the entire D. simulans clade. 46 which is consistent with our observation of mismatches in line for both RNAi knockdown target sequences. Our results from crosses with D. sechellia, however, were dramatically different. In contrast to our observations of robust hybrid male rescue with D. simulans and D. mauritiana, we observed no rescue with D. sechellia with either RNAi construct despite perfect matches with the RNAi-2 target sequence in all of the lines tested (mean 0.3% RNAi-1, 0.2% RNAi-2). Although we did not sequence rare survivor males from these crosses, these rare males are known to be the result of fertilization between nullo-X eggs from non-disjunction events in D. melanogaster and sperm carrying an X chromosome from the sister species (as observed in the RNAi OFF control crosses). Together, our results show that, unlike Hmr-based hybrid male rescue, RNAi targeting of the incompatible allele of gfzf is sufficient to rescue hybrid F1 male viability in crosses with D. simulans and D. mauritiana, but not with D. sechellia. The D. sechellia resistance to hybrid male rescue mediated by gfzf knockdown may be explained by a failure to reduce the expression of the gfzf sec allele in D. melanogaster-D. sechellia hybrids. Because the short interfering RNA (siRNA) pathway genes such as Dicer-2, Ago-2, and R2D2, etc. have diverged rapidly between D. melanogaster and the D. simulans clade, the siRNA pathway itself may not be functional in these in D. melanogaster-D. sechellia hybrids.27,28 To test whether the siRNA pathway is functional in hybrids between D. melanogaster and its sister species, we tested for the efficacy of RNAi in hybrids using a knockdown construct that targets the X-linked white gene.29 47 When the white gene is knocked down in flies, the eye color changes from red to white. Incomplete knockdown of this gene manifests as an intermediate color, which can be quantified. We generated hybrids between D. melanogaster females carrying this knockdown construct and males from D. simulans, D. mauritiana and D. sechellia and measured the intensity of eye pigmentation as a readout of RNAi efficacy. The reduction in eye pigmentation was not significantly different across all three hybrid genotypes (Fig 4.3). These results indicate that despite the rapid divergence of the genes involved in the siRNA pathway, this pathway remains functional in inter-species hybrids. However, since the w RNAi construct is driven by the GMR promotor in the eye, this experiment cannot rule out the possibility that the RNAi system is defunct in hybrids at a specific developmental timepoint where gfzf knockdown is required for rescue. To directly test whether the level of knockdown of gfzf sib is comparable across all three crosses, we performed the gfzf-knockdown hybrid rescue crosses with the three species and measured RNA expression levels of gfzf. We measured the levels of gfzf transcript from the parental species by RT-qPCR using primers that amplify only gfzf mel or gfzf sib in the hybrid females. We found that expression of the gfzf sib allele is reduced in all three inter-species hybrids, and that there is no significant difference between the magnitude of the reduction of gfzf sib expression in any of the three species (Fig 4.4). Together these results indicate that the lack of male rescue in D. sechellia hybrids is not due a failure to knockdown the expression of gfzf sec. 48 Figure 4.3. RNAi machinery functions in D. melanogaster – D. simulans clade hybrids (A) Example eyes from the genotypes tested for eye color intensity. (B) Quantification of eye color intensity in control and hybrid genotypes. The lettered bars indicate categories that were significantly different from each other (Pairwise Wilcoxon Rank Sum test, p < 0.05, n=6). Hybrid pigment intensity is significantly reduced by the RNAi construct, and no hybrid genotype was significantly different from any other. Figure 4.4. gfzf sib RNAi construct reduces the expression of gfzf sib in all three species Expression of gfzf was normalized to Rpl32 expression. Here values are presented as the ratio of gfzf sib to gfzf mel. * p < 0.05 by Pairwise Wilcoxon Rank Sum test. 49 Our attempts to rescue D. sechellia hybrid males indicate that the lack of rescue of D. sechellia hybrid F1 males may be explained by either additional loci or additional changes at known hybrid incompatibility loci fixed in the D. sechellia lineage. We reasoned that as D. sechellia is resistant to gfzf-mediated hybrid male rescue, whereas D. simulans is not, recombinant genotypes between the two species would allow us to map the loci responsible for this trait. However, conventional multigeneration recombinant mapping approaches are untenable for this trait because the final cross must include a D. melanogaster female crossed to a D. simulans-D. sechellia hybrid male. Hybrid F1 males between D. simulans and D. sechellia are completely sterile, and thus any subsequent generations of recombinant males that could produce progeny with D. melanogaster would be biased by the selection on alleles that rescue D. simulans-D. sechellia hybrid male sterility. This would not allow us to distinguish between alleles that rescue hybrid male viability in crosses with D. melanogaster from those that cause hybrid male sterility between D. simulans and D. sechellia. We circumvented this problem by using a D. simulans attached-X (C(1)RM yw / C(1;Y)) stock to alter the direction of the D. melanogaster / D. simulans-D. sechellia cross while still preserving the genotype that we aimed to study (Fig 4.5 A). The attached-X D. simulans / D. sechellia hybrid F1 females produce gametes that are recombinant for their autosomes, and carry either the D. simulans attached-X chromosomes or a D. sechellia Y chromosome. When these hybrid F1 females are crossed with a D. melanogaster male, this cross generates triple-hybrid F1 females with a D. simulans attached-X chromosomes 50 D. sechellia D. simulans C(X) A X D. melanogaster Zhr 1 gfzf sim RNAi Actin-GAL4 X Zhr 1 gfzf sim Actin-GAL4 / + RNAi Actin-GAL4 sequence the viable recombinant hybrid males and females B Allele frequencies of hybrids with gfzf sib knockdown 0 . 30 Average (male-female) allele frequency, 20KB windows 0 . 25 Lhr D. melanogaster D. simulans D. sechellia 0 . 30 Satyr gfzf 0 . 25 0 . 20 0 . 20 0 . 1 5 0 . 1 5 0 . 1 0 0 . 1 0 0 . 0 5 0 . 0 5 0.00 0 . 0 0 −0 . 0 5 −0 . 0 5 −0 . 1 0 −0 . 1 0 −0 . 1 5 −0 . 1 5 −0 . 20 −0 . 20 −0 . 25 −0 . 25 −0 . 30 −0 . 30 2L 2R 3L 3R Figure 4.5. D. sechellia is resistant to hybrid male rescue by gfzf sib RNAi due to two loci (A) Cross for generating tri-hybrid progeny for mapping samples. Males and females were collected in three independent replicates for pooled genome sequencing. (B) Map of allele frequencies in the tri-hybrid males. For each sample, allele frequencies were calculated in 20KB windows. They were then normalized by subtracting the allele frequency for females in the same window, and outliers removed. This plot contains the average of all three replicates. 51 and triple-hybrid F1 males with a D. melanogaster X and a D. sechellia Y chromosome. This direction of the cross is susceptible to the cyto-nuclear incompatibility of a standard D. melanogaster male to D. simulans female cross, since the maternal factor from D. simulans (Mhr) interacts with the X chromosome from D. melanogaster, both of which are present in this cross. To remedy this, we recombined our RNAi-gfzf sib transgene onto the Zhr 1 chromosome, which is known to rescue this cyto-nuclear incompatibility.30 These genotypes allowed us to generate large numbers of rescued triple-hybrid recombinant males, that are predicted to be enriched for D. simulans alleles that rescue viability and depleted for the D. sechellia alleles that prevent hybrid male rescue. Starting with three inbred lines of D. sechellia, we used this crossing scheme to produce matched pools of 350 rescued triple-hybrid males and triplehybrid females each. We then performed pooled whole genome sequencing of the rescued males and of the females separately from each replicate to measure the frequencies of D. simulans, D. sechellia, and D. melanogaster alleles across the genome. In these experiments, the triple-hybrid females serve as a control for general effects on hybrid viability. When we analyzed the allele frequency of D. simulans and D. sechellia alleles in our samples, we found a striking result. There are two locations in the D. simulans genome that are highly enriched in rescued hybrid males as opposed to the hybrid female samples (Fig 4.5 B). In both cases, this difference is due to deviations recorded in the male samples, as the females show nearly even D. simulans / D. sechellia allele frequencies 52 across the genome. The first peak of enrichment is centered between 17.32MB and 17.50MB on chromosome 2R (D. melanogaster coordinates). The maximum difference between rescued males and females in the frequency of the D. simulans allele is 0.258. Conversely, the difference in D. sechellia allele frequency between rescued males and females at this locus is -0.244. The D. simulans and D. sechellia alleles comprise half of the hybrid genome (the other half is D. melanogaster), and the magnitude of the D. simulans and D. sechellia allele frequency difference in males is 0.502. Therefore, our data suggest that all rescued males contain the D. simulans alleles at this locus. This peak sits directly on top of Lhr (17.43MB), a known hybrid incompatibility gene in this system. The second peak appears on chromosome 3L, centered between 8.62MB and 8.68MB (D. melanogaster coordinates). The maximum difference between rescued males and females in the frequency of the D. simulans allele is 0.190. Conversely, the difference in D. sechellia allele frequency between rescued males and females at this locus is -0.183. This difference means that 74.6% of the rescued males contained the D. simulans allele at this position. No genes near this region have previously been implicated in the D. melanogaster / D. simulans hybrid incompatibility. We name this new locus Sechellia aversion to hybrid rescue (Satyr). Our sequencing data show that every rescued male we recovered contained the D. simulans Lhr allele, while most rescued males contained the D. simulans Satyr allele. Therefore, it appears that both Lhr sec or Satyr sec can at 53 least partially prevent gfzf sib knockdown from rescuing hybrid F1 males. There is a slight elevation in the recovery of D. simulans alleles across the rest of the genome, even in regions unlinked to Lhr or Satyr. The basis of this deviation is unclear. We do not observe a peak in D. simulans allele frequency near gfzf, indicating that the gfzf sim and gfzf sec allele are equivalent in our experiment. It appears that the genomic architecture of the lack of gfzf-mediated rescue of D. sechellia male F1 hybrids is explained by two dominantly-acting large effect loci – Lhr and Satyr. Discussion Although D. simulans, D. mauritiana and D. sechellia are closely related to each other, D. sechellia is different from its sister species in several aspects. D. sechellia is found exclusively on the Seychelle islands in the Indian Ocean, and is specialized on the fruit of Morinda citrifolia, which is otherwise toxic to many other insects.31,32 This species has been utilized extensively as a prime model system to understand inter-species differences in morphology,33–35 toxin-resistance,36–38 and behavioral preferences.39–41 Our study shows that D. sechellia is also special with regards to hybrid incompatibilities, and uncovers the genetic architecture underlying this difference. Our results have three important implications to understanding the nature of hybrid male inviability between D. melanogaster and its sister species. First, our experiments represent the first effort to map the variation of hybrid male rescue in the D. simulans clade by using a triple-hybrid crossing 54 approach. Our results demonstrate that the interspecies variation in hybrid male rescue between D. melanogaster and species from the D. simulans clade is not broadly dispersed across the genome, but is instead explained by two dominantly-acting major effect loci. Because we find these differences to be fixed between the three species of the D. simulans clade, it is likely that these changes were fixed in the lineage leading to D. sechellia after its divergence from D. simulans and D. mauritiana. Second, we have uncovered a new locus, Satyr, that acts as a dominant hybrid incompatibility locus between D. melanogaster and D. sechellia. This locus resides near 8.6MB (D. melanogaster coordinates) on chromosome 3L. Our approach relies on sensitization to rescue by reducing rather than completely removing gfzf sec. Therefore, it is possible that complete loss of function at any of gfzf sec, Lhr sec, or Satyr sec may be capable of full rescue of hybrid males. One intriguing possibility is that the action Satyr is not confined to the D. melanogaster – D. sechellia hybridization, and that loss of function mutations at this locus from any of the D. simulans sibling species may be sufficient to rescue hybrid males. Third, of the two dominantly-acting loci that we mapped, the localization of one of these loci to the region containing Lhr is noteworthy. Our results suggest that both Lhrsim and Lhrsec are incompatible alleles, and yet the Lhrsim allele is enriched among rescued triple-hybrid males. This implies imply that additional changes at Lhr in the lineage leading to D. sechellia after its split from D. simulans may have made it even more penetrant in its incompatible effects. In other words, the same hybrid incompatibility genes but with stronger effects may explain the 55 difficulty in rescuing D. sechellia hybrids. Under this scenario, Satyr represents a novel hybrid male inviability locus that is shared between all three species of the simulans clade. This is also consistent with the observation that alleles of D. melanogaster Hmr are sufficient to rescue hybrid F1 males in crosses with all three species of the simulans clade. Alternatively, it is formally possible that Satyr represents a hybrid incompatibility locus that is unique to D. sechellia. Further mapping and identification of this gene will the door to understanding both the evolution and molecular mechanisms of this hybrid lethal incompatibility. Our triplehybrid approach has now made studying hybrid incompatibility genes with respect to D. sechellia accessible. Although loss of either Lhr or gfzf alone is sufficient to rescue hybrid F1 male viability in D. melanogaster-D. simulans hybrids, there is little evidence to show a direct genetic interaction between these two hybrid incompatibility genes. Our data for the first time suggest that variants at Lhr interact genetically with gfzf. Combined with recent work to show that Hmr and gfzf co-localize on chromatin in F1 hybrids,12 a more detailed picture of the nature of interactions between these three hybrid incompatibility genes is beginning to emerge in terms of their the ability to influence the effects of each other. Previous efforts to locate hybrid incompatibility loci have relied on the recovery of natural alleles,4,5 deficiency screens from D. melanogaster,42,43 and a mutagenesis screen in D. simulans.8 In all of these instances it is entirely plausible that a dominant hybrid incompatibility at the Satyr locus would have been missed – previous mutagenesis and deficiency screens were not to 56 saturation. Our work highlights how creative crossing approaches may allow interspecies variation to be leveraged towards understanding the genetics and evolution of hybrid incompatibilities.44,45 Materials and Methods Fly strains The details regarding natural populations and species variants that we acquired for this experiment can be found in Table 4.1. These lines were gifts from H.S. Malik, D. Matute, or acquired from the Drosophila Species Stock Center. For our triple-hybrid mapping cross, we generated several lines. We build a recombinant RNAi- gfzf sib, Zhr 1 chromosome by recovering the products of RNAi- gfzf sib 8 crossed to Zhr1 (Bloomington Stock Center 25140) over an FM7i balancer. We confirmed the ability of this chromosome to rescue hybrid F1 males by crossing it to the C(1)RM yw / C(1;Y) (attached-X) D. simulans stock.46 Next, we generated three independent stocks of D. sechellia w (Drosophila Species Stock Center 14021-0248.15) by single pair inbreeding three replicates of the base stock for five generations. To induce our RNAi system, we crossed the RNAi- gfzf sib, Zhr 1 chromosome to an Actin5C-GAL4 / CyO line (Bloomington Stock Center 25374) which expresses in all cell types at all stages of development after zygotic genome activation. We crossed the resulting Zhr 1 UAS.RNAi-gfzf sib; Actin5C-GAL4 F1 males to attached-X D. simulans / D.sechellia F1 hybrid females to make the triple-hybrid progeny. 57 Table 4.1 Species and strains Name Species Origin WT (07) D. simulans Wanie-Rukula, Congo WT (08) D. simulans Wanie-Rukula, Congo D. simulans Drosophila Species Stock Center D. simulans Drosophila Species Stock Center iso-105 D. mauritiana Drosophila Species Stock Center w139 D. mauritiana Malik Lab w140 D. mauritiana Malik Lab WT(09) D. mauritiana Malik Lab iso-75 D. mauritiana Malik Lab w[1] D. mauritiana Drosophila Species Stock Center D. sechellia Drosophila Species Stock Center WT(03) D. sechellia Drosophila Species Stock Center NF13 D. sechellia Matute Lab NF14 D. sechellia Matute Lab ArvoB3 D. sechellia Matute Lab w 501 C(1)RM yw / C(1;Y) w 58 Fly husbandry For our initial tests of hybrid male rescue, we allowed parental flies to mate for 2 days at 25C before flipping them to fresh media. We incubated the vials containing hybrid progeny at 18C, as during the larval stages hybrid larvae become extremely temperature sensitive.47 We counted the progeny at 23 days post mating. In generating the triple-hybrid flies, we used a different mating scheme as we found that the C(1)RM yw / C(1;Y) genotype has high rates of lethality at 25C. For these crosses, we allowed mating at 21C for 2 days, followed by incubating the progeny at 18C until 23 days post mating. Measuring gfzf expression in hybrids As one quarter of our hybrids contained the RNAi knockdown construct and the GAL4 driver, we determined the genotypes of our hybrid samples by removing the head and probing for the presence of the RNAi construct and the GAL4 construct by PCR. To measure gfzf expression, we extracted RNA from the remainder of the body using the DirectZol RNA Miniprep Kit (Zymo Research) and generated cDNA using SuperScript III (Thermo Fisher Scientific). For RTqPCR, we used iTaq Syber Green (BioRad). We measured the abundance of gfzf mel, gfzf sib, and Rpl32 as a loading control using the following primers: gfzf F(both species): CCGGACATGGACCTCTCAAA, gfzf R (mel): GGGACACGGATAATGATGCAG, gfzf R (sim): CTTTGGGACACGGATCTGCT, RPL32 F: ATGCTAAGCTGTCGCACAAATG, R: GTTCGATCCGTAACCGATGT. We rejected any samples in which our no-RT controls showed signs of 59 amplification. To compare expression levels, we first normalized both gfzf samples to the Rpl32 control, and then determined the ratio of gfzf sib to gfzf mel expression. We checked for statistical significance in our samples using a Pairwise Wilcoxon Rank Sum test in R. Measuring eye pigment in w-RNAi To test the knockdown of the w gene, we outcrossed GMR-wIR (Bloomington Stock Center 32067) females to w mutant males from all four species. In this design, the only intact allele of w came from the GAL4-wIR line and therefore controls for potential difference in targeting w between the different species. We gathered images of both eyes from individual flies using a Leica MC120 HD camera on a Leica MC165 FC dissection scope with overhead illumination. To control for changes in ambient lighting, we included a piece of blue construction paper as the background, and made sure to capture the image such that segments of the construction paper were not in the shadow of the fly. We used the gray scale of these images to measure pixel intensity in ImageJ, and normalized the values to that of the construction paper in the background. We normalized all values to the mean of the WT control, and checked for a statistical difference between the samples using a Pairwise Wilcoxon Rank Sum test in R. 60 Whole fly DNA extraction for pooled genome sequencing To extract DNA for whole genome sequencing, we used the DNeasy Blood and Tissue kit (Qiagen). We pooled our 350 triple-hybrids by simultaneously by freezing all samples in liquid nitrogen and grinding them together with a mortar and pestle, and immediately using the frozen ground tissue as the input for the DNeasy kit. We repeated this process for each of the triple-hybrid male and paired triple-hybrid female samples. For the parental lines that we sequenced, we extracted DNA from a pool of 50 flies, half male and half female. Pooled whole genome sequencing To measure allele frequencies in our triple-hybrid samples, we used the PCR-free Illumina Hi-Seq platform to generate paired end reads of the pooled sample. To generate accurate calls of variants in our different lines, we sequenced all six of our parental lines using the Novaseq Illumina platform. Library prep and sequencing was carried out by the Huntsman Cancer Institute High-Throughput Genomics and Bioinformatics Analysis Shared Resource. Sequence alignment and allele frequency analysis We trimmed sequencing reads for quality using PicardTools. We aligned the reads to the D. melanogaster reference genome (r6.24 at the time of analysis) using bwa.48 We called variants and re-aligned reads based on these variant calls using GATK 3.6.49 To find positions that would allow us to measure 61 allele frequency in the three species, we wrote our own code to parse vcf files and identify positions with SNPs fixed differently between all three species (for our analyses, we did not use indels as these SNPs were at high enough frequency). To analyze allele frequencies, we scanned the genome in 20KB windows and measured the relative abundance of D. melanogaster, D. simulans, and D. sechellia SNPs from high quality sites. We paired these windows between male and female samples, calculated the difference in allele frequency between males and females for all three of the parental SNP types. The plot that we report in Fig 4.5 is the average allele frequency in each window for all three replicates. All of our code can be found at github.com/jcooper036/tri_hybid_mapping. Data availability All of the genomic sequencing data for this project are available on the Sequence Read Archive accession number SRP190327. They can also be accessed via the BioProject accession number PRJNA530263. Acknowledgements We thank the labs of H. S. Malik and D. Matute for sharing their fly lines with us. We thank D. M. Castillo and J. G. Baldwin-Brown for their comments and suggestions. We thank S. Phadnis for her continued support. Research reported in this publication utilized the High-Throughput Genomics and Bioinformatic Analysis Shared Resource at Huntsman Cancer Institute at the University of Utah and was supported by the National Cancer Institute of the National Institutes of 62 Health under Award Number P30CA042014. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. References 1. Coyne, J. & Orr, H. A. Speciation. (Sinauer Associates, 2004). 2. Barbash, D. A. Ninety Years of Drosophila melanogaster Hybrids. Genetics 186, 1–8 (2010). 3. Pontecorvo, G. Viability Interactions Between Chromosomes of Drosophila melanogaster and Drosophila simulans. J. Genet. 45, 51–66 (1943). 4. Watanabe, T. K. A Gene that Rescues the Lethal Hybrids Between. Drosophila melanogaster and D. simulans. Jpn. J. Genet. 54, 325–331 (1979). 5. Hutter, P. & Ashburner, M. Genetic Rescue of Inviable Hybrids Between Drosophila melanogaster and its Sibling Species. Nature 327, 331–333 (1987). 6. Barbash, D. A., Siino, D. F., Tarone, A. M. & Roote, J. A Rapidly Evolving MYB-Related Protein Causes Species Isolation in Drosophila. Proc. Natl. Acad. Sci. 100, 5302–5307 (2003). 7. Brideau, N. J. et al. Two Dobzhansky-Muller Genes Interact to Cause Hybrid Lethality in Drosophila. Science 314, 1292–1295 (2006). 8. Phadnis, N. et al. An Essential Cell Cycle Regulation Gene Causes Hybrid Inviability in Drosophila. Science 350, 1552–1555 (2015). 9. Thomae, A. W. et al. A Pair of Centromeric Proteins Mediates Reproductive Isolation in Drosophila Species. Dev. Cell 27, 412–424 (2013). 10. Satyaki, P. R. V. et al. The Hmr and Lhr Hybrid Incompatibility Genes Suppress a Broad Range of Heterochromatic Repeats. PLoS Genet 10, e1004240 (2014). 11. Baumann, D. G., Dai, M.-S., Lu, H. & Gilmour, D. S. GFZF, a Glutathione S-Transferase Protein Implicated in Cell Cycle Regulation and Hybrid Inviability, Is a Transcriptional Coactivator. Mol. Cell. Biol. 38, e00476-17 (2018). 12. Cooper, J. C. et al. Altered Chromatin Localization of Hybrid Lethality 63 Proteins in Drosophila. bioRxiv 438432 (2018). doi:10.1101/438432 13. Lachaise, D. et al. Historical Biogeography of the Drosophila melanogaster Species Subgroup. in Evolutionary Biology (eds. Hecht, M. K., Wallace, B. & Prance, G. T.) 159–225 (Springer US, 1988). doi:10.1007/978-14613-0931-4_4 14. Ballard, J. W. O. Sequential Evolution of a Symbiont Inferred From the Host: Wolbachia and Drosophila simulans. Mol. Biol. Evol. 21, 428–442 (2004). 15. Dean, M. D. & Ballard, J. W. O. Linking Phylogenetics with Population Genetics to Reconstruct the Geographic Origin of a Species. Mol. Phylogenet. Evol. 32, 998–1009 (2004). 16. Baudry, E., Derome, N., Huet, M. & Veuille, M. Contrasted Polymorphism Patterns in a Large Sample of Populations From the Evolutionary Genetics Model Drosophila simulans. Genetics 173, 759–767 (2006). 17. Kopp, A., Frank, A. & Fu, J. Historical Biogeography of Drosophila simulans Based on Y-chromosomal Sequences. Mol. Phylogenet. Evol. 38, 355– 362 (2006). 18. Kliman, R. M. et al. The Population Genetics of the Origin and Divergence of the Drosophila simulans Complex Species. Genetics 156, 1913–1931 (2000). 19. McDermott, S. R. & Kliman, R. M. Estimation of Isolation Times of the Island Species in the Drosophila simulans Complex from Multilocus DNA Sequence Data. PLOS ONE 3, e2442 (2008). 20. Garrigan, D. et al. Genome Sequencing Reveals Complex Speciation in the Drosophila simulans Clade. Genome Res. 22, 1499–1511 (2012). 21. Lachaise, D., David, J. R., Lemeunier, F., Tsacas, L. & Ashburner, M. The Reproductive Relationships Of Drosophila sechellia with D. mauritiana, D. simulans, and D. melanogaster from the Afrotropical Region. Evol. Int. J. Org. Evol. 40, 262–271 (1986). 22. Matute, D. R. & Ayroles, J. F. Hybridization Occurs Between Drosophila simulans and D. sechellia in the Seychelles Archipelago. J. Evol. Biol. 27, 1057– 1068 (2014). 23. Sturtevant, A. H. A New Species Closely Resembling Drosophila melanogaster. Psyche J. Entomol. 26, 153–155 (1919). 24. Castillo, D. M. & Barbash, D. A. Moving Speciation Genetics Forward: Modern Techniques Build on Foundational Studies in Drosophila. Genetics 207, 64 825–842 (2017). 25. Cooper, J. C. & Phadnis, N. A Genomic Approach to Identify Hybrid Incompatibility Genes. Fly (Austin) 1–7 (2016). doi:10.1080/19336934.2016.1193657 26. Gérard, P. R. & Presgraves, D. C. Abundant Genetic Variability in Drosophila simulans for Hybrid Female Lethality in Interspecific Crosses to Drosophila melanogaster. Genet. Res. 94, 1–7 (2012). 27. Obbard, D. J., Jiggins, F. M., Halligan, D. L. & Little, T. J. Natural Selection Drives Extremely Rapid Evolution in Antiviral RNAi Genes. Curr. Biol. 16, 580–585 (2006). 28. Palmer, W. H., Hadfield, J. D. & Obbard, D. J. RNA-Interference Pathways Display High Rates of Adaptive Protein Evolution in Multiple Invertebrates. Genetics 208, 1585–1599 (2018). 29. Lee, Y. S. et al. Distinct Roles for Drosophila Dicer-1 and Dicer-2 in the siRNA/miRNA Silencing Pathways. Cell 117, 69–81 (2004). 30. Sawamura, K. & Yamamoto, M.-T. Cytogenetical Localization of Zygotic Hybrid Rescue (Zhr), a Drosophila melanogaster gene that Rescues Interspecific Hybrids from Embryonic Lethality. Mol. Gen. Genet. MGG 239, 441–449 (1993). 31. Tsacas, L. Drosophila sechellia, n. sp., Huitieme Espece du Sous-Groupe melanogaster des Iles Sechelles (Diptera, Drosophilidae). Rev. Francaise Entomol. Nouv. Ser. 3, 146–150 (1981). 32. R’Kha, S., Capy, P. & David, J. R. Host-Plant Specialization in the Drosophila melanogaster Species Complex: A Physiological, Behavioral, and Genetical analysis. Proc. Natl. Acad. Sci. U. S. A. 88, 1835–1839 (1991). 33. Sucena, É. & Stern, D. L. Divergence of Larval Morphology Between Drosophila sechellia and its Sibling Species Caused by Cis-Regulatory Evolution of ovo/shaven-baby. Proc. Natl. Acad. Sci. 97, 4530–4534 (2000). 34. Orgogozo, V., Broman, K. W. & Stern, D. L. High-Resolution Quantitative Trait Locus Mapping Reveals Sign Epistasis Controlling Ovariole Number Between Two Drosophila Species. Genetics 173, 197–205 (2006). 35. Orgogozo, V., Muro, N. M. & Stern, D. L. Variation in Fiber Number of a Male-Specific Muscle Between Drosophila Species: A Genetic and Developmental Analysis. Evol. Dev. 9, 368–377 (2007). 36. Jones, C. D. Genetics of egg production in Drosophila sechellia. Heredity 65 92, 235–241 (2004). 37. Huang, Y. & Erezyilmaz, D. The Genetics of Resistance to Morinda Fruit Toxin During the Postembryonic Stages in Drosophila sechellia. G3 GenesGenomesGenetics 5, 1973–1981 (2015). 38. Lanno, S. M. et al. Transcriptomic Analysis of Octanoic Acid Response in Drosophila sechellia Using RNA-Sequencing. G3 Bethesda Md 7, 3867–3873 (2017). 39. Prieto-Godino, L. L. et al. Evolution of Acid-Sensing Olfactory Circuits in Drosophilids. Neuron 93, 661-676.e6 (2017). 40. Seeholzer, L. F., Seppo, M., Stern, D. L. & Ruta, V. Evolution of a Central Neural Circuit Underlies Drosophila Mate Preferences. Nature 559, 564–569 (2018). 41. Lavista-Llanos, S. et al. Dopamine Drives Drosophila sechellia Adaptation to its Toxic Host. eLife 3, e03785 (2014). 42. Presgraves, D. C. A Fine-Scale Genetic Analysis of Hybrid Incompatibilities in Drosophila. Genetics 163, 955–972 (2003). 43. Cuykendall, T. N. et al. A Screen for F1 Hybrid Male Rescue Reveals No Major-Effect Hybrid Lethality Loci in the Drosophila melanogaster Autosomal Genome. G3 GenesGenomesGenetics 4, 2451–2460 (2014). 44. Orr, H. A. & Coyne, J. The Genetics of Postzygotic Isolation in the Drosophila virilis Group. Genetics 121, 527–537 (1989). 45. Cattani, M. V. & Presgraves, D. C. Incompatibility Between X Chromosome Factor and Pericentric Heterochromatic Region Causes Lethality in Hybrids Between Drosophila melanogaster and Its Sibling Species. Genetics 191, 549–559 (2012). 46. Sawamura, K., Yamamoto, M. T. & Watanabe, T. K. Hybrid Lethal Systems in the Drosophila melanogaster Species Complex. II. The Zygotic Hybrid Rescue (Zhr) Gene of D. melanogaster. Genetics 133, 307–313 (1993). 47. Barbash, D. A., Roote, J. & Ashburner, M. The Drosophila melanogaster Hybrid Male Rescue Gene Causes Inviability in Male and Female Species Hybrids. Genetics 154, 1747–1771 (2000). 48. Li, H. & Durbin, R. Fast and Accurate Short Read Alignment with Burrows–Wheeler Transform. Bioinformatics 25, 1754–1760 (2009). 66 49. McKenna, A. et al. The Genome Analysis Toolkit: A MapReduce Framework for Analyzing Next-Generation DNA Sequencing Data. Genome Res. 20, 1297–1303 (2010). CHAPTER 5 PARALLEL EVOLUTION OF SPERM HYPER-ACTIVATION CA2+ CHANNELS Reprinted with permission from Genome Biology and Evolution, Oxford University Press, Jacob C. Cooper and Nitin Phadnis, Copyright 2017 68 69 70 71 72 73 74 75 76 77 78 79 CHAPTER 6 RECURRENT RECURRENT POSITIVE SELECTION IN MAMMALIAN PHYLOGENIES 81 Abstract The most evolutionary active protein interfaces change rapidly across species. Recurrent positive selection at the codon level is often a sign that a gene is engaged in a molecular arms race – a conflict between the genome of its host and the genome of another species over mutually exclusive access to a resource that has a direct effect on the fitness of both individuals. Detecting molecular arms races has led to a better understanding of how evolution changes the molecular interfaces of proteins when organisms compete over time, especially in the realm of host-pathogen interactions. Here, we design a method for scanning for recurrent positive selection using entire genomes from a clade of species. We deploy this method on six mammalian clades – primates, mice, deer mice, dogs, cows, and bats – to both detect novel instances of recurrent positive selection and to compare the prevalence of recurrent positive selection between clades. We analyze the frequency at which individual genes are targets of recurrent positive selection in multiple clades. We find that coincidence of selection occurs far more frequently than by chance, indicating that all clades may have some highly specific shared selective pressures. Additionally, we highlight Polymeric Immunoglobulin Receptor (PIGR) as a gene which has specific amino acids under recurrent positive selection in multiple clades, indicating that it may have been locked in a molecular arms race for ~100My. These data provide an in-depth comparison of recurrent positive selection across the mammalian phylogeny, and highlights of the power of comparative evolutionary approaches to generate specific hypotheses about the molecular 82 interactions of rapidly evolving genes. Introduction Most genes in most organisms do not tolerate changes. Across all of life, a large repertoire of highly conserved core genes allows cells to function with extreme consistency. However, some mutations cause changes to a gene which increase the fitness of the individual. If these mutations survive the stochastic mutation birth/death process and continue to convey fitness advantages to individuals, they will eventually rise in frequency in a population – known as positive selection. Further, some genes sit at the forefront of evolution and are frequently targets of positive selection. Perhaps because they are the interface between a host and a lethal pathogen, or because they have a direct impact on fertility or many other reasons, changes to some genes are more frequently the target of positive selection – known as recurrent positive selection. Recurrent positive selection has become a subject of study because it represents highly dynamic evolutionary processes, and is a consistent way to detect host-pathogen interactions in using only primary nucleotide sequences. In the last fifteen years, many examples of recurrent positive selection have proved useful for dissecting the molecular interfaces where hosts and pathogens interact.1–7 These studies have been able to illuminate portions of primary sequences that are important for hosts and pathogens to interact, on what would otherwise be complex binding interfaces. They have also pointed to regions of immune proteins that are targeted by immune suppression systems from 83 pathogens.2,3 The recurrent evolution of host-pathogen interfaces in an attempt for one to gain an advantage over the other is often referred to as a molecular arms race.8 Recurrent positive selection in nonimmune factors usually represents highly dynamic evolutionary processes. For example, reproductive genes frequently experience positive selection both within and between species due to their direct effect on fertility and therefore fitness.9 An additional case of recurrent positive selection is that of centromeric histones, which are thought to evolve rapidly because of their role in meiotic drive.10 These examples and others like them have informed the way that evolution of these essential functions is understood. Both for understanding immune genes and for the discovery of new rapidly evolving processes, there have been multiple efforts to scan for recurrent positive selection in primates and other mammals.11–15 The main pattern revealed by these studies is that immune genes are much more likely to be rapidly evolving than the rest of the genome, leading to further analyses that have suggested viruses to be the main drivers of rapid evolution.16 However, the ability to better detect recurrent positive selection increases with the number of genomes used in the analysis,17 and previous analyses of primates have only used a subset of the available genomes for these analyses. Therefore, it is likely that many cases of recurrent positive selection have gone undetected thus far. Here, we design a computational pipeline to execute the most ubiquitous program for detecting recurrent positive selection, at a genome-wide scale. We 84 deploy this pipeline to scan for recurrent positive selection in six different clades of mammals, including primates, two clades of rodents, caniforms, bovids, and bats. We find many novel examples of recurrent positive selection across all these clades, and provide the results as a resource for others examining specific genes of interest. We ask how frequently single genes show signs of recurrent positive selection across multiple clades and find that there is a much higher coincidence of recurrent positive selection than to be expected by chance. Finally, we examine specific molecular interfaces that have been under recurrent positive selection in multiple clades. As an example, we highlight Polymeric Immunoglobulin Receptor (PIGR) as a gene that has seen recurrent positive selection in the same amino acids across multiple clades in mammals, suggesting that it is an import component of a yet undescribed molecular arms race. Our results will inform future studies focused on recurrent positive selection, and provide many new examples that will help to understand the evolutionary forces that shape the genome. Results To analyze recurrent positive section across mammals, we identified six different clades that have a sufficient number of genomes to conduct this analyses: primates (humans, apes, monkeys), murinae (mice, rats), cricetidae (deer mice, hamsters), chiroptera (bats), caniformia (dogs, bears, seals), and bovidae (cows, sheep). We also included a subsampling of the primates clade that was more comparable to the other clades in the number of genomes used. 85 We gathered genomes for all available species in these clades (Figure 6.1) and reference gene sequences from the best-annotated genome in each clade.18 We then identified homologous gene sequences for each gene that has a 1:1 orthologous relationship within its clade, as this analysis is easily confounded by paralogous sequences. We aligned these sequences and tested them for a signature of recurrent positive selection using Phylogenetic Analysis by Maximum Likelihood (PAML).19 For this test, we first measured the log-likelihood difference between Model 7 (no positive selection) and Model 8 (positive selection). A significant difference between the fit of these two models to the distribution of the rate of codon evolution in a gene is evidence that the gene has undergone recurrent positive selection. If the differences between these two models were significant, then we compared Model 8 to Model 8a (neutral evolution), to test against a scenario of neutral evolution that can be missed by Model 7 (Figure 6.2). In our analysis, we did not require that every gene be identified in all species to complete the analysis. Instead, we required that 90% of the gene be identified in at least four species. We report the distribution of the number of species used for each clade (Figure 6.3). Strangely, very few genes from bovidae were able to successfully run in our analysis (21%). Upon further examination, we discovered that many potential gene sequences were filtered out of our pipeline because they contained stop codons in the first 90% of the gene sequence. We confirmed that these stop codons were not an artifact of our sequence gathering method, nor our alignment. Since it is unlikely that most bovids are missing the sum of these 86 Figure 6.1. Phylogeny of six mammalian clades used in this study Each of the six clades used in this study is designated by their clade name on the right. The common name for all species is given in a corresponding tree to the right of the scientific names. 87 chimpanzee Pan paniscus bonobo Homo sapiens human Gorilla gorilla gorilla Pongo abelii samartran orangutan Nomascus leucogenys white-cheeked gibbon Colobus angolensis black and white colbus Rhinopithecus bieti black snub-nosed monkey Nasalis larvatus proboscis monkey Chorocebus sabaeus Papio anubis Primates Pan troglodyte green monkey olive baboon Cerocebus atys sooty magabey Mandrillis leucophaeus drill Macaca fascicularis crab-eating macaque Macaca mulatta rhesus macaque Macaca nemestrina pig-tailed macaque Saimiri boliviensis Bolivian squirrel monkey Nacny Ma’s night monkey Aotus nancymae common marmoset Callithirx jacchus large Japanese field mouse Apodemus speciosus European woodmouse Mus musculus house mouse Mus spretus Mus caroli Murinae Apodemus sylvaticus western wild mouse Ryukyu mouse shrew mouse Rattus norvegicus Norway rat Microtus agrestis short-tailed field vole Microtus ochrogaster prairie vole Ellobius talpinus Northern mole vole Ellobius lutescens Transcaucasian mole vole Myodes glareolus Mesocricetus auratus Cricetulus griseus Cricetidae Mus pahari Bank vole golden hamster Chinese hamster Phodopus sungrous Djungarian hamster Neotoma lepida desert woodrat Peromyscus maniculatus North American deer mouse Pteropus alecto black flying fox Pteropus vampyrus large flying fox Rousettus aegyptiacus Egyptian rousette Eidolon helvum straw-colored fruit bat Myotis lucifugus little brown bat Myotis brandtii Brandt’s bat Eptesicus fuscus Miniopterus natnlensis Chiroptera Myotis davidii vesper bat big brown bat natal long-fingered bat common vampire bat Pteronotus parnellii Parnell’s mustached bat Rhinolophus sinicus chinese rufous horeshoe bat Rhinolophus ferrumequinum greater horeshoe bat Hipposideros armiger great roundleaf bat Megaderma lyra Indian false vampire Bos taurus cow Bos indicus zebu Bos mutus yak Bison bison American bison Bubalus bubalis Capra aegagrus Bovidae Desmodus rotundus water buffalo wild goat Capra hircus domestic goat Ovis aries sheep Ammotragus lervia aoudad Pantholops hodgsonil chiru Lycaon pictus African hunting dog Canis lupis dog Ursus maritimus polar bear Odobenus rosmarus Leptonychotes weddellii Neomonachus schauinslandi Caniformia Ailuropoda melanocleua giant panda Pacific walrus Weddell seal Hawaiian monk seal Enhydra lutris sea otter Musteka putorius domestic ferret Alurus fulgens red panda 88 Figure 6.2. Corsair program architecture Corsair is designed to run PAML for an individual gene and scaled to the genomic scale. Here we use a toy example of the word "recurrent" to illustrate the process. 1) Search all species in a clade for gene sequence using the reference gene amino acid sequence. 2) Align the genes for each species using ClustalO, muscle, and Tcoffee. Drop any sequences that are less than 90% of the reference gene length (less is kept here for clarity). 3) Using the consensus alignment, mask indel codons and +/- 1 of indel codons. 4) Use PAML to detect recurrent positive selection. 5) Scale this process to analyze every gene in the clade that has a 1:1 orthology relationship throughout the clade. 89 Species usage per clade all primates nine primates murinae cricetidae caniformia bovidae chiroptera Clade Average pairwise dN/dS per gene all primates nine primates murinae cricetidae caniformia bovidae chiroptera Clade Max dS per gene all nine primates primates murinae cricetidae caniformia bovidae chiroptera Clade Figure 6.3. Summary statistics for Corsair run over six clades of mammals 90 genes, we hypothesize that this is likely an issue with the reference genomes available for the species in the bovidae clade. We find that recurrent positive selection is pervasive in all clades, as has been reported for primates and mice in previous work.13,14 In all clades that we analyzed, between 1.3% (murinae) and 8.6% (bovidae) of the genes analyzed showed a signature of recurrent positive selection by the M8-M8a test after multiple testing correction (Table 6.1). Phylogenetic distance can be a confounding factor for PAML, as long branch lengths make estimations of dN and dS inaccurate. To retroactively confirm that all the clades that we used had an appropriate phylogenetic distance for PAML, we measured the maximum dS that was calculated for each gene run in each clade (Figure 6.3). We find that the average maximum dS is close to 0.25 for all clades, confirming that all of these clades are appropriate for PAML. Another form of evidence for recurrent positive selection is the toggling at specific amino acid positions in multiple species in a phylogeny. These sites are of particular interest because they are the most likely to represent the interface of a molecular arms race, which has been demonstrated experimentally numerous times. To identify these sites, we calculated the Bayes Empirical Bayes (BEB) posterior probability (pp) that the state changes in each site throughout the gene were driving the signature of recurrent positive selection, and considered a site to be significant at a BEB pp > 0.95. Again we find that all clades have several genes with at least one or more of these sites. In our subsequent analyses, we use this category as our list of positively selected genes, since it represents a 91 Table 6.1 Results for Corsair run over six species of mammals Clade Max Species 1:1 Orthology primates (all) Genes run M8a < 0.05 M8a < 0.05 (FDR) at least 1 BEB site at least 3 BEB sites (pp > 0.95) (pp > 0.95) 19 14127 12589 (0.89) 2422 (0.192) 572 (0.045) 550 (0.043) 495 (0.039) primates(nine) 9 14127 12302 (0.87) 1689 (0.137) 325 (0.026) 307 (0.024) 275 (0.022) murinae 7 17637 15772 (0.89) 1462 (0.092) 208 (0.013) 202 (0.012) 174 (0.011) cricetidae 10 16370 13769 (0.84) 2019 (0.146) 417 (0.030) 396 (0.028) 347 (0.025) caniformia 10 16461 6555 (0.40) 1347 (0.205) 410 (0.063) 392 (0.059) 357 (0.054) bovidae 10 15724 3242 (0.21) 859 (0.264) 279 (0.086) 274 (0.084) 242 (0.074) chiroptera 15 13289 11614 (0.87) 2583 (0.222) 762 (0.065) 739 (0.063) 682 (0.058) . 92 strong signature of positive selection. After compiling data about recurrent positive selection in each clade, we investigated the coincidence of recurrent positive selection in multiple clades. First, we tested whether any two clades had a greater coincidence of recurrent selection than expected by chance. We first counted the frequency of pairwise hits between two clades (Figure 6.4) and then measured this against the expected frequency of recovered hits given the frequency of hits in both of those clades (Figure 6.4). A greater frequency of rapidly evolving genes than expected by chance indicates that the same genes are the targets of evolutionary forces that drive rapid positive selection, even across multiple clades. We find that all clades have a greater number of coinciding genes under recurrent positive selection than expected by random chance, with the greatest coincident score between murinae and cricetidae. These two clades are closely related and have a high degree of life-history similarity, indicating that these two clades might share similar selective pressures that drive recurrent positive selection. That we don’t find any comparison with fewer coinciding genes than expected, indicating that the evolutionary pressures that drive diversifying selection between clades are not entirely distinct. All organisms are constantly under attack from pathogens in the environment, and previous analyses of rapidly evolving genes have indicated immune function as a common driver of diversifying selection.16 To example the influence of immune function in our test for coincident recurrent positive selection, we segregated the list of rapidly evolving genes into those that have 93 Figure 6.4. Pairwise clade analysis Genes that scanned positive for recurrent positive selection in two clades occur more frequently than expected. A) Table with the number of genes analyzed for each comparison (top half) and the number of hits between each comparison (bottom half). B) Rate of pairwise hits for recurrent positive selection vs. the rate of hits expected by chance. The 95% confidence interval is given as a black bar around each point. Every comparison had a greater ratio of hits recovered than expected, with the neutral value represented by the dashed line. C) Contribution to the signal in B from immune genes (red) and nonimmune genes (yellow). The data from B are replotted for comparison (blue). 94 roles in immunity and those that do not based on KEGG pathway annotation (Figure 6.4). We then re-analyzed the coincidence of recurrent positive selection in these two subsets of the data. We find that recurrent positive selection is detected with a higher degree of coincidence than by chance alone for both immune genes and nonimmune genes. In this analysis, there is a general trend that the set of immune genes has a greater magnitude of deviation from the expectation of hits recovered than the set of nonimmune genes (6/10 comparisons). We find only one case where the opposite trend is true, and several cases where no distinction can be made (3/10). This analysis demonstrates specific immune genes are targets of recurrent positive selection repeatedly over multiple clades. However, our data clearly show that immune genes do make up the entirety of the signal for recurrent positive selection between clades. Beyond comparison of two clades, we do not have sufficient power to make clear conclusions about the rate of the expected and observed coincidence of recurrent positive selection. However, the genes that experience recurrent positive selection in more than two lineages may be interesting candidates to study because they represent truly long-term (or constantly restarting) arms races in the mammalian lineage. In our data, there are 20 genes with a signature of recurrent positive selection in three clades, and only two genes (PIGR and CD72) with signatures of recurrent positive selection in four clades. We did not identify any genes with recurrent positive selection in five or six clades (Table 6.2). Of the multiclade genes we did identify, most of them represent novel cases 95 Table 6.2 Genes under recurrent positive selection in more than two clades 3 Clade Hits primate + murinae + cricetidae LCP2 primate + murinae + caniformia ENSG00000270168 primate + murinae + chiroptera COL13A1 primate + cricetidae + bovidae CRAMP1 primate + cricetidae + chiroptera SERPINC1 TSPAN8 primate + bovidae + caniformia BTN1A1 GBX1 FYB2 primate + caniformia + chiroptera SAMD9L, GKN1 murinae + cricetidae + caniformia LYPD8 murinae + cricetidae + chiroptera TNFRSF1A, XCR1 murinae + bovidae + chiroptera PRDM11 cricetidae + bovidae + caniformia KCNU1 cricetidae + caniformia + chiroptera FGA PMFBP1 GLP1R bovidae+ caniformia + chiroptera CCDC198 4 Clade Hits primate + cricetidae + caniformia + chiroptera PIGR murinae + cricetidae + caniformia + chiroptera CD72 96 of detecting any form of recurrent positive selection; previous evidence of positive selection in any lineage only exists for SERPINC1, TSPAN8, SAMD9L, and KCNU1.14,20–22 To ask if the same interfaces of the recurrently selected genes have been exposed to selection in multiple clades, we analyzed the positioning of amino acids under recurrent selection in each of the 22 genes that show recurrent selection in 3 or more clades. From this analysis, it is clear that several genes have clusters of amino acids that are under recurrent positive selection in multiple clades, implying that a single molecular interface is the recurrent target of some selective pressure. This pattern is most obvious in PRDM11, ENSG00000270168, FYB2, KCNU1, TSPAN8, and PIGR. The greatest density of selected sites in multiple clades appears to fall in PIGR. PIGR (Polymeric Immunoglobulin Receptor) is a receptor on the basal surface of epithelial mucosal cells that binds to dimeric IgA and pentameric IgM and transports them to the apical surface.23 In our data, we find evidence that is under recurrent positive selection in primates, cricetidae, caniformia, and ciroptera. In murinae, it appears that the reference sequence for PIGR is truncated by ~560 amino acids and therefore we did not analyze the entire sequence. In bovidae, we were not able to analyze the gene, due to the stop codon issue that seems to plague the bovidae sequences. Therefore, it is unclear if recurrent positive selection in this gene extends to murinae and bovidae as well. To better understand the spatial distribution of the cites under selection in 97 this gene, we plotted the changes on a recently solved structure of PIGR 24 (Figure 6.5). PIGR is comprised of five immuno-globulin domains (D1-D5).25 We find that the sites under selection in PIGR are heavily enriched in one of these domains, D2. The positively selected sites appear to cluster on the outfacing section of D2. Though PIGR has been well studied for over 20 years,23 little is known about the specific function of the D2 domain. Evidence suggests that it plays a role in the transport of IgM but not IgA,26 and as of yet, there has been no evidence of a role for this domain in the innate pathogen binding actions of PIGR.27 Our data suggest that the outer face of the D2 domain participates in an evolutionary arms race across most mammals, indicating that it has an important role in immune function that is yet to be discovered. Discussion The modern age of genomics has brought with it many whole sequenced genomes over a broad taxonomic diversity, allowing evolution to be studied in a way that was not possible before. In this study, we sought to leverage this deep and diverse resource to analyze patterns of recurrent positive selection in six different clades of mammals. We designed a pipeline for increasing the throughput of the Phylogenetic Analysis by Maximum Likelihood (PAML),19 so that we could deploy this analysis on a genome-wide scale multiple times over. Subsequently, we asked if our analysis detected instances of recurrent positive selection in the same genes over multiple clades. We found instances of recurrent positive selection in the same genes far more often than expected by 98 Figure 6.5. PIGR sites under selection PIGR contains sites under selection in multiple clades of mammals. The crystal structure of PIGR is plotted in yellow on the left, with the immunoglobulin domains annotated as D1-D5. Sites under selection are shown as globular residues as either recurrently selected in multiple clades (red) or recurrently selected in just one clade (grey). The structure is presented in two orientations, so that the domains are visible (top), and rotated 90 degrees clockwise so that the arrangements of the sites are visible (bottom). The amino acid alignment on the right shows the D2 region of PIGR for each species analyzed. Selected sites are denoted using the same color scheme as the structure on the left. Species names are given as a four letter code (ex: Homo sapiens – Hsap), with a cladogram of their relationships on the far right. 99 chance in every pairwise comparison, indicating that there is an excess of genes that are frequently the targets of positive selection. We narrowed our comparison to some of the most frequently recurrently selected genes and identified a single molecular face of Polymeric Immunoglobulin Receptor (PIGR) that has been the target of recurrent positive selection in mammals for nearly 100 million years. In developing our method, we identified several sources of error that would lead to false positives and took care to eliminate them. First, PAML is highly sensitive to insertions and deletions (indels), as even slight misalignment of sequences can easily replicate the signature of recurrent positive selection.28,29 To circumvent this problem, we developed a strategy of aligning each gene with three different aligners, then only retaining sites where all aligners agreed on the alignment. This is similar to previous approaches that have used a post-hoc test to compare the results from multiple runs of PAML using different aligners,30 but our method avoids the situation where false negatives are created when one section of a gene contains indels, but another section of the gene has a robust signature of positive selection. Across multiple clades, our largest loss in power appears to have come from gene sampling. This is most true in bovidae, where we were only able to sample a small fraction of the known genes. Upon further inspection, this drop in gene coverage was largely due to many genes being eliminated from our analysis because they contained mid-gene stop codons, which we took as a sign of pseudogenization, poor input data quality, or both. Given that it is unlikely that bovidae is missing nearly 79% of the known one-to-one orthologs, this is likely to 100 be the result of poor input sequence quality. Upon visual inspection of failed genes, we found that many of the sequences did indeed contain internal stop codons. These did not appear to be the result of misalignment leading to frame shifts, and these sequences have otherwise good surrounding homology. Thus we conclude that our results suggest that additional genomic sequencing of the bovids should be made a priority before including them in further genome-wide analyses. This study follows on a history of other studies which have analyzed recurrent positive selection in primates, each time improving the analysis as new resources become available. In our study, our rate of detection for positively selected genes in primates falls in line with previously observed rates - 4.5% in our analysis vs. 1%-10% in previous work.11–14,28,31,32 In our data, every clade that we analyzed had rates of positive selection in this range. Like previous studies, we consider our detection method to be extremely conservative, and much more likely to identify false negatives than false positives. It is likely that this is an underestimate of recurrent positive selection in most of these cases. Our comparison between clades then relies on the fact that we used the same method of detection in every case, not that our data represent a ground truth about recurrent positive selection. Encouragingly, our data set identifies several previously studied examples of recurrent positive selection in at least one clade: for example, OAS1,33 PKR,4 TFRC,3 IZUMO1 and IZUMO4,34 PARP4 and PARP15,6 CATSPER 1-4, D, G, and E,35 RNASEL,36 ZP3,37 and CGAS.5 We compared our results with a list of 101 genes representing the most heavily enriched genes for selected sites from a previous genome-wide scan in primates and found that MUC13, NAPSA, PTPRC, APOL6, MS4A12, SCGB1D2, PIP, CFH, RARRES3, OAS1, and TSPAN8 were detected under recurrent positive selection in at least one clade, while PASD1, CD59, and TRIM5 were missed by our analysis (11 / 14 genes).14 Given the differences in the methodology of the two studies, we believe that this represents a high degree of replication of signals for recurrent positive selection, which has been difficult to obtain in previous work. Our analysis is the first to scan for recurrent positive selection over the entire genomes of multiple mammalian clades. We used this new set of data to ask how often we detected recurrent positive selection in the same gene in multiple clades. In our pairwise comparison between clades, we detected far more coincident recurrent positive selection than expected by chance. Recent work has implicated immune genes as the main targets of positive selection in multiple lineages,16,32 raising the possibility that the signal we detect is largely driven by immune genes. When we asked how immune genes contributed to this trend, we found that immune genes have a higher rate of coincidence of recurrent positive selection than the set of all other genes, though they did not explain the entire signal. Combined with previous work, our data suggest that while immune genes as a class are more frequently the target of recurrent positive selection, specific immune genes are not the only genes to experience recurrent positive selection in multiple clades more often than expected. It is difficult to say whether the few examples of recurrent positive selection that we 102 find in three clades and four clades constitute a greater rate than expected because the expectation of the number of genes to recover from these comparisons is too low to be biologically meaningful. However, our data are appropriate for analyzing specific molecular interfaces that might be under recurrent positive selection in multiple clades. Recurrent evolution on three-dimensional protein interfaces are signs of molecular arms races, thought to be waged over interactions at that interface. Identifying these interfaces has been a fruitful approach in understanding the otherwise very complex interactions between host proteins and pathogen-derived factors that might try to interfere with their functions, often identifying specific amino acids or pockets of binding that are crucial for the interaction. In our data, we identify PIGR as having a specific molecular face that has been the target of recurrent positive selection in multiple clades. PIGR contains five extracellular immunoglobulin domains, D1-D5,25 and most of the sites under recurrent selection in one clade and in multiple clades fall in the D2 domain. Though PIGR was identified more than two decades ago and has been the subject of thorough study, there is no immediate hypothesis that can explain this signature of recurrent positive selection. PIGR is primarily responsible for transporting dimeric IgA and pentameric IgM from the basal to the apical surface of mucosal epithelial cells.23 PIGR detaches from the apical surface and is secreted while bound to IgA and IgM, where it is protects secreted IgA and IgM from proteolytic degradation.27 PIGR has some native antibacterial activity,27,38 but this function does not rely on D2. Our results suggest that the D2 domain of 103 PIGR has a yet unknown function that is conserved in its action across mammals but is also a frequent target of selection. Given its prevalence, further investigation of PIGR’s D2 domain may provide general insight into a very generalizable route of attack by pathogens. As opposed to our finding that recurrent positive selection is more common in specific genes than expected, it is worth mentioning that by volume most of the recurrent positive selection that we detect is still clade-specific. Many of these hits represent novel examples of recurrent positive selection, which may prove interesting to study for reasons specific to the biology of those clades. Understanding the patterns of recurrent positive selection in the genome has been a useful pursuit both for understanding patterns of evolution and for studying health relevant host-pathogen molecular arms races. Our results add to this growing body of work by contributing data from multiple clades of mammals and point to examples where rapid evolution is the norm rather than the exception. There is a wide breadth of evolutionary history to study, and our results will provide context for future studies looking to analyze ecologically or medically significant instances of recurrent positive selection. Materials and Methods Data acquisition and distribution Genome sequences for all species were obtained from the National Center for Biotechnology Information (NCBI).11,39–67 The coding DNA sequences for all genes in each reference genome was obtained from Ensemble (https:// 104 ensembl.org/index.html). To facilitate use for future projects, we have assembled all code and dependencies into a python package (https://github.com/jcooper036/corsair). Sequence curation To identify gene sequences throughout each clade, we developed a search method that utilizes the CDS sequence of one well-annotated species in that clade. This method was slightly modified to be a whole genome scale version of a previous method we used to study positive selection for a much smaller group of genes.35 We started with a list of all protein-coding CDS sequences for our reference species. For each sequence, we searched for the homolog of each sequence by first using tBLASTn 68 to identify the genomic scaffold where the sequence resided in each species followed by exonerate 69 to generate a CDS model. We then removed any sequences that did not represent at least 95% of the total length of the gene. As stop codons can represent pseudogenization, we removed sequences that contained stop codons more than 5% of the distance from the end of the gene (if there was a stop codon in this range, we clipped away everything that came after it). We aligned the protein translations of these sequences with Clustal Omega,70 T-coffee,71 and Muscle.72 Because PAML can be easily confounded misalignment caused by insertions and deletions, we compared all three alignments and only kept positions that were agreed upon by all three aligners, while additionally trimming one codon on either side of an insertion or deletion. We then determined the phylogenic 105 relationship for the remaining species using the Phylo module of Biopython.73 Identification of positively selected sites To test for recurrent positive selection in a given gene, we used a Phylogenetic Analysis by Maximum Likelihood (PAML),19 specifically testing between two different models of codon evolution – M7, which fits a beta distribution to the frequency of dN/dS by site but limits the max of the distribution to dN/dS = 1, and M8, which is a similar model except that there is no max on the distribution. A better fit to M8 than M7, as determined by a log-likelihood ratio test with a p-value < 0.05, indicates that the evolution of that gene for those sequences is best explained by a model of recurrent positive selection.19 If we found a significant difference between M7 and M8, then we tested for a difference between M8 and M8a, which approximates neutral evolution. We applied a Bonferroni correction for the M8-M8a p-value based on the number of genes that were run in the analysis. If the analysis for rejected the M7 null and the M8a null hypothesis, we then turned to a Bayes Empirical Bayes (BEB) analysis to identify amino acid positions that have statistical support for recurrent selection.74 We considered a site to have strong evidence for recurrent selection if the BEB posterior probability was greater than 0.95. Finally, we checked that sites were not called as a result of poor alignment by checking for consistent alignment in the region surrounding each positively selected site. 106 Whole genome scaling Our analysis is designed to run on a gene by gene basis. To run the analysis for each gene in a reference genome, we broke the process into minimal computing elements and distributed the tasks over an array of computational resources using Amazon Web Services. Coincident recurrent selection analysis To search for evidence of recurrent – recurrent selection, we limited our analysis to only genes that had a PAML result in all 6 of the clades we analyzed. We considered a gene to have strong evidence of recurrent selection only if there was at least one site with strong evidence for recurrent selection in that gene. References 1. Sawyer, S. L., Wu, L. I., Emerman, M. & Malik, H. S. Positive Selection of Primate TRIM5α Identifies a Critical Species-Specific Retroviral Restriction Domain. Proc. Natl. Acad. Sci. U. S. A. 102, 2832–2837 (2005). 2. Zhang, J. et al. Species-Specific Deamidation of cGAS by Herpes Simplex Virus UL37 Protein Facilitates Viral Replication. Cell Host Microbe 24, 234248.e5 (2018). 3. Barber, M. F. & Elde, N. C. Escape from Bacterial Iron Piracy Through Rapid Evolution of Transferrin. Science 346, 1362–1366 (2014). 4. Elde, N. C., Child, S. J., Geballe, A. P. & Malik, H. S. Protein Kinase R Reveals an Evolutionary Model for Defeating Viral Mimicry. Nature 457, 485–489 (2009). 5. Hancks, D. C., Hartley, M. K., Hagan, C., Clark, N. L. & Elde, N. C. Overlapping Patterns of Rapid Evolution in the Nucleic Acid Sensors cGAS and OAS1 Suggest a Common Mechanism of Pathogen Antagonism and Escape. PLoS Genet 11, e1005203 (2015). 107 6. Daugherty, M. D., Young, J. M., Kerns, J. A. & Malik, H. S. Rapid Evolution of PARP Genes Suggests a Broad Role for ADP-Ribosylation in HostVirus Conflicts. PLOS Genet. 10, e1004403 (2014). 7. Mitchell, P. S. et al. Evolution-Guided Identification of Antiviral Specificity Determinants in the Broadly Acting Interferon-Induced Innate Immunity Factor MxA. Cell Host Microbe 12, 598–604 (2012). 8. Van Valen, L. A New Evolutionary Law. Evol. Theory 1, 1–30 (1973). 9. Swanson, W. J. & Vacquier, V. D. The Rapid Evolution of Reproductive Proteins. Nat. Rev. Genet. 3, 137–144 (2002). 10. Malik, H. S. & Henikoff, S. Adaptive Evolution of Cid, a CentromereSpecific Histone in Drosophila. Genetics 157, 1293–1298 (2001). 11. The Chimpanzee Sequencing and Analysis Consortium, Waterson, R. H., Lander, E. S. & Wilson, R. K. Initial Sequence of the Chimpanzee Genome and Comparison with the Human Genome. Nature 437, 69–87 (2005). 12. Gibbs, R. A. et al. Evolutionary and Biomedical Insights from the Rhesus Macaque Genome. Science 316, 222–234 (2007). 13. George, R. D. et al. Trans Genomic Capture and Sequencing of Primate Exomes Reveals New Targets of Positive Selection. Genome Res. 21, 1686– 1694 (2011). 14. van der Lee, R., Wiel, L., van Dam, T. J. P. & Huynen, M. A. GenomeScale Detection of Positive Selection in Nine Primates Predicts Human-Virus Evolutionary Conflicts. Nucleic Acids Res. (2017). doi:10.1093/nar/gkx704 15. Kosiol, C. et al. Patterns of Positive Selection in Six Mammalian Genomes. PLOS Genet. 4, e1000144 (2008). 16. Enard, D., Cai, L., Gwennap, C. & Petrov, D. A. Viruses are a Dominant Driver of Protein Adaptation in Mammals. eLife 5, e12469 (2016). 17. McBee, R. M., Rozmiarek, S. A., Meyerson, N. R., Rowley, P. A. & Sawyer, S. L. The Effect of Species Representation on the Detection of Positive Selection in Primate Gene Data Sets. Mol. Biol. Evol. 32, 1091–1096 (2015). 18. Ruffier, M. et al. Ensembl Core Software Resources: Storage and Programmatic Access for DNA Sequence and Genome Annotation. Database J. Biol. Databases Curation 2017, (2017). 19. Yang, Z. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. 108 Biol. Evol. 24, 1586–1591 (2007). 20. Zhao, S. et al. Identifying Lineage-specific Targets of Darwinian Selection by a Bayesian Analysis of Genomic Polymorphisms and Divergence from Multiple Species. bioRxiv 367482 (2018). doi:10.1101/367482 21. Lemos de Matos, A., Liu, J., McFadden, G. & Esteves, P. J. Evolution and Divergence of the Mammalian SAMD9/SAMD9L Gene Family. BMC Evol. Biol. 13, 121 (2013). 22. Geng, Y. et al. A Genetic Cariant of the Sperm-Specific SLO3 K+ Channel has Altered pH and Ca2+ Sensitivities. J. Biol. Chem. 292, 8978–8987 (2017). 23. Song, W., Bomsel, M., Casanova, J., Vaerman, J. P. & Mostov, K. Stimulation of Transcytosis of the Polymeric Immunoglobulin Receptor by Dimeric IgA. Proc. Natl. Acad. Sci. U. S. A. 91, 163–166 (1994). 24. Stadtmueller, B. M. et al. The Structure and Dynamics of Secretory Component and its Interactions with Polymeric Immunoglobulins. eLife 5, e10640 (2016). 25. Mostov, K. E., Friedlander, M. & Blobel, G. The Receptor for Transepithelial Transport of IgA and IgM Contains Multiple Immunoglobulin-Like Domains. Nature 308, 37–43 (1984). 26. Norderhaug, I. N., Johansen, F.-E., Krajči, P. & Brandtzaeg, P. Domain Deletions in the Human Polymeric Ig Receptor Disclose Differences Between its Dimeric IgA and Pentameric IgM Interaction. Eur. J. Immunol. 29, 3401–3409 (1999). 27. Kaetzel, C. S. The Polymeric Immunoglobulin Receptor: Bridging Innate and Adaptive Immune Responses at Mucosal Surfaces. Immunol. Rev. 206, 83– 99 (2005). 28. Schneider, A. et al. Estimates of Positive Darwinian Selection Are Inflated by Errors in Sequencing, Annotation, and Alignment. Genome Biol. Evol. 1, 114– 118 (2009). 29. Markova-Raina, P. & Petrov, D. High Sensitivity to Aligner and High Rate of False Positives in the Estimates of Positive Selection in the 12 Drosophila Genomes. Genome Res. 21, 863–874 (2011). 30. Hemmer, L. W. & Blumenstiel, J. P. Holding it Together: Rapid Evolution and Positive Selection in the Synaptonemal Complex of Drosophila. BMC Evol. Biol. 16, 91 (2016). 109 31. Bakewell, M. A., Shi, P. & Zhang, J. More Genes Underwent Positive Selection in Chimpanzee Evolution than in Human Evolution. Proc. Natl. Acad. Sci. 104, 7489–7494 (2007). 32. Shultz, A. J. & Sackton, T. Immune Genes are Hotspots of Shared Positive Selection Across Birds and Mammals. eLife 8, e41815 (2019). 33. Kumar, S., Mitnik, C., Valente, G. & Floyd-Smith, G. Expansion and Molecular Evolution of the Interferon-Induced 2′–5′ Oligoadenylate Synthetase Gene Family. Mol. Biol. Evol. 17, 738–750 (2000). 34. Grayson, P. & Civetta, A. Positive Selection and the Evolution of izumo Genes in Mammals. Int. J. Evol. Biol. 2012, (2012). 35. Cooper, J. C. & Phadnis, N. Parallel Evolution of Sperm Hyper-Activation Ca2+ Channels. Genome Biol. Evol. 9, 1938–1949 (2017). 36. Jin, W., Wu, D.-D., Zhang, X., Irwin, D. M. & Zhang, Y.-P. Positive Selection on the Gene RNASEL: Correlation between Patterns of Evolution and Function. Mol. Biol. Evol. 29, 3161–3168 (2012). 37. Arnoult, C., Zeng, Y. & Florman, H. M. ZP3-Dependent Activation of Sperm Cation Channels Regulates Acrosomal Secretion During Mammalian Fertilization. J. Cell Biol. 134, 637–645 (1996). 38. Mathias, A. & Corthésy, B. N-Glycans on Secretory Component. Gut Microbes 2, 287–293 (2011). 39. Zhang, G. et al. Comparative Analysis of Bat Genomes Provides Insight into the Evolution of Flight and Immunity. Science 339, 456–460 (2013). 40. Lindblad-Toh, K. et al. A High-Resolution Map of Human Evolutionary Constraint using 29 Mammals. Nature 478, 476–482 (2011). 41. Parker, J. et al. Genome-Wide Signatures of Convergent Evolution in Echolocating Mammals. Nature 502, 228–231 (2013). 42. Seim, I. et al. Genome Analysis Reveals Insights into Physiology and Longevity of the Brandt’s Bat Myotis brandtii. Nat. Commun. 4, 2212 (2013). 43. Botero-Castro, F. et al. Next-Generation Sequencing and Phylogenetic Signal of Complete Mitochondrial Genomes for Resolving the Evolutionary History of Leaf-Nosed Bats (Phyllostomidae). Mol. Phylogenet. Evol. 69, 728– 739 (2013). 44. Dong, D. et al. The Genomes of Two Bat Species with Long Constant 110 Frequency Echolocation Calls. Mol. Biol. Evol. 34, 20–34 (2017). 45. Archibald, A. L. et al. The Sheep Genome Reference Sequence: A Work in Progress. Anim. Genet. 41, 449–453 (2010). 46. Dong, Y. et al. Sequencing and Automated Whole-Genome Optical Mapping of the Genome of a Domestic Goat (Capra hircus). Nat. Biotechnol. 31, 135–141 (2013). 47. Qiu, Q. et al. The Yak Genome and Adaptation to Life at High Altitude. Nat. Genet. 44, 946–949 (2012). 48. Ge, R.-L. et al. Draft Genome Sequence of the Tibetan Antelope. Nat. Commun. 4, 1858 (2013). 49. Canavez, F. C. et al. Genome Sequence and Assembly of Bos indicus. J. Hered. 103, 342–348 (2012). 50. Zimin, A. V. et al. A Whole-Genome Assembly of the Domestic Cow, Bos taurus. Genome Biol. 10, R42 (2009). 51. Hu, Y. et al. Comparative Genomics Reveals Convergent Evolution Between the Bamboo-Eating Giant and Red Pandas. Proc. Natl. Acad. Sci. 114, 1081–1086 (2017). 52. Jones, S. J. et al. The Genome of the Northern Sea Otter (Enhydra lutris kenyoni). Genes 8, 379 (2017). 53. Liu, S. et al. Population Genomics Reveal Recent Speciation and Rapid Evolutionary Adaptation in Polar Bears. Cell 157, 785–794 (2014). 54. Foote, A. D. et al. Convergent Evolution of the Genomes of Marine Mammals. Nat. Genet. 47, 272–275 (2015). 55. Peng, X. et al. The Draft Genome Sequence of the Ferret (Mustela putorius furo) Facilitates Study of Human Respiratory Disease. Nat. Biotechnol. 32, 1250–1255 (2014). 56. Kirkness, E. F. et al. The Dog Genome: Survey Sequencing and Comparative Analysis. Science 301, 1898–1903 (2003). 57. Church, D. M. et al. Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse. PLOS Biol. 7, e1000112 (2009). 58. Rat Genome Sequencing Project Consortium. Genome Sequence of the Brown Norway Rat Yields Insights into Mammalian Evolution. Nature 428, 493– 111 521 (2004). 59. Prüfer, K. et al. The Bonobo Genome Compared with the Chimpanzee and Human Genomes. Nature 486, 527–531 (2012). 60. Scally, A. et al. Insights into Hominid Evolution from the Gorilla Genome Sequence. Nature 483, 169–175 (2012). 61. Carbone, L. et al. Gibbon Genome and the Fast Karyotype Evolution of Small Apes. Nature 513, 195–201 (2014). 62. Locke, D. P. et al. Comparative and Demographic Analysis of Orang-utan Genomes. Nature 469, 529–533 (2011). 63. The Marmoset Genome Sequencing and Analysis Consortium et al. The Common Marmoset Genome Provides Insight into Primate Biology and Evolution. Nat. Genet. 46, 850–857 (2014). 64. Ebeling, M. et al. Genome-Based Analysis of the Nonhuman Primate Macaca fascicularis as a Model for Drug Safety Assessment. Genome Res. 21, 1746–1756 (2011). 65. Zimin, A. V. et al. A New Rhesus Macaque Assembly and Annotation for Next-Generation Sequencing Analyses. Biol. Direct 9, 20 (2014). 66. Palesch, D. et al. Sooty Mangabey Genome Sequence Provides Insight into AIDS Resistance in a Natural SIV Host. Nature 553, 77–81 (2018). 67. O’Leary, C. E. et al. Identification of Novel MHC Class I Sequences in PigTailed Macaques by Amplicon Pyrosequencing and Full-Length cDNA Cloning and Sequencing. Immunogenetics 61, 689 (2009). 68. Madden, T. The BLAST Sequence Analysis Tool. (National Center for Biotechnology Information (US), 2003). 69. Slater, G. S. C. & Birney, E. Automated Generation of Heuristics for Biological Sequence Comparison. BMC Bioinformatics 6, 31 (2005). 70. Sievers, F. et al. Fast, Scalable Generation of High-Quality Protein Multiple Sequence Alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011). 71. Notredame, C., Higgins, D. G. & Heringa, J. T-coffee: A Novel Method for Fast and Accurate Multiple Sequence Alignment1. J. Mol. Biol. 302, 205–217 (2000). 112 72. Edgar, R. C. MUSCLE: A Multiple Sequence Alignment Method with Reduced Time and Space Complexity. BMC Bioinformatics 5, 113 (2004). 73. Cock, P. J. A. et al. Biopython: Freely Available Python Tools for Computational Molecular Biology and Bioinformatics. Bioinformatics 25, 1422– 1423 (2009). 74. Yang, Z., Wong, W. S. W. & Nielsen, R. Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection. Mol. Biol. Evol. 22, 1107–1118 (2005). |
| Reference URL | https://collections.lib.utah.edu/ark:/87278/s60062hb |



