| Title | The Louse Fly-Arsenophonus Arthropodicus association: development of a new model system for the study of insect-bacterial endosymbioses |
| Publication Type | dissertation |
| School or College | College of Science |
| Department | Biological Sciences |
| Author | Smith, Kari Lyn |
| Date | 2012-08 |
| Description | There are many bacteria that associate with insects in a mutualistic manner and offer their hosts distinct fitness advantages, and thus have likely played an important role in shaping the ecology and evolution of insects. Therefore, there is much interest in understanding how these relationships are initiated and maintained and the molecular mechanisms involved in this process, as well as interest in developing symbionts as platforms for paratransgenesis to combat disease transmission by insect hosts. However, this research has been hampered by having only a limited number of systems to work with, due to the difficulties in isolating and modifying bacterial symbionts in the lab. In this dissertation, I present my work in developing a recently described insect-bacterial symbiosis, that of the louse fly, Pseudolynchia canariensis, and its bacterial symbiont, Candidatus Arsenophonus arthropodicus, into a new model system with which to investigate the mechanisms and evolution of symbiosis. This included generating and analyzing the complete genome sequence of Ca. A. arthropodicus, which provided some evidence that Ca. A. arthropodicus has become recently associated with insects and may have evolved from an ancestor that was an insect pathogen. Additionally, I describe the development of methods for genetic modification of this bacterial symbiont and for introducing recombinant symbionts into louse fly hosts, as well as a new microinjection technique that enables the complete replacement of native symbionts with recombinant symbionts. With the generation of the symbiont genome sequence along with strategies for engineering recombinant symbionts and establishing them in an insect host, this work provides an interesting new system with which to investigate the function of specific genes in symbiosis as well as a promising new avenue of research involving paratransgenesis. |
| Type | Text |
| Publisher | University of Utah |
| Subject | Arsenophonus arthropodicus; Endosymbiosis; Evolution; Genomics; Pseudolynchia canariensis; Symbiont transmission |
| Dissertation Institution | University of Utah |
| Dissertation Name | Doctor of Philosophy |
| Language | eng |
| Rights Management | Copyright © Kari Lyn Smith 2012 |
| Format | application/pdf |
| Format Medium | application/pdf |
| Format Extent | 3,525,482 bytes |
| Identifier | etd3/id/3457 |
| ARK | ark:/87278/s6r248pt |
| DOI | https://doi.org/doi:10.26053/0H-9R1Q-JVG0 |
| Setname | ir_etd |
| ID | 197011 |
| OCR Text | Show THE LOUSE FLY-ARSENOPHONUS ARTHROPODICUS ASSOCIATION: DEVELOPMENT OF A NEW MODEL SYSTEM FOR THE STUDY OF INSECT-BACTERIAL ENDOSYMBIOSES by Kari Lyn Smith A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Biology The University of Utah August 2012 Copyright © Kari Lyn Smith 2012 All Rights Reserved The Uni v e r s i t y of Utah Gradua t e School STATEMENT OF DISSERTATION APPROVAL The dissertation of Kari Lyn Smith has been approved by the following supervisory committee members: Colin Dale Chair June 18, 2012 Date Approved Dale Clayton Member June 18, 2012 Date Approved Maria-Denise Dearing Member June 18, 2012 Date Approved Jon Seger Member June 18, 2012 Date Approved Robert Weiss Member June 18, 2012 Date Approved and by Neil Vickers Chair of the Department of __________________________Biology and by Charles A. Wight, Dean of The Graduate School. ABSTRACT There are many bacteria that associate with insects in a mutualistic manner and offer their hosts distinct fitness advantages, and thus have likely played an important role in shaping the ecology and evolution of insects. Therefore, there is much interest in understanding how these relationships are initiated and maintained and the molecular mechanisms involved in this process, as well as interest in developing symbionts as platforms for paratransgenesis to combat disease transmission by insect hosts. However, this research has been hampered by having only a limited number of systems to work with, due to the difficulties in isolating and modifying bacterial symbionts in the lab. In this dissertation, I present my work in developing a recently described insect-bacterial symbiosis, that of the louse fly, Pseudolynchia canariensis, and its bacterial symbiont, Candidatus Arsenophonus arthropodicus, into a new model system with which to investigate the mechanisms and evolution of symbiosis. This included generating and analyzing the complete genome sequence of Ca. A. arthropodicus, which provided some evidence that Ca. A. arthropodicus has become recently associated with insects and may have evolved from an ancestor that was an insect pathogen. Additionally, I describe the development of methods for genetic modification of this bacterial symbiont and for introducing recombinant symbionts into louse fly hosts, as well as a new microinjection technique that enables the complete replacement of native symbionts with recombinant symbionts. With the generation of the symbiont genome sequence along with strategies for engineering recombinant symbionts and establishing them in an insect host, this work provides an interesting new system with which to investigate the function of specific genes in symbiosis as well as a promising new avenue of research involving paratransgenesis. iv ABSTRACT...........................................................................................................................Ill LIST OF FIGURES............................................................................................................... vll LIST OF TABLES................................................................................................................... x Chapter 1 INTRODUCTION............................................................................................................1 References........................................................................................................................9 2 CHARACTERISTICS OF THE COMPLETE GENOME SEQUENCE OF CANDIDATUS ARSENOPHONUS ARTHROPODICUS........................................13 Abstract...........................................................................................................................13 Introductlon.....................................................................................................................14 Materlals and Methods...................................................................................................16 Results and Dlscusslon...................................................................................................19 Concluslon......................................................................................................................41 Acknowledgments..........................................................................................................43 References.......................................................................................................................43 3 COMPARATIVE ANALYSIS OF THE LOUSE FLY SYMBIONT, CANDIDATUS ARSENOPHONUS ARTHROPODICUS, AND THE PARASITOID WASP SYMBIONT, ARSENOPHONUS NASONIAE...................51 Abstract...........................................................................................................................51 Introductlon.....................................................................................................................52 Materlals and Methods...................................................................................................55 Results and Dlscusslon...................................................................................................57 Concluslon......................................................................................................................81 Acknowledgments.......................................................................................................... 84 References.......................................................................................................................84 4 REPLACEMENT OF NATIVE SYMBIONTS IN THE HIPPOBOSCID LOUSE FLY, PSEUDOLYNCHIA CANARIENSIS................................................. 92 TABLE OF CONTENTS Abstract...........................................................................................................................92 Introduction.....................................................................................................................93 Materials and Methods...................................................................................................96 Results...........................................................................................................................101 Discussion.....................................................................................................................111 Acknowledgments........................................................................................................115 References.....................................................................................................................116 5 CONCLUSION............................................................................................................. 120 References..................................................................................................................... 123 Appendix A ATTENUATION OF THE SENSING CAPABILITIES OF PHOQ IN TRANSITION TO OBLIGATE INSECT-BACTERIAL ASSOCIATION............ 124 B QUORUM SENSING PRIMES THE OXIDATIVE STRESS RESPONSE IN THE INSECT ENDOSYMBIONT, SODALIS GLOSSINIDIUS..................... 137 C PHYLOGENETIC ANALYSIS OF SYMBIONTS IN FEATHER FEEDING LICE OF THE GENUS COLUMBICOLA: EVIDENCE FOR REPEATED SYMBIONT REPLACEMENTS............................................................................... 149 vi LIST OF FIGURES Figure 2.1 Phylogenetic position of Ca. A. arthropodicus based on concatenated sequences of seven conserved orthologous genes......................................................................... 22 2.2 Plasmids of Ca. A. arthropodicus.................................................................................23 2.3 Genome features of Ca. A. arthropodicus................................................................... 24 2.4 COG category classification of genes........................................................................... 27 2.5 Ribosomal RNA operon organization and heterogeneity............................................31 2.6 Gene content and organization of type III secretion system islands..........................39 3.1 BLASTCLUST analysis of intact Ca. A. arthropodicus CDSs sharing amino acid sequence identity with CDSs in A. nasoniae........................................................59 3.2 Plot of genome synteny between A. nasoniae and Ca. A. arthropodicus...................61 3.3 GC skew and positions of CDSs and pseudogenes in A. nasoniae and Ca. A. arthropodicus chromosomes.......................................................................................... 63 3.4 Average sizes of pseudogenes and intact orthologs for A. nasoniae and Ca. A. arthropodicus.................................................................................................................. 65 3.5 Monte Carlo simulation of gene inactivation in a subset of Ca. A. arthropodicus orthologs........................................................................................................................67 3.6 The type III secretion system islands in A. nasoniae and Ca. A. arthropodicus.......76 3.7 Schematic illustrating the ymt gene in Ca. A. arthropodicus and its corresponding location in A. nasoniae......................................................................... 78 4.1 Locations of mutations in pseudogene yadhE of Ca. A. arthropodicus and the resulting yAadhE strain after lambda-Red mediated homologous recombination..102 4.2 Growth curves of yAadhE and WT strains and competitive growth assays........... 103 4.4 Number of Ca. A. arthropodicus symbiont genomes throughout host development.................................................................................................................. 105 4.3 Number of Ca. A. arthropodicus symbiont genomes in microinjected adult flies and F1 offspring............................................................................................................107 4.5 Number of Ca. A. arthropodicus symbiont genomes in two sets of microinjected puparia........................................................................................................................ 108 4.6 Number of Ca. A. arthropodicus symbiont genomes in F1 and F2 offspring of microinjected puparia................................................................................................. 110 A.1 Resistance to polymyxin B and cecropin A is PhoP-dependent............................... 127 A.2 Quantitative PCR analysis of transcripts derived from genes involved in lipid A modifications in S. glossinidius...................................................................................128 A.3 Thin layer chromatographic analysis of lipids extracted from wild type (wt) and phoP mutant strains of S. glossinidius grown at high (10 mM) and low (10 |iM) concentrations of magnesium......................................................................................129 A.4 The putative promoter regions of the S. glossinidius hilA homologue and the mgtCB pseudo-operon contain canonical PhoP boxes...............................................129 A.5 Response of S. glossinidius to antimicrobial peptides, acidic pH, or magnesium.. .131 A.6 Salmonella enterica strains expressing S. glossinidius PhoQ do not respond to magnesium.................................................................................................................... 132 B.1 Characterization of S. glossinidius AHL.................................................................... 139 B.2 Interactions of S. glossinidius SogR-OHHL complexes with sogIand carA promoters......................................................................................................................140 B.3 COG-based analysis of microarray expression data.................................................. 141 B.4 Influence of OHHL iron siderophore production in S. glossinidius.........................142 B.5 Degeneration of a carbapenem biosynthesis gene cluster in S. glossinidius and SOPE.............................................................................................................................143 B.6 Common ancestry of S. glossinidius and SOPE quorum sensing regulatory genes.............................................................................................................................143 C.1 Phylogeny of Columbicola spp. symbionts (bold) and related bacteria based on vi ii maximum likelihood and Bayesian analyses of a 1.46-kbp fragment of 16S rRNA gene sequences...................................................................................................162 C.2 Phylogeny of Columbicola spp. symbionts derived from maximum likelihood and Bayesian analyses of a combined data set consisting of 16S rRNA, fusA and groEL gene sequences...........................................................................................165 C.3 Phylogeny of Columbicola spp. symbionts (bold) and related bacteria based on maximum likelihood and Bayesian analyses of a 1.46-kbp fragment of the 16S rRNA gene sequence................................................................................................... 167 C.4 Homology model depicting the C. veigasimoni symbiont 16S rRNA sequence mapped onto the predicted Y. pestis 16S rRNA structure.........................................170 C.5 Homology model depicting the C. paradoxus symbiont 16S rRNA sequence mapped onto the predicted Y. pestis 16S rRNA structure.........................................173 C.6 Homology model depicting the C. columbae symbiont 16S rRNA sequence mapped onto the predicted Y. pestis 16S rRNA structure........................................ 176 C.7 Fluorescent in situ hybridization of the symbiont in C. baculoides.........................184 C.8 Comparison of the phylogenies of representative species of Columbicola spp. and their symbiotic bacteria........................................................................................ 186 C.9 Hypothetical host (grey lines) and symbiont (black lines) phylogenies generated under the symbiont replacement model......................................................................194 ix LIST OF TABLES Table 2.1 Statistics of de novo hybrid assembly of Illumina and Sanger reads.........................20 2.2 Genome features of bacteria with different lifestyles.................................................. 21 2.3 Flagellar components in Ca. A. arthropodicus.............................................................33 2.4 Homologs of TccCl and TccC2 of Ca. A. arthropodicus identified using BLAST...........................................................................................................................36 3.1 Features of the Ca. A. arthropodicus and A. nasoniae genome sequences............... 58 3.2 Unique CDSs in the Ca. A. arthropodicus genome sequence.....................................60 3.3 Presence of two-component regulatory systems (TCSs) in bacterial species based on KEGG pathway analysis.................................................................................70 4.1 Sequences of primers used for lambda-Red recombineering, qPCR and PCR 98 4.2 Number of colonies from F2 pupal extracts demonstrating kanamycin resistance..112 A.1 PCR detection of Sodalis glossinidius seven days following microinjection in tsetse and louse flies.....................................................................................................130 A.2 Distribution ofphoP-phoQ, the magnesium tranporters mgtA and mgtB, and lipid A modification genes among the insect pathogen Photorhabdus luminescens and recently derived and ancient insect symbionts.......................................................... 133 B.1 dN:dS ratios computed from pairwise comparisons of genes involved in quorum sensing in S. glossinidius, SOPE and related free-living bacteria............................144 C.1 Specimens of Columbicola used in this study............................................................155 C.2 Other 16S rDNA sequences used in this study...........................................................159 C.3 Relative-rate tests comparing molecular evolutionary rates of 16S rRNA gene sequences between different lineages of the symbionts of Columbicola spp. and free-living relatives.......................................................................................................183 x i CHAPTER 1 INTRODUCTION Bacteria have long been known to interact with a wide range of eukaryotic hosts, though historically, much study in this area has focused on organisms that cause disease. However, there are many bacteria that associate with eukaryotic organisms in a mutualistic manner and offer their hosts distinct fitness advantages. A wide variety of insects are known to harbor mutualistic bacterial symbionts (Buchner, 1965), which maximize host fitness in exchange for a constant, protected environment. These insect-bacterial symbioses may have a very ancient origin of association, consisting of "primary" symbionts that have become highly specialized to the insect host environment and are often housed in specific host-derived cells termed bacteriocytes that make up an organ called the bacteriome (Dale and Moran, 2006; Douglas, 2011). Primary symbionts are often obligately required by the insect host and serve to supplement the host diet with various nutrients, allowing it to subsist on a food source that may not be nutritionally complete (Dale and Moran, 2006). For example, the ancient symbiont Buchnera aphidicola resides within aphid hosts and serves to supplement the aphid's diet of plant sap with vital amino acids that are lacking (Douglas, 1998). Wigglesworthia glossinidia is another ancient primary symbiont associated with tsetse flies that plays a role in supplementing the tsetse's vertebrate blood diet with B vitamins and aids in host digestion of the blood meal (Akman et al., 2002; Pais et al., 2008). Without their primary 2 symbionts, insect hosts generally suffer fitness effects such as reduced fecundity or sterility and decreased life span (Nogge 1976, 1981; Douglas, 1998; Heddi et al., 1999; Pais et al., 2008). Mutualistic insect symbionts are typically vertically transmitted directly from mother to offspring, often transovarially or through ingestion of symbionts by offspring during a specific life stage (Bright and Bulgheresi, 2010). For example, B. aphidicola is transmitted via direct entry by symbionts into developing eggs or embryos (Wilkinson et al., 2003), while W. glossinidia is introduced into tsetse fly larvae through in utero feeding of the larvae on maternal milk gland secretions (Attardo et al., 2008). Due to this strict vertical transmission and their long evolutionary history of association with their insect hosts, primary symbionts often show a pattern of cospeciation with their hosts, evidenced by congruent branching patterns of symbiont and host phylogenies (Funk et al., 2000; Hosokawa et al., 2006). In addition to ancient symbionts, insects may also harbor one to a few bacterial symbionts with which they share a recent origin of association, that are often referred to as "secondary" or "facultative" symbionts. These symbionts may reside in bacteriocytes, although they often are not sequestered solely to these cells and may be found in a variety of host tissues including the fat body, hemolymph and reproductive tissues (Cheng and Aksoy, 1999; Dale et al., 2006; Oliver et al., 2010). Recently derived symbionts may provide a variety of benefits to their hosts, including diet supplementation or they may function in some capacity unrelated to host nutrition. For example, the secondary symbiont of tsetse flies, Sodalis glossinidius, retains genes for the production of B vitamins and may play a role in supplementing its host's blood diet (Nogge, 1981; Toh et al., 2006), and recent symbionts of aphids may be involved in host plant specialization (Tsuchida et al., 2004; Oliver et al., 2010). However, secondary symbionts have been implicated in a range of other functions as well, such as increasing host resistance to parasites or environmental stresses. For example, in aphids, the symbiont Regiella insecticola helps to protect its host from fungal pathogen infection (Scarborough et al., 2005), while the symbiont Hamiltonella defensa increases aphid resistance to parasitoid wasps (Oliver et al., 2009, 2010), and the presence of Serratia symbiotica enhances aphid host survival and fecundity under heat stress (Montllor et al., 2002; Russell and Moran et al., 2006). However, in many other insect associations, the roles of recently derived symbionts remain unclear. Recent mutualistic symbionts are often vertically transmitted in the same manner as primary symbionts, although they may undergo horizontal transfer between different insect hosts on occasion. Unlike ancient symbionts, the phylogenies of recently derived symbionts are not concordant with their host phylogenies (Russell et al., 2003; Dale and Moran, 2006). Additionally, some closely related symbionts have been identified in a wide range of insect hosts, indicating that these symbionts have been acquired independently from the environment or through horizontal transfer (Novakova and Hypsa, 2007; Moran et al., 2008; Novakova et al., 2009). With the advent of new sequencing technologies, there has been a rapid increase in the number of bacterial genome sequencing projects that have been completed, including many insect symbionts, with both ancient and recent origins of association with insects. This genomic information has provided much insight into the evolution and functional basis of insect-bacterial symbioses. What has become clear is that often the 3 bacterial symbionts of insects embark on a trajectory of genome degeneration and size reduction once they become associated with an insect host (Moran et al., 2008). This is predicted to be based on the fact that, once host-associated, the bacteria inhabit a very well-protected niche that is more static than the environments encountered by their free-living relatives. Therefore, there is relaxed selection on genes that may not be necessary in the new insect niche and these genes may quickly be inactivated in symbionts via large deletions, frameshifting mutations or altered start/stop codons (Burke and Moran, 2011). Additionally, genes under relaxed selection may serve as an insertion point for transposons or phage elements and there is often an expansion of these elements in recently derived symbionts (Belda et al., 2010; McCutcheon and Moran, 2012). Over time, repetitive DNA elements and inactivated genes are deleted from the genome to the point that only genes required for the symbiotic association are maintained and the genome size is greatly reduced (McCutcheon and Moran, 2012). For example, the smallest known bacterial genomes belong to primary symbionts of insects, including the psyllid symbiont Candidatus Carsonella rudii (160 kb; Nakabachi et al., 2006) and the mealybug symbiont, Candidatus Tremblaya princeps (139 kb; Lopez-Madrigal et al., 2011). These symbionts both retain genes required for the biosynthesis of amino acids that are anticipated to be lacking in the insect host diet even though they have lost many other pathways predicted to be essential for bacteria and must rely on their hosts or other symbionts for a variety of basic metabolic intermediates and cofactors (Nakabachi et al., 2006; Lopez-Madrigal et al., 2011; McCutcheon and Moran, 2012). The transition from a free-living bacterium with a large genome to a host-associated obligate symbiont with a small genome is a dynamic process and there is much 4 interest in understanding the time course and mechanisms by which this occurs. Due to their more recent origin of association, the genomes of secondary symbionts tend to represent an intermediate stage of genome degeneration, in which many genes have acquired inactivating mutations, such as those creating frameshifts and premature stop codons, but have not yet been fully deleted from the genome (Belda et al., 2010; Burke and Moran, 2011). This provides an interesting snapshot of the reductive process and these intermediate genomes may provide information on the mechanisms involved in undergoing a change from a free-living organism to an intracellular, obligate symbiont. The genome degeneration that occurs over time in insect symbionts has made study of the symbionts in the lab very difficult. The loss of many genes maintained by free-living bacteria to survive in a variety of environments makes symbionts very fastidious and difficult to isolate in culture, especially in the case of obligate primary symbionts. To date, only a few insect symbionts have been isolated in pure culture, and those that have proved amenable to culture are more recently derived symbionts that still maintain a larger gene set (Dale and Maudlin, 1999; Dale et al., 2006; Sabri et al., 2010). Culture and genetic manipulation of insect symbionts is a very important tool needed to test predictions about gene function and utility in symbioses and gain a better understanding of the molecular mechanisms required to initiate and maintain these associations. One insect symbiont that has been isolated in culture is that of tsetse flies, S. glossinidius, and recent work has involved genetic modification of this symbiont and introduction into insect hosts (Dale and Maudlin, 1999; Pontes and Dale, 2011; Pontes et al., 2011). However, the genetic techniques available for use in symbionts still lag behind those available for insect pathogens. There is a need for additional symbiont 5 study systems which are amenable to culture and genetic manipulation for functional studies in vitro and in vivo. An ideal study system for insect-bacterial symbioses would consist not only of a bacterial symbiont that can be cultured and genetically manipulated, but also a host insect that can be cultured and infected with genetic variants of symbionts to observe effects. Since many insects harbor multiple mutualistic symbionts that are transmitted maternally during insect reproduction, there are often no life stages during which these insects are aposymbiotic. This imposes a challenge in the establishment of a population of genetically modified symbionts and of observing the effects of genetic variants on the association. Recent attempts to establish tsetse flies carrying recombinant S. glossinidius have addressed this issue by first treating insects with antibiotics to reduce or eliminate wild type S. glossinidius prior to the introduction of recombinant symbionts (Weiss et al., 2006). However, antibiotics tend to lack specificity and many insects, including tsetse flies, are known to harbor multiple mutualistic symbionts (Dale and Moran, 2006). This makes it difficult to selectively eliminate a particular symbiont without affecting other members of the symbiotic flora, thereby compromising the fitness of the host insect. Thus, new methodologies are needed for introducing recombinant bacterial symbionts into insect hosts in order to investigate the molecular mechanisms of symbiosis. The work presented here focuses on the development of the association between the bacterial symbiont Candidatus Arsenophonus arthropodicus, and its louse fly host, Pseudolynchia canariensis, into a tractable system with which to study insect-bacterial symbioses. Closely related Arsenophonus species have been identified as symbionts in a wide range of distantly related insect hosts, such as triatomines (Hypsa and Dale, 1997), 6 7 ticks (Grindle et al., 2003), whiteflies (Thao and Baumann, 2004), hippoboscid flies (Trowbridge et al., 2006), and parasitoid wasps (Gherna et al., 1991), among a number of other insects (Novakova et al., 2009) and these Arsenophonus symbionts interact with their hosts in a variety of ways. These symbionts may be bacteriome-associated or distributed throughout the host body, vertically or horizontally transmitted, and they may maintain mutualistic relationships with insect hosts or behave as reproductive parasites that manipulate host reproduction, in a manner similar to the lifestyle of many Wolbachia species (Wilkes et al., 2011). Due to their widespread distribution, it seems unlikely that the general role of Arsenophonus symbionts in insect symbioses is diet supplementation, given that identified insect hosts subsist on diverse diets with differing nutrient availabilities, although it is unclear at this point what the functions of these symbionts might be in these varied associations. Candidatus Arsenophonus arthropodicus, the pigeon louse fly symbiont, displays traits common to other mutualistic endosymbionts of insects, including vertical transmission to offspring, and louse fly hosts do not exhibit any reproductive effects due to this association (Dale et al., 2006). This symbiont has been isolated in axenic culture and has proved amenable to genetic transformation with broad host-range plasmids (Dale et al., 2006). In addition, louse fly hosts can be maintained in the lab by culturing them on their native pigeon hosts. This provides a useful prospective study system for investigating the role of this bacterium in various symbioses and the mechanisms involved in initiating and sustaining successful infections in insect hosts. Here, the complete genome sequene for Ca. A. arthropodicus is presented and compared to other bacterial genomes. Understanding the gene inventory may help 8 elucidate the function of this symbiont in its louse fly host as well as provide an additional genome for comparison to gain a better understanding of insect symbiont evolution. The Ca. A. arthropodicus genome sequence is then compared to the draft sequence of another Arsenophonus symbiont, Arsenophonus nasoniae (Darby et al., 2010), a symbiont of the parasitoid wasp, Nasonia vitripennis (Gherna et al., 1991). Arsenophonus nasoniae has a different lifestyle than Ca. A. arthropodicus, with a male killing phenotype in its wasp host and it can undergo both vertical and horizontal transmission (Gherna et al., 1991; Duron et al., 2010). The gene inventories of both symbionts indicate a recent origin of association with insects and contain evidence of a transition from a pathogenic ancestor. The availability of the complete gene inventory for Ca. A. arthropodicus provides a variety of target genes of interest to begin to investigate using molecular genetic techniques. In this work, genetic modification of Ca. A. arthropodicus by homologous recombination mediated by the lambda Red recombineering system (Datsenko and Wanner, 2000) is described, as well as the introduction of modified bacteria into louse fly hosts via microinjection. Recombinant bacteria were able to successfully initiate an infection in louse flies, and underwent vertical transmission to offspring when injected into hosts during the pupal stage. Additionally, aposymbiotic louse flies were obtained as a result of the microinjection procedure, providing a means to investigate possible roles of Ca. A. arthropodicus in this association under varied environmental conditions. Given their broad host distribution, understanding the functions of this symbiont in the louse fly and the molecular mechanisms of infection might provide some insight into the ability of this genus to infect such a wide range of distantly related insects. 9 Additionally, there is much interest in developing insect platforms for studies of paratransgenesis, which entail using bacterial symbionts to express transgenes in insects to reduce their capability to transmit parasites in the wild (Coutinho-Abreu et al., 2010). Thus, the louse fly - Ca. A. arthropodicus association provides a promising new system with which to investigate the mechanisms of insect-bacterial symbioses as well as techniques of paratransgenesis for disease control. References Akman, L., A. Yamashita, H. Watanabe, K. Oshima, T. Shiba, M. Hattori, and S. Aksoy. 2002. Genome sequence of the endocellular obligate symbiont of tsetse flies, Wigglesworthia glossinidia. Nat. Genet. 32:402-407. Attardo, G.M., C. Lohs, A. Heddi, U.H. Alam, S. Yildirim, and S. Aksoy. 2008. Analysis of milk gland structure and function in Glossina morsitans: milk protein production, symbiont populations and fecundity. J. Insect Physiol. 54:1236-1242. Bright, M., and S. Bulgheresi. 2010. A complex journey: transmission of microbial symbionts. Nat. Rev. Microbiol. 8:218-230. Buchner, P. 1965. Endosymbiosis of Animals with Plant Microorganisms. John Wiley, New York. 909 pp. Burke, G.R., and N.A. Moran. 2011. Massive genomic decay in Serratia symbiotica, a recently evolved symbiont of aphids. Genome Biol. Evol. 3:195-208. Cheng, Q., and S. Aksoy. 1999. Tissue tropism, transmission and expression of foreign genes in vivo in midgut symbionts of tsetse flies. InsectMol. Biol. 8:125-132. Coutinho-Abreu, I.V., K.Y. Zhu, and M. Ramalho-Ortigao. 2010. Transgenesis and paratransgenesis to control insect-borne diseases: current status and future challenges. Parasitol. Int. 59:1-8. Dale, C., M. Beeton, C. Harbison, T. Jones, and M. Pontes. 2006. Isolation, pure culture, and characterization of "Candidatus Arsenophonus arthropodicus," an intracellular secondary endosymbiont from the hippoboscid louse fly Pseudolynchia canariensis. Appl. Environ. Microbiol. 72:2997-3004. Dale, C., and I. Maudlin. 1999. Sodalis gen. nov. and Sodalis glossinidius sp. nov., a microaerophilic secondary endosymbiont of the tsetse fly Glossina morsitans 10 morsitans. Int. J. Syst. Bacteriol. 49 Pt 1:267-275. Dale, C., and N.A. Moran. 2006. Molecular interactions between bacterial symbionts and their hosts. Cell. 126:453-465. Darby, A.C., J.H. Choi, T. Wilkes, M.A. Hughes, J.H. Werren, G.D. Hurst, and J.K. Colbourne. 2010. Characteristics of the genome of Arsenophonus nasoniae, son-killer bacterium of the wasp Nasonia. InsectMol. Biol. 19 Suppl 1:75-89. Datsenko, K.A., and B.L. Wanner. 2000. One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc. Natl. Acad. Sci. USA. 97:66406645. Douglas, A.E. 1998. Nutritional interactions in insect-microbial symbioses: aphids and their symbiotic bacteria Buchnera. Annu. Rev. Entomol. 43:17-37. Duron, O., T.E. Wilkes, and G.D. Hurst. 2010. Interspecific transmission of a male-killing bacterium on an ecological timescale. Ecol. Lett. 13:1139-1148. Funk, D.J., L. Helbling, J.J. Wernegreen, and N.A. Moran. 2000. Intraspecific phylogenetic congruence among multiple symbiont genomes. Proc. Biol. Sci. 267:2517-2521. Gherna, R.L., Werren, J.H., Weisburg, W., Cote, R., Woese, C.R., Mandelco, L., and D.J. Brenner. 1991. Arsenophonus nasoniae gen. nov., sp. nov., the causative agent of the son-killer trait in the parasitic wasp Nasonia vitripennis. Int. J. Syst. Bacteriol. 41:563-565. Grindle, N., J.J. Tyner, K. Clay, and C. Fuqua. 2003. Identification of Arsenophonus-type bacteria from the dog tick Dermacentor variabilis. J. Invertebr. Pathol. 83:264266. Heddi, A., A.M. Grenier, C. Khatchadourian, H. Charles, and P. Nardon. 1999. Four intracellular genomes direct weevil biology: nuclear, mitochondrial, principal endosymbiont, and Wolbachia. Proc. Natl. Acad. Sci. USA. 96:6814-6819. Hosokawa, T., Y. Kikuchi, N. Nikoh, M. Shimada, and T. Fukatsu. 2006. Strict host-symbiont cospeciation and reductive genome evolution in insect gut bacteria. PLoS Biol. 4:e337. Hypsa, V., and C. Dale. 1997. In vitro culture and phylogenetic analysis of "Candidatus Arsenophonus triatominarum," and intracellular bacterium from the triatomine bug, Triatoma infestans. Int. J. Syst. Bacteriol. 47:1140-1144. Lopez-Madrigal, S., A. Latorre, M. Porcar, A. Moya, and R. Gil. 2011. Complete genome sequence of "Candidatus Tremblaya princeps" strain PCVAL, an intriguing 11 translational machine below the living-cell status. J. Bacteriol. 193:5587-5588. McCutcheon, J.P., and N.A. Moran. 2012. Extreme genome reduction in symbiotic bacteria. Nat Rev Microbiol. 10:13-26. Montllor, C.B., Maxmen, A., and A.H. Purcell. 2002. Facultative bacterial endosymbionts benefit pea aphids Acyrthosiphon pisum under heat stress. Ecol. Entomol. 27:189-195. Moran, N.A., J.P. McCutcheon, and A. Nakabachi. 2008. Genomics and evolution of heritable bacterial symbionts. Annu. Rev. Genet. 42:165-190. Nakabachi, A., A. Yamashita, H. Toh, H. Ishikawa, H.E. Dunbar, N.A. Moran, and M. Hattori. 2006. The 160-kilobase genome of the bacterial endosymbiont Carsonella. Science. 314:267. Nogge, G. 1976. Sterility in tsetse flies (Glossina morsitans Westwood) caused by loss of symbionts. Experientia. 32:995-996. Nogge, G. 1981. Significance of symbionts for the maintenance of an optional nutritional state for successful reproduction in hematophagous arthropods. Parasitol. 82:101104. Novakova, E., and V. Hypsa. 2007. A new Sodalis lineage from bloodsucking fly Craterina melbae (Diptera, Hippoboscoidea) originated independently of the tsetse flies symbiont Sodalis glossinidius. FEMSMicrobiol. Lett. 269:131-135. Novakova, E., V. Hypsa, and N.A. Moran. 2009. Arsenophonus, an emerging clade of intracellular symbionts with a broad host distribution. BMC Microbiol. 9:143. Oliver, K.M., P.H. Degnan, M.S. Hunter, and N.A. Moran. 2009. Bacteriophages encode factors required for protection in a symbiotic mutualism. Science. 325:992-994. Oliver, K.M., P.H. Degnan, G.R. Burke, and N.A. Moran. 2010. Facultative symbionts in aphids and the horizontal transfer of ecologically important traits. Annu. Rev. Entomol. 55:247-266. Pais, R., C. Lohs, Y. Wu, J. Wang, and S. Aksoy. 2008. The obligate mutualist Wigglesworthia glossinidia influences reproduction, digestion, and immunity processes of its host, the tsetse fly. Appl. Environ. Microbiol. 74:5965-5974. Pontes, M.H., and C. Dale. 2011. Lambda Red-mediated genetic modification of the insect endosymbiont Sodalis glossinidius. Appl. Environ. Microbiol. 77:19181920. Pontes, M.H., K.L. Smith, L. De Vooght, J. Van Den Abbeele, and C. Dale. 2011. 12 Attenuation of the sensing capabilities of PhoQ in transition to obligate insect-bacterial association. PLoS Genet. 7:e1002349. Russell, J. A., Latorre, A., Sabater-Munoz, B., Moya, A., and N.A. Moran. 2003. Sidestepping secondary symbionts: widespread horizontal transfer across and beyond the Aphidoidea. Mol. Ecol., 12:1061-1075. Russell, J.A., and N.A. Moran. 2006. Costs and benefits of symbiont infection in aphids: variation among symbionts and across temperatures. Proc. Biol. Sci. 273:603-610. Sabri, A., P. Leroy, E. Haubruge, T. Hance, I. Frere, J. Destain, and P. Thonart. 2010. Isolation, pure culture and characterization of Serratia symbiotica sp. nov., the R-type of secondary endosymbiont of the black bean aphid Aphis fabae. Int. J. Syst. Evol. Microbiol. 61:2081-2088. Scarborough, C.L., Ferrari, J., and H.C.J. Godfray. 2005. Aphid protected from pathogen by endosymbiont. Science. 310:1781. Thao, M.L., and P. Baumann. 2004. Evidence for multiple acquisition of Arsenophonus by whitefly species (Sternorrhyncha: Aleyrodidae). Curr. Microbiol. 48:140-144. Trowbridge, R.E., K. Dittmar, and M.F. Whiting. 2006. Identification and phylogenetic analysis of Arsenophonus- and Photorhabdus-type bacteria from adult Hippoboscidae and Streblidae (Hippoboscoidea). J. Invertebr. Pathol. 91:64-68. Tsuchida, T., R. Koga, and T. Fukatsu. 2004. Host plant specialization governed by facultative symbiont. Science. 303:1989. Wilkes, T.E., Duron, O., Darby, A.C., Hypsa, V., Novakova, E., and G.D.D. Hurst. 2011. The genus Arsenophonus. In Manipulative Tenants: Bacteria Associated with Arthropods. E. Zchori-Fein, and K. Bourtzis, editors. CRC Press, Boca Raton, FL. 225-244. Wilkinson, T.L., T. Fukatsu, and H. Ishikawa. 2003. Transmission of symbiotic bacteria Buchnera to parthenogenetic embryos in the aphid Acyrthosiphon pisum (Hemiptera: Aphidoidea). Arthropod Struct. Dev. 32:241-245. CHAPTER 2 CHARACTERISTICS OF THE COMPLETE GENOME SEQUENCE OF CANDIDATUS ARSENOPHONUS ARTHROPODICUS Abstract Hippoboscid louse flies are blood-feeding parasites of pigeons that harbor an ancient obligate symbiont required for nutritional supplementation, as well as a more recently derived symbiont, Candidatus Arsenophonus arthropodicus. This recent symbiont is a member of the Arsenophonus clade, which consists of a wide array of closely related symbionts that are associated with many distantly related insect hosts. Symbionts in this clade are phylogenetically related to the free-living/opportunistic human pathogen Proteus mirabilis and the nematode symbiont/insect pathogen, Photorhabdus luminescens. We have sequenced the genome of Ca. A. arthropodicus and identified factors shared with other insect symbionts as well as virulence factors related to those used by bacterial pathogens to interact with hosts. The genome of Ca. A. athropodicus shows a bacterium in transition from a free-living lifestyle to a permanent insect association. The presence of virulence factors and toxin-encoding genes and gene fragments suggests that Ca. A. athropodicus has evolved from a pathogenic ancestor to a mutualistic insect symbiont through attenuation of pathogenic interactions that would have detrimental host effects and a transition from horizontal to vertical transmission. Introduction Many insects have developed intimate relationships with mutualistic bacterial symbionts that allow them to exploit a novel niche, and these interactions have likely played an important role in shaping the evolution of insect species (Steinert et al., 2000). While some of these relationships are known to have a recent origin of association, others are known to be ancient and obligate in nature. It has been predicted that these mutualists evolved from ancestors that were insect parasites or other parasites that used insects as vectors, and that they transitioned through a switch in bacterial transmission strategy from horizontal to vertical, concomitant with an attenuation in parasite virulence, which benefits the host and increases the likelihood of symbiont transmission (Ewald, 1987; Steinert et al., 2000; Weeks et al., 2007). Genome sequences for many bacterial symbionts of insects have been generated in the last decade and have provided a framework for understanding the genomic changes that occur over time as bacteria take up residence in their insect hosts. Over the course of long-term host restriction, bacterial symbionts undergo substantial gene inactivation and loss due to the relaxation of selection on genes that are not essential in the symbiotic lifestyle (McCutcheon and Moran, 2012). Genome degeneration is proposed to be exacerbated by an increased rate of fixation of slightly deleterious mutations which occurs as a consequence of frequent population bottlenecks during host reproduction (Moran, 1996; Moya et al., 2008). As might be expected, recently derived symbionts are found to have larger gene inventories than ancient symbionts, and the recently derived symbionts maintain genes that share sequence homology with virulence factors used by related bacterial pathogens (Dale et al., 2001; Degnan et al., 2009; Moya et al., 2008). 14 For example, the tsetse fly symbiont, Sodalis glossinidius, is known to utilize type III secretion systems to facilitate insect cell invasion and intracellular proliferation (Dale et al., 2001; Dale et al., 2005). These secretion systems are known to play major roles in pathogen interaction with host cells, allowing pathogens to deliver effector proteins directly into the cytoplasm of target cells (Hueck, 1998). However, these virulence factors are not found in the substantially reduced genomes typical of ancient symbionts, most likely due to the fact that they have been lost in the transition to a highly specialized, obligate association (Dale and Moran, 2006). Therefore, some virulence factors that are required for the lifestyle of bacterial pathogens may play only a temporary role in the transition from parasitism to mutualism. Here we present the genome sequence of Candidatus Arsenophonus arthropodicus, a recently derived symbiont of the pigeon louse fly, Pseudolynchia canariensis. Members of the Arsenophonus group of bacteria are found in a broad range of insect hosts which are phylogenetically diverse (Dale et al., 2006; Novakova et al., 2009) and recent screens identified Arsenophonus species in 5% of the insects surveyed (Duron et al., 2008). The interactions between Arsenophonus symbionts and their insect hosts are quite variable, ranging from beneficial mutualisms to reproductive parasitisms, and involve mechanisms of both vertical and horizontal transmission (Wilkes et al., 2011). The genome of Ca. A. arthropodicus shares traits common to other recently derived insect symbionts, including a genome size and gene inventory that is reduced from free-living relatives but substantially larger than those of ancient symbionts. The presence of genes or gene fragments that share sequence homology with known virulence factors involved in pathogenesis suggests that the Ca. A. arthropodicus - louse fly 15 symbiosis originated from a bacterial ancestor that was an insect pathogen or other pathogen that utilized insects, such as louse flies, as a transmission vector. Materials and Methods DNA preparation and library construction for Sanger sequencing Ca. A. arthropodicus was isolated from louse fly pupae and liquid cultures were maintained at 28°C in Mitsuhashi and Maramorosch medium (MM medium) as described previously (Dale et al., 2006). Purified genomic DNA was extracted using a DNeasy Blood and Tissue Kit (Qiagen) from 3 mls of Ca. A. arthropodicus culture harvested by centrifugation in mid-log phase of growth. The genomic DNA was hydrodynamically sheared by repeated passage through a 0.005 inch orifice to an average size range of 8-12 kb. Sheared DNA was blunt end-repaired with T4 DNA polymerase and phosphorylated with T4 polynucleotide kinase prior to blunt-end ligation with biotinylated adaptor oligonucleotides. Adaptored DNA was then purified by capture on streptavidin coated beads. Plasmid vector pWD42 (Robb et al., 2001), a copy-number inducible derivative of plasmid R1, was prepared and ligated with adaptors complementary to the insert adaptors and the insert DNA was then annealed to the adaptored vector. Vectors containing genomic DNA inserts were transformed into chemically competent E. coli XL-10 Gold cells (Stratagene) and plated on Terrific Broth (TB) media plates with ampicillin to select for transformants. Plasmid induction and Sanger sequencing After overnight growth at 30°C, clones were picked into liquid TB and grown for 16 hours at 30°C. Runaway plasmid replication was induced by incubating in a 42°C 16 shaking water bath for 2.25 hours. Standard alkaline lysis plasmid preparation procedures were used to collect plasmid DNA and plasmids were digested with NotI (NEB) to check insertion rate frequency. Paired end sequencing reads were generated from the inserts using a BigDye terminator cycle sequencing kit (Applied Biosystems) with primers complementary to vector sequences flanking the insert. DNA was ethanol precipitated to remove excess fluorescent terminators and analyzed on an ABI 3730 96- capillary instrument. Paired-end Sanger sequencing reads were then analyzed and assembled using the Phred/Phrap/Consed (Ewing and Green, 1998; Ewing et al., 1998; Gordon et al., 1998) assembly software with default parameters. Library construction and Illumina sequencing A DNA sequencing library was constructed using the Illumina Paired-End DNA Sample Prep Kit according to manufacturer's instructions. All DNA purification steps were performed using a Qiagen PCR purification kit. Briefly, 5 p,g of purified Ca. A. arthropodicus genomic DNA was fragmented by nebulization. Fragmented DNA was end-repaired with T4 DNA polymerase and DNA polymerase I Klenow fragment. Single ‘A' bases were added to the 3' ends of the DNA fragments with Klenow exo (3' to 5' exo minus) and purified fragments were ligated to adaptors with 3' ‘T' overhangs. Ligated products were subjected to gel electrophoresis and products in the 150-200 bp range were gel extracted and purified. DNA fragments with adaptors on both ends were enriched by 18 cycles of PCR using primers complementary to sequences on the adaptor ends. Cluster generation and 36-bp paired-end sequencing was performed on an Illumina Genome Analyzer II. 17 Hybrid assembly o f sequencing reads A hybrid assembly of paired-end Sanger and Illumina reads was generated using the CLC Genomics Workbench (CLC bio, Aarhus, Denmark) high-throughput sequencing de novo assembly algorithm. Contigs were manually inspected for misassemblies and edited using Consed (Gordon et al., 1998). Sequence gap closures were performed by primer walking on the appropriate gap spanning Sanger library clones. Remaining physical gaps were closed by Sanger sequencing of PCR products spanning gaps, which were generated by touchdown PCR using Phusion DNA Polymerase (Finnzymes). The assembly was verified by aligning all sequencing reads back to the assembly consensus sequence to confirm that an equal coverage distribution was obtained and that paired-end sequences were correctly arranged. Genome annotation and analysis The genome was annotated using the NCBI Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP), which uses a combination of GeneMark and Glimmer to predict genes (Borodovsky and McIninch, 1993; Lukashin and Borodovsky, 1998; Delcher et al., 1998), and uses tRNAscan-SE (Lowe and Eddy, 1997) to predict tRNAs. Ribosomal RNAs were predicted using BLAST searches (Altschul et al., 1990) against an RNA database. The Conserved Domain Database (Marchler-Bauer et al., 2011) and Cluster of Orthologous Groups (Tatusov et al., 1997) were used to assign putative functions to genes. The annotation obtained from the PGAAP was then manually examined in Artemis (Rutherford et al., 2000) and each predicted CDS was verified by performing a BLAST search against the NCBI non-redundant protein database to confirm start and stop codons and to identify candidate pseudogenes containing frameshifting 18 mutations or modified start/stop codons. Circular genome representations were generated using DNAPlotter (Carver et al., 2009). Sequence alignments were performed using Clustal X (Larkin et al., 2007). Metabolic capability was assessed using Pathway Tools (Dale et al., 2010) and the KEGG (Kyoto Encyclopedia of Genes and Genomes; Kanehisa and Goto, 2000) automatic annotation server (KAAS; Moriya et al., 2007). Phylogenetics A bacterial species phylogeny was generated based on the concatenated nucleotide sequences of seven conserved genes (frr, gyrB, infB, pheS, prfA, pth, and tsf). The concatenated gene sequences were aligned using Clustal X (Larkin et al., 2007) and maximum likelihood trees were constructed using PhyML (Guindon and Gascuel, 2003). PhyML parameters were: HKY85 model of sequence evolution, starting from 20 random trees, with the proportion of invariable sites estimated from the data and 100 bootstrap replicates. Results and Discussion Genome sequencing The Ca. A. arthropodicus genome sequence was generated via a hybrid sequencing approach using Sanger sequencing and next-generation Illumina sequencing of paired-end libraries. The Sanger and Illumina sequencing reads were assembled together using the CLC Genomics Workbench de novo assembly algorithm and yielded 123 contigs ranging in size from 95 bp to 187,172 bp (Table 2.1). Sequence gaps in the assembly were closed by primer walking gap spanning clones, while physical gaps were closed by primer walking PCR products (Table 2.1). The final assembly comprises three 19 20 Table 2.1. Statistics o f de novo hybrid assembly o f Illumina and Sanger reads. Chromosomal contigs in initial assembly (size) Average chromosomal contig length Plasmid contigs in initial assembly (size) Average genome coverage Number of reads in assembly Number of primer walks Chromosomal contigs in final assembly (size) Plasmids closed in final assembly (total size) 119 (2,726,065 bp) 3 (2,873,045 bp) 3 (98,909 bp) 22,908 bp 4 (95,856 bp) 259X 25,920,686 386 large contigs (1,437,952 bp, 945,492 bp, and 489,601 bp) that make up the chromosome, separated by gaps of approximately 20 kb, and three closed circular plasmid contigs. Sequence reads that are not integrated into these contigs assemble into small contigs (< 2 kb) of repetitive phage sequences that have not yet been placed into the chromosomal gaps. Given the coverage obtained by both sequencing methods (~3X Sanger coverage, ~200X Illumina coverage), it is highly unlikely that there are any remaining unique coding sequences within these gaps. General features o f the genome The genome features of Ca. A. arthropodicus were compared with other related gamma-proteobacteria for which genome sequences are available, that have lifestyles ranging from free-living bacteria to ancient, obligate symbionts of insects (Table 2.2). A phylogeny of these bacteria was constructed (Fig. 2.1) based on concatenated nucleotide sequence alignments of seven conserved orthologous genes using PhyML (Guindon and Gascuel, 2003). Based on this phylogeny, and supported by other phylogenetic constructions based on 16S ribosomal RNA sequences (Dale et al., 2006; Darby et al., 2010; Novakova et al., 2009), the closest sequenced relatives to Ca. A. arthropodicus are the free-living Proteus mirabilis and the nematode symbiont/insect pathogen Table 2.2. Genome features of bacteria with different lifestyles. Free-living or symbiotic relatives Recently acquired symbionts Ancient symbionts Escherichia Dickeya Proteus Photorhabdus Ca. A. Sodalis Arsenophonus Buchnera Wigglesworthia coli K12 dadantii mirabilis luminescens arthropodicus glossinidius nasoniae aphidicola glossinidia Chromosome (bp) 4,639,221 4,922,802 4,063,606 5,688,987 2,873,045 4,171,146 3,567,128a 640, 681 697, 742 GC content (%) 50.8 56.3 38.9 42.8 38.1 54.7 37.3 26.2 22.5 Predicted CDS 4,146 4,549 3,693 4,683 2,231 2,432 3,332 571 611 Pseudogenes 179 22 24 222 395 1,501 135 13 8 rRNA operons 7 7 7 7 7 7 8-10b 2 2 tRNA 89 75 83 85 64 69 52 32 34 Plasmids 0 0 1 0 3 3 2/-> or moreb 2 1 a Size of chromosome and plasmid scaffolds. b Estimates based on fractured assembly. 21 22 Figure 2.1. Phylogenetic position of Ca. A. arthropodicus based on concatenated sequences of seven conserved orthologous genes. Phylogenetic analysis based on maximum likelihood estimation using PhyML (Guindon and Gascuel, 2003), with bootstrap values indicated, based on 100 replicates. GenBank accession numbers for sequences used for tree construction are provided in Table 2.2. Photorhabdus luminescens (Fig. 2.2). Trees were also generated with nhPhyML (Boussau and Guoy, 2006), using a nonhomogenous model of sequence evolution that allows for varying rates of sequence evolution on different branches of the tree, and the resultant tree topology remained the same. The Ca. A. arthropodicus genome is predicted to comprise three plasmids (Fig. 2.2) and a single 2.88 Mb chromosome (Fig. 2.3), which is reduced in size from that of its close free-living relative, Proteus mirabilis, that has a genome size of 4.06 Mb (Table 2.2; Pearson et al., 2008). The genome of Ca. A. arthropodicus displays a moderate bias toward AT, with an increased AT content of 61.9%, a feature which is not typical of free- 23 Figure 2.2. Plasmids of Ca. A. arthropodicus. Depictions of the three plasmids present in the Ca. A. arthropodicus genome. Red arrows indicate genes on leading strand, green arrows indicate genes on lagging strand, blue arrows indicate pseudogenes. living bacteria and recently-associated insect symbionts, such as Escherichia coli (Blattner et al., 1997), Dickey a dadantii (Glasner et al., 2011) and Sodalis glossinidius (Toh et al., 2006), but which is often observed in more ancient insect symbionts, such as Buchner a aphidicola (Shigenobu et al., 2000) and Wigglesworthia glossinidia (Akman et al., 2002). However, the genome sequences of Arsenophonus relatives P. mirabilis (Pearson et al., 2008) and P. luminescens (Duchaud et al., 2003) are also biased, with AT content around 60% (Table 2.2). Thus, a bias toward AT in Ca. A. arthropodicus may be a feature that is inherent to it and its close relatives rather than a result of insect host restriction. The Ca. A. arthropodicus genome contains numerous pseudogenes (-15% of the predicted genes; Fig. 2.3), a feature which is common to other symbionts that are in the early stages of genome degeneration, such as S. glossinidius, in which -38% of predicted genes are anticipated to be pseudogenes (Belda et al., 2010; Toh et al., 2006), but which is not a prominent feature of free-living bacteria and long established endosymbionts, 24 2800000 2t 1400000 Figure 2.3. Genome features of Ca. A. arthropodicus. From outer track: genes encoded on the leading strand (red), genes on lagging strand (green), pseudogenes (light blue), tRNAs (orange), rRNAs (dark blue), phage genes (pink), type III secretion system genes (brown), genes encoding predicted toxins- TC, RTX, Mcf (purple), ICS/IS911 repeats (light green). Innermost circle depicts GC skew. Gaps between contigs in the scaffold are indicated by (X) and contigs were ordered based on PCR analysis. both of which tend to maintain relatively low proportions of pseudogenes (Table 2.2). Additionally, the genome of Ca. A. arthropodicus contains repetitive insertion sequences and phage regions (15% of the CDSs; Fig. 2.3), another characteristic that is common to recently acquired symbionts such as S. glossinidius (21% of CDSs; Belda et al., 2010; Toh et al., 2006) and the aphid symbiont, Hamiltonella defensa (21%; Degnan et al., 2009), but occur more infrequently in free-living bacteria and ancient symbionts (McCutcheon and Moran, 2012). To better understand the gene inventory of Ca. A. arthropodicus, Cluster of Orthologous Groups (COG; Tatusov et al., 1997) information was assigned to predicted genes (Fig. 2.4). There are 698 genes (31% of the CDSs) that are currently annotated as encoding "hypothetical proteins" and could not yet be assigned to COG categories. For the remaining gene inventory, the COG categories that have accumulated the lowest percentages of pseudogenes are F (3.1% - genes involved in nucleotide transport and metabolism), O (4.4% - Posttranslational modification, protein turnover and chaperones) and J (4.8% - Translation, ribosomal structure and biogenesis). The increased retention of genes in these categories is predicted to be due to the fact that they provide core functions involved with bacterial survival and replication within the insect host. COG categories showing increased percentages of pseudogenes include categories U (36.8% - Intracellular trafficking, secretion and vesicular transport), N (16.5% - Cell motility) and T (15.4% - Signal transduction mechanisms). A higher proportion of genes within these categories may be unneccessary or redundant within the static insect environment, encoding products that might be dispensible or are provided by the insect host. For example, COG category U includes genes involved in protein secretion systems and 25 conjugation, such as the type III and type IV secretion systems, which are often absent in ancient symbionts, including W. glossinidia and B. aphidicola (Akman et al., 2002; Shigenobu et al., 2000). Additionally, many ancient symbionts, such as Blochmannia and Baumannia species have lost genes associated with flagellar motility (COG category N) over the course of their intimate intracellular relationship with an insect host (Toft and Fares, 2008), indicating that motility is not a requirement for insect symbionts. However, some ancient intracellular symbionts, such as W. glossinidia and B. aphidicola, have retained a subset or near complete inventory of the flagellar genes, which may be required during specific life stages or may function as a protein secretion system (Akman et al., 2002; Maezawa et al., 2006). The necessity for a number of regulatory signal transduction systems (COG category T) may be reduced in the constant environment within the body of an insect, leading to a loss of a higher proportion of genes involved in altering gene expression in response to changing environmental conditions. Preferential loss of genes in these COG categories suggests that many systems of regulation, secretion and motility may not be required once a bacterium becomes sequestered in a static, protected host environment. Therefore, these genes are under relaxed selection to be maintained and can more easily be inactivated and lost from the genome, without detrimental fitness effects. Plasmids Ca. A. arthropodicus harbors three plasmids of 55,855 bp, 33,161 bp and 9,893 bp (Fig. 2.2). The largest plasmid, pARAl, maintains open reading frames (ORFs) that share homology with tra conjugative transfer genes, although many have been truncated or have accumulated frameshifting mutations, which is predicted to render them 26 COG Category Number of genes Translation, ribosomal structure and biogenesis- J Transcription- K Replication, recombination and repair- L Cell cycle control, cell division, chromosome partitioning- D Defense mechanisms- V Siganl transduction mechanisms- T Cell wall/membrane/envelope biogenesis- M Cell motility- N Intracellular trafficking, secretion and vesicular transport- U Posttranslational modification, protein turnover, chaperones- O Energy production and conversion- C Carbohydrate transport and metabolism- G Amino acid transport and metabolism- E Nucleotide transport and metabolism- F Coenzyme transport and metabolism- H Lipid transport and metabolism-1 Inorganic ion transport and metabolism- P Secondary metabolites biosynthesis, transport and catabolism- Q General function prediction only- R □ T . . ^ Intact G enes Function unknown- S □ Pseudogenes 50 100 150 200 250 Figure 2.4. COG category classification of genes. Dark bars indicate the number of intact predicted coding sequences and light bars indicate the number of pseudogenes per category. to 28 nonfunctional. The loss of functionality of these genes is not surprising given that this bacterium likely has little opportunity to engage in parasexual recombination since it is a symbiont that is sequestered in an insect host (McCutcheon and Moran, 2012). Intact coding sequences (CDSs) carried by pARA1 include a set of ascorbate-specific phosphotransferase system (PTS) components. There are additional PTS components specific to other sugars present in the chromosome, however, this system might play an important metabolic role, given that plasmid-borne genes may have increased expression relative to the chromosomal genes since plasmids are often present in multiple copies, or may provide greater regulatory control over gene expression (Moran et al., 2003). The presence of these plasmid-borne PTS genes may allow Ca. A. arthropodicus to utilize ascorbate as an additional carbohydrate source. Since ascorbate is found in high levels in white blood cells (Corti et al., 2010), this vitamin may be available in the louse fly's blood meal for use by Ca. A. arthropodicus. Plasmid pARA2 encodes a set of genes that share homology with the tri/virB type IV secretion system genes, which are found in a variety of bacteria and are also widespread throughout the Wolbachia, possibly playing a role in their associations with insect hosts (Pichon et al., 2009). The vir genes present on pARA2 all appear to be intact, suggesting that these genes are important in the symbiosis or that this plasmid may be a more recent acquisition that has not yet undergone any degenerative evolution. Type IV secretion systems commonly function in conjugation and DNA transport as well as transport of effector molecules into the cytoplasm of eukaryotic cells (Rances et al., 2008). A CDS homologous to the gene encoding the queuosine biosynthesis protein, QueC, is present on both the pARA1 and pARA2 plasmids, with a copy also located on the Ca. A. arthropodicus chromosome. Queuosine is involved in tRNA modifications that are predicted to enhance the efficiency of protein synthesis (Cicmil and Huang, 2008). Analysis of the three queC CDSs shows that the two plasmid-borne copies are 94% identical at the amino acid level but only share 70% identity with the chromosomal queC. The chromosomally-encoded QueC shares 97% amino acid sequence identity with the QueC of Arsenophonus nasoniae, but has an N-terminal truncation of 55 amino acyl residues. It is unclear whether this truncation may have diminished or abolished the function of QueC, necessitating the need for recruitment of additional gene copies, or whether the acquisition of additional plasmid-borne copies led to relaxed selection on the chromosomal copy, making its maintenance unneccessary. Either way, QueC is likely important in the lifestyle of Ca. A. arthropodicus, given that it is present on multiple plasmids that may result in enhanced expression levels. Plasmid pARA3 is a small plasmid composed mostly of pseudogenes, however, it maintains intact genes encoding a LuxR-like transcriptional regulator, a colicin Ib immunity protein and a cytosine permease. There are two additional genes encoding cytosine permeases located within the chromosome of Ca. A. arthropodicus. Multiple copies of cytosine permeases are also present in the genomes of the insect symbionts S. glossinidius and A. nasoniae (Toh et al., 2006; Darby et al. 2010), suggesting an important role for cytosine transport in these recently established symbionts. 29 Genes encoding structural RNAs Candidatus Arsenophonus arthropodicus maintains 64 tRNAs comprising all 20 amino acids and seven ribosomal RNA operons consisting of three different organizational structures (Fig. 2.5A). The rRNA operons differ in the identities of interspersing tRNAs as well as in the number of 5S subunits included, a variation which is also observed in the close free-living relative of Ca. A. arthropodicus, P. mirabilis (Pearson et al., 2008). Since bacterial phylogenies often utilize rRNA sequences which can be affected by heterogeneity in the sequences (Sorfova et al., 2008) we aligned the paralogous copies of these rRNA genes to determine if they encoded divergent sequences. The 5S subunits were relatively polymorphic (Fig. 2.5B), while the 16S subunit showed little heterogeneity (Fig. 2.5C) and the 23S subunit was completely conserved in all seven operons. Metabolic capability Analysis using the KEGG automatic annotation server (KAAS; Moriya et al., 2007) indicates that Ca. A. arthropodicus retains complete metabolic pathways for glycolysis, gluconeogenesis, the tricarboxylic acid cycle and the pentose phosphate pathway. Additionally, Ca. A. arthropodicus retains pathways for the synthesis of certain B vitamins, including biotin, thiamine, nicotinic acid and pantothenate. B vitamins are predicted to be lacking in the louse fly host's blood diet and may need to be supplemented by bacterial symbionts, in a similar manner to that observed in tsetse flies. Both the primary and secondary symbionts of the tsetse fly, Wigglesworthia glossinidia and S. glossinidius, respectively, are predicted to play a role in diet supplementation, providing essential B vitamins to the tsetse fly that are not present in sufficient quantities 30 A >[>[> i - >n >r> ■ ->»»■----- >r> 16S tRNA 23 S 5S * * * ***** **** * ** * *** ****** ************************* ***************** ********************** **** * * TCTGGCGGTAATAGCACGGTGGTCCCACCTGACCCCATGCCGAACTCAGAAGTGAAATGCCGTAGCGCCGATGGTAGTGTGGGTTCTCCCCATGTGAGAGTAGGGAGCTGCCAGAC TCTGGCGGTAATAGCACGGTGGTCCCACCTGACCCCATGCCGAACTCAGAAGTGAAATGCCGTAGCGCCGATGGTAGTGTGGGTTCTCCCCATGTGAGAGTAGGGAGCTGCCAGAC TCTGGCGGTAATAGCACGGTGGTCCCACCTGACCCCATGCCGAACTCAGAAGTGAAATGCCGTAGCGCCGATGGTAGTGTGGGTTCTCCCCATGTGAGAGTAGGGAGCTGCCAGAC TCTGGCGGTAATAGCGCGGTGGTCCCACCTGACCCCATGCCGAACTCAGAAGTGAAATGCCGTAGCGCCGATGGTAGTGTGGGTTCTCCCCATGTGAGAGTAGGGAGCTGCCAGAC TCTGGCGGTAATAGCGCGGTGGTCCCGCCTGACCCCATGCCGAACTCAGAAGTGAAATGCCGTAGCGCCGATGGTAGTGTGGGTTCTCCCCATGTGAGAGTAGGGAGCTGCCAGAC TCTGGCGGTAATAGCGCGGTGGTCCCACCTGACCCCATGTCGAACTCAGAAGTGAAATGCCGTAGCGCCGATGGTAGTGTGGGTTCTCCCCATGTGAGAGTAGGGAGCTGCCAGAC TCTGGCGGTAATAGCGCGGTGGTCCCACTTGACCCCATGCCGAACTCAGAAGTGAAATGCCGTAGCGCCGATGGTAGTGTGGGTTCTCCCCATGTGAGAGTAGGGAGCTGCCAGAC CTTAGTAGCTATAGCGCGGTAGCTCCACCTGATCCCATGCCGAACTCAGAAGTGAAATGCCGTAGTGCCGATGGTAGTGTGGGGTCTCCCCATGTGAGAGTAGGGAACTGCTAAGC CTTAGTAGCTATAGCGCGGTAGCTCCACCTGATCCCATGCCGAACTCAGAAGTGAAATGCCGTAGTGCCGATGGTAGTGTGGGGTCTCCCCATGTGAGAGTAGGGAACTGCTAAGC 1 ................1 0 ....................2 0 ....................3 0 ....................4 0 ....................5 0 ....................6 0 ....................7 0 ....................8 0 ....................9 0 ................. 1 0 0 ..................1 1 0 ............... c * * * * * * * * * * * * * * * * * * * * * * GTGGAGGAATACCGGTGGCGAAG GTGGAGGAATACCGGTGGCGAAG GTGGAGGAATACCGGTGGCGAAG GTGGAGGAATACCGGTGGCGAAG GTGGAGGAATACCGGTGGCGAAG GTGGAGGAATACCGGTGGCGAAG GTGGAGGAA-ACCGGTGGCGAAG 7 1 0 .................7 2 0 .................. 730 * * * * * * * * * * * * * * * * * * * * * * TGTGGCCT- CCGGAGCTAACGCG' TGTGGCCT-CCGGAGCTAACGCG TGTGGCCT-CCGGAGCTAACGCG' TGTGGCCT-CCGGAGCTAACGCG TGTGGCCT-CCGGAGCTAACGCG TGTGGCCT-CCGGAGCTAACGCG TGTGGCCTTCCGGAGCTAACGCG 8 5 0 .................8 6 0 ...................870 TACGTACCCGGGCCTTGTACACAC TACGTACCCGGGCCTTGTACACAC TACGTACCCGGGCCTTGTACACAC TACGTACCCGGGCCTTGTACACAC TACGTATCCGGGCCTTGTACACAC TACGTATCCGGGCCTTGTACACAC TACGTATCCGGGCCTTGTACACAC 1 3 8 0 .............. 1 3 9 0 ................1400 **************** ******* GCAAAAGAAGTAGGTA-GCTTAAC GCAAAAGAAGTAGGTA-GCTTAAC GCAAAAGAAGTAGGTAAGCTTAAC GCAAAAGAAGTAGGTA-GCTTAAC GCAAAAGAAGTAGGTA-GCTTAAC GCAAAAGAAGTAGGTA-GCTTAAC GCAAAAGAAGTAGGTA-GCTTAAC 1 4 3 0 .............. 1 4 4 0 ................1450 Figure 2.5. Ribosomal RNA operon organization and heterogeneity. A. Three different rRNA operon structures in Ca. A. arthropodicus. B. Alignment of complete 5S rDNA sequences from all seven operons. C. Portions of 16S sequence alignments containing non-conserved positions. Asterisks indicate sites conserved in all sequences. Base numbers are indicated below alignment. 32 in vertebrate blood (Akman et al., 2002; Nogge, 1981). Ca. A. arthropodicus may play an analogous role to S. glossinidius in the louse fly, which incidentally also maintains an ancient bacteriome-associated symbiont that is closely related to W. glossinidia (Dale et al., 2006). Motility Many recently acquired and some ancient insect symbionts retain genes necessary to assemble functional flagella for locomotion (Akman et al., 2002; Darby et al., 2010; Degnan et al., 2010; Toh et al., 2006). Due to their prevalence in many symbiont genomes, it has been postulated that flagella may play an important role in allowing symbionts to infect specific tissues in the host insect or may be required for motility during certain insect life stages, such as during insect reproduction when symbionts are maternally transmitted (Rio et al., 2012). Flagellar genes are present in Ca. A. arthropodicus and are located within a single 38 kb island. However, it appears that genes encoding the essential flagellar components FliF, FliI, FliJ, FliP, FliS, and FlhA (Berg, 2003) have been truncated by frameshifting mutations or premature stop codons in Ca. A. arthropodicus (Table 2.3). Additionally, genes encoding truncated components of the FliG and FliH proteins have been fused into a single ORF. We tested for Ca. A. arthropodicus motility using swarm plate assays and motility was not observed. Furthermore, neither basal bodies nor complete flagella were visualized using transmission electron microscopy (data not shown). Based on this information, we predict that Ca. A. arthropodicus do not produce functional flagella. However, it is possible that a limited inventory of flagellar gene components provides an alternative 33 Table_2.3._Flagellar_components_in_Ca. A. arthropodicus. Status3 Lengthb % Identityc Ortholog lengthd Function FliZ + 171 76 176 Regulation FliA + 240 73 240 Sigma factor FliC + 378 58 363 Flagellin FliD + 468 47 472 Filament cap FliS p 47 58 132 Chaperone (FliC) FliT + 112 40 117 Chaperone (FliD) FliE p 42 72 110 Basal body FliF p 417 54 573 M ring FliG pe 347e 72 332 Rotor/switch complex FliH pe 347e 57 240 Protein export FliI p 289 71 457 Export ATPase FliJ p 108 49 148 Chaperone FliK p 322 37 470 Hook-length control FliL + 160 60 160 Rotational control FliM + 342 65 343 Switch complex FliN + 139 71 136 Switch complex FliO + 148 35 148 Protein export FliP p 146 80 256 Protein export FliQ + 89 62 89 Protein export FliR + 267 57 260 Protein export FlgL + 312 34 314 Hook-filament junction FlgK + 548 39 547 Hook-filament junction FlgJ p 65 54 328 Peptidoglycan hydrolase FlgI p 197 73 368 P-ring FlgH + 203 74 247 L-ring FlgG + 260 81 260 Distal rod FlgF p 134 60 251 Proximal rod FlgE + 419 57 406 Hook FlgD + 238 45 264 Hook assembly FlgC + 134 78 134 Proximal rod FlgB + 135 64 137 Proximal rod FlgA p 107 45 218 P-ring assembly FlgM + 100 33 99 Anti-sigma factor FlgN p 104 45 146 Chaperone (FlgK/FlgL) FlhA p 422 80 696 Protein export FlhB + 383 64 382 Protein export MotB + 293 59 349 Stator MotA + 297 71 297 Stator FlhC + 189 76 193 Regulation FlhD + 125 80 116 Regulation a (+) indicates intact component, (p) indicates a predicted pseudogene. b Length o f largest component ORF in amino acyl residues. c Percent amino acid sequence identity shared by Ca. A. arthropodicus and P. mirabilis orthologs. d Length o f P. mirabilis orthologs in amino acyl residues. e Truncated FliG/FliH components fused into single ORF o f indicated size. 34 function, perhaps in terms of protein secretion, as has been predicted for B. aphidicola (Maezawa et al., 2006). Insecticidal toxins The Ca. A. arthropodicus genome retains remnants of the high molecular weight insecticidal toxin complexes (TCs) discovered in P. luminescens (Bowen et al., 1998). Photorhabdus luminescens is a mutualistic symbiont of entomopathogenic nematode hosts and plays a vital role in its host life cycle, due to the large number of toxins it produces that are targeted toward insects (Waterfield et al., 2009). When a nematode host harboring P. luminescens infects an insect, it releases the bacterial symbionts into the insect hemocoel. Under the conditions encountered in an insect host, P. luminescens expresses the TCs and other toxic compounds that rapidly kill the insect and both nematode and symbiont then obtain nutrition from and reproduce within the cadaver (Waterfield et al., 2009). The TC products encoded by P. luminescens have been shown to have high levels of toxicity toward a wide range of insect hosts (Blackburn et al., 2005; Waterfield et al., 2009). Therefore, these insecticidal toxin genes would not be expected to be present in the genome of a mutualistic symbiont of insects. For complete toxicity of the TCs, multiple toxin components are required, of three different types: a TcA-like component, a TcB-like component, and a TcC-like component (ffrench-Constant and Waterfield, 2005). Located adjacent to each other in the Ca. A. arthropodicus genome are two genes, tcdA and tcdB, which are homologs of members of the TcA and TcB classes, respectively. However, both ORFs contain one or more frameshifting mutations, likely rendering them nonfunctional. There are two TcC component-encoding CDSs sharing homology with tccC genes of P. luminescens, located in separate regions of the genome. Both tccC CDSs maintain intact reading frames of a similar size to homologs in P. luminescens and other related bacteria (Table 2.4), even though their predicted partner components have acquired frameshifting mutations. The presence of intact insecticidal tccC gene homologs in Ca. A. arthropodicus is intriguing, given that the additional genic components required to produce a functional toxin complex have been inactivated by mutations. The TccC components in P. luminescens show sequence similarity to other RHS/YD-repeat proteins and have been shown to have ADP-ribosyltransferase activity and function in actin polymerization and cytoskeletal rearrangement of insect cells (Lang et al., 2010a; 2010b). This is also the mode of action of many other bacterial toxins as well as some type III secretion system effector proteins (Aktories et al., 2011; Dean, 2011). Thus, it is possible that TccC plays a role independent of its TC partners within the Ca. A. arthropodicus - louse fly association, again reflecting a transition from a parasitic to mutualistic lifestyle. It is also possible that these toxin components do not provide a useful function in the symbiosis and that these candidate genes have simply not yet accumulated mutations that allow us to identify them as inactive. It is predicted that, after association with a host insect, there are many genes that are not required in the new environment that evolve under relaxed selection and are prone to accumulating nonsense mutations or insertions/deletions (Burke and Moran, 2011). However, there is anticipated to be a lag between the onset of relaxed selection and the accumulation of an inactivating mutation, such that any given genome could maintain a subset of "cryptic" pseudogenes (Burke and Moran, 2011). Given that the TcA and TcB components have already been inactivated in 35 36 Table 2.4. Homologs of TccCl and TccC2 of Ca. A. arthropodicus identified using BLAST.____________________________________ Locus tag Product Sizea % IDb % IDc Ca. A. arthropodicus ARA 06730 Tcc C1 1035 58 Ca. A. arthropodicus ARA 04415 TccC2 860 58 Photorhabdus luminescens plu4167 TccC1 1043 52 58 Photorhabdus luminescens plu4182 TccC6 965 58 57 Serratia entomophila pADAP 57 SepC 973 58 61 Serratia proteomaculans Spro 0382 YD repeat protein 852 57 59 Xenorhabdus bovienii XBJ1 1574 TccC 932 59 63 Xenorhabdus nematophila XNC1 2567 TccC 1016 47 59 Yersinia pestis y2020 Insecticidal toxin 874 55 59 Yersinia pseudotuberculosis YPTS 2310 YD repeat protein 994 55 59 a Size o f product in amino acyl residues. b Percent amino acid identity shared with TccC1 o f Ca. A. arthropodicus. c Percent amino acid identity shared with TccC2 o f Ca. A. arthropodicus. Ca. A. arthropodicus, it is possible that the remaining TcC components represent such cryptic pseudogenes and that they have no functional role in the symbiosis. Other toxins A CDS sharing 68% amino acid sequence identity with the Yersinia murine toxin (ymt) gene is present in Ca. A. arthropodicus. This murine toxin displays phospholipase D activity and is toxic to mice and rats, but has also been shown to be required for Yersiniapestis survival in the midgut of its flea vector (Hinnebusch et al., 2002). In Y. pestis, Ymt is predicted to protect the bacterium from cytotoxic products originating from the flea's digestion of its blood meal (Hinnebusch et al., 2002). Since the louse fly host of Ca. A. arthropodicus also feeds exclusively on vertebrate blood, it is conceivable that this gene plays a similar role. Additionally, there are a number of candidate ORFs in Ca. A. arthropodicus that share homology with the repeats-in-toxin (RTX) family of proteins (Lin et al., 1999) and the RTX ABC transporter. Some of the transporter genes appear to be intact, but genes 37 encoding the large RTX proteins contain numerous frameshifting mutations and are broken up into many fragments. Additionally, there are small gene fragments remaining that share significant sequence identity with the mcf (makes caterpillars floppy) gene, which produces a toxin involved in cell lysis of insect midguts (Dowling et al., 2004) and is used by P. luminescens to kill insects. Identification of TC components, combined with the fragments of RTX and Mcf toxins that remain in the genome, suggest that at some point in the evolutionary past, Ca. A. arthropodicus played a role as an insect pathogen and that these virulence properties have been attenuated in the switch to a mutualistic relationship with the louse fly. Type III secretion systems Type III secretion systems (T3SSs) are commonly utilized by pathogenic bacteria to inject effector proteins directly into the cytosol of eukaryotic host cells (Hueck, 1998). These effector proteins often lead to host cytoskeletal rearrangements, allowing invasion of host cells by bacterial pathogens, or they may alter host immune response, or induce host cell apoptosis (Mota and Cornelis, 2005). Genes encoding the type III secretion apparatus, or injectisome, have been identified in divergent bacterial pathogens as well as in recently associated bacterial symbionts, and many of the structural genes are conserved between them (Cornelis, 2006). The effectors translocated via this system, however, may differ substantially between bacterial species, leading to different host effects (Dean, 2011; Mota and Cornelis, 2005). For example, there are two islands encoding T3SS genes in the tsetse symbiont, S. glossinidius, with synteny to T3SS islands identified in Yersinia and Salmonella spp. (Dale et al., 2001; Dale et al., 2005). In these two bacterial pathogens, the T3SS effectors suppress host immune responses and inhibit phagocytosis, allowing the invasion of host cells, and may also lead to induction of host cell apoptosis (Cornelis, 2002; Pavlova et al., 2011). Sodalis glossinidius appears to use these systems in a similar manner to intitiate successful intracellular infections in a tsetse fly host, although without detrimental cytotoxic effects on host cells (Dale et al., 2001; Dale et al., 2005). There are three islands of genes encoding components of T3SSs in the genome of Ca. A. arthropodicus. Like S. glossinidius, island 1 appears to be related to the T3SS of Yersinia spp., with similar gene content and organization (Fig. 2.6A), while the other two islands show synteny with the SPI-1 island found in Salmonella spp. (Fig. 2.6B; Dale et al., 2005; Hueck, 1998). One of the SPI-1-like islands in Ca. A. arthropodicus contains an inversion of the genes encoding PrgH-HrpE when compared to SP1-1, and is also interrupted between invA and invB by a phage element (Fig. 2.6B). Within each island, there are genes that have accumulated frameshifting mutations or large truncations, indicating that no single island encodes a complete secretion apparatus (Fig. 2.6A,B). However, it is possible that the individual genic components of each island function synchronously to facilitate secretion. At least one full-length copy is present of all genes predicted to be required for successful secretion, except for genes encoding the components SctV/InvA, SctU/SpaS, and SctC/InvG. The SctU and SctV proteins contain a transmembrane domain and a cytosolic domain, which are involved in formation of the export pore in the inner membrane and recognition and switching of substrates to be exported, respectively, while SctC is a secretin and is involved in formation of the channel through the outer membrane (Diepold et al., 2011). These three proteins are predicted to be necessary for formation of a functional injectisome, though in Ca. A. 38 39 Yersinia enterocolitica d ^ 5 5 9 f e w Q y g < i i. Q 2 H 2 ^ g a ! Q? ^ O O O O O O O O O O O O u r u r 1 <-> o o o o o o o . o > ^ > * o o o o o c j > ~ * o o > < x o C/DC/}C/3C/3t/ )&nc/DC/3C/3</)Gr!C/)>>‘ <Z}GOCX)(Z)C/3(X) M 73 ^ H W W W i J i J J J t / l ^ ^ W U C / ] \(/ \l/ \l/ \l/ \l/ \l/\l/ \l/ .... 0 0 0 ^ 0 ^ 0 ^ - V V V ~ ^ o ^ - - o o o ^ o o o Co. A. arthropodicus island 1 (23.7 kb) B □ Secretion apparatus ■ Translocator/Effector □ Hypothetical protein □ Chaperone □ Regulation Salmonella enterica serovar Typhimurium (SPI1) ....... d . ~ 1 C I u-Op-K & EP 60 PP 00 00 - > > > > > X O £ £ i £ i S £ £ S = V V V ^ ...t » » " m u S Z O o H O ' c g o g < c Q U Q < c u CQ % CQ <C Q ffi i ~ ,-, ^ < W > > a a a a a f t o . -2 .&.&.&.& a -2 & go 3 £ £?£?£?£?£?& ££, i s) ts) ini siir>mui c/3 gn go go c/3 *2 oo on ^£2 DC DH c l o , o - Dh O K r- r " O - & < 0 « Q « C U D « C - • - • ■ *"* • * *™" ^ i 7 ) W W 7 5 7 ) C/3 G O C/3 C/3 C/3 C/3 C/3 \|/ \|/ \|/ \|/ \|/ \|/ ■ ■ W W t W W t W - \ l / \ l / \ l / \ l / \ ( / \ l / \ l / \ l / \ ( / OOOOOO......O c > c > c > [p h a ^ ^ Ca. A. artliropodicus islands 2 (22.9 kb) and 3 (21.0 kb T3SS, split by 26.8 kb phage) Figure 2.6. Gene content and organization of type III secretion system islands. A. The T3SS island from Yersinia enterocolitica compared to island 1 in Ca. A. arthropodicus. B. The SPI1 T3SS island from Salmonella enterica serovar Typhimurium compared to two islands of genes found in Ca. A. arthropodicus. Arrows indicate gene orientation and colors indicate the predicted role of the gene products named. Dotted lines indicate locations which contain genes in other islands. Anticipated pseudogenes are denoted by ¥• arthropodicus, all three copies of genes encoding these products contain mutations causing frameshifts or premature stop codons. It is possible that flagellar gene products could complement the T3SS components in Ca. A. arthropodicus, given that many of the structural elements of both T3SS and flagella are functionally similar (Aizawa, 2001) and the two structures are predicted to have evolved from common ancestry (Gophna et al., 2003; Saier, 2004). A recent study by Stone et al. (2010), shows that the non-motile intracellular pathogen, Chlamydia pneumoniae, retains three flagellar homologs,flhA,fliF and flil, that can interact and co- purify with certain T3SS components. The truncated T3SS proteins in Ca. A. arthropodicus, SctV and SctU, are structurally related to flagellar FlhA and FlhB, respectively, and an intact CDS encoding for FlhB is present in Ca. A. arthropodicus (Table 2.3). However, flhA contains a frameshifting mutation that creates a premature stop codon (Table 2.3). It is still unclear which flagellar protein SctC is structurally and functionally closest to, although the outer ring component FlgI has been suggested (Aizawa, 2001), which also contains frameshifting mutations in Ca. A. arthropodicus (Table 2.3). At this point it is unclear whether any of the truncated T3SS or flagellar products maintain functionality, and the requirement for an operational T3SS in Ca. A. arthropodicus needs to be tested using genetic approaches. Insertion sequences and phage regions There are 22 copies of a bacterial insertion sequence (IS) element that shares homology with IS911, which is a member of the IS3 family of IS elements. These IS elements generally consist of two adjacent, partially overlapping ORFs (Rousseau et al., 2004). BLAST homology searches show that the two consecutive ORFs in Ca. A. arthropodicus are closest to a 283 amino acid ORF and a 103 amino acid ORF of the IS3/IS911 family of Shewanella denitrificans. This IS element is repeated in seven positions in the genome in which it appears to be full-length and intact. In addition, there are 15 copies of this IS element which have similar truncations in both ORF components. These truncated elements may still retain the activity required for transposition, or may be mobilized in trans using the integrase and transposase enzymes from the intact elements. 40 41 There are a large number of repetitive phage genes present in the Ca. A. arthropodicus genome. Putative phage elements comprise 10% of the genome sequence and 15% of the predicted CDSs. There are phage genes sharing sequence similarity with those of the APSE bacteriophages present in H. defensa, a facultative symbiont of aphids (Degnan et al., 2009; Moran et al., 2005). The APSE phages in H. defensa carry genes encoding a variety of toxins, such as Shiga toxin, cytolethal distending toxin and YD-repeat proteins, that confer protective benefits to the aphid host in defense against parasitoid wasps (Degnan and Moran, 2008; Oliver et al., 2009). While there are multiple copies of APSE phage structural and regulatory sequences, intact toxin-encoding genes from these elements are absent in the Ca. A. arthropodicus genome. There is a gene fragment present that shares homology with the YD-repeat proteins, however only a small portion of the gene remains and it has likely been inactivated by deletion of a majority of the ORF. Therefore, it does not appear that Ca. A. arthropodicus is using these phages in the same way as H. defensa to protect the louse fly against parasitoid infection. However, we cannot rule out the possibility that other insecticidal proteins, such as the TccC components, provide an analogous function such that these symbionts could play a role in host defense. Conclusion The genome sequence of Ca. A. arthropodicus provides evidence of a genome in transition from a bygone parasitic lifestyle to a mutualistic associate of insects. It shares genomic traits common to secondary symbionts that have a recent host association, such as decreased inventory of functional genes and increased numbers of repetitive elements compared to free-living relatives, but still retains a larger gene inventory than insect symbionts with an ancient origin of association, demonstrating that this transition to insect mutualism is still in progress. The presence of potential toxins and virulence factors and islands of genes encoding type III secretion system components and effectors, implies that the ancestor of this symbiont was formerly a pathogen. The switch to a mutualistic insect-associated lifestyle is predicted to have led to inactivation of toxin products and effectors that would cause direct harm to the host, and this is evidenced by the inactivated remnants of various toxins, such as the RTXs and TC components. The continued maintenance of some intact predicted virulence factors that are shared with pathogenic bacteria may indicate that mutualistic symbionts use these factors to interact with host cells using the same machinery as pathogens, but without decreasing host fitness. However, such a hypothesis would need to be clarified using genetic experiments. The genome sequence of Ca. A. arthropodicus provides further insight into potential roles this symbiont might play in its wide range of insect hosts. Within the louse fly, it is possible that this symbiont serves to supplement the host blood diet with B vitamins, given the retention of genes involved in vitamin synthesis, although the prospect of other functions cannot be ruled out. Other insect symbionts are known to perform a variety of roles in hosts outside of diet supplementation, such as increasing host defenses against parasitoids by expressing toxins with insecticidal activity (Oliver et al., 2003, 2009) or increasing host thermal tolerance to temperature stresses (Montllor et al., 2002; Russell and Moran, 2006). The genome of Ca. A. arthropodicus contains elements related to thermal stress and insect pathogenicity that might also be used in these manners to enhance louse fly host fitness, including numerous heat shock proteins 42 43 as well as T3SS effectors and the TccC components of the insecticidal toxin complexes. Genetic experimentation and the use of aposymbiotic lines of flies will be necessary to further elucidate if Ca. A. arthropodicus functions in one of these capacities similar to other symbionts or if it has a unique role within louse flies. Since Ca. A. arthropodicus has been isolated in culture, the genome sequence can now be used to enable genetic testing of the utility of specific genes in the symbiosis. To date, little is known about the interactions that occur between symbiont and host on a molecular level since few insect symbionts have proved amenable to axenic culture and genetic modification. This work provides a complete gene inventory for a symbiont that can be cultured and manipulated in the lab, facilitating further study of symbiont gene functions in vivo. References Aizawa, S. I. 2001. Bacterial flagella and type III secretion systems. FEMS Microbiol Lett, 202:157-64. Akman, L., A. Yamashita, H. Watanabe, K. Oshima, T. Shiba, M. Hattori, and S. Aksoy. 2002. Genome sequence of the endocellular obligate symbiont of tsetse flies, Wigglesworthia glossinidia. Nat Genet, 32:402-7. Aktories, K., A. E. Lang, C. Schwan, and H. G. Mannherz. 2011. Actin as target for modification by bacterial protein toxins. FEBS J, 278:4526-43. Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. JMolBiol, 215:403-10. Belda, E., A. Moya, S. Bentley, and F. J. Silva. 2010. Mobile genetic element proliferation and gene inactivation impact over the genome structure and metabolic capabilities of Sodalis glossinidius, the secondary endosymbiont of tsetse flies. BMC Genomics, 11:449. Blackburn, M. B., J. M. Domek, D. B. Gelman, and J. S. Hu. 2005. The broadly insecticidal Photorhabdus luminescens toxin complex a (Tca): activity against the Colorado potato beetle, Leptinotarsa decemlineata, and sweet potato whitefly, 44 Bemisia tabaci. J Insect Sci, 5:32. Blattner, F. R., G. Plunkett, 3rd, C. A. Bloch, N. T. Perna, V. Burland, M. Riley, J. Collado-Vides, J. D. Glasner, C. K. Rode, G. F. Mayhew, et al. 1997. The complete genome sequence of Escherichia coli K-12. Science, 277:1453-62. Borodovsky, M. and J. McIninch. 1993. GeneMark: Parallel gene recognition for both DNA strands. Comput. Chem, 17:123-133. Boussau, B., and M. Gouy. 2006. Efficient likelihood computations with nonreversible models of evolution. SystBiol, 55:756-68. Bowen, D., T. A. Rocheleau, M. Blackburn, O. Andreev, E. Golubeva, R. Bhartia, and R. H. ffrench-Constant. 1998. Insecticidal toxins from the bacterium Photorhabdus luminescens. Science, 280:2129-32. Burke, G. R., and N. A. Moran. 2011. Massive genomic decay in Serratia symbiotica, a recently evolved symbiont of aphids. Genome BiolEvol, 3:195-208. Carver, T., N. Thomson, A. Bleasby, M. Berriman, and J. Parkhill. 2009. DNAPlotter: circular and linear interactive genome visualization. Bioinformatics, 25:119-20. Cicmil, N., and R. H. Huang. 2008. Crystal structure of QueC from Bacillus subtilis: an enzyme involved in preQ1 biosynthesis. Proteins, 72:1084-8. Cornelis, G. R. 2002. Yersinia type III secretion: send in the effectors. J Cell Biol, 158:401-8. Cornelis, G. R. 2006. The type III secretion injectisome. Nat Rev Microbiol, 4:811-25. Corti, A., A. F. Casini, and A. Pompella. Cellular pathways for transport and efflux of ascorbate and dehydroascorbate. Arch Biochem Biophys, 500:107-15. Dale, C., M. Beeton, C. Harbison, T. Jones, and M. Pontes. 2006. Isolation, pure culture, and characterization of "Candidatus Arsenophonus arthropodicus," an intracellular secondary endosymbiont from the hippoboscid louse fly Pseudolynchia canariensis. ApplEnviron Microbiol, 72:2997-3004. Dale, C., T. Jones, and M. Pontes. 2005. Degenerative evolution and functional diversification of type-III secretion systems in the insect endosymbiont Sodalis glossinidius. Mol Biol Evol, 22:758-66. Dale, C., and N. A. Moran. 2006. Molecular interactions between bacterial symbionts and their hosts. Cell, 126:453-65. Dale, C., S. A. Young, D. T. Haydon, and S. C. Welburn. 2001. The insect endosymbiont 45 Sodalis glossinidius utilizes a type III secretion system for cell invasion. Proc Natl Acad Sci U S A, 98:1883-8. Dale, J. M., L. Popescu, and P. D. Karp. 2010. Machine learning methods for metabolic pathway prediction. BMC Bioinformatics, 11:15. Darby, A. C., J. H. Choi, T. Wilkes, M. A. Hughes, J. H. Werren, G. D. Hurst, and J. K. Colbourne. 2010. Characteristics of the genome of Arsenophonus nasoniae, son-killer bacterium of the wasp Nasonia. Insect Mol Biol, 19 Suppl 1:75-89. Dean, P. 2011. Functional domains and motifs of bacterial type III effector proteins and their roles in infection. FEMSMicrobiol Rev, 35:1100-25. Degnan, P. H., T. E. Leonardo, B. N. Cass, B. Hurwitz, D. Stern, R. A. Gibbs, S. Richards, and N. A. Moran. 2010. Dynamics of genome evolution in facultative symbionts of aphids. Environ Microbiol. 12:2060-2069. Degnan, P. H., and N. A. Moran. 2008. Diverse phage-encoded toxins in a protective insect endosymbiont. Appl Environ Microbiol. Degnan, P. H., Yu, Y., Sisneros, N., Wing, R.A., and N.A. Moran. 2009. Hamiltonella defensa, genome evolution of protective bacterial endosymbiont from pathogenic ancestors. Proc Natl Acad Sci U S A, 106:9063-9068. Delcher, A. L., Hormon, D., Kasif, S., White, O. and S. L. Salzberg. 1999. Improved microbial gene identification with GLIMMER. Nucleic Acids Res, 27: 4636-4641. Diepold, A., U. Wiesand, and G. R. Cornelis. 2011. The assembly of the export apparatus (YscR,S,T,U,V) of the Yersinia type III secretion apparatus occurs independently of other structural components and involves the formation of an YscV oligomer. Mol Microbiol, 82:502-14. Dowling, A. J., P. J. Daborn, N. R. Waterfield, P. Wang, C. H. Streuli, and R. H. ffrench- Constant. 2004. The insecticidal toxin Makes caterpillars floppy (Mcf) promotes apoptosis in mammalian cells. Cell Microbiol, 6:345-53. Duchaud, E., C. Rusniok, L. Frangeul, C. Buchrieser, A. Givaudan, S. Taourit, S. Bocs, C. Boursaux-Eude, M. Chandler, J. F. Charles, et al. 2003. The genome sequence of the entomopathogenic bacterium Photorhabdus luminescens. Nat Biotechnol, 21:1307-13. Duron, O., D. Bouchon, S. Boutin, L. Bellamy, L. Zhou, J. Engelstadter, and G. D. Hurst. 2008. The diversity of reproductive parasites among arthropods: Wolbachia do not walk alone. BMC Biol, 6:27. Ewald, P. W. 1987. Transmission modes and evolution of the parasitism-mutualism 46 continuum. Ann N YAcadSci., 503:295-306. Ewing, B. and P. Green. 1998. Basecalling of automated sequencer traces using phred. II. Error probabilities. Genome Res, 8:186-194. Ewing, B., Hillier, L., Wendl, M. and P. Green. 1998. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res, 8:175-185. ffrench-Constant, R., and N. Waterfield. 2005. An ABC Guide to the Bacterial Toxin Complexes. Adv ApplMicrobiol, 58C:169-183. Glasner, J. D., C. H. Yang, S. Reverchon, N. Hugouvieux-Cotte-Pattat, G. Condemine, J. P. Bohin, F. Van Gijsegem, S. Yang, T. Franza, D. Expert, et al. 2011. Genome sequence of the plant-pathogenic bacterium Dickeya dadantii 3937. J Bacteriol, 193:2076-7. Gophna, U., E. Z. Ron, and D. Graur. 2003. Bacterial type III secretion systems are ancient and evolved by multiple horizontal-transfer events. Gene, 312:151-63. Gordon, D., Abajian, C. and P. Green. 1998. Consed: A graphical tool for sequence finishing. Genome Res, 8:195-202. Guindon, S., and O. Gascuel. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. SystBiol, 52:696-704. Hinnebusch, B. J., A. E. Rudolph, P. Cherepanov, J. E. Dixon, T. G. Schwan, and A. Forsberg. 2002. Role of Yersinia murine toxin in survival of Yersinia pestis in the midgut of the flea vector. Science, 296:733-5. Hueck, C. J. 1998. Type III protein secretion systems in bacterial pathogens of animals and plants. MicrobiolMol Biol Rev, 62:379-433. Kanehisa, M., and S. Goto. 2000. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res, 28:27-30. Lang, A. E., G. Schmidt, A. Schlosser, T. D. Hey, I. M. Larrinua, J. J. Sheets, H. G. Mannherz, and K. Aktories. 2010a. Photorhabdus luminescens toxins ADP-ribosylate actin and RhoA to force actin clustering. Science, 327:1139-42. Lang, A. E., G. Schmidt, J. J. Sheets, and K. Aktories. 2010b. Targeting of the actin cytoskeleton by insecticidal toxins from Photorhabdus luminescens. Naunyn Schmiedebergs Arch Pharmacol, 383:227-35. Larkin, M. A., G. Blackshields, N. P. Brown, R. Chenna, P. A. McGettigan, H. McWilliam, F. Valentin, I. M. Wallace, A. Wilm, R. Lopez, J. D. Thompson, T. J. Gibson, and D. G. Higgins. 2007. Clustal W and Clustal X version 2.0. 47 Bioinformatics, 23:2947-8. Lin, W., K. J. Fullner, R. Clayton, J. A. Sexton, M. B. Rogers, K. E. Calia, S. B. Calderwood, C. Fraser, and J. J. Mekalanos. 1999. Identification of a Vibrio cholerae RTX toxin gene cluster that is tightly linked to the cholera toxin prophage. Proc Natl Acad Sci U S A, 96:1071-6. Lowe, T. M. and S. R. Eddy. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucl. Acids Res, 25: 955-964. Lukashin A. and M. Borodovsky. 1998. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res, 26:1107-1115. Maezawa, K., S. Shigenobu, H. Taniguchi, T. Kubo, S. Aizawa, and M. Morioka. 2006. Hundreds of flagellar basal bodies cover the cell surface of the endosymbiotic bacterium Buchnera aphidicola sp. strain APS. J Bacteriol, 188:6539-43. Marchler-Bauer, A., S. Lu, J. B. Anderson, F. Chitsaz, M. K. Derbyshire, C. DeWeese- Scott, J. H. Fong, L. Y. Geer, R. C. Geer, N. R. Gonzales, et al. 2011. CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res, 39:D225-9. McCutcheon, J. P., and N. A. Moran. 2012. Extreme genome reduction in symbiotic bacteria. Nat Rev Microbiol. Montllor, C. B., Maxmen, A., and A.H. Purcell. 2002. Facultative bacterial endosymbionts benefit pea aphids Acyrthosiphon pisum under heat stress. Ecol Entomol, 27:189-195. Moran, N. A. 1996. Accelerated evolution and Muller's rachet in endosymbiotic bacteria. Proc Natl Acad Sci U S A, 93:2873-8. Moran, N. A., P. H. Degnan, S. R. Santos, H. E. Dunbar, and H. Ochman. 2005. The players in a mutualistic symbiosis: insects, bacteria, viruses, and virulence genes. Proc Natl Acad Sci U S A, 102:16919-26. Moran, N. A., G. R. Plague, J. P. Sandstrom, and J. L. Wilcox. 2003. A genomic perspective on nutrient provisioning by bacterial symbionts of insects. Proc Natl Acad Sci US A, 100 Suppl 2:14543-8. Moriya, Y., M. Itoh, S. Okuda, A. C. Yoshizawa, and M. Kanehisa. 2007. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res, 35:W182-5. Mota, L. J., and G. R. Cornelis. 2005. The bacterial injection kit: type III secretion systems. Ann Med, 37:234-49. 48 Moya, A., J. Pereto, R. Gil, and A. Latorre. 2008. Learning how to live together: genomic insights into prokaryote-animal symbioses. Nat Rev Genet, 9:218-29. Nogge, G. 1981. Significance of symbionts for the maintenance of an optional nutritional state for successful reproduction in hematophagous arthropods. Parasitol, 82:101104. Novakova, E., V. Hypsa, and N. A. Moran. 2009. Arsenophonus, an emerging clade of intracellular symbionts with a broad host distribution. BMC Microbiol, 9:143. Oliver, K. M., P. H. Degnan, M. S. Hunter, and N. A. Moran. 2009. Bacteriophages encode factors required for protection in a symbiotic mutualism. Science, 325:992-4. Oliver, K. M., J. A. Russell, N. A. Moran, and M. S. Hunter. 2003. Facultative bacterial symbionts in aphids confer resistance to parasitic wasps. Proc Natl Acad Sci U S A, 100:1803-7. Paul, K., G. Gonzalez-Bonet, A. M. Bilwes, B. R. Crane, and D. Blair. 2011. Architecture of the flagellar rotor. EMBO J, 30:2962-71. Pavlova, B., J. Volf, P. Ondrackova, J. Matiasovic, H. Stepanova, M. Crhanova, D. Karasova, M. Faldyna, and I. Rychlik. 2011. SPI-1-encoded type III secretion system of Salmonella enterica is required for the suppression of porcine alveolar macrophage cytokine expression. Vet Res, 42:16. Pearson, M. M., M. Sebaihia, C. Churcher, M. A. Quail, A. S. Seshasayee, N. M. Luscombe, Z. Abdellah, C. Arrosmith, B. Atkin, T. Chillingworth, et al. 2008. Complete genome sequence of uropathogenic Proteus mirabilis, a master of both adherence and motility. J Bacteriol, 190:4027-37. Pichon, S., D. Bouchon, R. Cordaux, L. Chen, R. A. Garrett, and P. Greve. 2009. Conservation of the Type IV secretion system throughout Wolbachia evolution. Biochem Biophys Res Commun, 385:557-62. Rances, E., D. Voronin, V. Tran-Van, and P. Mavingui. 2008. Genetic and functional characterization of the type IV secretion system in Wolbachia. J Bacteriol, 190:5020-30. Rio, R. V., R. E. Symula, J. Wang, C. Lohs, Y. N. Wu, A. K. Snyder, R. D. Bjornson, K. Oshima, B. S. Biehl, N. T. Perna, M. Hattori, and S. Aksoy. 2012. Insight into the transmission biology and species-specific functional capabilities of tsetse (Diptera: glossinidae) obligate symbiont wigglesworthia. MBio, 3. Robb, F. T., D. L. Maeder, J. R. Brown, J. DiRuggiero, M. D. Stump, R. K. Yeh, R. B. 49 Weiss, and D. M. Dunn. 2001. Genomic sequence of hyperthermophile, Pyrococcus furiosus: implications for physiology and enzymology. Methods Enzymol, 330:134-57. Rousseau, P., E. Gueguen, G. Duval-Valentin, and M. Chandler. 2004. The helix-turn-helix motif of bacterial insertion sequence IS911 transposase is required for DNA binding. Nucleic Acids Res, 32:1335-44. Russell, J. A., and N. A. Moran. 2006. Costs and benefits of symbiont infection in aphids: variation among symbionts and across temperatures. Proc Biol Sci, 273:603-10. Rutherford, K., J. Parkhill, J. Crook, T. Horsnell, P. Rice, M. A. Rajandream, and B. Barrell. 2000. Artemis: sequence visualization and annotation. Bioinformatics, 16:944-5. Saier, M. H., Jr. 2004. Evolution of bacterial type III protein secretion systems. Trends Microbiol, 12:113-5. Shigenobu, S., H. Watanabe, M. Hattori, Y. Sakaki, and H. Ishikawa. 2000. Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature, 407:81-6. Sorfova, P., A. Skerikova, and V. Hypsa. 2008. An effect of 16S rRNA intercistronic variability on coevolutionary analysis in symbiotic bacteria: molecular phylogeny of Arsenophonus triatominarum. Syst ApplMicrobiol, 31:88-100. Steinert, M., U. Hentschel, and J. Hacker. 2000. Symbiosis and pathogenesis: evolution of the microbe-host interaction. Naturwissenschaften, 87:1-11. Stone, C. B., D. C. Bulir, J. D. Gilchrist, R. K. Toor, and J. B. Mahony. 2010. Interactions between flagellar and type III secretion proteins in Chlamydia pneumoniae. BMC Microbiol, 10:18. Tatusov, R. L., E. V. Koonin, and D. J. Lipman. 1997. A genomic perspective on protein families. Science, 278:631-7. Toft, C., and M. A. Fares. 2008. The evolution of the flagellar assembly pathway in endosymbiotic bacterial genomes. MolBiolEvol, 25:2069-76. Toh, H., B. L. Weiss, S. A. Perkin, A. Yamashita, K. Oshima, M. Hattori, and S. Aksoy. 2006. Massive genome erosion and functional adaptations provide insights into the symbiotic lifestyle of Sodalis glossinidius in the tsetse host. Genome Res, 16:149-56. Waterfield, N. R., T. Ciche, and D. Clarke. 2009. Photorhabdus and a host of hosts. Annu Rev Microbiol, 63:557-74. 50 Weeks, A. R., M. Turelli, W. R. Harcombe, K. T. Reynolds, and A. A. Hoffmann. 2007. From parasite to mutualist: rapid evolution of Wolbachia in natural populations of Drosophila. PLoSBiol, 5:e114. Watson, W. T., T. D. Minogue, D. L. Val, S. B. von Bodman, and M. E. Churchill. 2002. Structural basis and specificity of acyl-homoserine lactone signal production in bacterial quorum sensing. Mol Cell, 9:685-94. Wilkes, T. E., Duron, O., Darby, A. C., Hypsa, V., Novakova, E., and G. D. D. Hurst. 2011. The genus Arsenophonus. In E. Zchori-Fein, and K. Bourtzis (ed.), Manipulative tenants: Bacteria associated with arthropods. CRC Press, Boca Raton, FL. CHAPTER 3 COMPARATIVE ANALYSIS OF THE LOUSE FLY SYMBIONT, CANDIDATUS ARSENOPHONUS ARTHROPODICUS, AND THE PARASITOID WASP SYMBIONT, ARSENOPHONUS NASONIAE Abstract The Arsenophonus clade of bacteria consists of symbionts associated with a wide range of insect hosts having different diets and lifestyles. These symbiotic bacteria maintain a wide array of interactions with these hosts ranging from mutualistic to parasitic and vertically to horizontally transmitted. We have compared the genome sequences of two members of the Arsenophonus clade: the mutualistic louse fly symbiont, Candidatus Arsenophonus arthropodicus and the reproductive parasite of parasitoid wasps, Arsenophonus nasoniae. These two symbionts are closely related and have genomes that share a high level of synteny, although they reside in different insect hosts with distinct host effects. Analysis of gene inventory and pseudogene content suggests that both of these symbionts have become associated with insects recently, although Ca. A. arthropodicus has undergone more degenerative evolution than A. nasoniae. Compared to free-living relatives, both Arsenophonus symbionts have lost regulatory elements, likely due to their lifestyles within the static insect host environment, and have accumulated pseudogenes and repetitive elements, similar to other 52 recently associated insect symbionts. Given the diversity in the range of associations displayed by Arsenophonus symbionts, this group presents an interesting system with which to study the molecular mechanisms of insect interactions. Introduction Many microbial genome sequencing projects have been completed over the past decade, providing a wealth of information on a wide variety of bacterial species, including many bacterial symbionts of insects. Comparative genomics provides a means to use this data to understand their evolutionary histories and the role of bacterial symbionts in their hosts. Here we compare the genome sequences of two members of the Arsenophonus genus, which contains species that have now been identified as associates in a wide range of distantly related insect hosts, including plant-feeding insects such as aphids and psyllids, and vertebrate blood-feeders including ticks, triatomines and flies (Duron et al., 2008; Novakova et al., 2009). Arsenophonus species have been classified as both obligate, bacteriome-associated symbionts, as well as recently acquired facultative symbionts, and different species may undergo vertical or horizontal transmission (Wilkes et al, 2011). Owing to the diversity of association types involving Arsenophonus species and differences observed in transmission strategies in their insect hosts, this genus presents an interesting group of bacteria with which to investigate the mechanistic basis of insect symbiosis. There are two species of Arsenophonus that have been isolated in axenic culture and represent the first members of this group to have undergone genome sequencing: Arsenophonus nasoniae, a symbiont of a parasitoid wasp (Darby et al., 2010; Gherna et al., 1991; Werren et al., 1986), and Candidatus Arsenophonus arthropodicus, a symbiont of the pigeon louse fly (Dale et al., 2006). These two closely related symbionts inhabit insects that have different lifestyles and the bacteria seem to have distinct effects on their hosts, making them useful for comparing genes involved in insect association. Arsenophonus nasoniae is a bacterial symbiont associated with the Hymenopteran parasitoid wasp, Nasonia vitripennis. Parasitoid wasps, such as N. vitripennis, parasitize cyclorrhaphous Dipteran flies, ovipositing their own offspring into host fly pupae (Danneels et al., 2010). Prior to oviposition, the wasps inject venom into the fly pupa, which leads to immune suppression and eventual cell apoptosis and fly death, allowing the wasp larvae to develop and subsist on the fly pupa (Danneels et al., 2010). Only a proportion of N. vitripennis wasps maintain populations of the symbiont A. nasoniae, but wasps that are infected inject the bacteria along with the eggs into the fly pupal host. Arsenophonus nasoniae then colonizes and replicates within the parasitized fly pupa (Werren et al., 1986; Wilkes et al., 2010). When the developing wasp larvae feed on the pupa, they ingest the symbiotic bacteria in the process, which then pass through the gut to colonize the wasp (Darby et al., 2010). While this mechanism is how A. nasoniae is vertically transmitted from mother to offspring, it also provides a means of horizontal transfer to offspring of uninfected parents, if the same fly pupa is parasitized by multiple wasps (Duron et al., 2010; Huger et al., 1985). Nasonia vitripennis wasps that have acquired an infection of A. nasoniae exhibit a sex ratio bias in their offspring toward females, with males being killed as embryos (Gherna et al., 1991; Werren et al., 1986). Nasonia vitripennis is a haplodiploid insect, and this male killing occurs as a result of the inability of the maternally-derived centrosomes to form in unfertilized (male) eggs (Ferree et al., 2008). This type of male- 53 killing is predicted to be caused by production of a small molecule or effector that can cross the wasp egg membrane and disrupt the formation of the maternal centrosome (Ferree et al., 2008; Wilkes et al., 2010). Thus, A. nasoniae effects host reproductive distortions similar to those observed in insects with Wolbachia infections (Bandi et al., 2001). Candidatus Arsenophonus arthropodicus is a symbiont of the Dipteran hippoboscid louse fly, Pseudolynchia canariensis. The louse fly is an obligate blood-feeding parasite of rock pigeons (Columbia livia) that spends the majority of its life on its bird host, with females leaving only to deposit offspring in the nest material or to transfer to a new host (Baequart, 1953; Marshall, 1981). Candidatus Arsenophonus arthropodicus resides in a variety of tissues in the louse fly, including the hemolymph, gut, fat body and reproductive tissues (Dale et al., 2006). This symbiont is found in both male and female louse flies and does not appear to have any reproductive effects on its host, with an observed sex ratio of 50:50 (Dale et al., 2006). The louse fly is related to the tsetse fly, and both reproduce by means of adenotrophic viviparity, in which a single egg is fertilized at a time and all larval stages occur within the uterus of the mother, with the larva obtaining nutrition through milk gland secretions (Attardo et al., 2008). The tsetse fly maintains two different symbionts: a bacteriome-associated primary symbiont, Wigglesworthia glossinidia (Aksoy, 1995), and a second symbiont found in multiple fly tissues, Sodalis glossinidius (Dale and Maudlin, 1999). Like tsetse, the louse fly also harbors an obligate primary symbiont, in addition to Ca. A. arthropodicus, that is closely related to W. glossinidia (Dale et al., 2006). The louse fly primary symbiont likely plays a nutritive role in its host, similar to 54 that of W. glossinidia, which plays a role in dietary supplementation and blood meal digestion (Akman et al., 2002; Nogge, 1981; Pais et al., 2008). Both of the louse fly symbionts are transmitted to offspring vertically through the milk gland secretions that larvae feed on while developing in utero, in a manner similar to symbiont transmission in tsetse flies (Attardo et al., 2008). In addition, Ca. A. arthropodicus symbionts have no known life stage component outside of the louse fly host. Here we compare the genome sequence of Ca. A. arthropodicus with the draft genome sequence of A. nasoniae (Darby et al., 2010; Wilkes et al., 2010) with the aim of understanding how these two closely related bacteria interact with hosts that have distinct lifestyles. We note that both symbionts have relatively recent origins of association with insect hosts, although the genome sequence of Ca. A. arthropodicus appears to be more degenerate, with a smaller genome size and gene inventory than A. nasoniae. We identify factors that may be involved in insect associations that are shared by both species as well as factors that differ between the two associations and may account for the abilities of these bacteria to infect unique insect hosts with different outcomes. Materials and Methods Genome alignments The 143 individual A. nasoniae draft genome scaffold sequences deposited in GenBank (Darby et al., 2010) were aligned with the Ca. A. arthropodicus genome sequence using progressiveMauve (Darling et al., 2004; Darling et al., 2010). A single GenBank file was then produced for A. nasoniae by concatenating the individual scaffold GenBank files in their aligned order using the union tool in Emboss (Rice et al., 2000). 55 The two Arsenophonus genomes were then compared using Artemis v13.2.0 (Rutherford et al., 2000) and DNAPlotter (Carver et al., 2009). Genome analyses Conceptually translated amino acid fasta files for both Arsenophonus genomes were compared using BLAST (Altschul et al., 1990) and BLASTCLUST to identify genes that shared high levels of sequence identity. Regions of genome synteny were identified using Crossmatch, which is a component of the Phred/Phrap/Consed assembly package (Ewing and Green, 1998; Ewing et al., 1998; Gordon et al., 1998). Plots of genome synteny were then generated using Circos (Krzywinski et al., 2009). Metabolic capabilities were compared using the KEGG (Kyoto Encyclopedia of Genes and Genomes; Kanehisa and Goto, 2000) automatic annotation server (KAAS; Moriya et al., 2007). Pseudogene analysis The Ca. A. arthropodicus and A. nasoniae genome scaffolds were each independently aligned to the genome sequence of a closely related outgroup, Proteus mirabilis (Pearson et al., 2008) using progressiveMauve (Darling et al., 2010) to determine genome synteny. Orthologs sharing at least 50% identity over 70% of their length were exported and the sizes of the intact P. mirabilis orthologs were used to compare pseudogene sizes to the sizes of intact coding sequences. A simple Monte Carlo approach was developed to simulate the evolution of pseudogenes in Ca. A. arthropodicus. The program simulates the accumulation of random mutations in all of the orthologs of intact genes and pseudogenes that are shared 56 by Ca. A. arthropodicus and its closest free-living relative, Proteus mirabilis. Mutations accumulate in accordance with ORF size in a randomly selected class of neutral genes over a number of cycles. At preset cycle intervals, the simulation records (i) the difference in size between intact and disrupted sequences, (ii) the number of neutral genes that have accumulated one or more disrupting mutations, and (iii) the density of disrupting mutations. Results and Discussion General features The A. nasoniae genome is currently a draft genome sequence that comprises 143 scaffolds (Darby et al., 2010), while the Ca. A. arthropodicus genome sequence is composed of a single scaffold (Table 3.1). Arsenophonus nasoniae has a genome size that is larger than Ca. A. arthropodicus, and has a larger gene inventory, implying that Ca. A. arthropodicus has undergone more extensive degenerative evolution over the course of its association with the louse fly (Table 3.1). To identify genes shared between the two symbionts, BLASTCLUST analysis of the two genomes was used to cluster together coding sequences (CDSs) that shared amino acid sequence identity with cutoffs at 75% and 95% identity. Out of the 2203 identified CDSs predicted to encode functional gene products in Ca. A. arthropodicus, 66% of them paired with a CDS in A. nasoniae with which they shared at least 75% amino acid sequence identity or higher and 26% were found to share over 95% identity (Fig. 3.1). The remaining CDSs shared amino acid sequence identity below 75% or were unique to one of the Arsenophonus genomes. Excluding phage-associated CDSs and repetitive elements, Ca. A. arthropodicus contains just 37 unique chromosomal CDSs that are intact and do not have apparent 57 58 Table 3.1. Features of the Ca. A. arthropodicus and A. nasoniae genome sequences. Ca. A. arthropodicus A. nasoniae Size (Chromosome + Plasmids) 2,971,954 bp 3,567,128 bp GC content (%) 38.1 37.4 # of scaffolds (# of contigs) 1 (3) 144(665) Predicted CDSs 2,231 3,203 Pseudogenes 395 135 rRNA operons 7 8-10a tRNAs 64 52 Plasmids 3 2 or morea % Prophage sequence 10 10 a Estimates based on fragmented assembly. homologs in the A. nasoniae genome sequence (Table 3.2). Of these 37 CDSs, nine of them would be located between scaffolds or within sequence gaps in their predicted location in the A. nasoniae sequence (Table 3.2). Ten of the CDSs are located in a block in Ca. A. arthropodicus (ARA_06845-ARA_06530) and share homology with nine consecutive CDSs in Proteus mirabilis. There is one interspersing CDS in this block that is predicted to encode a cysteine desulfurase enzyme that is involved in thiamine biosynthesis, and shares homology with that of Sphaerobacter thermophilus, a distantly related bacterium, suggesting an independent acquisition of this enzyme. There is also a block comprising seven consecutive Ca. A. arthropodicus CDSs (ARA_10960- ARA_10995) that is absent in the A. nasoniae sequence, encoding products predicted to be involved in lipopolysaccharide (LPS) biosynthesis (Table 3.2). The loss of multiple adjacent CDSs that are present in both Ca. A. arthropodicus and closely related bacteria, such as P. mirabilis and Photorhabdus luminescens, suggests that A. nasoniae has undergone several large deletions. However, for the most part, Ca. A. arthropodicus appears to retain a subset of the A. nasoniae gene set, with few unique protein coding genes that are not shared by both symbionts. 75% Identity ^ 95% Identity Figure 3.1. BLASTCLUST analysis of intact Ca. A. arthropodicus CDSs sharing amino acid sequence identity with CDSs in A. nasoniae. A.Venn diagram showing the number of CDSs with protein sequence identity of 75% or higher. B. CDSs with greater than 95% identity. Genome alignment and synteny To determine the level of synteny shared between the two genomes, A. nasoniae scaffolds that were at least 5 kb in size were aligned to the Ca. A. arthropodicus sequence. These alignments were then compared using Crossmatch, with a minscore of 100 and remaining default parameters (Gordon et al., 1998), which determined the presence of CDSs that were located in the same order and orientation within each genome sequence. There were 491 alignments obtained between the two genome sequences with an average length of 5,002 bp, average nucleotide identity of 88.5% and a total alignment length of 2,456,051 bp. This output was used to generate a plot of synteny (Fig. 3.2) using Circos (Krzywinski et al., 2009). The Circos alignment shows that, within the scaffolds of A. nasoniae, there is a high level of synteny with fragments of the Ca. A. arthropodicus genome, indicating that many gene orthologs are retained in the same order and orientation in each organism. 60 Table 3.2. Unique CDSs in the Ca. A. arthropodicus genome sequence. Locus Tag Product BLAST homology - Locus tag ARA 02740 Hypothetical protein Photorhabdus asymbiotica - PAU02164 ARA 04590 Hypothetical protein Hamiltonella defensa - Hdef 1253 ARA 04830 CDP-diacylglycerol pyrophosphatase Burkholderia sp. - bgla 1g12250 ARA 06360 Hypothetical protein Proteus mirabilis - PMI1624 ARA_06845 Citrate lyase beta Proteus mirabilis - PMI0231 ARA_06490 Siderophore biosynthesis Proteus mirabilis - PMI0232 ARA_06495 TonB siderophore receptor Proteus mirabilis - PMI0233 ARA_06500 Diaminopimelate decarboxylase Proteus mirabilis - PMI0234 ARA_06505 Pyridoxal-phosphate dep enzyme Proteus mirabilis - PMI0235 ARA 06510 octopine/opine dehydrogenase Proteus mirabilis - PMI0236 ARA 06515 MFS-family transporter Proteus mirabilis - PMI0237 ARA_06520 Iron ABC transporter Proteus mirabilis - PMI0238 ARA_06525 Cysteine desulfurase Sphaerobacter thermophilus - Sthe3332 ARA_06530 Hypothetical protein Proteus mirabilis - PMI_0239 ARA 07095a Fumarate reductase Fe-S subunit Proteus mirabilis - PMI3587 ARA 07110a Fumarate reductase subunit D Proteus mirabilis - PMI3585 ARA_07725 Murine toxin Yersinia pestis - y1069 ARA_07820 Hypothetical protein Pseudomonas syringae - Psyr 2271 ARA 09180 Xylose isomerase Burkholderia sp. - BC1001 0668 ARA 09190 Multidrug efflux transporter Burkholderia sp. - BC1001 0669 ARA 09195 Glycosyltransferase family 2 protein Burkholderia sp. - BC1001 0670 ARA_09795 Phosphoesterase Proteus mirabilis - PMI3123 ARA 10960 LPS biosynthesis protein Photorhabdus luminescens- plu4812 ARA 10965 Aminotransferase Photorhabdus luminescens- plu4813 ARA 10970 LPS biosynthesis protein Photorhabdus luminescens- plu4814 ARA 10975 Glycosyltransferase family protein Shewanella baltica - Sbal195 3023 ARA_10980 Hypothetical protein Escherichia coli - ECED1_2379 ARA 10990 Glycosyltransferase family protein Haemophilus parasuis - HPS 03029 ARA 10995 Glycosyltransferase family protein Camnocytophaga ochracea - Coch0695 ARA 11150 Prevent-host-death protein Pectobacterium carotovorum ARA 11245a Phosphate ABC transport- pstS Proteus mirabilis - PMI_2893 ARA 11250a Phosphate ABC transport- pstC Proteus mirabilis - PMI_2894 ARA 12095a MFS-family transporter Photorhabdus luminescens- plu0476 ARA 12100a Phosphoglycolate phosphatase Photorhabdus luminescens- plu0475 ARA_12105a Alcohol dehydrogenase Photorhabdus luminescens- plu0474 ARA_12135a DNA-directed DNA polymerase Proteus mirabilis - PMI2485 ARA 1525 8a PTS system- Hpr Photorhabdus luminescens- plu1394 a Predicted location in A. nasoniae genome contains sequence gaps or scaffold breaks. 61 A. nasoniae Ca. A. arthropodicus Figure 3.2. Plot of genome synteny between A. nasoniae and Ca. A. arthropodicus. Individual scaffolds of the A. nasoniae genome are indicated by the outermost colored boxes along the top and aligned to the concatenated Ca. A. arthropodicus genome sequence represented by the gray bar along the bottom. Colored bands joining the two represent individual areas of synteny along the lengths the genomes. The track of blue lines along each genome represent the numbers and positions of pseudogenes. Due to the fragmented nature of the A. nasoniae genome sequence, we cannot predict the order and orientation of scaffolds and in F |
| Reference URL | https://collections.lib.utah.edu/ark:/87278/s6r248pt |



