| Title | Structures of the Klebsiella oxytoca phage Phi KO2 and vibrio harveyi myovirus-like protelomerase far C-terminal domains |
| Publication Type | dissertation |
| School or College | College of Pharmacy |
| Department | Medicinal Chemistry |
| Author | Smith, Diana K. |
| Date | 2010-08 |
| Description | A small but growing number of bacteria and phages are known to contain linear, hairpin-ended genomes. The hairpin "protelomeres" are created by the action of a dedicated enzyme known as protelomerase that acts on a palindromic DNA target sequence. Phage protelomerases are typically longer than their bacterial counterparts and contain an additional far C-terminal region of limited sequence conservation. Studies of the protelomerase of the Klebsiella oxytoca phage ΦKO2 have shown that although the far C-terminal region is not required to produce hairpin ends, truncation of the region has a drastic effect on enzyme kinetics. To date, no other studies have been reported on the far C-terminal region of this or any other protelomerase. We present the solution structures of the far C-terminal regions of two phage protelomerases. The regions form homologous, compact structures that adopt a fold similar to the canonical double-stranded RNA-binding domain and have been called the far C-terminal domains. Sequence alignment and secondary structure predictions show that all known and putative phage protelomerases contain C-terminal regions which will almost certainly form homologous domains. A sequence comparison of these proteins with all known protelomerases is presented, along with an analysis of the sequence and structure of proteins which adopt a similar fold. Based on structure homology and comparative sequence conservation of key binding regions, we propose that the domain belongs to the growing family of three stranded β-sheet DNA-binding proteins that is a subclass of the double-stranded RNA-binding domain superfamily. |
| Type | Text |
| Publisher | University of Utah |
| Subject MESH | Klebsiella oxytoca; Enzyme Precursors; Telomerase; DNA-Directed DNA Polymerase; DNA Replication; Gene Expression Regulation, Enzymologic; RNA-Binding Proteins; Protein Binding; Protein Folding; DNA Hairpins; Oligonucleotides; Bacteriophages; Promoter Regions, Genetic; DNA-Binding Proteins; Viral Proteins |
| Dissertation Institution | University of Utah |
| Dissertation Name | Doctor of Philosophy |
| Language | eng |
| Relation is Version of | Digital reproduction of Structures of the Klebsiella Oxytoca Phage Phi KO2 and Vibrio Harveyi Myovirus-Like Protelomerase Far C-Terminal Domains. Spencer S. Health Sciences Library. Print version available at J. Willard Marriott Library Special Collections. |
| Rights Management | Copyright © Diana K. Smith 2010 |
| Format | application/pdf |
| Format Medium | application/pdf |
| Format Extent | 3,397,792 bytes |
| Source | Original in Marriott Library Special Collections, QR6.5 2010.S64 |
| ARK | ark:/87278/s60k5hss |
| DOI | https://doi.org/doi:10.26053/0H-554N-N3G0 |
| Setname | ir_etd |
| ID | 196406 |
| OCR Text | Show STRUCTURES OF THE KLEBSIELLA OXYTOCA PHAGE PHI KO2 AND VIBRIO HARVEYI MYOVIRUS-LIKE PROTELOMERASE FAR C-TERMINAL DOMAINS by Diana K. Smith A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Medicinal Chemistry University of Utah August 2010 Copyright © Diana K. Smith 2010 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of Diana K. Smith has been approved by the following supervisory committee members: Darrell Davis , Chair April 28, 2010 Date Approved Chris Ireland , Member April 28, 2010 Date Approved Michael Kay , Member April 28, 2010 Date Approved Eric Schmidt , Member April 28, 2010 Date Approved Wes Sundquist , Member April 28, 2010 Date Approved and by Chris Ireland , Chair of the Department of Medicinal Chemistry and by Charles A. Wight, Dean of The Graduate School. ABSTRACT A small but growing number of bacteria and phages are known to contain linear, hairpin-ended genomes. The hairpin "protelomeres" are created by the action of a dedicated enzyme known as protelomerase that acts on a palindromic DNA target sequence. Phage protelomerases are typically longer than their bacterial counterparts and contain an additional far C-terminal region of limited sequence conservation. Studies of the protelomerase of the Klebsiella oxytoca phage ΦKO2 have shown that although the far C-terminal region is not required to produce hairpin ends, truncation of the region has a drastic effect on enzyme kinetics. To date, no other studies have been reported on the far C-terminal region of this or any other protelomerase. We present the solution structures of the far C-terminal regions of two phage protelomerases. The regions form homologous, compact structures that adopt a fold similar to the canonical double-stranded RNA-binding domain and have been called the far C-terminal domains. Sequence alignment and secondary structure predictions show that all known and putative phage protelomerases contain C-terminal regions which will almost certainly form homologous domains. A sequence comparison of these proteins with all known protelomerases is presented, along with an analysis of the sequence and structure of proteins which adopt a similar fold. Based on structure homology and comparative sequence conservation of key binding regions, we propose that the domain belongs to the growing family of three stranded β-sheet DNA-binding proteins that is a subclass of the double-stranded RNA-binding domain superfamily.To my Matt, my Mimsy, and my munchkins TABLE OF CONTENTS ABSTRACT ....................................................................................................................... iii LIST OF FIGURES ........................................................................................................... vi LIST OF TABLES ........................................................................................................... viii Chapter 1. INTRODUCTION ..........................................................................................................1 Background ............................................................................................................. 1 Protelomerase overview .......................................................................................... 8 Preliminary work .................................................................................................. 17 Research overview ................................................................................................ 25 2. STRUCTURES OF THE KLEBSIELLA OXYTOCA PHAGE ΦKO2 AND VIBRIO HARVEYI PHAGE VHML PROTELOMERASE FAR C TERMINAL DOMAINS .......27 Abstract ................................................................................................................. 27 Introduction ........................................................................................................... 27 Experimental procedures ...................................................................................... 30 Results and discussion .......................................................................................... 36 3. CONCLUSION .............................................................................................................95 Summary ............................................................................................................... 95 Future directions ................................................................................................... 96 Conclusion .......................................................................................................... 108 Appendices A. COMPARISON OF THE SECONDARY STRUCTURE PREDICTION AND 13Cα CHEMICAL SHIFT INDICES FOR THE TEL-KO2 AND TEL-VHML FAR-CTDS .109 B. LITERATURE REVIEW OF THE ABILITY OF DSRBDS TO RECOGNIZE SEQUENCE SPECIFICTY IN DOUBLE-STRANDED RNA .......................................112 C. SUMMARY OF PROJECT ONE: HIV TRANSACTIVATION CHIMERAS .......119 REFERENCES ................................................................................................................143LIST OF FIGURES 1.1 Approaches to the "chromosome end problem" by bacteria and viruses .................... 3 1.2 Schematic: Protelomerases resolve replicated DNA to form hairpin telomeres ......... 9 1.3 Structure of the Tel-KO2 Dimer Bound to DNA....................................................... 12 1.4 Surface of the Tel-KO2 dimer. .................................................................................. 15 1.5 Domain architecture comparison of the published structures of the C-terminally truncated Klebsiella oxytoca phage ΦKO2 protelomere resolvase (Tel-KO2) and lambda integrase as calculated in a DaliLite pairwise comparison. .............................................. 18 1.6 Comparison of the kinetics of hairpin resolution of replicated oligonucleotide substrate by full length (1-640) and truncated (1-531) Tel-KO2. ..................................... 21 1.7 Effect of DNA target site truncation on efficiency of Tel-KO2 hairpin resolution. .. 23 2.1 Analysis of the full far C-terminal domain of TelKO2 to determine suitability of construct for spectroscopic study. ..................................................................................... 37 2.2 Determination of structured regions of Tel-KO2 (530-640). ..................................... 41 2.3 NMR ensembles of Tel-KO2 and Tel-VHML farCTD structures ............................. 50 2.4 Comparison of the hydrophobic cores of the Tel-VHML and Tel-KO2 farCTDs. ... 52 2.5 Conserved network of hydrophobic residues in the interior of phage protelomerase farCTDs............................................................................................................................ 55 2.6 Conserved residues of the phage far C-terminal domain. .......................................... 56 2.7 Electrostatic Surfaces of the Tel-KO2 and Tel-VHML farCTDs show similarly charged surfaces along homologous faces.. ...................................................................... 59 2.8 Comparative consensus sequences of dsRNA-binding dsRBDs and DNA-binding dsRBDs with known and putative protelomerase farCTDs. ............................................. 63 2.9 Comparison of VHML farCTD with reported structures of RNA-binding dsRBDs bound to dsRNA. .............................................................................................................. 66 2.10 Crystal structure of the human DGCR8 Core showing dsRBD/protein contacts. ... 70 vii 2.11 Interactions of the dsRBD of bacterial RNase IIIs ................................................... 73 2.12 Comparison of VHML farCTD with reported structures of DNA-binding dsRBDs bound to DNA. .................................................................................................................. 77 2.13 Comparison of β-strand residues of DNA-binding dsRBDs and protelomerase far CTDs. ............................................................................................................................... 80 2.14 Comparison of the external facing residues of the β-sheet of transposon Tn916 Integrase DNA-binding domain (Tn916-Int-DBD) to the probable homologous residues of putative protelomerase farCTD Tel-PY54. .................................................................. 88 2.15 Effects of base-step substitution of Tn916 Int-DBD cognate DNA on complex affinity, and comparison to PY54 protelomerase target site. ............................................ 91 2.16 Comparsion of the binding sites of known phage protelomerases. ......................... 94 A.1. Secondary structure prediction and comparative 13Cα chemical shift indices support the solved structures of the farCTDs of Tel-KO2 and Tel-VHML. ............................... 110 C.1 Tat Sequence comparison ........................................................................................ 126 C.2 HIV TAR ................................................................................................................. 128 C.3 The EIAV transactivating complex ......................................................................... 134 C.4 Comparison of the TAR RNA stem-loop regions from EIAV and HIV-1. ............ 137 C.5 The TAR interacting residues of the EIAV Tat ARM ............................................. 139 vii LIST OF TABLES 1.1 Organisms that contain known or putative protelomerases ......................................... 5 2.1 Tel-KO2 farCTD581-640 missing or unassigned resonances ................................... 44 2.2 Comparison of structural statistics of Tel-KO2 and Tel-VHML farCTD structures . 48 2.3 Comparison of Tn916 Int-DBD residues important for DNA binding affinity to probable positionally homologous residues of putative PY54 protelomerase farCTD. ... 87 CHAPTER 1 INTRODUCTION Background The "chromosome end problem" Genomic stability rests upon the ability to replicate and repair a complete genome with fidelity. Fidelity is ensured in both repair and replication by the action of a polymerase, which interprets the blueprint of a complementary strand to synthesize a new polynucleotide sequence. Typically, the synthesis of a new strand by a polymerase also requires a primer, an existing oligonucleotide with a free 3‟-OH, on to which it adds additional nucleotides. The necessity of an existing 5‟ primer results in a requirement for specialized care for terminal genomic DNA. All known DNA polymerases operate in a 5‟-3‟ direction. Successive rounds of genomic replication lead to ongoing shortening of the lagging strand of linear DNA with subsequent eventual loss of genetic information (1). Additionally, free or exposed genome ends are recognized by the cell as damaged or foreign DNA, and are subject to normal means of cellular protection and repair, including degradation by exonucleases and recombination via nonhomologous end joining (2, 3). The complications of replication and repair at regions of the terminal regions of genomes constitutes what has been called the "chromosome end problem." Organisms across all kingdoms have developed different strategies for maintenance of genomic termini. Typically, eukaryotes protect their linear genomes with 2 the addition of noncoding sequences to DNA ends. Drosophila chromosomes, for instance, are maintained by the addition of a transposon sequence to the termini ((4, 5), and (6) for a recent review.) More common is the insertion of repeated, guanine rich sequence regions which fold into specialized structures. The folded structures interact with dedicated proteins, forming the protective nucleoprotein complexes called telomeres ((7-9) for reviews). Using an entirely different strategy, bacterial and Archaeal genomes are usually circular, bypassing the issue of exposed DNA ends entirely (summarized in (10). Although relatively rare, notable exceptions to this broad categorization of genome architecture are known. There are several known examples, for instance, of bacteria with linear genomes. Among these, there are two general methods used to protect exposed genomic termini. See Figure 1.1 for an overview of the classes of linear bacterial and phage genomes. In the first class, as with linear eukaryotic chromosomes, terminal DNA ends are protected by associated proteins. Representatives of this class include Streptomyces lividans (11) and Saccharopolyspora erythraea (12), in which the free ends of linear genomic DNA are protected by 5‟ covalent association with "terminal proteins" (TPs.) Some viral pathogens also employ this method, including adenovirus (13) and Bacillus subtilis phage Φ29 (14). Under this means of genome protection, the covalently attached proteins also serve to prime DNA synthesis during replication. In adenovirus and phage Φ29 replication, the attached protein is the origin of replication, while in Streptomyces, replication initiates from an interior location, and the attached terminal protein serves to help initiate "end patching" of the single-stranded region that results from replication ((15), and (16) for a recent review of the Streptomyces 3 Figure 1.1 Approaches to the "chromosome end problem" by bacteria and viruses Circular genome without free ends Linear genome with distal ends protected by a protein covalently attached to the 5‟- ends Linear genome with covalently closed hairpin ends 4 chromosome.) Another solution to the chromosome end replication problem has also been discovered in both bacteria and viruses with linear genomes. In these systems, the termini of linear chromosomes or plasmids are covalently closed and protected by terminal "hairpin caps," which prevent both exonucleolytic degradation and terminus shortening during replication (17, 18). The hairpin caps have been shown to be covalently closed by a combination of denaturation/renaturation studies, direct visual examination of denatured, single stranded circles through microscopy, and Maxam-Gilbert sequence analysis (19-21). Because of the protection afforded the genome termini by the hairpin caps, the term "protelomere" was coined to represent "prokaryotic telomeres" (22). Hairpin-capped genomes and their protelomeres In all systems so far studied, replication of a linear, hairpin-ended genome is relatively simple. Common replication enzymes initiate bi-directional replication at an interior location of origin resulting in a circular, concatenated genomic dimer (23, 24). In vitro assays have shown that the interlinked genome is resolved into monomers by the action of a single dedicated enzyme (17, 25-29). The process of resolution into monomers produces the terminal hairpin ends, so the enzyme has been called both a "protelomerase" and a "telomere resolvase" (25, 26). The presence of a protelomerase-type protein in a bacterium is a good indication that the system carries a linear replicon. Table 1.1 lists the known protelomerases and the organisms in which they were found, and gives a brief description of each system. .Table 1.1 Organisms that contain known or putative protelomerases Organism Enzyme length Description Borrelia burgdorferi 449 A gram-negative spirochetal bacteria that is a significant, tick-borne, human pathogen and the causative agent of Lyme disease (30). Symptoms of Lyme disease include fever, chills, headache, and musculoskeletal pain. Progressive illness results in facial palsy, arthritis, heart palpitations, and chronic neurological complaints, including tremor and short term memory loss. Infections have been associated with non-Hodgkin lymphoma.(31) The fragmented genome consists of both circular and linear chromosomes plus 12 linear and nine circular plasmids (19, 20, 32-34). The protelomerase gene (called ResT) is essential for B. burgdorferi survival (35, 36). Borrelia hermsii 449 A gram-negative, spirochetal bacteria and a significant, tick-borne human pathogen, B. hermsii is the causative agent of relapsing fever. Manifestations include headache, myalgia, arthralgia, chills, vomiting, and abdominal pain (37, 38). Agrobacterium tumifaciens 442 A rod-shaped, gram-negative bacterium of the family Rhizobiaceae found ubiquitously in soil, A. tumefaciens causes crown gall disease in plants by inserting a fraction of its DNA (called T-DNA) in the host genome (39, 40). The wide variety of plants affected by the bacterium (including grape vines, stone fruits, nut trees, sugar beets, horseradish, and rhubarb) makes it of great concern to the agriculture industry (41). The segmented genome consists of one linear and one circular chromosome, and two circular plasmids. 5 Table 1.1 continued Organism Enzyme length Description Escherichia coli phage N15 630 A non-integrative, temperate, lambdoid-like phage of the Siphoviridae family (22, 42). E. coli is a gram negative rod-shaped bacterium commonly found in the lower intestine of warm-blooded organisms (endotherms). Halomonas aquamarina phage ΦHAP-1 520 A myovirus-like temperate phage induced from H. aquamarina, the gram-negative halophilic gammaproteobacterium that has been isolated from a variety of marine and hypersaline environments, including the pelagic ocean, deep-sea hydrothermal vents, the brine-seawater interface of deep-sea brine pools, and coastal surface waters. The PhiHAP-1 genome shares synteny and gene similarity with coliphage N15 and vibriophages VP882 and VHML (43). Klebsiella oxytoca phage ΦKO2 640 A non-integrated prophage of K. oxytoca with a genome size of approximately 51.6kb. Genome organization is similar to the coliphage N15 (44). K. oxytoca is a gram-negative, rod-shaped bacterium closely related to K. pneumoniae Vibrio harveyi phage VHML 509 VHML (Vibrio harveyi myovirus like) is a temperate phage classified in the Myoviridae family. It infects the free living gram-negative marine bacterium V. harveyi, which causes the normally non-pathogenic host to become virulent to a variety of aquatic organisms (45). Economic loss in recent years due to infection of aquacultured species has been particularly devastating (46, 47). The phage was sequenced, but the host-phage system was reported as lost (43). VHML has also been reported to lyse the strains the strains Vibrio alginolyticus ACMM102 and Vibrio cholerae ATCC 14035. Vibrio parahaemolyticus O3:K6 phage VP58.5 A 42.612 kb myovirus closely related to VHML that was isolated from a V. parahaemolyticus strain belonging to the serovar O3:K6 pandemic clonal complex. The clone has been associated with many seafood-borne diarrhea outbreaks in Southeast Asia and South America, particularly Chile (48). 6 7 Table 1.1 continued Organism Enzyme length Description Vibrio parahaemolyticus O3:K6 phage VP882 538 A Myoviridae bacteriophage isolated from a pandemic strain of V. parahaemolyticus that also infects and lyses high proportions of Vibrio vulnificus and Vibrio cholerae (49). V. parahaemolyticus is a gram negative, rod-shaped, motile, facultatively aerobic bacterium found in brackish saltwater. When ingested, it causes gastrointestinal illness in humans. Wound, eye, and ear infections can also develop from swimming or working in affected waters (50). Yersinia enterococolitica phage PY54 617 Yersinia enterocolitica is a member of the family Enterobacteriaceae, some strains of which are enteropathogenic to humans, and are predominantly transmitted by ingestion of undercooked meat (51, 52). Non-pathogenic strains are found readily in the environment and may have potential to become pathogenic. PY54 is a linear plasmid prophage with a genome size of approximately 46kb which infects Y. enterococolitica. (53) Vibrio campbelli 691 A gram-negative, facultative anaerobe which contains a putative protelomerase (DBSOURCE accession code ABGR01000073.1.) It is likely that the protelomerase is encoded by a phage which infects the bacteria. 7 8 Protelomerase overview Creation and maintenance of hairpin protelomeres Creation and maintenance of a hairpin protelomeres requires only a protelomerase and the DNA recognition sequence upon which it acts. Indeed, normally circular plasmids which have been engineered to encode the Escherichia coli phage N15 protelomerase and its DNA recognition sequence have been shown to linearize in vivo once transformed into E. coli, and to replicate stably as hairpin-ended linear plasmids (18). The DNA binding site of all known protelomerases is a palindromic inverted repeat at the junction of the genomic dimers (see Figure 1.2). Studies on the protelomerases from the ΦKO2 and N15 phages and from the bacterial Borrelia burgdorferi enzyme (usually called ResT) have shown that the protelomerase enzymes, which are monomeric in solution, bind target DNA as dimers, creating transient staggered cuts six base-pairs apart on opposite strands, each three base-pairs away from the axis of dyad symmetry of the palindrome (29, 54, 55). The 6 bp overhangs created by the staggered cuts fold over and are joined to the opposite strand to form the hairpin caps. Protelomerase nearest neighbor and reaction mechanism Protelomere resolvases share sequence homology and an expected two-step reaction mechanism with the site-specific integrases called tyrosine recombinases or lambda integrases, as well as with type IB topoisomerases (28, 29, 56-58) (see also (59) for a minireview.) Site-specific tyrosine recombinases/lambda integrases are tetrameric and bind to two duplex DNAs, catalyzing the exchange of four DNA strands to integrate or excise a DNA segment from the host genome (60, 61). IB topoisomerases are found across eukarya, in many bacterial genera, and two known families of eukaryotic viruses9 Figure 1.2 Schematic: Protelomerases resolve replicated DNA to form hairpin telomeres Based on the figure from Aihara et al, Molecular Cell, Volume 27, Issue 6, 21 September 2007, Pages 901-913 A) Cartoon showing the current understanding of replication of hairpin telomere-containing linear genomes. Replication of a linear chromosome with hairpin telomeres produces a dimeric circular intermediate that is resolved into unit-length chromosomes by the activity of protelomerase. L and R refer to left and right hairpin ends, respectively. B) A model for the hairpin formation reaction by the protelomerase Tel-KO2, proposed based on the crystal structure presented in the study. The dots represent the phosphates at the sites of cleavage. R' linear chromosome bidirectional replication initiates in interior L L L L' L L' R' R R R R replication results in a junction of concatenated dimers junctions protelomerase resolves dimer and creates hairpin ends L/L‟ junction of replicated, concatenated chromosomes and protelomerase monomers showing reactive tyrosines L reaction intermediate is a covalent phosphotyrosine linkage spontaneous folding/reaction forms hairpin ends and releases reactive tyrosines protelomerase dimer binding distorts DNA and disrupt pairing of central six bases L' A B. 10 (poxvirus and mimivirus, (59, 62).) Type IB topoisomerases are monomeric, and act to relax supercoiling by cleaving and rejoining a single strand of duplex DNA, which is first allowed to rotate relative to the opposite strand (63). The reaction mechanism for all three enzyme families is similar. A conserved catalytic pentad (R-K-R-H-Y) acts to execute the cleavage and religation of DNA by way of consecutive transesterification reactions without the use of a cofactor. In the reaction, a nucleophilic attack by a catalytic tyrosine on the scissile phosphodiester bond within the DNA target site results in the formation of a covalent 3‟ DNA-phosphotyrosyl linkage and a free 5‟ hydroxyl. In the second half of the reaction, the free 5‟ hydroxyl from the same strand (IB topoisomerases), complementary strand (protelomerases), or a strand from a neighboring duplex (lambda integrases) attacks the phosphotyrosyl intermediate in a second nucleophilic reaction, regenerating a joined DNA strand and releasing the catalytic tyrosine. Protelomerase domain organization All known protelomerases contain a highly conserved central region surrounded by N-terminal and C-terminal regions of variable length, sequence, and secondary structure prediction. Protelomerases can be roughly broken into two clades based primarily on size. The bacterial protelomerases are shorter, approximately 450 residues in length, while phage protelomerases are longer; most consist of more than 600 residues. The phage protelomerases contain an additional C-terminal region of varying length which terminates in a short region of moderate conservation. The conserved central core is highly homologous to tyrosine recombinases and Type IB topoisomerases, and contains all five of the catalytic pentad residues. 11 Protelomerase structure The crystal structure of the minimally catalytic, N-terminal portion (531 of 640 residues) of the Klebsiella oxytoca phage ΦKO2 protelomerase (Tel-KO2) complexed to the central 44 base-pairs of its 50 base-pair DNA target site was recently solved (64). To "trap" the enzyme in an intermediate state, the DNA target was nicked, and contained orthovanadates (VO43-) to mimic the pentavalent transition state of the scissile DNA phosphate cleavage. The crystallography was simplified by slightly altering the DNA sequence to make it symmetric (two bases on each side normally interrupt the otherwise perfect symmetry) with an additional nucleotide substitution at the distal end. The structure shows two interlocked Tel-KO2 monomers each bound to a half-site of the DNA target (see Figure 1.3). The dimerically bound proteins show extensive interprotein subunit and protein-DNA contacts. The Tel-KO2 monomer subunits bind to their DNA target with their C-terminal domains on the distal ends of the DNA target site and the N-terminal domains interlinking near the central six base-pairs of the target site. Each monomer contains an extended linker helix (K) in between its first and second domains. When the monomer binds to its target site, helix K rests in the major groove, allowing the monomer to wrap completely around the DNA while still remaining predominantly on one side of the DNA. The authors propose that the enzyme dimer functions by bending and "springloading" the DNA, and that following nicking, the 6 base-pair, 5‟-overhanging ends reorganize spontaneously within the enzyme active site, allowing the protelomerase to act as a ligase to produce two hairpins ends from the original duplex DNA. The authors further surmise that the reaction is made essentially irreversible through the surprising level of distention of the DNA visible in the crystal structure, 12 Figure 1.3 Structure of the Tel-KO2 Dimer Bound to DNA. A) "Bottom view" of the structure oriented parallel to the two-fold non-crystallographic axis with the DNA helix extending horizontally from left to right, showing the extensive coverage of the DNA by the protelomerase dimer. B) "Side view" of the structure (rotated 90° along the z-axis relative to the view in (A)) showing the protein-induced curvature of the DNA. DNA nicking sites are indicated. C) "Top view" of the structure (rotated 90° along the z axis relative to the view in (B)) showing the offset in the DNA helical axis, highlighted by dotted lines indicating the axes for each half-site. In all views, one monomer is colored grey while the second monomer is colored with a spectrum in which the N-terminus begins in blue and the C terminus finishes in red. N and C termini and linker helix K of monomer 1 are indicated. 13 Linker helix K of monomer 1 C-terminus of monomer 1 2-fold axis of symmetry N-terminus of monomer 1 DNA nicks 90° A. 90° Linker helix K of monomer 1 C-terminus of monomer 1 B. C. 14 which they presume is induced by protein-binding. The DNA is not only bent in a curved arc (73° within a plane parallel to the two-fold axis of the complex), but the ends are bent outward from their otherwise linear path. The structure indicates that the disruption of the dimer interface may be required for hairpin formation following initial strand cleavage. The Tel-KO2 monomers are described as having three domains: an amino-terminal domain, the central portion of which the authors refer to as the "muzzle," the highly conserved catalytic domain containing all five residues which constitute the catalytic pentad, and a carboxy-terminal domain, which the authors call the "stirrup" (see Figure 1.4). The linker helix K connects the N-terminal and catalytic domains. The C-terminal/stirrup domain is connected to the catalytic domain by an extended segment. The "stirrup" contacts the DNA, but does not interact extensively with other portions of the protein. The carboxy end of the stirrup domain seems to loop back toward the catalytic domain, but there is no known or presumed function for this configuration, which may or may not be an artifact of the shortened construct. The protelomerase dimer extensively covers the DNA, with the exception of a small, 8-9 basepair stretch of the target site eleven basepairs from either side of the palindrome center. The domain organization of the Tel-KO2 protein monomers is similar to, but distinct from, the lambda integrase/tyrosine recombinases with which protelomerase resolvases share sequence homology. The core portion of lambda integrase consists of a two-lobed architecture consisting of an amino-terminal domain (sometimes called a core-binding domain) and a catalytic, or core domain, connected by an extended linker (60). The catalytic domains of the lambda integrase and Tel-KO2 proteins are especially well-conserved by both sequence similarity and secondary structural organization, as 15 Figure 1.4 Surface of the Tel-KO2 dimer. The figures are colored as follows: monomer 1: N-terminal muzzle domain in blue, linker helix in green, catalytic domain in yellow, and C-terminal stirrup domain in red; monomer 2: N-terminal muzzle domain in cyan, linker helix in gray-green, catalytic domain in orange, and C-terminal stirrup domain in purple. The views are oriented identically to those in figure X.X as follows: A) Bottom view of the dimer, showing the extensive coverage of the DNA binding site by Tel-KO2, and the interactions of the catalytic domains (colored orange and yellow) of the monomer subunits. B) Side view of the dimer, showing that while each monomer wraps around the DNA, each monomer lies predominantly on one side of the DNA arc. C) Top view of the dimer, showing the extensive interaction of the N-terminal muzzle domains (colored blue and cyan) with each other, which also interact with the DNA on the neighboring half-site. Each N-terminal muzzle domain cradles the linker helix K of its neighbor helix. 16 Catalytic domain Monomer 1 C-terminal truncation Monomer 2 (purple-orange-cyan) N-terminal "muzzle" containing domain Naked DNA 90° C-terminal "stirrup" domain 90° Monomer 1 (red-blue-yellow) B. A. Linker helix K of monomer 1 C. 17 observable in a calculated DaliLite Pairwise comparison (see Figure 1.5) (65). However, the structures differ in several instances. Lambda integrase does not contain a homologous carboxy terminal "stirrup" domain, and the amino-terminal/core-binding domain is much smaller, seemingly lacking the majority of the "muzzle" present in protelomerase KO2 (helices D-I, residues 79-200). Additionally, lambda integrase contains an amino terminal arm-binding domain not present in Tel-KO2, which is connected by an extended linker. This domain is crucial to the functioning of lambda integrase. It interacts with the arm-binding domains of three other integrase subunits to form a functioning tetramer that binds the accessory (arm) DNAs involved in recombination. The arm-binding tetramer/accessory DNA interaction appears to shape the recombination complex in a way that suggests arm binding shifts the reaction equilibrium in favor of recombinant products (60). Preliminary work Far C-terminal region The length of the Tel-KO2 construct used for crystallographic analysis represented the shortest construct shown to be catalytically active during in vitro deletion studies (Huang, W.M., personal communication, unpublished data.) The studies show that C-terminally truncated enzyme can resolve oligomer substrates containing the full-length Tel-KO2 target site into hairpin products in vitro. All deletion mutants studied (truncated to 605, 545, 538, and 531 from the full length 640-residue protein) are able to produce hairpins in the in vitro assay. However, a resolution kinetics comparison study of the shortest truncation (TelK531) with full length (TelK640) enzyme show that while the truncated enzyme is still functional, the efficiency of hairpin turnover is greatly 18 Figure 1.5 Domain architecture comparison of the published structures of the C-terminally truncated Klebsiella oxytoca phage ΦKO2 protelomere resolvase (Tel-KO2) and lambda integrase as calculated in a DaliLite pairwise comparison. (A) Superposition of monomers. The Tel-KO2 monomer is shown in light cyan and the integrase monomer is shown in red. (B) Close-up view of conserved catalytic regions, which have been rotated 90° along the z axis relative to the full-protein overlay. Coloring is the same as in (A). 19 Tel-KO2 Linker helix K Tel-KO2 C-terminal domain "stirrup" Λ- integrase core-binding domain 90° Conserved catalytic domain Tel-KO2 N-terminal "muzzle" domain Λ- integrase N-terminal arm-binding domain Tel-KO2 Linker helix K A. B. 20 reduced, and functions only at approximately 1/50th the rate of full-length enzyme (see Figure 1.6.) The Huang group also studied the effect of truncating the DNA target site on protelomerase K function. An in vitro analysis revealed that distally truncating the DNA target site by as little as three base pairs on both sides of the palindromic 56 base-pair target sequence reduced the efficiency of full-length Tel-KO2 (1-640) by 75%. Further truncating the DNA target site to contain only the central 42 base pairs (seven base-pairs removed from each end) decreased efficiency of Tel-K640 to less than 1%. Surprisingly, the most highly truncated enzyme (TelK531) was better able to process the shortened target site substrates, with an efficiency that was restored to nearly 15% of full-length Tel-KO2 acting on full-length oligomer substrates (see Figure 1.7). Since this combination permitted catalytic activity (although kinetic efficiency was reduced compared to full length protein and substrate), a C-terminally truncated Tel-KO2 and a minimal DNA target were selected for structural studies by crystallographic analysis. Although a truncated Tel-KO2 was found to be sufficient for processing truncated DNA target sites in vitro, the unique effects of the C-terminal region on the function of Tel-KO2 in substrate processing indicated that this region warranted further study. Removal of the C-terminal domain negatively affects the ability of Tel-KO2 to produce hairpins from linear, double-stranded target DNA, and the distal ends of the target site also have an effect on catalysis that cannot be explained by interaction with the catalytically active portion of the protelomerase. Given the in vitro study results, it seems probable that the far C-terminal region of the protelomerase interacts with the distal DNA ends, the core portion of the protelomerase, or both. The C-terminal region of Tel-KO2 21 Figure 1.6 Comparison of the kinetics of hairpin resolution of replicated oligonucleotide substrate by full length (1-640) and truncated (1-531) Tel-KO2. The oligonucleotide target sequence contains the full length 56 base pair binding site plus an additional ten base-pairs for better visualization of the resulting hairpins. Hairpin turnover by truncated Tel-KO2 occurs at 1/50th the rate of full length (1-640) Tel-KO2. A) Agarose gels comparing resolution progression. B) Reaction profile showing the rate of truncated Tel-KO2(1-531) activity (squares) compared to the rate of full-length Tel-KO2 (1-640) activity (circles). Activity is scaled to percent hairpin formation of wild type (full-length) Tel-KO2. Wai Mun Huang lab, Pathology Department, University of Utah, unpublished data 22 A. B. Hairpin formation (%) Time (min) 23 Figure 1.7 Effect of DNA target site truncation on efficiency of Tel-KO2 hairpin resolution. Reducing the length of the oligomer substrate decreases the efficiency of full-length Tel-KO2 in hairpin resolution. A C-terminally truncated Tel-KO2 (1-531) can partially restore efficiency for the shortest truncated substrate. Lanes 1-4: oligonucleotide substrate contains full-length (50 base pair) binding site. Lanes 5-8: Tel-KO2 binding site contained in the oligonucleotide substrate has been truncated by three base-pairs on either end. Resulting oligonucleotide contains the central 50 base-pairs of the Tel-KO2 binding site. Lanes 9-14: Tel-KO2 binding site contained in the oligonucleotide substrate has been truncated by seven base-pairs on either end. Resulting oligonucleotide contains the central 42 base-pairs of the Tel-KO2 binding site. Lanes 1, 5, and 9: no protelomerase added. Lanes 2-4 and 6-8: increasing amounts of full length Tel-KO2 were titrated with the corresponding oligomer substrate. Lanes 10-12: A much greater higher concentration of Tel-KO2 used in lanes 2-4 and 6-8 was incubated with the 42 base-pair oligomer. Lanes 13 and 14: Truncated Tel-KO2 (1-531) was incubated with the 42 base-pair oligomer. Efficiency of activity of the full-length and truncated Tel-KO2 enzymes is shown as a percentage of efficiency of full-length enzyme acting on full-length oligomer substrate. Wai Mun Huang lab, Pathology Department, University of Utah, unpublished data 24 pSKN (56 bp) pSKN-7S (42 bp) pSKK (50 bp) 0 0 0 100% 25% < 1% 15% K 640 K 640 K 640 K531 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Length of DNA substrate Resulting % activity Titration of TelK 25 contains an unusually high number of acidic residues and was deemed unsuitable for crystallographic analysis, so structural studies by NMR were proposed in order to elucidate the function of this interesting domain. Research overview Although they are relatively rare, the number of phage and bacterial organisms known to use covalently closed hairpin ends as a way of protecting their linear genomes is growing, and this means of genomic conservation is just beginning to be understood. The creation and maintenance of hairpin ends is accomplished by the action of a single, dedicated enzyme known as a protelomere resolvase, or protelomerase. A sequence alignment of protelomerases from phage and bacterial origins reveals that phage protelomerases are generally longer, and contain an additional C-terminal region consisting of a linker of variable length terminating in a short region of limited conservation. We have called this region the far C-terminal domain (farCTD.) The role of this extended region is not understood, but studies of the protelomerase from Klebsiella oxytoca phage ΦKO2 have shown that truncating the far CTD has drastic effects on enzyme functioning in in vitro studies. Protelomerase constructs lacking the far C-terminal domain can function in vitro to produce hairpins from synthetic DNA oligomers containing the native protelomerase binding site, but the truncated protelomerase functions at only a fraction of the normalized activity of the full length enzyme. Additionally, truncating the DNA binding site in synthetic DNA oligomers also dramatically decreases the efficiency of the protelomerase in a manner that cannot be explained by interaction with the catalytic portion of the molecule described in the solved structure. The function of the far C-terminal region is unknown. However, the 26 comparative in vitro studies of truncated and full-length protelomerase indicate that the far C-terminal region of Tel-KO2 plays an important role in enzyme function, either through interaction with the catalytic portion of the protelomerase, or with the DNA target itself. To understand the role of the far C-terminal domain, we have asked a number of questions. How much of the C-terminal region of phage protelomerases is structured, and what is the structure of this domain? How much sequential and structural conservation is there between the far C-terminal domain of the Klebsiella oxytoca phage ΦKO2 protelomerase and those of other phage protelomerases? What is the likely binding partner of the protelomerase far CTD? And lastly, what relation does the phage protelomerase far CTD have to the bacterial protelomerase? Chapter two reports my work on this project in attempting to answer these questions. We report the solution structure of the far CTD of two phage protelomerases. A comparison of the sequence of these protelomerase domains to the homologous regions of all known phage protelomerases is presented. We also include information about likely structural analogues in other proteins, a report of initial studies seeking to identify the farCTD binding partner, and a comparison to bacterial protelomerases. CHAPTER 2 STRUCTURES OF THE KLEBSIELLA OXYTOCA PHAGE ΦKO2 AND VIBRIO HARVEYI PHAGE VHML PROTELOMERASE FAR C TERMINAL DOMAINS Abstract We present the solution structures of the far C-terminal domains of two phage protelomerases. A sequence comparison of these structures with all known protelomerases is presented, along with an analysis of the sequence and structure of proteins which adopt a similar fold. The solution structures show that the far C-terminal domain of phage protelomerases adopts a fold which is similar to the canonical dsRBD. Based on structure homology and limited sequence conservation, we propose that the domain belongs to the growing family of three stranded β-sheet DNA-binding proteins. Introduction A small but growing number of bacteria and viruses have been discovered to have linear genomes protected by covalently closed hairpin ends. The hairpin ends function like eukaryotic telomeres in protecting the genome from aberrant recombination and repair as well as from exonucleolytic degradation and have therefore been called "protelomeres" (17-22, 43). In organisms which employ this method of genomic 28 protection, bidirectional replication results in a concatenated genome dimer that is resolved into unit lengths by the action of a single, dedicated enzyme called a protelomere resolvase (protelomerase) (23, 24, 26-29, 54). The protelomerase binds as a dimer on a palindromic DNA sequence at the end-to-end junctions of the genome dimers and acts to produce and maintain the hairpin caps following replication. Known protelomerases have a similar architectural domain organization. Protelomerases have a highly conserved central catalytic region, with amino and carboxyl regions of varying length and conservation. Phage protelomerases tend to be longer than their bacterial homologues, and have an additional region of varying length with limited sequence homology at the far C-terminal end of the protein. Although the presence of this far C-terminal region is universally conserved in phage protelomerases, the purpose of the domain is not yet understood. So far, only two studies regarding the far C-terminal domain (farCTD) have been undertaken, and both were in the Klebsiella oxytoca phage ΦKO2 system. Current analyses of the ΦKO2 system show that the far C-terminal region is not required for catalytic activity, but that its absence has a dramatic negative effect on enzyme function. In vitro studies of the protelomerase of the Klebsiella oxytoca phage ΦKO2 (Tel-KO2, also called TelK) have shown that although a truncated protelomerase lacking this far C-terminal region is sufficient to form hairpins from synthetic double-stranded DNA oligomers containing the protelomerase recognition site, the efficiency of the enzyme is severely affected by shortening or removal of the far C-terminal region. Constructs lacking the far C-terminal region function at less than 1/10th the rate of full length enzyme (Wai Mun Huang, personal communication, unpublished data). 29 Other protelomerase analyses are not explained by the current understanding of the catalytic region. The structure of C-terminally shortened protelomerase Tel-KO2 (lacking the far C-terminal region) bound to the central 44 basepairs of its target site was recently described (64). The structure indicates that the catalytic portion of the enzyme extends only to the central 42 base pairs of the normally 56 base pair target site. However, shortening the DNA target site has a dramatic negative effect on enzyme function (Wai Mun Huang, personal communication, unpublished data). In vitro assays measuring the ability of protelomerase to create hairpins from double-stranded oligomers containing the target site show that shortening the length of the DNA target site by as few as three base pairs per side reduces percent enzyme activity to only 25% of that of full-length protelomerase acting on full length (56 base pair) target site. Shortening the target site further (seven basepairs per side, resulting in a 42 basepair target site) reduces enzyme function even more, to less than 1% activity. The solved structure does not explain how the external regions of the DNA affect enzyme efficiency. Interestingly, enzyme constructs lacking the far C-terminal region were able to partially restore enzyme efficiency for shortened DNA oligomers in in vitro assays. Together, the data suggest that the far C-terminal region of protelomerase Tel-KO2 may interact with the distal regions of the DNA palindrome, the catalytic region, or both in a manner that influences protelomerase catalytic ability. Secondary structure predictions of the far C-terminal region of Tel-KO2 and other protelomerases estimate a region of flexibility but the varying length and limited sequence conservation of the C-terminal region make it difficult for automated sequence alignment programs to compare the far C-terminal regions. No further information has been reported about the structure or function of the 30 far C-terminal region of this or any protelomerase. To better understand the role of the far C-terminal region in the functioning of phage protelomerases, we determined which portions of the isolated Tel-KO2 far C-terminal region are well-structured and we report the solution structure of this region. To verify the structure, we also solved the solution structure of a homologous region of a second phage protelomerase. We report that the structures form compact domains that adopt a similar fold and belong to the double stranded RNA binding domain (dsRBD)-like superfamily. We present an alignment of these domains with the far C-terminal regions of all known phage protelomerases, identifying putative homologous far C-terminal domains (farCTDs) and identify conserved residues. We propose by structural homology and limited sequence conservation that the protelomerase farCTD belongs to the small but growing group of proteins which recognize DNA in a sequence-specific way by inserting a three-stranded β-sheet in the DNA major groove. Experimental procedures DNA, plasmids, and cloning DNA encoding the Tel-KO2 far C-terminal region was amplified from established vectors encoding full-length Tel-KO2 (cloning of vectors described in (29)). The length of the initial Tel-KO2 farCTD construct (residues 530-640 in the context of the full-length protelomerase) was selected to overlap the amino-terminal region of the protelomerase that had been studied using X-ray crystallography (64). Comparative 15NHSQC and 1H15N-NOE (heteronuclear NOE) experiments combined with a standard suite of NMR assignment experiments (described below) were used to identify the structured portion of the Tel-KO2 farCTD. The length of the construct selected for 31 structure determination (residues 581-640) was predicted to have a short N-terminal unstructured region. The size of the protein fragment used to study the VHML protelomerase farCTD (Tel-VHML, residues 449-509) was selected to reflect the solved structure of the Tel-KO2 far CTD as determined by direct sequence and predicted secondary structure homology. The encoding DNA was amplified from PCR primers designed by Wai Mun Huang and synthesized as oligonucleotides by the University of Utah DNA/Peptide core facility. The resulting DNA fragments encoding either Tel-KO2 or Tel-VHML farCTD were ligated into NdeI/BamHI sites of T7 promoter-driven pET16b vectors (Invitrogen) to create a template that encoded a fusion protein with an amino-terminal 6X histidine affinity tag followed by a thrombin protease site. By design, the purified proteins contained an additional four amino-terminal residues (G-S-H-M) resulting from the thrombin cleavage site. The expression constructs were verified by DNA sequencing at the University of Utah DNA Sequencing Core facility, and resulting plasmids were transformed into E. coli BL21 (DE3) Gold Competent cells (Novagen.) Protein expression and purification Tel-KO2 or Tel-VHML protelomerase farCTD fusion proteins were isolated from cells grown in two-liter quantities of M9 minimal media supplemented with 130 μg/ml ampicillin and either 2g/liter 13C-glucose and 1g/liter 15NH4Cl or 1g/liter 15NH4Cl alone. Cells harboring the Tel-KO2 plasmid grew at 37°C to a cell density of approximately A[590] = 0.8. Protein expression followed induction by 0.7 mM (final concentration) isopropyl-1-thio-β-D-galactopyranoside (IPTG) for 12-16 hours at room temperature. Cells harboring the Tel-VHML farCTD fusion protein grew at room temperature until 32 cell density had reached A[600]=0.55. Protein expression was induced with 0.25mM IPTG for 16 hours at 18°C. In both cases, cells were harvested by centrifugation and stored frozen at -80°C. Purification of Tel-KO2 and Tel-VHML protelomerase farCTDs was similar, with only minor exceptions. For the Tel-KO2 farCTD, the frozen pellet from one liter of culture was thawed and suspended in 20 ml of lysis buffer containing 25% (w/v) sucrose, 50 mM Tris-HCl (pH 8.0), supplemented with protease inhibitors (0.13 mM benzamidine and 0.6 mM phenylmethylsulfonyl fluoride (PMSF)), 25 mM β-mercaptoethanol (BME), and 1mg/ml lysozyme (Sigma). The resulting lysate suspension, containing 0.6 mg of cell pellet/ml, was incubated at 0°C for approximately 30 minutes. Following sonication (4 X 20 seconds with a micro-probe), the suspension was treated with RNase (150 μg/ml) and DNase (5 μg/ml in 5 mM Mg2SO4) at 0°C for one hour. The following were then added to reach the noted concentration before an overnight incubation at 0°C: Thesit (Boehringer Mannheim) 0.6% (v/v), NaCl (2 M), benzamidine (0.13 mM) and PMSF (0.625 mM.) The suspension was clarified by centrifugation (25,000 rpm and 4°C in an SW40 rotor) to remove cell debris. Soluble protein was purified by nickel sepharose chromatography (Amersham Pharmacia) as follows: clarified lysate was applied (as two batches) to a 12-15 ml column of Ni-NTA agarose (Qiagen) equilibrated with a buffer consisting of 10% (v/v) glycerol, 50 mM Tris-HCl (pH 7.5), 10 mM BME, and 0.1 mM benzamidine. The protein-loaded column was then washed with 10 column-volumes of equilibration buffer which had been supplemented with 500 mM NaCl and 20 mM imidazole. The protein was then eluted from the column with equilibration buffer supplemented with 500 mM 33 NaCl and 800 mM imidazole. Following elution, fractions containing the protein were pooled and successively diluted and reconcentrated using 10,000 MWCO Amicon® Centricon® Centrifugal Filter Devices (Millipore) to remove excess NaCl and imidazole, reaching a final buffer concentration of 100mM NaCl and 50mM imidazole in 50mM sodium phosphate, pH 7.4. The 6x His tag was removed using biotinylated thrombin, and NaCl was added to reach 250μM. The cleavage reaction mixture was then purified first with a streptavidin column to remove the protease, and then a nickel column to remove the cleaved tag as well as any uncleaved protein. Recovery was nearly 100%, as indicated by Bradford assay (using Pentax BSA as standard.) Recovered protein was concentrated using a fresh 10,000 MWCO centricon (Millipore), with successive dilutions with NMR buffer followed by concentration to a final buffer composition of 25mM sodium phosphate, pH6.4, 10mM NaCl in 90% H2O and 10% D2O. Final protein yield, as indicated by Bradford assay, was 0.350mL of 1.0mM protein. The purification of the Tel-VHML farCTD was similar, with the following differences: the concentration of protease inhibitors originally added to the lysis buffer were 0.15mM benzamidine, 0.75 mM PMSF; following nuclease digestion, protease inhibitors were supplemented to reach original concentration and the detergent and sodium chloride were added to reach final concentrations of 0.25% Thesit (Boehringer Mannheim) and 2.5M NaCl. Column purification differed as follows: following lysate centrifugation, the clarified supernatant was diluted with 50mM Tris, pH 7.4 to a total salt concentration of 1M NaCl, and imidazole was added to 10mM before loading onto a nickel sepharose column. The column was washed first with 50mM Tris, pH7.4, 0.5M NaCl then with a low concentration imidazole wash (as before, but containing also 20mM 34 imidazole) before elution with 0.8 M imidazole in 50mM sodium phosphate, pH 7.4 with 0.5M NaCl. Pooled fractions were concentrated using a 3000MWCO Amicon® Centricon® Centrifugal Filter Device (Millipore). Final protein yield, as indicated by Bradford assay, was 0.450mL of 1.0mM protein. NMR spectroscopy: data collection and resonance assignments NMR spectra were recorded on either a Varian Inova 500 MHz or 600 MHz NMR spectrometer equipped with a triple-resonance 1H/13C/15N probe (cryogenic probe, 600MHz; room temperature probe, 500MHz) and z-axis pulsed-field gradients. To optimize resolution and minimize broadening, Tel-KO2 experiments were conducted at 30°C. Tel-VHML experiments were conducted at 25°C. NMR data were processed using FELIX 2004 software (Felix NMR, Inc., Accelrys, San Diego), and the SPARKY software program (T.D. Goddard and D. G. Kneller, University of California, San Francisco) was used to assign resonances using standard approaches with tools developed within this program. Assignment of main-chain atoms, β carbons, and β protons was accomplished from analysis of triple resonance experiments including CBCACONH(3D), HNCACB(3D), HBHACONH(3D), and HNHA(3D)(66). Aliphatic side-chain atoms beyond the β position were assigned from resonances in additional spectra, including CCONH(3D), HCCONH(3D), HCCH TOCSY(3D), and HCCH COSY(3D). Aromatic side-chain assignments were made using CBHD, CBHE, CHSQCTOCSY, and CTHMQCTOCSY(3D) experiments. Side-chain amide resonances were assigned from intraresidue NOEs and HNCACB and CBCACONH spectra. The backbone torsion angle restraints were derived from chemical shift information as evaluated by the TALOS software program (67). NOE connectivities were obtained from 3D [1H,13C,1H] NOESY 35 and 3D [1H,15N,1H] NOESY experiments. Structure calculations Following resonance assignment of backbone and side-chain atoms, the tools in the SPARKY software program were used to identify NOE correlations and intensities from relevant NOESY spectra. Using the resulting NOE correlation data and torsion angle restraints generated as described above, NOE assignments and initial structures calculations were generated using the automated NOE assignment and torsion angle dynamics capabilities of the CYANA software program (version 2.1) (68). Briefly, using the criteria of chemical shift agreement, network anchoring, and consistency with an initial structure, a total of 100 randomized conformers were "folded" into 3D structures by introducing NOE constraints in a step-wise manner. Each conformer underwent seven cycles, with 10,000 steps of torsion angle dynamics per cycle. The lowest energy CYANA calculated structured were further refined in XPLOR-NIH using a simulated annealing protocol that utilizes the hydrogen bond restraints and assignments determined by the CYANA program. PROCHECK-NMR (69) and the validation programs supplied at the PDB deposition site (http://deposit.pdb.org/adit/) were used to substantiate the structures. PDB files will be deposited into the PDB database prior to publication. Structures were visualized and figures created using PyMOL (DeLano Scientific) (70) and MOLMOL (71). Use of sequence/structure databases and sequence alignment Structural homology searches and structural alignments were accomplished using the Dali and DaliLite programs (65). Alignment of complete protelomerase sequences 36 was accomplished using CLUSTAL W (72-75). Coordinates of the dsRBDs of the DNA-binding domains of the integrase protein from conjugative transposon Tn916 (76), I-PpoI homing endonuclease (77), AtERF1 protein (78), and lambda integrase(60), and of the dsRBDs of Xenopus laevis XlrbpA dsRBD2 (79), Drosophila staufen dsRBD3 (80), Aquifex aeolicus RNase III (81) used in structure comparison calculations were obtained from the Protein Data Bank (PDB). Results and discussion Identification of the structured portion of theTel-KO2 far C-terminal region NMR was used to evaluate the suitability of the construct for structure determination, and to design the size and conditions of the construct for study. The length of the initial Tel-KO2 farCTD construct (residues 530-640 in the context of the full-length protelomerase) was selected to include a short region of the amino-terminal region of the protelomerase that had been studied using X-ray crystallography (64). 15NHSQC studies of the initial farCTD construct yielded mixed results. A small number of well-resolved, well-distributed peaks were dominated by a central region of strong but poorly resolved peaks (see Figure 2.1). 15NHSQC profiles with this type of pattern can indicate that a protein is not well-folded or that parts of the protein are unstructured, since the comparative 15N and 1H frequencies of backbone amides of individual amino acids and unfolded peptides are not significantly different. It is the different magnetic environments created by a folded protein which results in altered chemical shifts for individual amide resonances. Poorly resolved peaks can also indicate that multiple amide bond vectors are in very similar environments. The Tel-KO2 far C-terminal region has a 37 Figure 2.1 Analysis of the full far C-terminal domain of TelKO2 to determine suitability of construct for spectroscopic study. A) Sequence profile showing the unusually high number of negatively charged residues. B) Secondary structure prediction as generated by the PSIPRED program (82, 83). The structure predicts a number of helices in the N-terminal half of the peptide, but these are reported with a low level of prediction confidence. C) 15NHSQC of the full far C-terminal domain. Although a number of peaks are well-dispersed and well-resolved, the central portion of the spectrum (bordered by a dashed box) contains a group of peaks with poor dispersion and resolution. 38 A. B. C. Residue Count A Alanine 11 R Arginine 3 N Asparagine 4 D Aspartate 15 C Cysteine 0 Q Glutamine 2 E Glutamate 22 G Glycine 10 H Histidine 4 I Isoleucine 2 L Leucine 5 K Lysine 4 M Methionine 4 F Phenylalanine 4 P Proline 5 S Serine 5 T Threonine 2 Y Tyrosine 3 V Valine 6 W Tryptophan 2 ω1 - 1H (ppm) ω2 - 15N (ppm) 10 9.5 9.0 8.5 8.0 7.5 7.0 6.5 105 110 115 120 125 130 strand helix coil Legend Pred: predicted secondary structure AA: target sequence Conf: confidence of prediction 39 unique compositional profile, being composed of a large number of negatively charged residues. Because of proportionately large number of similar amino acids (the composition of the protein construct is approximately 20% glutamate and 14% aspartate by residue type) and because the secondary structure analysis predicted a number of α-helices (see Figure 2.1), it was unclear based on 15NHSQC studies alone whether the poorly resolved central peaks were indicative of a partly or poorly folded protein, or whether the region of low resolution was a result of the unique construct profile. HNNOE experiments were used to indicate whether the construct was likely to be well-folded. The 1H-15N heteronuclear NOE (HNNOE) experiment can be used as a probe in determining the structural state of a protein. Alone, HNNOE data is not sufficient to completely characterize molecular dynamics, but provides a qualitative indication of the mobility of individual amide bond vectors. As such, it can provide a useful assessment for which residues within a polypeptide are structured and which are not (84, 85). HNNOE experiments indicated that while the central, largely unresolved grouping of the 15NHSQC resonances were from residues undergoing rapid motions characteristic of an unstructured polypeptide, the resolved frequencies had values that indicated that the residues were relatively immobile, characteristic of residues within a well-folded structure. These results were supported by the predicted secondary structure calculations and by initial backbone assignments (vide infra.) To determine whether the folded portions localized to a distinct region within the construct, a 15N13C-labelled protein was purified, and resonances were assigned using a variety of standard triple-resonance experiments. Unstructured protein regions typically yield very strong peaks, and these peaks dominated the spectra of all NMR experiments. 40 Thus, although many of the resolved peaks on the 15NHSQC could be assigned, the obfuscation of the peaks within triple-resonance spectra by the overpowering resonances from the unstructured protein regions prevented a complete structure determination of this construct. The analysis did, however, result in an identification and mapping of the folded regions of the protein, which were localized to the extreme carboxy-terminal side of the far C-terminal construct. Figure 2.2 shows the comparative 15NHSCQ and 15N-HNNOE profiles and the construct residues that were assigned. Based on identification of the folded portion of the far C-terminal region along with the secondary structure prediction analysis, constructs with increasing amino-terminal truncation were designed. The encoded proteins were expressed and purified, and their 15NHSQC profiles were compared to determine the optimal construct length for structure determination. The construct selected for structure determination, consisting of residues 581-640, was still predicted to contain an N-terminal unstructured region, but was selected nevertheless due to its better purification efficiency. The resulting purified protein also contained an additional four amino-terminal residues (G-S-H-M), resulting from the thrombin cleavage site engineered to facilitate purification. Analysis of the 15NHSQC of the resulting construct showed fewer resonances than expected, as well as a range of peak widths with broadening for some individual peaks. Sedimentation equilibrium studies of the first KO2 protelomerase farCTD construct (residues 530-640) had shown a possible multimeric equilibrium (data not shown) so a series of solution and experimental conditions were evaluated by NMR to define the optimal conditions for structure determination. Sample conditions were varied to evaluate the effects of protein concentration, pH, and salt and buffer concentrations. 41 Figure 2.2 Determination of structured regions of Tel-KO2 (530-640). A) 15NHSQC of the full far C-terminal domain. Although a number of peaks are well-dispersed and well-resolved, the central portion of the spectrum (bordered by a dashed box) contains a group of peaks with poor dispersion and resolution. B) 15N-HNNOE of full-length C-terminal region of Tel-KO2 (530-640). Red corresponds to positive peaks, and green corresponds to negative peaks. The 15N-HNNOE indicates that the majority of the peaks in the central region of the spectrum belong to residues with a molecular motion similar to individual amino acids, and are most likely contained in an unstructured region of the peptide, while the well-dispersed peaks belong to residues that are in a structured region. C) Sequence of TelKO2 (530-640) indicating residues for which amide resonances were assigned using a standard suite of triple resonance experiments. All residues with assigned amides (indicated by bold, underlined text) are located in the far C-terminal end of the peptide. D) 15NHSQC of the selected N-terminally truncated construct. The majority of the resonances representing amides in unstructured regions have been removed, while the amides from structured regions are all present. 42 A. B. C. D. MGSSHHHHHHSSGLVPRGSHMVLPDEEILEPMDDVDLDDENHDDETLDDDEIEVDESEGEELEEAGDAEEAEVAEQEEKHPGKPNFKAPRDNGDGTYMVEFEFGGRHYAWSGAAGNRVEAMQSAWSAFK ω1 - 1H (ppm) ω2 - 15N (ppm) ω2 - 15N (ppm) ω2 - 15N (ppm) ω1 - 1H (ppm) ω1 - 1H (ppm) 105 110 115 120 125 130 105 110 115 120 125 130 10.0 9.0 8.0 7.0 6.0 10.0 9.0 8.0 7.0 6.0 10.0 6.5 10.0 6.5 105 130 43 The sample temperature and NMR field strength (500 and 600 MHz) were also varied since these affect spectral appearance in the presence of exchange. While the overall pattern of the 15NHSQC was maintained, different conditions resulted in a surprising variability of the resolution of peaks in different regions of the spectrum. Because it was believed that higher temperatures and lower protein concentration would most likely shift the monomer/multimer equilibrium toward the monomeric form, conditions were selected which mimicked the 15NHSQC pattern changes produced under these relative conditions. Lower experimental temperatures are usually better for sample stability, and can also protect the sample from rapid solvent exchange at exposed regions, which would interfere with the detection of amide resonances. Additionally, higher protein concentrations generally facilitate structure analysis because of better NMR signal to noise ratio. Conditions were selected which created a pattern that mimicked that observed at higher temperatures and lower concentrations without resorting to those extremes for experimental conditions. Although the majority of the resonances were assigned and the structure determined, several expected resonances were unaccounted for (see Table 2.1). These included several resonances that were indeed present, but simply overwhelmed and obscured by the strong signals issuing from unfolded region of the peptide, as well as resonances that were presumably weakened or absent due to excessive broadening. Among the former were overlapping signals from residues in the unstructured region of the protein, including the HB* and HG* resonances of residues E582, E585, and E587. Among the latter were several resonances within a compact region of the protein (residues 616-620.) Loss of signal in this region is likely caused by local dynamics or 44 Table 2.1: Tel-KO2 farCTD581-640 missing or unassigned resonances Residue Missing resonancea Note GLU 582 1HB/2HB, 1HG/2HG In unstructured N-term tail GLU 585 1HB/2HB, 1HG/2HG In unstructured N-term tail GLU 587 1HB/2HB, 1HG/2HG In unstructured N-term tail LYS 589 1HG, 1HE/2HE In unstructured N-term tail LYS 593 1HG/2HG, 1HE/2HE Beginning of structured region LYS 597 1HB/2HB PHE 611 H, HZ PHE 613 HZ ARG 616 HA, 1HB/2HB, 1HG/2HG, 1HD/2HD Region of third β-strand HIS 617 H, HA, HD2 Region of third β-strand TYR 618 H, HA, HE1/HE2 Region of third β-strand ALA 619 HA Region of third β-strand TRP 620 H, HD1 Region of third β-strand GLY 622 H Region of third β-strand PHE 639 HZ 29 missing chemical shift assignments. Assignment completeness 90.5%. Atom description follows PDB nomenclature. 45 intermediate exchange rather than solvent exchange, especially given the high proportion of missing HA resonances (four out of five residues in this region) which are not typically subject to solvent exchange-based signal attenuation. Given previous data suggesting a possible multimeric equilibrium of the far C-terminal region, it may be that this portion of the protein is the site of multimer interaction. Additionally, the structure shows that this region contains an unusually high number of phenylalanines and tyrosines, which may have contributed to signal attenuation due to ring-flipping motions about the Cβ-Cγ bond axis. Several amide resonances within the 616-620 region were absent from the 15NHSQC. Traditional molecular graphics programs such as PyMOL (70) and MOLMOL (71) use the Kabsch-Sanders algorithm (86) to determine secondary structure elements, which essentially looks for specific hydrogen bonding patterns involving amide resonances. Due to the missing amide resonances in this region, secondary structure was not determined in these programs. However, analysis of solved structures has lead to the observation that the existence of secondary structure leads to a characteristic observable shift of Cα, Cβ and Cγ resonances compared to those observed for residues in random coil conformation (87). The values of these shifts indicate a propensity for regions of the structures to populate extended or α-helical conformations. Typically, Cα shifts are better predictors of alpha helical regions, while Cβ and Cγ shifts are better indicators of β-strand or extended conformation. A consensus of the assigned Cα, Cβ, and Cγ shifts for the residues in this region show that this region, although underdetermined, is most likely composed of β-strand/extended region (data not shown.) In the case of the residues with missing 15NHSQC amide resonances, other intraresidue frequencies were observable 46 and assigned from other data sets, such as the side-chain frequencies from the aromatic or aliphatic region of CHSQC experiments. In no case were all resonances for a given residue missing or unobservable. Nevertheless, due to the missing resonances, the structure is unavoidably somewhat underdetermined, particularly in the 616-620 region. To verify the structure and to identify themes of conservation among phage protelomerases, a homologous phage protelomerase far C-terminal region was selected for structural study. A sequence comparison of known phage protelomerases revealed that the protelomerase of phage VHML (Vibrio Harvey Myovirus Like) of the free-living aquatic V. harveyi proteobacterium was the most distant relative of TelKO2 within the phage clade, so this protelomerase was selected. Sequence conservation of the two regions was low (16.4%, with only 9.8% identity and 6.6% homology) as determined by BLAST comparisons (88, 89). Secondary structure prediction for the two domains was similar, however, with two contiguous β-strands followed by a carboxy-proximal α-helix. The VHML far C-terminal region contains fewer phenylalanines and tyrosines, and early tests indicated that the VHML construct designed to be homologous to the Tel-KO2 far C-terminal region did not appear to have an unstructured region or suffer from issues of peak broadening or signal attenuation caused by intermediate exchange or other local dynamics, making it a good candidate for structure determination. Structure of Tel-KO2 and Tel-VHML far C-terminal domains The 2D homonuclear and 3D 13C- and 15N-edited NOESY experiments provided 625 and 1098 conformationally restrictive NOE distance restraints for Tel-KO2 and Tel-VHML far C-terminal regions, respectively. The NMR structure ensembles have a root mean square deviation of 0.51Å (Tel-KO2 farCTD) and 0.43Å (Tel-VHML farCTD) for 47 backbone atoms in structured regions. All prolines were in the trans conformation, as confirmed by 13Cβ and 13Cγ chemical shifts typical of trans-Prolines as part of the CYANA program (version 2.1) (68). A full summary of structural statistics is given in Table 2.2. Both the Tel-KO2 and Tel-VHML far C-terminal regions are well within acceptable ranges in all points of measure: on the basis of agreement between individual structures within ensembles, good geometries, low residual target function energies, and lack of NOE violations, the structures of both constructs are well-determined. However, the Tel-VHML far C-terminal region, which was not hampered by the broadening issues of local dynamics or intermediate exchange as the Tel-KO2 structure was, is better defined, and it verifies and improves on the Tel-KO2 structure. Compared with the 29 missing or unassigned resonances for the Tel-KO2 construct, the Tel-VHML construct has only two unassigned resonances (tyrosine 57 HD1/HD2 and methionine 64 1HE/2HE/3HE) The Tel-VHML structure has more assigned and fewer unassigned NOESY peaks than its Tel-KO2 counterpart, a lower average RMSD to mean coordinates, and more favorable Ramachandran statistics. This is most likely due to the absence of the N-terminal unstructured region in the Tel-VHML construct and better determination of the C-terminal region due to a lack of exchange-based broadened NMR signals that confounded the Tel-KO2 study. As with the Tel-KO2 construct, a consensus of Tel-VHML Cα, Cβ, and Cγ shifts predicts three β-strands before the C-terminal α-helix. For the Tel-VHML construct, however, and all expected peaks were accounted for in the region of the third predicted β-strand. The resulting structure confirms the presence of the third β-strand, and by homology, verifies that the Tel-KO2 construct also contains a three stranded β-sheet. The resulting Tel-VHML and Tel-KO2 structures thus 48 Table 2.2 Comparison of structural statistics of Tel-KO2 and Tel-VHML farCTD structures Parameter value Structure Tel-VHML Tel-KO2 Residues 449-509 581-640 Constraints used for structure calculation Short-range NOEs |i-j|≤1 1554 934 Medium-range NOEs 1<|i-j|<5 208 137 Long-range NOEs |i-j|≥5 416 288 Dihedral angle constraintsa 90 82 Hydrogen bondsb 24 14 Average Deviations from idealized geometry (RMSD(dev)) from XPLORc Bonds (Ǻ) .002(.000) .003(.000) Angles (˚) .435(.003) .441(.004) Impropers (˚) .278(.012) .259(.009) Average CYANA assignments/constraint (CYANA) 1 1 CYANA target function value, Å2 0.29 0.02 Average RMSD to mean coordinatesd: res: 452-509 res: 593-640 Avg. backbone to mean (Å) 0.43 0.51 Avg. heavy atom to mean (Å) 1 1.03 RMSD from ideal geometryz Bond lengths (Å) 0.012 0.012 Bond angles (˚) 1.5 1.4 Ramachandran analysis for structured regionse: procheck procheck Most favored regions (%) 96.6 92.3 Allowed regions (%) 3.4 7.7 Generously allowed regions (%) 0 0 Disallowed regions (%) 0 0 aDihedral angle constraints generated using TALOS (67). bDetermined from an agreement of structures resulting from initial CYANA calculations (68) cThe statistics (average (SD)) calculated for the bundle of the 20 best-energy conformers.c dAverage of the 20 best-energy structures calculated using the program XPLOR using upper limit distance constraints and H bonds from CYANA and dihedral angle constraints from TALOS eDetermined using PROCHECK-NMR (69) 49 adopt the same fold, a compact domain with a βββα topology in which the three β-strands form an anti-parallel β-sheet which packs against and folds around the C-terminal alpha helix. The structures have been called the far C-terminal domains (farCTDs.) The solved Tel-KO2 and Tel-VHML farCTD structures are each represented by an ensemble of the 20 lowest energy conformers in Figure 2.3). Although the farCTDs from the KO2 and VHML protelomerases belong to the same fold, they are nonidentical. The two domains share only limited sequence identity and differ with respect to the length and curvature of the helix. The C-terminal helix in the Tel-KO2 farCTD consists of 13 residues, while the helix in the Tel-VHML farCTD consists of 18 residues. The N-terminal region of the Tel-KO2 farCTD appears to be completely unstructured, while the N-terminal region of the Tel-VHML farCTD is structured and somewhat extended, but does not appear to conform to common secondary structural elements. The relevance of this extended region has yet to be established. The hydrophobic cores of the two proteins are also different. As shown in Figure 2.4, there are more aromatic residues in the Tel-KO2 farCTD core, especially near the C-terminal side of the helix. The aromatic residues appear to be stacked and angled in a way that would enhance the stability of the folded structure. In particular, the W635/F639 and F596/F610 pairs are stacked in a nearly perfectly planar conformation in all five of the best-energy structures, with average distances of slightly less than 5Å between their centroids. F613 is approximately perpendicular to the W635/F639 pair, which would also provide additional stability to the stacked pair (see Figure 2.4.) In the Tel-VHML farCTD structure, only three residues (W455/W483/Y502) are involved in the aromatic network on the C-terminal edge of the alpha helix, and are arranged in 50 Figure 2.3 NMR ensembles of Tel-KO2 and Tel-VHML farCTD structures NMR ensembles of the 20 lowest-energy structures of Tel-KO2 farCTD (A) and Tel-VHML farCTD (C) shown as backbone diagram in stereo. NMR ensembles showing ribbon representation of the five lowest energy structures of the Tel-KO2 farCTD ((B), structured region only, residues 593-640) and Tel-VHML farCTD (D). 51 A. A. B. C. D. N-terminus C-terminus N-terminus C-terminus 52 Figure 2.4 Comparison of the hydrophobic cores of the Tel-VHML and Tel-KO2 farCTDs. A) Ribbon diagrams of the top five structures of the Tel-VHML farCTDs. The hydrophobic residues which make up the interior are shown as sticks. Aliphatic residues are colored wheat while aromatic residues are shown in white. B) The aromatic stacking network of the Tel-VHMLfarCTD. The ten best energy structures are shown, and are represented as ribbon diagrams of the backbone. The side-chains of relevant residues are shown as sticks. Residues W455 (yellow), W483 (green), and Y502 (blue) are oriented in an orthogonal conformation to each other. C) Ribbon diagrams of the top five structures of the Tel-KO2 farCTD. The hydrophobic residues which make up the interior are shown as sticks. Aliphatic residues are colored pale blue, while aromatic residues are shown in white. D) The aromatic stacking network of the Tel-KO2 farCTD. The ten best energy structures are shown, and are represented as ribbon diagrams of the backbone. The side-chains of relevant residues are shown as sticks. The W635/F639 stacking pair is shown in red. F613, which is orthogonal to this pair, is shown in yellow. The F596/F611 pair is shown in green. The interior of the Tel-KO2 farCTD contains more aromatic residues, with an extensive network of stacking interactions both planar and perpendicular. 53 A. B. C. D. W635/F639 stacking pair (red) perpendicular to F613 (yellow). F596/F611 stacking pair (green) W455 (yellow), W483 (green), and Y502 (blue) in orthogonal conformation Tel-KO2 farCTD C-terminus Tel-VHML farCTD C-terminus 54 orthogonal conformations to each other. The Tel-VHML farCTD appears to be additionally stabilized by the interaction of the C-terminus of the longer helix with the somewhat extended N-terminal region. The homologous Tel-KO2 farCTD N-terminal region lacks observable structure. Because of the very limited sequence conservation, differing secondary structure predictions, and the varying length of unstructured linker regions between the farCTDs and their N-terminal protelomerase regions, automated sequence alignment programs have difficulty in recognizing the conserved domain as being homologous among protelomerases. Most prediction programs predict two, rather than three β-strands, some identifying strands β2β3 (as with Tel-KO2 farCTD) and some identifying strands β1β2 (as with Tel-VHML farCTD) which further complicates alignment (see Appendix A for a comparison of Tel-KO2 and Tel-VHML secondary structure prediction and comparative Cα shifts.) Having solved the Tel-VHML and Tel-KO2 farCTD structures, we used the DaliLite program to overlay the structures, and identified several conserved and similar residues (65). The hydrophobic interior of the two proteins was especially well-conserved, resulting in a positionally homologous network of hydrophobic residues (both aromatic and aliphatic) that is shown in Figure 2.5. Aside from the hydrophobic core, the structural overlay revealed additional regions of sequence similarity which appear to be conserved among the other phage protelomerases (see Figure 2.6). The loop between the β1β2 strands, for instance, consists of a GD-rich region which is highly conserved among the other putative protelomerase farCTDs. The loop between the β2β3 strands is shorter than the β1β2 loop, and contains a conserved glycine. The short loop between the β3 strand and the 55 Figure 2.5 Conserved network of hydrophobic residues in the interior of phage protelomerase farCTDs. A) Cartoon showing an overlay of the solved farCTDs. The Tel-KO2 farCTD is shown in cyan, and the Tel-VHML farCTD is shown in red. Hydrophobic residues that are conserved by type and location are represented as sticks. Figure text denotes the comparative residues with (V) indicating that a residue belongs to Tel-VHML and (K) indicating a Tel-KO2 residue. The two structures have an RMSD of 2.1 in comparative regions as calculated by a DaliLite structure comparison (65). B) A sequence alignment of the two domains indicates the location of the similarly conserved residues. A consensus sequence is shown beneath the alignment, in which (%) indicates an aromatic residue and (h) represents a hydrophobic residue (either aromatic or aliphatic). The symbols above the sequence alignment indicate the location of the secondary structure elements identified by structure determination (β indicates β-strand or extended conformation, while α indicates α-helix). An arrow indicates the first residue within the sequences for which DaliLite structure comparisons were calculated. βββββββ ββββββ ββββββββ ααααααααααααααααααα >>>>>>> >>>>>> >>>>>>>> OOOOOOOOOOOOOOOOOOO Tel-VHML DQKVSWPKAKDIKVQ-SKKEGD-MWHVWTEVNGMRWENW-SKGRKTEAVKALRQQYERESAEM Tel-KO2 AEVAEQEEKHPGK-PNFKAPRDNGDGTYMVEFEFGGRHYAWSGAAGNRVEAMQSAWSAYFK | | | | || | h % h h hh h A. B.. 56 Figure 2.6 Conserved residues of the phage far C-terminal domain. A) and B) Structural overlay showing the externally-facing residues which are conserved according to position and type. The overlaid backbones of Tel-VHML and Tel-KO2 are shown in red and cyan, respectively. Views A) and B) are shown with a 90° vertical rotation to each other. Text indicates the identity and sequence number of residues in the context of the full-length protelomerase, with (V) indicating the residue belongs to Tel-VHML and (K) indicating the residue belongs to Tel-KO2. C) An alignment of all known and putative phage protelomerase farCTDs as compared to the solved Tel-KO2 and Tel-VHML structures. The numbers to the right of each aligned sequence indicate the C-most residue of that sequence in context of the full-length protelomerases. Residues which appear to be conserved through a majority of the protelomerases are shown in red and underlined, and are numbered in both the aligned sequences and the displayed overlays (1-11). Underneath the alignment is displayed: first, an indication of the degree of conservation among the domains; next, a consensus sequence; and finally, the location of the conserved residues within the aligned Tel-VHML and Tel-KO2 structures. Symbols indicating consensus sequence, conservation, and structural position are as follows: For the consensus sequence: h: hydrophobic (either aliphatic or aromatic) u: small (Gly, Ala) %: aromatic ^: polar/charged ±: charged +: positively charged „: negatively charged For the degree of conservation: * total conservation : high conservation . moderate conservation For the location of the conserved residues: i: the side-chain extends into the interior of the protein α: the side-chain extends externally from the face of alpha helix e: the side-chain extends externally from the face of the beta sheet l: the residue resides in a loop region 57 1 2 3 45 6 7 8 9 10 11 ---------------βββββββ------βββββββ----ββββββ-----ααααααααααααααααααα VHML QEEDQKVSWPKAKDIKVQ--SKKEGD-MWHVWTEVNGMRWEN-WS-KGR-KTEAVKALRQQYERESAEM 509 KO2 AEVAEQEEKHPGK-PNFKA-PRDNGDGTYMVEFEFGGRHYAW-SGAAGN-RVEAMQSAWSAYFK 640 N15 EEGPEEHQPTALK-PVFKP-AKNNGDGTYKIEFEYDGKHYAW-SGP-ADSPMAAMRSAWETYYS 631 PY54 SDNASDEDKPEDK-PRFAAPIRRSED-SWLIKFEFAGKQYSW-EG-NAESVIDAMKQAWTENME 628 VP882VPAAEKQPKKAQK-PRLVAH-QV-DDEHWEAWALVEGEEVAR-VKIKG-TRVEAMTAAWEASQKALDD 538 Halm VAAAVPKEVAEAK-PRLNAHPQ--GDGRWVGVASINGVEVAR-VGNQAG-RIEAMKAAYKAAGGR 520 VCampVVETKPKDETVIK-PKMKGH-KE-DDGTWLVDVTIEDKSWQISVGKEPKNVMDAFKLAWNEFvyrkalpe…… Conservation: * :.... : .*. : . . . : . : ..*:. ::. . Consensus: K-P^h^u--+^-GDG-%-h-h-h-G--%-------u^--h'Ah+-A%^-%------- Location: i^ie e lll i i i i l i α αiiα i α 90° 3 4 5 6 7 8 10 1 A. B. C. 9 (K) R600 (V) K466 (K) G603 (V) G469 (K) D604 (V) D470 (K) G625 (V) G489 (K) G615 (V) G480 (K) E612 (V) E477 (K) K597 (V) Q464 (K) E629 (V) E493 (K) Q632 (V) K596 (K) S636 (V) Q508 (K) K593 (V) K459 2 11 58 alpha helix is also composed of small residues. The external face of the alpha helix shows considerable conservation according to residue type. When viewed through the helix, a series of charged and polar residues align in a surprisingly linear fashion, especially given the variability in the curvature of the helix. Judging from the aligned putative protelomerase farCTDs, this pattern is nearly universally conserved, and includes, starting from the N-terminal base of the helix, a small residue, a negatively charged residue, and two successive polar/charged residues. Perhaps equally notable as the conservation within the loops and on the face of the alpha helix is the complete lack of conservation on the external face of the β-sheet. There is, however, notable conservation at the beginning and end of strand β1, which consists of basic or polar residues, as well as a universally conserved lysine just prior to the start of the same strand. The calculated electrostatic surface potentials of the Tel-KO2 and Tel-VHML structures are very similar, as calculated by the Adaptive Poisson-Boltzmann Solver (APBS) add-in within the PyMOL program (70, 90). Both structures display a positive surface along the β1/helix face, and a negative surface along the β2/helix face. The calculated surface potentials of these faces are shown for both proteins in Figure 2.7. With the structure of the farCTD solved and regions of conservation identified by structural and sequential alignment, an understanding of the functionality of this domain will be aided by the identification of its binding partner. Initial studies aimed at accomplishing this directive both by surface plasmon resonance and NMR were inconclusive. However, a careful comparative analysis of the farCTD with the sequence and structure of proteins with similar domains gives a good indication of the probable 59 Figure 2.7 Electrostatic Surfaces of the Tel-KO2 and Tel-VHML farCTDs show similarly charged surfaces along homologous faces.. Red indicates a negative charge, while blue indicates a postitive charge. White indicates a neutral charge. The surfaces are generated at partial transparency to show the secondary structure cartoon beneath. The cartoon is colored in a spectrum with the N-terminus beginning in blue and the C-terminus finishing in red. A. Tel-KO2 farCTD oriented to show the most negatively charged surface, along the β3/helix face. B. Tel-KO2 farCTD oriented to show the most positively charged surface, along the β1/helix face. C. Tel-VHML farCTD oriented to show the most negatively charged surface, along the β3/helix face. D. Tel-VHML farCTD oriented to show the most positively charged surface, along the β1/helix face. The two views of each protein are rotated approximately 180˚ relative to each other along the horizontal axis. 60 A. Tel-KO2 negatively (left) and B. positively (right) charged surfaces C. Tel-VHML negatively (left) and D. positively (right) charged surfaces 61 class of binding partner of the farCTD, which will facilitate the design of future studies. A Dali structure search revealed that the protelomerase farCTD structures most closely resemble a fold known as the double-stranded RNA-binding domain (dsRBD), the founding member of the double-stranded RNA-binding domain-like superfamily (91). The dsRBD-like superfamily also consists of the homologous-pairing domain of Rad52 recombinase and the ribosomal S5 protein, N-terminal domain. The structural statistics of the protelomerase farCTDs compare favorably with those of other recently published dsRBDs (92). Proteins which contain dsRBDs can be separated into classes based on their known binding partners. dsRBDs are known to bind RNA, DNA, and other proteins. An analysis of the conserved features of each class of dsRBD will help to indicate which macromolecular class is the likely partner of the protelomerases farCTD, and will aid in the design of future studies. The double-stranded RNA binding domain The classic dsRBD is an αβββα fold in which an anti-parallel three-stranded β-sheet packs against a carboxy-proximal α-helix and is further stabilized by an additional amino-proximal α-helix. The dsRBD was identified in 1992 by St. Johnston et al., who noted repeated regions of sequence similarity in the Drosophila melanogaster gene, staufen, and a Xenopus laevis gene of then unknown function (93). The group used the repeats in both systems to identify a minimum consensus sequence, which they used to search the protein sequence database. In addition to the repeats in staufen (required for the localization of maternal mRNAs) and the X. laevis gene, the product of which was later named Xlrbpa, (for Xenopus laevis RNA-binding protein A, a homolog of human TRBP involved in formation of the RISC complex) additional proteins containing the 62 consensus sequence were identified, several of which were known or thought to bind double-stranded RNA. These included human dsRNA-activated inhibitor (DAI), human trans-activating region (TAR) binding protein (TRBP), and RNase III. The thousands of examples of this domain that have since been identified are from widely diverse origins ranging from archaea, bacteria, and viruses to eukaryotes, in which they are found in both the nucleus and cytoplasm of both plants and animals. Although all dsRBDs share a structural fold, dsRBDs that bind different classes of macromolecules have distinct characteristics that reflect their different modes of macromolecular recognition. Based on principles of sequence conservation, general architecture, and hierarchy of important observed binding regions between the farCTD and other dsRBDs, it is most likely that the farCTD binds DNA. However, it is possible that the domain may also interact with other protein elements, either of protelomerase or other origin. dsRBD as RNA-binder As the eponymous and founding member of the superfamily, RNA-binding dsRBDs define the classical architectural fold (αβββα) and preferentially bind dsRNA over dsDNA and RNA/DNA hybrids. Sequence conservation has been extensively studied and is well characterized (see Figure 2.8). It extends across all four elements of secondary structure and in the β1β2 and β3α2 loops, and is especially high in the three regions involved in direct RNA interaction (94). Classifications of dsRBDs have been made in terms of sequence, the number of copies of dsRBDs within the dsRBD-containing protein, the presence of other intraprotein domains such as a catalytic domain, and RNA-binding ability (95). The five separate dsRBDs in PKR, for instance, are classified as to whether or not they bind dsRNA at all (dsRBDs 1, 3, and 4 do) as well as 63 Figure 2.8 Comparative consensus sequences of dsRNA-binding dsRBDs and DNA-binding dsRBDs with known and putative protelomerase farCTDs. A) The consensus sequence for RNA-binding dsRBDs derived from the sequence logo of the hidden Markhov model (HMM) created by Tian et al. (94). Reprinted by permission from Macmillan Publishers Ltd: [NATURE REVIEWS | MOLECULAR CELL BIOLOGY] (Tian et al, 5, 1013-1023 (December 2004) doi:10.1038/nrm1528 citation), copyright (2004). The HMM was created from all dsRBDs then available in the InterPro database (1,428 total.) Adjacent residues are separated by a line, the thickness of which indicates the probability of an insertion of random residues. The width of each residue indicates the probability of having a residue at that position (narrow residues tend to be deleted at their positions compared to wide residues.) The height of a residue indicates how conserved it is among all dsRBDs. Secondary structures of the dsRBD are indicated above the sequence logo. The three dsRNA interaction regions (regions 1, 2 and 3) are marked below the sequence logo. Intervening sequences between α-helices (α) and β-strands (β) are known as loops: loop 1 is between helix 1 and strand 1, loop 2 between strand 1 and strand 2, loop 3 between strand 2 and strand 3, and loop 4 between strand 3 and helix 2. A consensus sequence is shown below the RNA-binding dsRBD HMM, with residues that are also conserved among protelomerase farCTDs highlighted. B) An alignment of known and putative farCTD sequences is shown, with a farCTD consensus sequence displayed both above and below the alignment. Within the alignment, residues conserved among farCTDs are indicated in red text and underlined, while secondary structure (determined by solution or homology) is highlighted, and is indicated above the alignment: (α, alpha helix; (β, extended or beta strand conformation.) In the consensus shown above the alignment, residues conserved between farCTDs and RNA-binding dsRBDs are highlighted. In the consensus sequence displayed below the alignment, residues conserved between DNA-binding dsRBDs are highlighted. C) A consensus sequence and alignment of DNA-binding dsRBDs. Secondary structure is indicated as described in B. Within the β-strands, externally-facing residues are indicated by colored text. Conserved residues are underlined. The consensus sequence is displayed above the alignment, with residues that are also conserved among protelomerase farCTDs highlighted. All three consensus sequences contain alternating hydrophobic residues in the region of strand β2 which probably reflects conservation of core-stabilizing residues, as well as a small (glycine or proline) residue at the beginning of strand β1 (G or P) and the end of strand β3. Outside of these regions, only one other residue is well-conserved between RNA-binding dsRBDs and protelomerase farCTDs (an alanine in the middle of the C-terminal α-helix.) None of the points of conservation between farCTDs and RNA-binding dsRBDs are in the regions noted for RNA-interaction. Outside of regions of total domain conservation already mentioned, similarity between the protelomerase farCTD and DNA-binding dsRBD consensuses is greatest just N-terminal to the beginning of and at the C-terminal end of the strand β1, which contain conserved positive and primarily positive or polar residues, respectively. In the DNA-binding dsRBDs, these residues interact with the DNA phosphate backbone. 64 A. RNA-binding dsRBD concensus and hidden Markhov Model of 1,428 dsRBDs RNA-binding dsRBD consensus sequence: PK^-L^E-----+----P-Y-------GP--H-P-F-h-V-h-G/P-----G-G-SKK--AE^-AA--AL---L-- B. farCTD sequence alignment and consensus sequence farCTD consensus sequence: ------------K-P^h^u--+----GDG---%-h-h-h-G---%----G--u---h'Ah+-A%^-%------ --------------βββββββββ---------βββββββ---ββββββββ----ααααααααααααααααααα VHML QEEDQKVSWPKAKDIKVQ--SKKE---GD--MWHVWTEVNG-MRWEN-WSK-GR-KTEAVKALRQQYERESAEM KO2 AEVAEQEEKHPGK-PNFKA-PRDN--GDG--TYMVEFEFGG-RHYAW-SGAAG-NRVEAMQSAWSAYFK N15 EEGPEEHQPTALK-PVFKP-AKNN--GDG--TYKIEFEYDG-KHYAW-SGP-ADSPMAAMRSAWETYYS PY54 SDNASDEDKPEDK-PRFAAPIRRS--ED---SWLIKFEFAG-KQYSW-EG-NAESVIDAMKQAWTENME VP882 VPAAEKQPKKAQK-PRLVAH-QV---DDE--HWEAWALVEG-EEVAR-VKIKG-TRVEAMTAAWEASQKALDD Halm VAAAVPKEVAEAK-PRLNAHPQ----GDG--RWVGVASING-VEVAR-VGNQAG-RIEAMKAAYKAAGGR VCamp VVETKPKDETVIK-PKMKGH-KE---DDG--TWLVDVTIED-KSWQISVGKEPKNVMDAFKLAWNEFvyrk… farCTD consensus sequence ----------K-P^h^u--+^---GDG---%-h-h-h-G---%----G--u---h'Ah+-A%^-%------- C. DNA-binding dsRBD consensus sequence and alignment: +-G/P----+^----G----%-h-h------G----hG------------------------- -----------------βββββββ------βββββββ-------βββββββ--ααααααααααααααααααα 1gcc …AVTAAKGKHYR--G--VRQRPW---G---KFAAEIRDPAKNGARVWLGTFETAE-DAALAYDRAAFRMR… Tn9 MSEKRRDNRGRILKT-G--ESQRK---DG---RYLYKYIDSF--GEPQFV-YS… …WKLVATDRVPAGKRDAISLREKIAEL λINT MGRRRSHERRDLPPNLYIRNN---G---YY--CYRDPRT-G-K-EFGL-GRDRRIAITEAIQANIELFS… i-Ppo1 …ALTNAQILAVIDSWEETVGQFP… …LGGGLQGTLHCYEIPLAAPYGVG-FAKN----G-PTRWQYKRTINQV---VHRWGSHTVPFLLEPDNINGKTCTA… α1 Region 1 Region 2 Region 3 α2 β3 β2 β1 65 which bind RNA with the greatest affinity (96, 97). Beyond a preference for dsRNA, the question of binding specificity by the dsRBD must be addressed in diverging but related aspects: RNA sequence specificity, RNA helix geometry specificity, and RNA "secondary structure" specificity. Although most data suggests that dsRBDs do not impart binding specificity, increasing numbers of studies imply degrees of binding preference. At this juncture, it is perhaps only possible to say that while most dsRBDs bind most dsRNA, some dsRBDs bind dsRNA more tightly, and do so with varying exclusivity. Appendix B contains a literature review of the current understanding of the ability of dsRBDs to impart binding specificity to the proteins in which they are located. Multiple structures of dsRBDs in complex with dsRNA have revealed common recognition and binding patterns. As reviewed by Tian et al., in all cases, including the second dsRBD in X. laevis XlrbpA with a non-physiological dsRNA (pdb accession code 1DI2) (79), D. melanogaster staufen dsRBD3 in complex with an RNA hairpin (pdb accession code 1EKZ) (80), A. Aeolicus RNaseIII dsRBD in complex with dsRNA (pdb accession codes 1RC7 and 1YYW) (81), and S. cerevisiae RNase III (Rnt1p) in complex with the 5' terminal RNA hairpin of snR47 precursor (pdb accession code 1T41), the domain binds along a single face of the dsRNA (94). Protein to RNA interactions are primarily to RNA phosphate and 2‟-hydroxyl groups rather than to sequence-specific contacts, and are often water-mediated. Along the face of the RNA, successive minor-major-minor grooves are contacted by protein helix α1 (2‟-hydroxyls contacted), the β3-α2 loop (backbone phosphates contacted), and the β1-β2 loop (2‟- hydroxyls contacted) respectively. Figure 2.9 shows an overlay of three dsRBDs which have been solved 66 Figure 2.9 Comparison of VHML farCTD with reported structures of RNA-binding dsRBDs bound to dsRNA. Overlay of the solved dsRBDs bound to dsRNA with Tel-VHML farCTD as calculated in a DaliLite structure comparison. Xenopus laevis XlrbpA dsRBD2with a non-physiological dsRNA (blue) (79), Drosophila staufen dsRBD3 in complex with an RNA hairpin (yellow) (80), Aquifex Aeolicus RNaseIII dsRBD in complex with dsRNA (green), and Tel-VHML farCTD (red). The structures show the common recognition modes as indicated. Minor groove 2, contacted by the β1-β2 loop (shorter in farCTDs--VHML in red) Major groove Contacted by the β3-α2 loop Minor groove 1, contacted by helix α1(missing in farCTDs--VHML in red) 67 bound to dsRNA, along with our Tel-VHML farCTD structure. Of the three important RNA interaction motifs of the known RNA-binding dsRBDs, only the third, the loop between strand β3 and helix α2, is comparatively structurally well-conserved in the farCTD. Among RNA binders, however, the highly conserved sequence in this region consists of an SKK*AE, which is not present in the farCTD. The first interaction motif, helix α1, is not present in the solved structures of the protelomerase farCTD. It is possible that a helix forms in the unstructured region of the farCTD on binding or that it interacts with a helix from a remote region of the protein, but such interactions are not described for known RNA-binding dsRBDs, for which the secondary structural elements are always contiguous. The second interaction motif, the loop between strands β1 and β2, is the site with the most variability in length for RNA binders, but is nevertheless usually longer than the loop observed in the protelomerase farCTD (94). The RNA binding dsRBDs contain a highly conserved GP*H sequence in this region, which in the farCTD is substituted by a DG-rich sequence (94). The β1β2 turn in DNA-binding dsRBDs, on the other hand, contains a glycine but lacks other notable conservation. dsRBDs as protein binders Some dsRBDs are known to interact with protein elements, although this function is much less well characterized than RNA recognition. The majority of protein-binding dsRBDs are grouped with RNA-binding dsRBDs, and the degree to which the domain is dedicated toward RNA versus protein binding is sometimes a matter of controversy, as in the case of the two dsRBDs of rat ADAR2 (98). However, the dsRBDs in several proteins have been shown to act as dimerization domains, allowing dsRBD-containing 68 proteins to form homodimers or to dimerize with heterologous proteins which also contain dsRBDs (99-103). dsRBDs have also been shown to interact with other, non-dsRBD protein domains (104, 105). Despite the classification of some dsRBDs as protein-binding domains, however, the identification of the principles behind protein binding has remained elusive. A comprehensive or definitive analysis of the characteristics of protein-binding dsRBDs has yet to be accomplished, either by sequence conservation or locational mapping on the domain. St. Johnston originally identified two types of dsRBD based on level of sequence conservation (93, 106). Type A dsRBDs maintained conservation over the entire length of the domain, while type B dsRBDs maintained sequence conservation primarily in the C-terminal region, but less so in the N-terminal sides, with only moderate conservation in the second RNA-interacting region (the β1β2 loop.) While some of the identified type B dsRBDs (such as Xenopus XlrbpA dsRBD3 and human TRBP dsRBD3) have since been shown to bind protein elements, no consistent sequence consensus of this relatively small group has been identified, other than a comparative lack of conservation in N-terminal regions. A consistent or definitive mapping of important protein-binding regions is also lacking. An early study of the dsRBDs of PKR identified the hydrophobic residues on one side of an amphipathic helix as the dimerization site, but a later structure showed that this side of the helix interacted with the three-stranded β-sheet and was buried in the protein, and the mutations (A or L to E) were probably more likely to compromise the entire structural fold rather than simply interrupting dimerization (107, 108). RNA-independent interaction of dsRBDs has been documented primarily by 69 biochemical means, and although a few structures showing dsRBD-protein interactions have been solved, more are needed before a consensus can be described. The current structures provide examples showing different binding surfaces being utilized for protein interaction. Although each of these structures provides only a single example of a protein-binding face, the interactions are worth noting as possibilities for the function of the protelomerase farCTD. The first example shows the only example of a dsRBD/dsRBD interaction. The structure of the human DGCR8 core, containing two dsRBDs and a C-terminal region (pdb accession code 2yt4), was recently determined at a resolution of 2.6 Å (109). In complex with its partner Drosha, DGCR8 cleaves primary microRNA (pri-miRNA) substrates into precursor miRNA and initiates microRNA maturation. As the structure shows, the two dsRBDs interact through the β2β3 loop of dsRBD1 and the C-terminal end of dsRBD2‟s helix α2 (see Figure 2.10). Both domains exhibit many more interactions to a central helix, which is sandwiched between the two dsRBDs, primarily by the C-terminal regions of their second helices (α2). Both dsRBDs also show interactions with other protein elements through one side of their first helices, α1. Interestingly, the regions of the dsRBDs which show interactions to each other and to the sandwiched helix are different than those traditionally associated with dsRNA binding, which may indicate that in this case dsRNA binding and protein binding by a dsRBD need not be mutually exclusive. A model showing dsRNA interaction with an intact DGCR8 core was proposed based on a superimposition of the DGCR8 dsRBDs with the Xlrbpa-dsRNA complex structure (79). Although the dsRNA modeling is compelling, DGCR8 was not solved in the presence of its RNA substrate, and more data is needed to support the biochemical evidence that suggests the protein does not undergo 70 Figure 2.10 Crystal structure of the human DGCR8 Core showing dsRBD/protein contacts. A) The DGCR8 core colored in spectrum with the N-terminus beginning in blue and the C terminus finishing in red. B) The DGCR8 core showing the two dsRBDs and the central helix: dsRBD1 is colored green, and dsRBD2 is colored yellow. The central helix is shown in red. Both dsRBDs show interactions to the central helix with the C-terminal end of their second alpha helices (helix α2.) Loop β2β3 of dsRBD1 interacts with the C terminal end of dsRBD2s helix α2. The dsRBDs also make protein contacts with their first helices, α1, dsRBD1 with β-strands from elsewhere in the core, and dsRBD2 with the central alpha helix. C) The model created by Sohn et al. by superimposing the Xlrbpa-dsRNA complex structure onto the dsRBDs of DGCR8. The molecule is rotated approximately 90˚ around a horizontal axis from the views in A) and B), and shows that the RNA-binding surfaces are different than those that bind protein. Reprinted by permission from Macmillan Publishers Ltd: [NATURE STRUCTURAL & MOLECULAR BIOLOGY] (Sohn, S. Y., Bae, W. J., Kim, J. J., Yeom, K. H., Kim, V. N., and Cho, Y. (2007) Crystal structure of human DGCR8 core, Nature structural & molecular biology 14, 847-853, copyright (2007) 71 B. dsRBD1(green) dsBD2 (yellow) C. dsRBD1 dsBD2 A. 72 extensive remodeling in order to act catalytically with its partner. Nevertheless, the structure provides an example of protein-interacting surfaces of a dsRBD. Proteins with dsRBDs are often highly modular, containing flexible or extended linkers in between domains which has resulted in a less complete understanding of protein binding elements. The family of bacterial RNase III enzymes, for example, contain an N-terminal catalytic domain and a C-terminal dsRBD attached by a flexible linker. Previous crystal structures, such as the Mycobacterium tuberculosis RNase III, solved as a 2.1 Å homodimer, did not contain credible electron density for the carboxy terminal dsRBD, leaving the authors to speculate that the domain was highly mobile with respect to the nuclease domain (110). A recent crystal structure of Aquifex aeolicus RNase III was solved in the presence of dsRNA and showed both the endonuclease and dsRBD engaged with the dsRNA, but there was no interaction between the two domains, which are separated by a seven residue linker (111). The Thermotoga maritime RNase III structure, however, solved at 2.0 Å resolution, shows a second example of a dsRBD/protein interaction. The T. maritime structure was solved as a homodimer without RNA, and shows an apparent interaction between the carboxy-terminal dsRBD and its amino-terminal nuclease domain. The T. maritime RNase III structure was deposited by the Center for Structural Genomics (Wilson, I.A., deposition: 2002-09-13, release: 2002-11-13, publication forthcoming; DOI: 10.2210/pdb1o0w/pdb; pdb accession code 1O0W) and shows that in this case, the primary protein interacting surface of the dsRBD is the face of the second helix α2, which interacts with the nearly parallel helix of the endonuclease domain (see Figure 2.11.) This series of structures, like the model for the DGCR8 core, not only provides an example of a protein-interacting face of a dsRBD, but 73 Figure 2.11 Interactions of the dsRBD of bacterial RNase IIIs A) and B) show the dsRBD-endonouclease interaction of the Thermatoga maritime RNase dimer (pdb accession code 1O0W.) C) and D) show the dsRBD-RNA interaction of the Aquifex aolicus RNase III (pdb accession code 2NUG.) A) One monomer is shown in gray, while the other is colored in spectrum with the N-terminus beginning in blue and the C terminus finishing in red. In the spectrum-colored monomer, the dsRBD is on the upper left-hand side and colored in red and orange B) A close-up of one monomer of the RNase III from T. maritime. The dsRBD is colored in red and orange. Extensive interactions occur between the face of the C-terminal α-helix (red) and the nearly parallel helix of the endonuclease domain (yellow). C) RNase from A. aolicus, crystallized with RNA. The enzyme is oriented with the catalytic domain positioned similarly to the catalytic domain of T. maritime RNase shown in A), and is colored similarly, with one monomer in gray and the other shown in spectrum. The dsRBD of the spectrum-colored monomer is colored in orange and red. The dsRBD (shown in red and orange) has obviously changed orientation (as described in the manuscript presenting the structures.) D) The A. aolicus structure has been rotated to show that the dsRBD (red) is not bound to the catalytic domain (green) but the linker has allowed it to perform its expected function (RNA binding.) 74 Thermatoga maritime RNaseIII A. B. Aquifex aolicus RNase III C. D. 75 implies that a single dsRBD can bind more than one type of macromolecule. Despite the growing understanding of the role of dsRBDs in protein interactions, too few examples are known. Further biochemical and structural analysis will be required before determining principles such as sequence conservation and binding patterns can be identified. Thus, while the possibility of farCTD/protein interaction cannot be ruled out, such a functional role is also not conclusively supported by comparison to known structures. DNA-binding dsRBDs: Nonclassical dsRBDs A small number of proteins containing a non-classical dsRBD have been recently reported to bind dsDNA in a novel way, and comprise a new DNA-binding fold. The proteins include Arabidopsis thaliana ethylene responsive factor domain 1 (AtERF1-DBD) (78), the amino terminal DNA-binding domain of the conjugative transposon Tn916 integrase (Tn916-Int-DBD) (76), and the arm-binding domain of Escherichia coli bacteriophage λ integrase, sometimes called the N-domain. (60). All structures were reported as protein/DNA complexes. The domains present as a variation on the classical dsRBD in that the first helix, α1, is missing, resulting in a contiguous, anti-parallel, three-stranded β-sheet which packs against a carboxy-proximal α-helix (βββα.) A fourth protein, a homing endonuclease from the slime mold Physarum polycephalum (I-PpoI) (77), has been found to bind DNA in a similar way, but differs from the other structures in the architectural arrangement of its secondary structural elements: the helix against which the three-stranded β-sheet binds is not directly carboxy-proximal to the β-strands, but originates instead from a noncontiguous, amino-relative location in the protein. For all four proteins, the DNA binding surface is the face of a β-sheet composed 76 of three contiguous, anti-parallel strands whose comparative lengths (among the proteins) are similar but not identical. The β-sheet inserts into the major groove of its cognate DNA, where alternating residues project toward the interior of the protein or directly out from the face of the β-sheet where they interact site-specifically with DNA bases or with the phosphodiester backbone. The dsRBD is oriented within the major groove such that strand β1 (N to C) is parallel to the proximal DNA backbone which is oriented in a 5‟to 3‟ direction. Strand β3 (N to C) is proximal to the complementary strand‟s backbone, which aligns 3‟ to 5‟ in a parallel manner (see Figure 2.12). The β-sheet adopts a conserved curvature such that the DNA-facing side of strands β1-β2 forms a concave surface, while the β2-β3 DNA-facing surface is convex. This hybrid curvature maximizes the DNA contacting area by molding the protein to the shape of the major groove. The location of the residues within the β-sheet that are important for base recognition varies among the four proteins. A typical three-stranded β-sheet is slightly wider than can be easily accommodated within the DNA major groove, and the four proteins have different strategies to overcome that problem (112). The Tn916-Int-DBD is rotated slightly within the major groove so that strands β2-β3 insert more deeply and contain all but one of the residues responsible for base-recognition, while strand β1 abuts the proximal DNA backbone and contains small externally facing residues that accommodate backbone proximity. The AtERF1-DBD is also rotated slightly within the major groove, but in the opposite direction of the Tn916-Int-DBD, inserting strands β1-β2 more deeply, with small residues on the C-terminal end of strand β3 to accommodate the proximal DNA backbone. The I-Ppo1 structure shows that the central strand of the β- 77 Figure 2.12 Comparison of VHML farCTD with reported structures of DNA-binding dsRBDs bound to DNA. Overlay of the solved DNA-binding dsRBDs bound to dsDNA with Tel-VHML farCTD as calculated in a DaliLite structure comparison. 1z1g: λ-integrase arm-binding domain bound to accessory (arm) DNA (slate) (60); 1tn9: amino terminal DNA-binding domain of the Tn916 integrase protein with its dsDNA binding partner (76, 112) (magenta), 1gcc: Arabidopsis thaliana ethylene responsive factor domain 1 and its DNA binding partner (78) (lime-green); Tel-VHML farCTD (red). The structures show the common recognition modes as indicated. A) A view showing the insertion of the β-sheets into the major groove B) A view showing the similarity of the alignment of the β-strands and of the lengths of the intervening loops. C) A view showing the orientation of the helices compared to the β-sheets. Much more variation is present in helix length and orientation compared to RNA-binding dsRBDs. 78 A. B. C. N-terminal side of strand β3 5‟ 3‟ 79 sheet plays the main role in base-specific recognition, with three of the four externally facing residues making base-specific contacts, and the fourth contacting the phosphate backbone. One residue in each of the first and third strands in the sheet (β3 and β5 in context of the full protein) also make a base-specific contact, and portions of the first and third strand are highly twisted, allowing residues from both to insert into the major groove, which would otherwise only be able to accommodate recognition residues from two of the three strands. A recent solution structure of the λ-Int-N domain, however, shows the opposite; none of the residues projecting outward from the central strand make base-specific contacts with its DNA binding partner. Instead, residues from the first and third strands are primarily responsible for base recognition, as well as a residue which originates from the β1β2 loop. Strand β1 is highly twisted, and also contains a bulge, which appears to allow residues from both strands β1 and β3 to insert into the major groove (113). Unlike comparisons of RNA-binding dsRBDs, the significance of sequence alignments of DNA-binding dsRBDs is in the lack of conservation in key binding regions (see Figure 2.13.) Sequence conservation of DNA-binding dsRBDs is extremely limited (the AtERF1 and Tn916-Int-DBD domains, for example, retain only 12% sequence homology when their secondary structural elements are aligned) especially among the externally-facing residues of the β-sheet. This lack of conservation reflects the ability of DNA-binding dsRBDs to recognize and bind different DNA sequences through base-specific major groove contacts. Only one study has compared the sequences of DNA-binding dsRBDs. In 2000, Connolly et al. reported a comparative analysis of the Tn916-Int-DBD/DNA complex with the previously published AtERF1 and I-PpoI structures 80 A. ---------------->>>>>>---->>>>>>>------>>>>>>----- Ppo1 ...QGTLHCYEIPLAAPYGVGFAKN-GPTRWQYKRTINQV---VHRWGSHTV... eiebe e%eiebb eiei Aterf ...AVTAAKGKHYR---G-VRQRPW-G-KFAAEIR-DPAKNGARVWLGTFET... beieie*eb b%eieiebi eiei bi Tn916 MSEKRRDNRGRILK-T-G-ESQRK-DG-RYLYKYIDSF--GEPQFV-YSWKL... beieieb b%eieiei eiei-e b λINT MGRRRSHERRDLPPNLYIRNN-G-YY--CYRDPRT-G-K-EF-GLGRD... b iibiebe b%--eibii b-ei-ei + G/P + G % h h G h B. -----------K-P^h^u--+--GDG-%-h-h-h-G--%-------u--- -------------->>>>>>>>----->>>>>>----->>>>>>----- VHML QEEDQKVSWPKAKDIKVQ--SKKEGD-MWHVWTEVNGMRWEN-WS-KGR-K... eie ie e e%eieie ieiei ei KO2 AEVAEQEEKHPGK-PNFKA-PRDNGDGTYMVEFEFGGRHYAW-SGAAGN-R... eie ie e e%eieie ?e i ei N15 EEGPEEHQPTALK-PVFKP-AKNNGDGTYKIEFEYDGKHYAW-SGP-ADSP... PY54YC SDNASDEDKPEDK-PRFAAPIRRSED-SWLIKFEFAGKQYSW-EG-NAESV... VP882 VPAAEKQPKKAQK-PRLVAH-QV-DDEHWEAWALVEGEEVAR-VKIKG-TR... Halm VAAAVPKEVAEAK-PRLNAHPQ--GDGRWVGVASINGVEVAR-VGNQAG-R... VCamp VVETKPKDETVIK-PKMKGH-K-EDDGTWLVDVTIEDKSWQISVGKEPKNV... Figure 2.13 Comparison of β-strand residues of DNA-binding dsRBDs and protelomerase far CTDs. A) Sequences of DNA-binding dsRBDs are aligned according to secondary sequence, with the orientation of residues (internally or externally facing) indicated beneath each sequence. Within each sequence, externally-facing residues are indicated with colored text, and residues important for base-specific interaction (as described in the original manuscripts) are shown in bold text and are underlined. Symbols representing orientation of residues are as follows: interior facing residue, (i); backbone-interacting residue, (b); other external facing residues, (e); conserved, interior-facing aromatic residue in strand β2, (%). A consensus sequence is located below the alignment. B) Sequences of Tel-VHML and Tel-KO2 are shown, with orientation of residues in β-strands indicated beneath each sequence as in A. The putative protelomerase farCTDs are also shown, and are aligned according by the relatively high conservation of hydrophobic residues found at roughly alternating positions within the strands. Residues with relatively high conservation are indicated with red text and are underlined. A consensus sequence is located above the alignment. 81 (112). After noting the minimal sequence conservation Connolly instead noted four related, positionally conserved, DNA-protein contacts. These included two interfacial contacts to the phosphodiester backbone (a glycine prior to strand β1 and a positively charged residue at the beginning of strand β2) and two base-specific hydrogen bonds (originating from the second external residue on strand β1, and from a charged residue in strand β2.) The λ-Int-N-domain structure published after Connolly‟s study reduces conservation even further. The conserved anchoring residues noted in Connolly‟s study (the glycine located just proximal to strand β1 and the basic residue which is the first externally facing residue of strand β2) are replaced in the λ-Int-DBD structure by Pro14 and Tyr23, respectively, and there are no specific base contacts originating from strand β2. However, the addition of the λ-Int-N-domain to the fold highlighted conservation that was not as obvious in the comparison of the three prior structures. Not present in the I-Ppo1 structure, but conserved among the three proteins which maintain the dsRBD fold is a basic residue located before strand β1 which appears to interact with the phosphodiester backbone in a manner to further anchor the domain within the major groove. The structures also all contain a conserved basic or polar residue at the end of the strand β1 which is poised to interact with the phosphodiester backbone. In terms of overall architecture, there is much more variability in the DNA-binding structures than in RNA binding dsRBDs. The length of the individual strands which make up the β-sheet varies from protein to protein, and the presence and level of twist among strands varies significantly as well. In AtERF1, a beta bulge in strand β3 serves to accommodate a proper fit against the proximal DNA strand, while λ-Int-N-82 domain has a β-bulge in strand β1. In Tn916, an extra insertion between strand β3 and the alpha helix creates a loop which specifically interacts with bases just 3‟ of the dsRBD binding site, while the λ-Int-N-domain has recently been shown to have an extended region N-terminal to the dsRBD which becomes structured on binding, making two additional contacts in each of the major and minor grooves. The extended segment forms a short 310 helix and then wraps around the DNA strand, inserting into the minor groove (113).) Among the four DNA-binding dsRBDs, the length and orientation of the alpha helix to the β-sheet also display significant variability (see Figure 2.12.) Despite these differences, the general size, conformation, and fit of all structures within the major groove are analogous, and clearly define a structural fold and binding motif. Protelomerase farCTD/dsRBD class comparison Both the observed similarities and variablilities of different aspects of the Tel-KO2 and Tel-VHML structures are more consistent with DNA-binding dsRBDs than with RNA-binding dsRBDs. The protelomerase farCTDs share the overall architecture of DNA-binders (βββα), with comparable lengths for individual β-strands and intrastrand loops (see Figure 2.12.). The β1β2 loop in particular (important in RNA recognition for RNA-binders) is shorter in both DNA-binding dsRBDs and the protelomerase farCTD than in RNA-binding dsRBDs (Figure 2.8). The length and orientation of the alpha helix to the β-sheet show much more variability in the βββα structures than those observed in RNA-binders. The paradigm of two of the three β-strands of DNA-binding dsRBDs being more significant in DNA recognition is also reflected in the protelomerase farCTDs, with the β2β3 strands of the Tel-VHML being longer and more well-defined, while in the farCTD of Tel-KO2 strands β1β2 are better defined. The sequence 83 alignment and secondary structure prediction calculations of the putative protelomerase farCTDs with those of the solved protelomerase farCTDs indicates that they also will also reflect the DNA-binding dsRBD characteristics, including identical architecture, similar strand length, and variability in helix length. Like the DNA-binding dsRBDs, all solved and putative protelomerase farCTDs have a conserved basic residue prior to the start of strand β1, and an additional basic residue positioned to be homologous to the second externally-facing residue of strand β1. All known and putative protelomerase farCTDs except that of Tel-VHML have a conserved proline antecedent to strand β1, reflective of the λ-Int-N-domain (see Figure 2.13). It is especially interesting that the λ-Int-N-domain belongs to the protein with which protelomerases share a catalytic mechanism and significant core domain homology. Lambda integrase is often considered the archetypal member of the tyrosine recombinase family, and it is this family that is most closely related to protelomerases in general. In fact, based on the similar overall genome organizations and close relationships of phages N15, KO2, and PY54 with E. coli phage λ and other lambdoid phages, it has been suggested that these protelomerase-containing phages should be classified as a lambdoid phage subgroup (44). Lamdba integrase has even been noted to produce covalently closed ends at att sites in in vitro experiments where mismatches are present in the core region (114). The λ-Int-N-domain is N-terminal to the two-lobed catalytic and core-binding domains which make up the λ-integrase central region and is attached by a flexible linker. The close relationship of λ-integrase to protelomerases, and their similarity in architecture suggests that the dsRBDs of protelomerase and λ-integrase may serve a related function. The λ-integrase N-domain recognizes and binds accessory 84 DNA (called ARM sites) outside of the core-binding site during integration and excision ((61) for a recent review.) Lambda integrase forms a four-part multimer, and the N-domains (also called ARM-binding domains) appear to influence equilibrium of the recombination reaction by shaping the complex via their interaction with each other in addition to their interaction with accessory DNA (60). The function of the λ integrase N-domain not only suggests that the protelomerase farCTD binds DNA, but provides a precedent for a possible second, protein based interaction in its role. Sequence conservation between a DNA binding dsRBD and a putative phage protelomerases farCTD Conservation of key residues between a known DNA-binding dsRBD and a putative phage protelomerase farCTD along with accompanying cognate DNA sequence conservation further provide compelling indication that the farCTD binds DNA. The Tn916-Int-DBD is the DNA-binding dsRBD for which the most structure/function information is known. Tn916 is a well-characterized conjugative transposon encoding tetracycline resistance that excises and integrates in a manner similar to E. coli bacteriophage λ. The protein that accomplishes both the strand cleavage and rejoining aspects of transposition is the transposon-encoded integrase (Int) which is composed of two domains (115). The C-terminal, catalytic domain is, like the minimally enzymatically active N-terminal region of Tel-KO2, a member of the tyrosine family of site-specific integrases. It recognizes and cleaves inverted repeats at the transposon/chromosome junction. The catalytic domain is delivered to the site of recombination by the site-specific in |
| Reference URL | https://collections.lib.utah.edu/ark:/87278/s60k5hss |



