Nuclear magnetic resonance studies of an intrinsically disordered protein: the N protein of bacteriophage lambda

Nuclear magnetic resonance studies of an intrinsically disordered protein: the N protein of bacteriophage lambda

Title	Nuclear magnetic resonance studies of an intrinsically disordered protein: the N protein of bacteriophage lambda
Publication Type	thesis
School or College	College of Science
Department	Biological Sciences
Author	Bhattacharje, Gourab
Date	2014-12
Description	Nuclear magnetic resonance (NMR) spectroscopy was employed to characterize structural and dynamic properties of bacteriophage ƛN protein (ƛN). ƛN is an intrinsically disordered protein (IDP) that interacts with multiple partners to prevent termination in the phage ƛ-Escherichia coli transcription apparatus. Limited dispersion in the 1H dimension of the 1H-15N heteronuclear correlation spectra confirmed the extensively disordered nature of ƛN. Resonance assignments were made for the amide-15N, amide-1H, 13Cα and 13Cβ nuclei of more than 90% of the nonproline residues at pH 7 and 5.5, which were subsequently used to calculate secondary structure propensities. Residues 2-7 and 55-75 showed propensities to form α-helical structures, whereas the residues 34-47 and 95-107 showed propensities to form extended structures. Previous studies have shown that residues 1-22 of ƛN adopt a helical structure when bound to a site in the RNA transcript (boxB) and residues 34-47 form an extended structure to interact with E. coli host transcription factor NusA protein. We have discovered that the residues 55-75 of ƛN protein, hitherto uncharacterized, have propensities to form transient helical secondary structures. This putative transient helical region spanning residues 55-75 is amphipathic and may form coiled-coil structures, which further suggests a possible structural or functional role of this segment in the antitermination apparatus. To characterize the backbone dynamics of ƛN, 15N longitudinal relaxation rates (R1), transverse relaxation rates (R2), and steady-state 15N-1H nuclear Overhauser effects were measured. Significantly elevated transverse relaxation rates (R2) for the amide groups of residues 55-75 indicated slow conformational exchange in the ƛs-ms timescale, consistent with a transient secondary structure in this segment of ƛN. Faster amide-bond motions were analyzed by mapping reduced spectral density functions, derived from the 15N relaxation parameters, which further revealed backbone motions on two or more timescales, as expected for a nonglobular disordered protein. The results of this NMR study suggest the presence of previously unknown functional domains of ƛN protein, which may enhance our understanding of the phage ƛ-Escherichia coli antitermination apparatus and allow further investigations of the binding mechanisms of IDPs with their interacting partners.
Type	Text
Publisher	University of Utah
Subject	Amphipathic alpha-helix; Antitermination apparatus; Bacteriophage Lambda N; Conformational selection; Intrinsically disordered protein (IDP); Transient secondary structure
Dissertation Institution	University of Utah
Dissertation Name	Master of Science
Language	eng
Rights Management	Copyright © Gourab Bhattacharje 2014
Format	application/pdf
Format Medium	application/pdf
Format Extent	1,680,604 bytes
Identifier	etd3/id/3353
ARK	ark:/87278/s6w69v0d
DOI	https://doi.org/doi:10.26053/0H-FDAX-0S00
Setname	ir_etd
ID	196917
OCR Text	Show NUCLEAR MAGNETIC RESONANCE STUDIES OF AN INTRINSICALLY DISORDERED PROTEIN: THE N PROTEIN OF BACTERIOPHAGE LAMBDA by Gourab Bhattacharje A thesis submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Master of Science Department of Biology The University of Utah December 2014 Copyright © Gourab Bhattacharje 2014 All Rights Reserved The University of Utah Graduate School STATEMENT OF THESIS APPROVAL The thesis of Gourab Bhattacharje has been approved by the following supervisory committee members: David P. Goldenberg , Chair 07.08.2014 Date Approved Martin P Horvath , Member 09.21.2014 Date Approved David F Blair , Member 07.09.2014 Date Approved and by M Denise Dearing , Chair/Dean of the Department/College/School of Biology and by David B. Kieda, Dean of The Graduate School. ABSTRACT Nuclear magnetic resonance (NMR) spectroscopy was employed to characterize structural and dynamic properties of bacteriophage N protein (N). N is an intrinsically disordered protein (IDP) that interacts with multiple partners to prevent termination in the phage -Escherichia coli transcription apparatus. Limited dispersion in the 1H dimension of the 1H-15N heteronuclear correlation spectra confirmed the extensively disordered nature of N. Resonance assignments were made for the amide-15N, amide-1H, 13C and 13C nuclei of more than 90% of the nonproline residues at pH 7 and 5.5, which were subsequently used to calculate secondary structure propensities. Residues 2-7 and 55-75 showed propensities to form -helical structures, whereas the residues 34-47 and 95-107 showed propensities to form extended structures. Previous studies have shown that residues 1-22 of N adopt a helical structure when bound to a site in the RNA transcript (boxB) and residues 34-47 form an extended structure to interact with E. coli host transcription factor NusA protein. We have discovered that the residues 55-75 of N protein, hitherto uncharacterized, have propensities to form transient helical secondary structures. This putative transient helical region spanning residues 55-75 is amphipathic and may form coiled-coil structures, which further suggests a possible structural or functional role of this segment in the antitermination apparatus. To characterize the backbone dynamics of N, 15N longitudinal relaxationiv rates (R1), transverse relaxation rates (R2), and steady-state 15N-1H nuclear Overhauser effects were measured. Significantly elevated transverse relaxation rates (R2) for the amide groups of residues 55-75 indicated slow conformational exchange in the s-ms timescale, consistent with a transient secondary structure in this segment of N. Faster amide-bond motions were analyzed by mapping reduced spectral density functions, derived from the 15N relaxation parameters, which further revealed backbone motions on two or more timescales, as expected for a nonglobular disordered protein. The results of this NMR study suggest the presence of previously unknown functional domains of N protein, which may enhance our understanding of the phage -Escherichia coli antitermination apparatus and allow further investigations of the binding mechanisms of IDPs with their interacting partners. Thesis dedicated to my beloved friends: Pigaret-sin, Sanja-gan and Syad-lan! TABLE OF CONTENTS ABSTRACT…………………………………………………………………………….. iii LIST OF FIGURES……………………………………………………………………. viii ACKNOWLEDGEMENTS……………………………………………………………... ix CHAPTERS 1. INTRODUCTION……………………………………………………………… 1 1.1 Intrinsically disordered proteins…………………………………………….. 1 1.1.1 Sequence-structure-function paradigm and its contradiction by IDPs 2 1.1.2 Comparisons of the physical properties of IDPs and folded proteins 4 1.1.3 Significance of studying IDPs……………………………………….. 7 1.2 Bacteriophage N protein……………………………………………………. 9 1.2.1 Bacteriophage -E. Coli transcription antitermination apparatus… ..10 1.2.2 Events before transcription antitermination………………………… 10 1.2.3 Specific interactions of N in the antitermination apparatus……….. 12 1.2.3.1 N protein-boxB mRNA……………………………………..... 12 1.2.3.2 N protein-NusA protein……………………………………… 14 1.2.3.3 N protein-RNAP……………………………………………... 16 1.3 Aim of this study…………………………………………………………… 16 2. MATERIALS AND METHODS………………………………………………. 18 2.1 NMR sample preparation…………………………………………………… 18 2.2 NMR spectroscopy…………………………………………………………. 19 2.3 Helical wheel plot and coiled-coil predictions……………………………… 21 3. RESULTS………………………………………………………………………. 22 3.1 Resonance assignment……………………………………………………… 22 3.2 Proline isomerization……………………………………………………… 24 3.3 Secondary chemical shifts………………………………………………… 28 3.4 The helix formed by the residues 55-75 is amphipathic………………….....33 3.5 15N relaxation measurements……………………………………………….. 35 3.6 Mapping of spectral density functions……………………………………… 39vii 4. DISCUSSION…………………………………………………………………. 49 4.1 Evidence of disorder………………………………………………………... 49 4.2 Secondary structures…………………………………………………………50 4.3 Dynamic properties………………………………………………………… 55 4.4 Conclusion…………………………………………………………………...57 APPENDICES A. RESONANCE ASSIGNMENTS OF THE 13C, 13C, 1HN and 15N NUCLEI OF N AT PH 7……………...…………………………………... 59 B. RESONANCE ASSIGNMENTS OF THE 13C, 13C, 1HN and 15N NUCLEI OF N AT PH 5.5……………………………….…………........ 63 REFERENCES…………………………………………………………………………. 67 LIST OF FIGURES Figure Page 1.1: boxB mRNA1-15-N 2-36 peptide structure derived from NMR studies……….......13 3.1: 1H-15N HSQC (600 MHz) spectrum of 1mM N in 50 mM phosphate buffer at pH 7 and 25°C showing 12 resonance assignments……………………. 23 3.2: 13C secondary chemical shifts of N at pH 5.5 and 7………………………….... 29 3.3: Secondary structure propensity (SSP) scores of N at pH 5.5 and 7…………..… 31 3.4: Model representation of the SSP scores at the N-terminal region (pH 7) mapped onto the NMR structure of N2-36 peptide when bound to boxB mRNA……...…. 32 3.5: Amphipathicity of the transient -helix formed on residues 55-78 of N…..….. 34 3.6: Relaxation parameters of 1mM N (600 MHz) in 50 mM phosphate buffer at pH 7 and 25°C…………………………………………………………………. 36 3.7: Transverse relaxation rates (R2) of 1mM N at 500 and 600 MHz (50 mM phosphate buffer, pH 7 and 25°C)………………………………………………... 38 3.8: Spectral density mapping (J0.87H, JN, and J0) of N (relaxation parameters measured at 600 MHz, in 50 mM phosphate buffer, at pH 7 and 25°C)………..... 43 3.9: Illustration of Lipari-Szabo (LS) mapping and backbone dynamics analysis of a folded protein by LS mapping, taken from Hanson et al. 113……………………... 44 3.10: Lipari-Szabo (LS) mapping analysis of N at pH 7 (relaxation parameters measured at 600 MHz, in 50 mM phosphate buffer and at 25°C)……….……… 46 3.11: Lipari-Szabo (LS) mapping analysis of N at pH 5.5 (relaxation parameters measured at 600 MHz, in 50 mM succinate buffer and at 25°C)……....................47 ACKNOWLEDGEMENTS I would like to express my sincere gratitude to my thesis adviser Dr. David Goldenberg for his continuous support, encouragement and patience during the completion of my thesis. He introduced me to the exciting field of intrinsically disordered proteins (IDPs), explained the intricate concepts of protein structures and NMR, and always patiently answered my silliest questions, being enthusiastic and inspirational about any idea or suggestion. He not only honed my approach towards scientific problems, but also helped me improve my writing skills. On a personal note, he stood by me at that phase of my life when very few things went according to my plans. As a mentor and thesis advisor, he has been more than I could ever ask for. I am thankful to Dr. Jack Skalicky of the Biochemistry department, who helped with the NMR work, from setting up the experiments and collecting the data, and guiding me to solve the NMR assignments. I am also thankful to Brian Argyle for teaching me the wet-lab protocols of NMR sample preparation during my early days. I am grateful to Dr. Martin Horvath, one of my thesis committee members, for always being kind, helpful, supportive, and enthusiastic. Besides his valuable advices on this present work, I also learned several concepts in biochemistry from him when I was a teaching assistant in his biological chemistry class. I want to thank Dr. David Blair, the other thesis committee member, for being patient, caring and encouraging. I could not have reached the completion of my thesis without the unstinted support from all myx thesis committee members. I want to thank Shannon Nielsen and the other staffs in the Biology department for being very helpful with all the administrative procedures. I am also grateful to Dr. Neil Vickers, the chair of the department of Biology, who provided me crucial financial support to finish my thesis. I want to thank my lab mates, Sally, Sabrina, Brian, and Tim, who always maintained friendly and creative atmosphere in lab. I must make a special mention of Kigen, a fellow graduate student in the Biology department, for being a supportive and a brotherly friend through good and bad times. I am especially thankful to my friends Parth, Aayush, Debosmita, Tosifa, Cheenu, Rahul, Javed, Tiwary-G, Vanshaj, James, Anusha, Sundar-G, Tridib-da, Balu, Riddhita-di, Anamika-Di, Ankit, and Manas for making my stay at Salt Lake City interesting and memorable. I thank my relatives, especially, mama-ra, mami-ra, masi-ra, meso-ra, gabbu-da, jintu-da, nantul, tumpa-di, and tani-di, who were always with me through thick and thin. I am also thankful to my teachers from Saha Institute of Nuclear Physics (SINP), Indian Institute of Technology- Kharagpur (IITKGP) and Howrah Vivekananda Institution, whose teachings have become integral part of my life in many ways. I am really thankful to my friend Lipika, who came into my life at the worst phase of my life and provided me support and inspiration, at the time when it was required the most. Last but not the least, I thank my parents for their genuine attempt to make a decent human being out of me and their constant encouragement and faith in me, which provides me the daily strength and motivation to pursue science. CHAPTER 1 INTRODUCTION 1.1 Intrinsically disordered proteins Intrinsically disordered proteins (IDPs) represent a recently recognized class of biological macromolecules1 and have enhanced our understanding of the functional mechanisms of proteins. Since at least the 1930s, biophysical and biochemical experiments have indicated that proteins generally assume stable folded three-dimensional structure in order to become functional2, 3. This folded structure is determined by the amino acid sequence4, and this causal relationship among sequence, structure, and function has been popularly considered as a paradigm of protein biochemistry5. However, with the discovery of IDPs, growing evidence suggests that a protein does not always require stable folded structure in order to execute its function; instead, some proteins function while in a disordered state, interconverting among many different conformations under physiological conditions. In the following subsections, the sequence-structure-function paradigm and apparent contradiction to this paradigm by the discovery of the IDPs are briefly reviewed, the physical properties of IDPs and folded proteins are compared, and, the biophysical and biomedical motivations behind studying IDPs are discussed. 2 1.1.1 Sequence-structure-function paradigm and its contradictions by IDPs As has been mentioned frequently5, 6, 7, the foundation of the sequence-structure-function paradigm was established by Emil Fischer in 1893, when he hypothesized the lock and key model of enzyme specificity8, long before an actual three-dimensional structure of a protein was determined9. In this model, it was proposed that the enzyme-substrate specificity is determined by accurate complementarities of the three dimensional structures of the enzymes and the substrates. By 1925, observations showing that the loss of protein activity due to heat or altered solution conditions could be reversed supported the idea of a defined native structure determining protein activity10. Although the evidence of helical and coiled-coil structures was clear by 195211, 12, it was the advent of protein crystallography which reinforced the classical sequence-structure-function paradigm by 196513, 14, 15, 16. Atomic level crystal structures of myoglobin9, 15, hemoglobin13, 16 and hen-egg lysozyme14, clearly showed that the secondary structures form sequentially long-range contacts to assemble globular tertiary structures, which define the biologically active native structures of proteins. However, X-ray crystallography, the very tool used to determine atomic level structures, introduced a bias that favored the study of proteins with stable three dimensional structures. This helps to explain why challenges to the sequence-structure-function paradigm were slow to emerge. Although X-ray crystallography initially corroborated the classical paradigm, sometimes loop regions that are important for functional activity were found missing in protein structures determined by this method. As mentioned by Uversky and Dunker17, 3 the very first case of missing residues of significant size was observed in the structure of the extracellular nuclease of Staphylococcus aureus18. The reason for this missing electron density appears to be variability in the positions of these residues in different molecules making up the crystalline array5. Missing residues in this and other crystal structures provided the first examples of disorder or flexibility5, 18, 19. Later on, simultaneous studies on the same proteins by nuclear magnetic resonance (NMR) spectroscopy and X-ray crystallography confirmed the flexibility of residues that were invisible in crystal structures20. Early NMR spectroscopy studies on proteins, published in 1978, showed that some biologically active regions are flexible, lacking stable and defined structures in solution21, 22. The importance of the disordered regions was further emphasized by protein folding studies, in which partially folded intermediates, missing crucial native tertiary contacts and having increased flexibility of the loop regions and the end terminals, were shown to interact with membranes, ligands, and chaperones23, 24, 25. These observations indicating that the disordered regions of proteins are sometimes involved in binding macromolecules26, and multiple interacting partners27 supported a new notion that functional proteins do not always require unique folded structures. Advances in NMR spectroscopy further revealed that some biologically important proteins, such as tau or prothymosin- behave like random-coil polymers and lack any of the secondary structures characteristic of globular proteins28, 29. In the late 1990s, various experimental groups reported that several proteins associated with transcription, translation, cell-cycle regulation, and signaling remained disordered under a wide range of solution conditions1, 30, 31, 32. The disordered regions identified experimentally were associated with particular trends in amino acid 4 compositions32. These correlations were used to develop algorithms to predict disordered regions from amino acid sequences33. Advances in genome sequencing, bioinformatics and computational approaches resulted in a surge in the number of proteins predicted to be partially or entirely disordered in their active states1, 33. Bioinformatics predictors have estimated that more than 5% of all proteins in prokaryotes and approximately 35-50% of all eukaryotic proteins have stretches of 40 or more disordered residues34, 35. These proteins have variously been described as "rheomorphic, flexible, mobile, partially folded, natively denatured, natively unfolded or intrinsically unstructured" 1, 6, 7, 36, 37; however, "intrinsically disordered proteins" was popularly chosen in a recent conference in 2010 as the most explanatory and least confusing name to describe this new class of proteins1. In this way, the discovery of the IDPs, along with the advancement of experimental and computational tools, strongly challenged the core idea of the classical sequence-structure-function paradigm. 1.1.2 Comparisons of the physical properties of IDPs and folded proteins The newly discovered IDPs and classically known folded proteins differ in their physical properties. A significant difference between IDPs and folded globular proteins lies in their amino acid compositions. In folded globular proteins, significant numbers of nonpolar residues collapse to form the hydrophobic cores, and polar and charged amino acids generally reside at the surface and interact with water molecules. In contrast, IDPs contain smaller proportions of hydrophobic residues, resulting in the absence of a hydrophobic core and higher proportions of polar and charged residues, often resulting in 5 net positive or net negative charge at physiological pH36. The repulsion of these charged residues and the lack of hydrophobic residues in IDPs favor more extended structures compared to folded globular proteins. Amino acid sequence analysis further showed that IDPs can be clearly distinguished from globular proteins based on mean hydrophobicity and net normalized charge38. The fundamental and defining difference between an IDP and a folded globular protein is that the ensembles of an IDP contain an extremely large number of interconverting conformations, whereas the degree of conformational heterogeneity of a folded globular protein is restricted. Measurements of structural properties by standard methods such as circular dichroism (CD) or small angle X-ray scattering (SAXS) provide average ensemble information for both IDPs and folded globular proteins. In the case of a folded globular protein, however, the ensemble members have very similar structures, especially when examined at low resolution (< 3Å); thus CD or SAXS measurements describe a principal, static structure. However, in the case of an IDP, the ensemble conformations differ from each other significantly; thus the structural information obtained from CD or SAXS measurements is an ensemble average, and cannot be described as a single static structure6. Various conformations of an IDP are dynamic, i.e., they interconvert on many different timescales5. The conformational ensembles of unbound IDPs have been broadly classified, based on size-exclusion chromatography and CD studies, in three groups-molten globule, premolten globule and coil-like39. In contrast, the structure of a folded globular protein is a compact ordered conformation in its free state, and if denatured, can assume any of the three different conformations-molten globule, premolten globule, and 6 coil-like. Degree of compactness in an IDP has been correlated with various factors related to amino acid composition and sequence-such as net charge and fraction of proline residues, which contribute to less compact structures, and poly-histidine tags, which have been shown to favor more compact structures40. Molten globule-like conformations, which were initially identified as folding intermediates between globular folded proteins and coil-like unfolded proteins5, lack rigid tertiary structures, but contain regular secondary structures41. Premolten globules and coil-like conformations are considered to be more extended5. Although they are less compact than molten globule conformations, premolten globule conformations still contain some amount of residual or partial secondary structures. By contrast, coil-like conformations possess very low amounts of secondary structures42. Comparisons of hydrodynamic radii among molten globule-like, premolten globule-like and coil-like conformations showed that sequence length also influences the compactness of an IDP43. The extended structures and conformational heterogeneity of some disordered proteins are believed to enable them to bind with multiple partners37. In some cases, IDPs assume interwoven complex structures in macromolecular interactions that could not be achieved if the proteins had rigid globular structures44. Disordered regions are predominantly found in large macromolecular complexes, such as cytoskeletal proteins, ribosomes, or histones, interacting with other proteins and nucleic acids7. It has been suggested that the extended conformations of IDPs allow poly-peptides to expose larger surface areas and higher capture radii, than they would as folded globular proteins, enhancing their abilities to form complexes and making them more susceptible to regulations by posttranslational modifications and proteolysis1. 7 1.1.3 Significance of studying IDPs Studying IDPs bears crucial biological importance34. Protein-protein45, protein-DNA46 and protein-RNA networks47 are frequently organized around hubs48 composed of small numbers of proteins connected with multiple partners. Several hub proteins use disordered regions to interact with their partners. Some are mostly disordered in solution, such as -synuclein, high mobility group protein A (HMGA), and synaptobrevin, whereas others are partially disordered, such as Mdm2, subunit A of calcineurin, and p5349. Examples of processes regulated by disordered hub proteins include transcription, cell signaling, stress responses, phosphorylation, chaperone activity, and the storage of small molecules7. It has been suggested that the roles of disordered proteins increase as the complexity of an organism increases36. As IDPs are involved in the regulation of crucial biological processes, mutations or malfunctions in IDPs are associated with diseases in a large number of cases1. In the case of cancer, it has been hypothesized that structural differences in the disordered segments of crucial hub proteins may play significant roles. The tumor suppressor protein p53 contains disordered N-terminal and C-terminal ends and binds to multiple partners using the disordered regions50. Residues 367-388, at the C-terminal end of p53, can either adopt an -helical structure with S100bb, or a -sheet structure with sirtuin, or two different irregular structures while interacting with CBP and cyclin A250. Altered structures in the disordered regions may lead to differences in the resulting complexes, which may escape regular surveillance mechanisms of the cell51. Improper conformations of IDPs can result in amyloid-like aggregates, which are commonly observed in neurodegenerative diseases, such as in prion disease, Alzheimer's disease, Huntingtin's 8 disease and Parkinsons's diease1, 52. In Alzheimer's disease, intrinsically disordered tau proteins are hyperphosphorylated, and subsequently dissociate from microtubules to form neurofibrillary tangles, which aggregate and spread throughout the brain in the later stages of the disease28, 53. Changes in post-translational modifications and the absence of interaction with regular partners are possible explanations for these conformational diseases1. Moreover, mutations in the disordered regions can lead to changes in important post-translational modification sites, which can cause diseases54. A few virulent strains of disease causing viruses are found richer in intrinsic disorder55. IDPs have also been implicated in several other diseases including cardiovascular diseases, diabetes, and autoimmune diseases1, 7, 17. Improved knowledge of the structures and the functions of IDPs could help us decipher crucial physiological processes, and in turn enhance our ability to design better therapeutic drugs. Besides being biologically and biomedically important, studying IDP contributes to a more comprehensive understanding of protein folding. Understanding how the amino acid sequence of a protein specifies its folding into a defined tertiary structure requires a better understanding of disordered conformations, which represent the starting point of the folding processes. Furthermore, as seen in several biological processes such as cell cycle control, transcription, and translation, the transition from disordered to ordered structures upon the binding of IDPs with their partners involves highly accurate control of thermodynamic energies37. Knowledge of the determinants of thermodynamic stability is important to understanding the stability of folded proteins in general and their transitions from folded to unfolded states. 9 1.2 Bacteriophage N protein The N protein of bacteriophage N) is an intrinsically disordered protein that plays a critical role in phage development by suppressing normal transcription termination signals of the host, Escherichia coli56. N contains 24 positively charged residues and 9 negatively charged residues of 107 total residues. Previous CD studies of N demonstrated a deep trough at ~200 nm, indicating random coil-like disordered structures with no hint of ordered secondary structures57. Small-angle X-ray scattering (SAXS) studies indicated that the radius of gyration of N falls into the regime of the extendedly disordered, coil-like proteins and is insensitive to the denaturant urea58. Previously published two-dimensional 1H-15N heteronuclear single quantum correlation (HSQC) spectra of N showed poor dispersion in the 1H dimension, further indicating a disordered structure59. In the antitermination apparatus, N interacts with a nascent phage transcript, a host transcription factor and the host RNA-polymerase (RNAP), and subsequently suppresses transcription termination signals. In this work we have studied N as a model IDP that interacts with multiple partners. The following section provides a brief background of the previous structural and functional studies on N and is divided into three parts. The first part deals with the different components of the phage -E. coli transcription antitermination apparatus, the second part describes the known sequence of events that occur before antitermination, and the third part briefly addresses various interactions of N with three of its partners in the antitermination apparatus-boxB mRNA, NusA protein and RNA polymerase (RNAP). 10 1.2.1 Bacteriophage -E.coli transcription antitermination apparatus Transcriptional termination is a crucial mechanism that prokaryotic cells employ to control normal gene expression60 and also helps bacteria to defend themselves against infecting bacteriophages. To counter this transcriptional termination system in the host E. coli, bacteriophage  engages its N protein, which positively regulates genes expressed from the phage promoters pL and pR and maintains the balance between the lytic and lysogenic cycle61. N forms a ribonucleoprotein complex that alters the normal role of E. coli transcription complex such that it skips through the terminator sequences of the crucial genes of phage 56. The antitermination complex contains RNAP; transcription factors NusA, NusB, NusE and NusG; and the N-utilization (nut) segments of the nascent phage transcript, composed of two parts-boxA and boxB56. Interestingly, in vitro transcription assays showed that N, boxB mRNA and NusA protein are sufficient to effectively mute terminators, which can be located hundreds of base pairs away from the boxB site62. However, the fully effective antitermination apparatus further contains E. coli transcription elongation factors NusB, NusG, ribosomal protein NusE (S10) and can suppress terminators in vitro and in vivo placed thousands of base pairs following the boxA and boxB sequences62. 1.2.2 Events before transcription antitermination A basic understanding of the events before N mediated transcription antitermination requires some knowledge of RNAP, the key enzyme involved in transcription, and the early events of transcription, i.e., initiation and elongation. The core 11 of RNAP is conserved among eukaryotes and prokaryotes63. Much knowledge about RNAP and E. coli transcription have been derived from crystal structures of E. coli RNAP subunits64, the sequence homology, and available crystal structures of the homologs (such as Thermus aquaticus RNAP)63, 65, electron microscopy, ab-initio structural predictions66, proteolytic cleavage experiments67, deletion mutations, in vitro transcription assays68, and various other biochemical and biophysical methods (mentioned in the references of Severinov et al.63). RNAP of E. coli, with an approximate molecular mass of 400 kDa, consists of five subunits ('). The active site residues lie in the  and ' subunits; however, correct RNAP assembly requires the N-terminal domains of the two  subunits63. Although not essential for RNAP functionality, the  subunit supports the whole assembly. In the early phase of transcription,  factors associate with the RNAP core to form the holoenzyme. The holoenzyme then binds to the promoter DNA sequence to form an open transcription initiation complex63. Several transcription factors bind to the open initiation complexes to form early and late elongation complexes to assist in the RNA synthesis63, 69. Chromatin immune-precipitation with microarray (ChIp-on-chip) experiments revealed that the first of these transcription factors to interact with RNAP is the NusA protein70. The N-terminal domain of NusA interacts near the RNA exit channel of RNAP, and the AR2 domain binds with the C-terminal domain of one of the  subunits71. Once an early elongation complex is formed with NusA and RNAP, the other transcription factors, NusG, NusB and NusE, join the assembly while the  factors dissociates from RNAP72. In vitro transcription assays suggest that the formation of the  antitermination 12 complex starts as soon as the boxA mRNA is transcribed and proceeds with the simultaneous association of NusB protein with boxA mRNA73, 74 and dimerization of the NusB-NusE proteins73. Immediately after boxB mRNA is transcribed, N interacts with boxB mRNA, NusA and RNAP to form the active antitermination complex, which allows transcription through the normal terminator sequences73, 75. 1.2.3 Specific interactions of N in the antitermination apparatus Interactions of N protein with boxB mRNA, NusA, and RNAP are briefly discussed below. 1.2.3.1 N protein-boxB mRNA The association of N and boxB mRNA is essential for efficient transcription antitermination76, 77, and this interaction has been studied extensively57, 59, 76, 78, 79, 80, 81, 82, 83. The N-terminal segment of N contains a positively charged arginine-rich-motif (ARM), which binds negatively charged boxB mRNA. Circular dichroism (CD) spectroscopy studies indicate that any structural changes on N protein due to boxB mRNA interaction is limited to the N-terminal 36 residues59. NMR studies on N1-22 and N1-36 peptides have shown that residues 1-22 form a bent -helix upon binding with the boxB mRNA (Fig. 1.1)57, 80. NMR analyses by chemical shifts, backbone coupling constants and nuclear overhauser effect (NOE) indicate that the residues E9RRA12 form a bent structure81. Structures derived from NOE constraints indicate that the -helical region of the N1-22 or N1-36 peptide is specifically bent on the residue R1157, 80. Three 13 Figure 1.1: boxB mRNA1-15-N 2-36 peptide structure derived from NMR studies: N-terminal residues 3-22 of  protein (shown in blue cartoon) are helical when bound to boxB mRNA (shown in purple ribbon). The nucleotides G1, A7, A9 of boxB mRNA are shown in cyan, and the residues A3, W18 and N22 of  protein are shown in green (image constructed in pymol with use of coordinates from 1QFQ.pdb57). 14 out of the five arginine residues (T5RRRERR11) in the ARM of N protein have been shown to play major roles in recognition and subsequent formation of the intermolecular H-bonding (R7 and R8) and electrostatic interaction (R11) with the negatively charged RNA57, 80, 81. Upon binding to the N-terminal region of the N protein, the boxB mRNA pentaloop (G6AAAA10) folds into a stable gnra tetraloop, with the base A9 extruded out80. Sedimentation equilibrium studies indicate a 1:1 molecular stoichiometry of the N1-22 peptide or the full length N protein while bound to boxB mRNA59, 83. In addition to the arginine residues, six other residues of N, A3, Q4, K14, Q15, W18 and K19, also form contacts with boxB mRNA57, 80. Genetic analyses84 and mutagenesis studies57 revealed that residues A3 and W18 interact strongly with boxB mRNA via nonpolar side chains. The stacking interaction of the W18 residue of N protein and the A7 nucleotide of boxB mRNA is a critical contact for this mRNA-protein interaction and for the activity of the antitermination apparatus57, 80, 81. Changes of residues Q14 and K15 do not affect binding of boxB mRNA85; however, Q14-K15 mutants of N protein show a significant decrease in antitermination78. It should be noted that the most detailed structural knowledge of the N-terminal part is derived from studies of N bound to boxB mRNA; however, the structural properties of this N-terminal segment in the absence of boxB mRNA have not been studied in detail. 1.2.3.2 N protein-NusA protein Residues 1-47 of the N protein are sufficient to reduce the termination enhancing effect of the E. coli transcription factor NusA83. Affinity chromatography studies using various deletion mutants of the N fused to glutathione S-transferase (GST) and in vitro 15 transcription assays showed that this particular region binds NusA protein in order to support antitermination83. Further studies using amino terminal and carboxy terminal deletion mutants revealed that residues 34-47 of the N are involved in binding NusA83. The carboxy-terminal domain of NusA contains two acidic repeats, AR1, spanning residues 345-426, and AR2, spanning residues 427-49586. A crystallographic study using this carboxy-terminal domain of NusA, i.e., residues 350-495, and residues 34-47 of N showed that the residues N34-L40 of N form an extended conformation, when bound to two copies of the AR1 domain of the NusA protein87. In this study, no electron density was observed for residues 41-47 of N or the AR2 domain of NusA. The amino acid sequences of the AR1 and AR2 domains are similar to one another, and residues N34-L40 and N41-R47 of N also display internal similarity. These observations led to the suggestion that residues N41-R47 of N protein may bind to the AR2 domain of NusA87. However, NMR spectroscopy studies using a fragment of N containing residues 1-53 have revealed that residues N41-R47 of N also interact with the AR1 domain of the NusA protein88. Recent in vivo and in vitro studies showed that the interaction between N and the AR1 domain of NusA protein is neither important for antitermination nor for bacteriophage growth; in contrast, N was shown to interact with the N-terminal domain of NusA89. In order to gain further insight about N-NusA interactions, it may be helpful to know the structural properties of residues 34-47 of N protein, in the presence and in the absence of NusA protein. 16 1.2.3.3 N protein-RNAP In vitro binding assays, gel mobility shift assays and in vitro transcription assays of GST-N peptides and RNAP suggest that residues 1-47 and 73-107 of the N may bind two distinct sites of RNAP83. Furthermore, in vitro transcription assays with various deletion mutants of N revealed that residues 89-107 can significantly affect antitermination in the absence of NusA protein83. However, the particular residues of N protein and RNAP that are responsible for binding are yet to be identified. 1.3 Aim of this study As discussed above, bacteriophage N protein is a highly disordered IDP, as shown by SAXS, CD and NMR 1H-15N HSQC studies57, 58, 59. IDPs or disordered regions of a globular protein can form stable folded structures upon binding with their partners via a disorder to order transition7. In the case of N, the N-terminal 22 residues bind to boxB mRNA and form a stable -helical structure, and residues 34-47 of this protein bind to NusA protein in an extended conformation. Previous NMR spectroscopy studies showed that the functional regions of some IDPs, even in their unbound states, tend to have transient secondary structures that adopt fully formed secondary structures after binding with their partner6. However, it is previously unknown whether N, a coil-like IDP, has any secondary structures when it is not bound to any of its partners. Although N-terminal peptides of N, bound with specific interacting partners, have been structurally characterized57, 80, 88, there has been little characterization of the full-length protein. Previous CD and SAXS studies on free N provided information on the average structural properties of the entire ensemble of conformations, but were unlikely 17 to have detected transient secondary structures. NMR measurements, such as chemical shifts and 15N relaxation rates, can provide residue specific information to detect transient secondary structures in IDPs, and thus can complement the previous SAXS and CD measurements. In this study, NMR chemical shifts of the full-length protein were analyzed to investigate the existence of transient secondary structures. NMR 15N relaxation rates of the (1H-15N) amide bonds were measured to further analyze residue-specific backbone dynamics. The details of the NMR studies on N are discussed in the following chapters. CHAPTER 2 MATERIALS AND METHODS 2.1 NMR sample preparation A mutant form of the bacteriophage N protein, C93S-N, was used for all the NMR analyses described in this study. Single labeled (15N) or double labeled (15N-13C) N was produced in E. Coli BL21 (DE3) bacteria transformed with the expression plasmid pET-N1, purified by ion exchange chromatography methods, and stored after lyophilization as described previously58. In order to prepare the NMR samples, lyophilized protein was dissolved in 1 ml of buffer containing 6M GuHCl, 50 mM Tris-Cl, 0.1 mM EDTA, 1mM benzamidine, and fresh 1mM DTT (pH 7.6). The dissolved protein was dialyzed in a Slide-A-Lyzer dialysis cassette (Thermo scientific), with a MW cut-off of 7,000, against either 50 mM phosphate buffer at pH 7 or 50 mM succinate buffer pH 5.5. The purity of the dialyzed N protein was checked by SDS gel electrophoresis and nondenaturing gel electrophoresis. UV absorption at 280 nm (molar extinction coefficient 280 = 14100 cm-1 M-1 83, 90) was used to determine the concentration of N. The pH of the NMR samples was adjusted by adding HCl or NaOH and determined by a glass electrode. Volumes of the NMR samples were either 600 l in a standard NMR tube or 300 l in a Shigemi tube, containing 1mM N protein, 50 mM 19 phosphate buffer (at pH 7) or 50 mM succinate buffer (at pH 5.5), 10% D2O, 0.05 mM NaN3, and 0.10 mM EDTA. 2.2 NMR spectroscopy Spectra for resonance assignments and 15N relaxation data of N protein were collected using a Varian INOVA 600 MHz spectrometer, equipped with a cryogenic, triple resonance (H13C15N) probe and a Z-axis pulsed field gradient. Transverse relaxation rates of N at pH 7 were collected at both 500 MHz (Varian INOVA spectrometer) and 600 MHz. Two-dimensional 1H-15N HSQC spectra had a spectral width in the 1H dimension of 9611.92 Hz and 1024 complex points and a spectral width in the 15N dimension of 1823.30 Hz and 182 complex points. NMR data were processed with FELIX 2007 (Felix NMR, Inc.) using in-house processing macros and with NMRPipe and NMRView91, 92. Resonance assignments were obtained from analyses of two-dimensional 1H-15N HSQC and three-dimensional HNCACB, CBCA(CO)NH and C(CO)NH-TOCSY spectra93 using the programs SPARKY (T.D. Goddard and D.G. Kneller, University of California, San Francisco) and AutoAssign94. The 1H chemical shifts were referenced to the 4,4-dimethyl-4-silapentane-1-sulphonic acid (DSS) methyl protons at 0 ppm, and 13C and 15N chemical shifts were referenced indirectly95. All 15N relaxation measurements (R1, R2 and steady state heteronuclear (15N-1H) NOE) were performed at 250C using the pulse sequences of Farrow et al96. At pH 7, longitudinal relaxation rates (R1) were determined at 600 MHz using relaxation delays of 50, 100, 200, 300, 400, 600, 800, 1000, 2000, and 3000 ms, with duplicate spectra collected for 50 and 200 ms; transverse relaxation rates (R2) were measured at both 600 20 MHz and 500 MHz using pulse delays of 10, 30, 50, 90, 150, 250, and 350 ms, with duplicate spectra measured at 30 and 90 ms. At pH 5.5, longitudinal relaxation rates (R1) were determined using the same delays used at pH 7; however, the longest delay in the transverse relaxation experiment at pH 5.5 was increased from 350 to 450 ms. A CPMG refocussing delay of 625 s was used in the R2 measurements. R1 and R2 relaxation rates were estimated by fitting the amide peak intensities to a single exponential decay function using the Curvefit program (laboratory of Arthur G Palmer III). Amides for which the estimated errors in the relaxation rates were greater than 10% were eliminated from future analyses. Steady state heteronuclear (15N-1H) NOE values were calculated as the ratio of peak intensities in the presence and absence of proton presaturation; the peak intensity errors were calculated from the root-mean-square base plane noise and propagated to estimate the errors in NOE values. For calculations of spectral density functions97, at 600 MHz, the dipolar relaxation constant, d and the chemical shift anisotropy constant, c of an amide bond were calculated to be respectively, -7.21 x 104 s-1 and -3.75 x 104 s-1; dipolar relaxation constant d = (0hγNγH/8π2 ) <1/rNH3>, where 0 is the permeability of free space; γH and γN are the gyromagnetic ratios of 1H and 15N nuclei, and <1/rNH3> -1/3 is the bond stretching average length of an amide bond98; chemical shift anisotropy constant c = Δ(N/√3), where Δ is the 15N chemical shift anisotropy; the values of rNH and Δ were taken as 1.02 Å and -170 ppm98, 99. 21 2.3 Helical wheel plot and coiled-coil predictions Helical wheel projections were drawn by using the online script created by Don Armstrong and Raphael Zidovetzki, University of California, Riverside (http://rzlab.ucr.edu/scripts/wheel/wheel.cgi). Coil formation probabilities of N protein sequence were estimated by several methods available online. For the COILS program100, the available scoring matrices derived from datasets of coiled-coil proteins, both MTK and MTIDK, yielded identical probability scores (http://embnet.vital-it.ch/software/COILS_form.html). Using the Paircoil2 program101, (http://groups.csail.mit.edu/cb/paircoil2/), the P-score cut off, a probability cut off determining sensitivity and specificity, was set for 0.99. Using the MultiCoil program102, coiled-coil scores were obtained by setting dimeric scorer at 3,4,5 and trimeric scorer at 2,3,4 in order to calculate pairwise interactions (http://groups.csail.mit.edu/cb/multicoil/cgi-bin/multicoil.cgi). In order to calculate coiled-coil propensities by the MARCOIL program, amino acid probability matrices, 9FAM emission probability matrix and MARCOIL-H transition probability matrix, and default parameter values for i, r and t were used (http://bcf.isb-sib.ch/webmarcoil/webmarcoilC1.html). To predict coiled-coil domains by MultiCoil2 program, the webserver used was: (http://groups.csail.mit.edu/cb/multicoil2/cgi-bin/multicoil2.cgi). CHAPTER 3 RESULTS 3.1 Resonance assignment Resonance assignment of uniformly and isotopically labelled backbone amide (1H-15N) nuclei and side-chain carbon (13C) nuclei is the first step towards the elucidation of structural and dynamic properties of a protein by NMR93. The N protein used for all the NMR studies described here contained a replacement of the single cysteine residue at position 93 with serine to prevent the formation of intermolecular disulfides or other modifications of the Cys residue. Initial NMR studies were carried out using a sample containing 1mMN protein in a 50 mM phosphate buffer at pH 7 and 25°C, the solution conditions used in our previous SAXS and SANS (Small Angle Neutron Scattering) studies on the same protein58. A two-dimensional (1H-15N) HSQC spectrum of the N protein under these conditions is shown in (Fig. 3.1). The limited chemical shift dispersion in the 1H dimension of the HSQC spectrum is consistent with previous CD, NMR, and SAXS studies indicating an extensively disordered protein57, 58, 59. As is often seen for unfolded proteins, more pronounced chemical shift dispersion was observed in the 15N dimension, which facilitated resonance assignments and further analysis. In the hope of identifying more favorable solution conditions for NMR studies, seven different HSQC spectra of the N protein were collected of samples with pH values 23 Figure 3.1: 1H-15N HSQC (600 MHz) spectrum of 1mM N in 50 mM phosphate buffer at pH 7 and 25°C showing 12 resonance assignments: Low dispersion in the 1H dimension indicates that  protein is a disordered protein. Examples of 4 residues with duplicate assignments, A20, L38, T58, and I107, are labelled. 24 covering the range of 3.0 to 7.0 at 25°C. Another series of HSQC spectra were measured at temperatures ranging from 5° C to 45° C, at pH 7.0. The largest number of resolved peaks was found in the spectrum recorded at pH 5.5 and at 25°C. Therefore, pH 7 and pH 5.5, both at 25°C, were the two chosen conditions for the resonance assignment of N. For samples at pH 5.5, 50 mM succinate buffer was used. To assign chemical shifts, the following triple resonance (1H-13C-15N) spectra were recorded at both of these conditions: HNCACB, CBCA(CO)NH, and C(CO)NH-TOCSY93. These experiments provided the chemical shifts of the side-chain carbon nuclei along with the backbone amide-15N and amide-1H. Resonance assignments were made for the backbone 1HN and 15N nuclei for a total of 91 of the 100 nonproline residues of N at pH 7. In addition, 13C and 13C chemical shifts of 99 nonglycine residues and 13Cchemical shifts of all five glycine residues were also obtained. Chemical shifts of 97 backbone 1HN and 15N nuclei, 105 backbone 13C nuclei, and 100 13C nuclei of N were assigned at pH 5.5. Assigned resonances of all backbone amide (1H-15N) nuclei and side-chain carbon (13C) nuclei of N at pH 7 and pH 5.5 are listed in appendix A and B. 3.2 Proline isomerization Proline isomerization, i.e., inter-conversion between cis and trans prolyl-peptide conformations, is often associated with the highly flexible nature of IDPs103. In order to check if proline isomerization occurs in N protein, an IDP that contains seven proline residues, nuclei assigned to two peaks in the HSQC spectra were investigated. At pH 7, twelve residues of N each gave rise to two peaks in the HSQC spectrum. The HSQC peaks for four of these residues are labeled in Fig. 3.1. At pH 5.5, three of these residues 25 were found to generate 2 peaks each. The relative intensities of the two peaks originating from the same residue were in most cases close to 10:1 (Table 3.1). Each of these residues is in close proximity to a Pro residue, at position 23, 36, 54, or 105, suggesting that these two peaks arise from the cis and trans isomers of peptide bonds preceding the proline residues. To test this hypothesis, coupling to the amide peaks of the sequentially next residues were utilized to obtain the 13Cand 13C chemical shifts from the C(CO)NH-TOCSY spectrum for the proline residues. The alternative chemical shifts of the amide nuclei of L24 and L106 at pH 7 and L106 at pH 5.5 were each linked to 13Cand 13C nuclei of the preceding proline residues. These 13Cand 13Cchemical shifts and their differences Δ = ([13C]-[13C]) are listed in Table 3.2. Previous studies have shown that the prolyl-peptide bonds in trans or cis conformation result in Δ values in the range of 0 to 4.8 ppm or 9.15 to 14.4 ppm, respectively104, 105. The Δvalues (Table 3.2) confirmed the presence of significant populations of the cis isomer of the P23 and P105 residues at pH 7. At pH 5.5, although one of the 13C chemical shifts was missing, different 13C chemical shifts in the TOCSY spectrum confirmed cis-trans isomerization of the P105 residue. However, this quantitative chemical shift analysis could not validate cis-trans isomerization of the P36 and P54 residues at pH 7, owing to poor signal to noise ratio for the P36 residue side-chain carbons in the C(CO)NH-TOCSY spectrum, corresponding to the weaker amide peak of the I37 residue, and due to no available alternative assignment for the I55 residue in the HSQC spectrum. The significant population of both cis and trans prolyl-peptide conformations is consistent with the highly disordered nature of N. 26 Table 3.1: Amino acid residues displaying alternate resonance frequency pH Residues assigned twice S/N ratio of the two assignments in HSQC Nearby Proline residue 7.0 A20, A21, N22, L24 10.56:1, 8.25:1, 5.58:1d, 10.03:1d P23 I37, L38 2.23:1d, 11.68:1 P36 T58, V59 12.67:1, 13.03:1 P54 I104, L106, I107 4.93:1, 11.04:1d, 8.71:1 P105 5.5 I104, L106, I107 6.85:1, 8.44:1, 8.56:1 P105 d One of the assignment is overlapped with some other assignment 27 Table 3.2: Evidence of cis-trans isomerization of P23 and P105 at pH 7.0 and P105 at pH 5.5 pH Proline Residues (Cof two different isomers (ppm) (Cof two different isomers (ppm) Δ=  (C)-(C) (Proline conformation) 7.0 P23 32.16 27.39 4.77 (trans) 34.46 25.02 9.44 (cis) P105 32.18 27.52 4.66 (trans) 34.50 24.94 9.56 (cis) 5.5 P105 32.15 27.24 4.91 (trans) 34.50 N.A. (cis) 28 3.3 Secondary chemical shifts The chemical shifts of NMR active nuclei in a protein are particularly sensitive to secondary structure. To assess the propensity of N to take on regular secondary structures, secondary chemical shifts were calculated by subtracting the observed 1H, 13C,and 15N chemical shifts from the corresponding residue-specific chemical shifts measured in disordered glutamine based penta-peptides106. This set of random-coil reference chemical shifts were chosen for several reasons that are listed here: a. database derived random-coil chemical shifts tend to average out different effects of various solvent conditions106; b. glutamine is more representative of the other amino acid residues, with respect to its backbone dihedral angles, than is glycine, the other residue commonly used in the model-peptide based libraries106; c. the random coil chemical shifts in this library were determined at pH 6.5, which is close to the solvent conditions of this study; and d. this library also provided sequence correction factors to compensate for effects on the chemical shifts due to neighboring residues. The 13C chemical shifts are known to be particularly sensitive to backbone dihedral angles. The secondary chemical shifts for these 13Cnuclei of N are plotted against the residue numbers in Fig. 3.2. Consistently positive or negative 13C secondary chemical shifts for a segment of residues indicate formation of -helical or -strand secondary structures, respectively. Continuous stretches of positive secondary chemical shifts for the 13C nuclei of amino acid residues 3-19 and 56-80 indicate significant population of -helical conformation at both pH 5.5 and pH 7. To further analyze the secondary structure content in N, the secondary structure propensity (SSP) parameter developed by Marsh et al.107 was applied. This parameter 29 Figure 3.2: 13Csecondary chemical shifts of  at pH 5.5 and 7: positive 13Csecondary chemical shifts, as calculated from the glutamine based random-coil chemical shifts library 106, indicate evidence of -helical secondary structure at residues 3-19 and 56-80 30 represents a normalized and weighted average of multiple secondary chemical shifts of a peptide over a default window of five residues and provides a single SSP score for each residue107. A positive SSP score indicates helical propensity of a residue, whereas a negative SSP score suggests propensity to be in an extended or a -strand conformation. Results of the SSP calculation for N based on secondary chemical shifts of 13C13C1HN, and 15N nuclei are shown in Fig. 3.3. A SSP score of +1 or -1 for a particular residue indicates that the residue resides in a fully formed -helix or -strand conformation, respectively. At the N-terminus of N protein at pH 7, a stretch of positive SSP scores with a maximum of ~0.2 indicate a likely presence of a transient helical structure on residues 2-7, consistent with the 13C secondary chemical shift analysis shown above. Similarly, the residues 55-75 significantly reflect a likely presence of another transient (~30%) helical structure. It has been previously shown from studies of N-terminal peptides (N1-22 and N1-36) that the first 19 residues of N adopt a bent -helical structure upon binding with the boxB mRNA57, 80. Here, the secondary chemical shift analyses demonstrate that the same N-terminal region of the full length N protein has measureable helical propensity even in the absence of the mRNA (Fig. 3.4). It has also been shown previously that the residues 34-47 interact with Nus A protein in an extended conformation87, 88. The SSP score analysis shows that this region has significant propensity to form extended structure even in the absence of its interaction partner. Furthermore, positive SSP scores for the residues 23-26 indicate transient -helical population, and negative SSP scores for the residues 95-107 indicate transient extended conformation for these C-terminal residues. An NMR study previously indicated presence of a turn at the residues 23-26, in presence of boxB 31 Figure 3.3: Secondary structure propensity (SSP) scores of  at pH 5.5 and 7: SSP scores were calculated from the secondary chemical shifts of 13C13C1HN, and 15N nuclei over a window size of 5 residues. Positive SSP scores of residues 2-7 and residues 55-75 show evidence of transient -helical structures; negative SSP scores of residues 34-47 and 95-107 show -strand propensities. 32 Figure 3.4: Model representation of the SSP scores at the N-terminal region (pH 7) mapped onto the NMR structure of N2-36 peptide when bound to boxB mRNA: The N-terminal residues of N have propensity to form secondary structure even in the absence of boxB mRNA. Figure is derived by the coordinates from the NMR structure of N2-36 peptide when bound to boxB mRNA (1QFQ.pdb57) and image construction is done in pymol. Color scheme: residues with SSP scores from + 0.15 to + 0.25, + 0.15 to + 0.10, + 0.10 to + 0.03 and less than + 0.03 are respectively marked in red, salmon, orange and blue. The residues for which SSP score is not obtained are shown in green. 33 mRNA57. To the best of our knowledge, formation of -helical secondary structure over the residues 55-80 and extended structure over the residues 95-107 of N protein has not been previously detected. The other interesting observation from these 13C secondary chemical shifts and SSP analyses is that the presence of these -helical contents decreases with a lowering in pH from 7.0 to 5.5. The only expected significant change in ionization associated with this pH change is the protonation of the side-chain of His-64, the only histidine residue of the N. It has been shown before that if placed in the interior of an -helix, a positively charged His side-chain can decrease helix stability, presumably by interacting with the helix dipoles108. In the case of N, a positively charged H64 side chain inside the transient -helix formed over the residues 55-75 possibly lowers the helical content at pH 5.5. The decrease in the helical content in the N-terminal segment is not readily explained, but may arise from transient tertiary interactions. 3.4 The helix formed by the residues 55-75 is amphipathic The secondary chemical shift analyses of the N indicated the presence of a transient -helical structure over the residues 55-75, a region that has not previously been associated with any secondary structure or function. We speculated that residues 55-75 may be involved in protein-protein interactions, as is often observed for the transient secondary structures of IDPs7. Alpha helices with amphipathic nature play crucial roles in protein-protein recognition and are often observed for binding IDPs with their partners109. To examine whether this newly-found alpha helical region is amphipathic, a helical wheel projection of the residues 55-78 of N protein is drawn in Fig. 3.5. The 34 Figure 3.5: Amphipathicity of the transient -helix formed on residues 55-78 of N: Hydrophobic residues are represented by shaded diamonds, hydrophilic residues as circles, negatively charged residues as triangles and positively charged residues as pentagons. Hydrophobic moment, a vector calculated from the amino acid sequence, is indicated at the center of the helical wheel. 35 helical wheel projection indicates that the transient helix formed by these residues of N would be strongly amphipathic. Amphipathic alpha helices can oligomerize driven by the favored interactions of the hydrophobic surfaces to form intramolecular or intermolecular parallel or antiparallel coiled-coils that are dimers, trimers, tetramers or pentamers100. These coiled coil regions are marked by heptad repeats of the amino acid sequences (a-b-c-d-e-f-g) where positions a and d are usually occupied by hydrophobic residues. To explore the possibility that residues 57-78 of N might form a coiled-coil structure, the sequence was tested using several prediction programs: COILS100, Paircoil2101,110, MultiCoil102, MARCOIL111, and MultiCoil2112. No clear conclusions could be drawn from these analyses, however, since the first three of these programs predicted significant propensities to form coiled-coils, while the other two did not. 3.5 15N relaxation measurements In order to probe the backbone dynamics in N over the picosecond to millisecond regime, the 15N NMR longitudinal relaxation rates (R1), transverse relaxation rates (R2) and heteronuclear (1H-15N) nuclear Overhauser effects (NHNOE) were measured for the amide 15N nuclei. At pH 7, R1 relaxation rates of 86 residues and R2 relaxation rates and NHNOE of 90 residues were quantified. These relaxation data plotted versus residue numbers are shown in Fig. 3.6. The residues corresponding to the missing relaxation values in the figure were either unassigned, prolines, or displayed poor fits to a single exponential function. The R1 relaxation rates (Fig. 3.6a) were relatively uniform across the whole sequence of N at pH 7. Slightly lower R1 values for the residues near the 36 Figure 3.6: Relaxation parameters of 1mM N (600 MHz) in 50 mM phosphate buffer at pH 7 and 25°C: Large mobility of the amides of the C- terminal, N- terminal and for the residues 27-33 of the N protein is evident from relaxation rates R1 and NHNOE (A and C); R2 relaxation rates (B) indicate probable conformational exchange of the residues 55-85. 37 termini and the residues 24-35 indicate more extensive internal mobility of these amides on timescales shorter than that of molecular tumbling93. The relatively low R2 values and large negative NOE values (Fig. 3.6b, c) observed for amides in the terminal regions are also consistent with greater flexibility on the first timescale. The R2 relaxation rates of the residues 55-85 were significantly larger than those in the other regions of N (Fig. 3.6b). These high transverse relaxation (R2) rates most likely reflect the loss of phase coherence of nuclei in different molecules randomly changing conformations on the s-ms timescale113. The amide nitrogens of a few residues residing outside this segment, namely R8, N22, and K102, also showed elevated R2 values (Fig. 3.6b). Further evidence of conformational exchange was obtained by measuring R2 at a second magnetic field. Conformational exchange places nuclei in distinct chemical environments, causing a change in precession frequency and loss of phase coherence among nuclei in different molecules. The resulting increase in transverse relaxation rates depends on several factors, including the magnitude of the change in precession frequency, which is proportional to the static magnetic field (B0). Thus, an increase in the static magnetic field is predicted to lead to larger R2 relaxation rates93, 114. More specifically, the exchange contribution to R2 is predicted to be proportional to the square of the static field strength. The R2 relaxation rates of N at pH 7 measured at 500 and 600 MHz are plotted in Fig. 3.7. The R2 rates of the residues 56-75 increased as the strength of the static magnetic field was increased. The average R2 rates for residues 56-75 of N were found to be 6.45 and 7.92 s-1 at 500 and 600 MHz respectively, as compared to average values 4.62 and 4.28 s-1 at 500 and 600 MHz magnetic fields, respectively, for the residues lying outside the segment. The latter values were assumed to represent also 38 Figure 3.7: Transverse relaxation rates (R2) of 1mM N at 500 and 600 MHz (50 mM phosphate buffer, pH 7 and 25°C): Positive difference between R2 relaxation rates collected at 600 MHz and at 500 MHz presents strong evidence of conformational exchange for the residues 56-75 39 the nonexchange contributions to the R2 rates of residues 56-75. The average exchange contributions were thus estimated to be approximately 1.8 s-1 at 500 MHz and 3.6 s-1 at 600 MHz. The 2-fold increase in the excess transverse relaxation rate with magnetic field is somewhat greater than predicted (a 1.44-fold increase)93, but is qualitatively consistent with conformational exchange. Formation and breakdown of the transient -helix, formed on residues 55-75, as indicated by secondary chemical shift analyses, can be a likely model for the conformational exchange giving rise to large R2 values. 3.6 Mapping of spectral density functions Quantitative interpretation of NMR relaxation data is based on spectral density functions (SDF), which describe the relative probabilities of motions with different frequencies93, 115. For the case of simple Brownian rotational motion with a single time constant, the SDF, J(ω) has the form of a Lorentzian function: J(ω)=2τm5 (1+ω2τm2 ) [3.1] where, ω is angular frequency, and τm is the correlation time of rotational tumbling, which can be defined approximately as the average time required for a rotation of 1 radian. The factors that contribute to relaxation processes, such as dipole-dipole relaxation and chemical shift anisotropy, depend on the probabilities of quantum transitions due to molecular motions, which in turn are determined by the frequencies and intensities of magnetic fluctuations93. If the motions of all the amide bonds are only influenced by the overall molecular tumbling time τm, i.e., without any internal motion, 40 J(ω) of an amide (15N-1H) bond can be represented by the above equation, and 15N relaxation rates, R1, R2 and NHNOE, due to dipolar coupling and chemical shift anisotropy, can be calculated from the following equations: R1= d210 [ J(ωH−ωN)+3J(ωN)+ 6J(ωH+ωN)]+c2J(ωN) [3.2] R2= d220 [4J(0)+J(ωH−ωN)+3J(ωN)+ 6J(ωH)+ 6J(ωH+ωN)] +c26[3J(ωN)+4J(0)] [3.3] NHNOE = 1+d210γHγN [ 6J(ωH+ωN)− J(ωH−ωN)]/R1 [3.4] where γN and γH are the gyromagnetic ratios of 15N or 1H nuclei; c and d are the dipolar and chemical shift anisotropy constants of an amide bond, and ωH and ωN are the larmor frequencies of the 1H and 15N nuclei. However, internal motions faster than overall molecular tumbling often play significant roles in amide bond dynamics, and thus contribute to 15N relaxation. In order to consider contributions of both overall tumbling and internal motions for the 15N relaxation rates of a globular protein, the SDF is often expressed as a sum of two or more Lorentzian functions. One such SDF, introduced by Lipari and Szabo116, is: J(ω)=25S2τm (1+ω2τm2 ) +25 (1−S2)τ (1+ω2τ2 ) [3.5] 41 In the above expression, S2 is an order parameter which reflects spatial restriction of the internal motions on a scale of zero to one, and τ = 1/ (τe-1 + τm-1), where τe and τm are respectively the correlation times for internal motions of an amide bond and overall molecular tumbling116. The usual practice for globular proteins is to fit experimental relaxation data to the equations (3.2, 3.3 and 3.4) above to estimate an overall tumbling time τm for the molecule, and the order parameter S2 and internal correlation time τe for individual amide bonds117. However, fitting to Lipari-Szabo SDF is problematic when there are exchange contributions (Rex) to R2, and generally for disordered proteins, it may not be appropriate to invoke a single overall tumbling time. An alternative is to calculate J(ω) for selected frequencies from the experimental data, and then compare patterns with those predicted by simple models. The angular frequencies (ω) at which the spectral density functions J(ω) can be evaluated depend on the measured relaxation parameters. In the reduced approach of spectral density mapping97, standard relaxation rates R1, R2 and heteronuclear NOE allow calculating spectral density functions J(ω) for each residue at three different values of ω: 0.87 times the 1H larmor frequency (J0.87H), the 15N larmor frequency (JN), and zero frequency (J0): J0.87H=R1(NOE−1)γNγH 45d2 [3.6] JN= (R1−J0.87H (7d24))/( 3d24+ c2) [3.7] 42 J0= (R2−JN (3d28+c22)−J0.87H(13d28))/ (d22+2c23) [3.8] The above equation (3.8) for J0 assumes no conformational exchange contribution to R2 relaxation rates. If exchange does contribute to R2, the calculated J0 from the relaxation data will be larger than the value describing tumbling and faster internal motions. Spectral density functions J0, JN and J0.87H of N at pH 7 are plotted in Fig. 3.8. The terminal regions and the residues 24-35 of N show relatively lower JN values and higher J0.87H values, indicating faster internal motions of these regions (Fig. 3.8a and 3.8b)118, relative to the rest of the molecule. Larger J0 values of the Residues 56-80 (Fig. 3.8c) can be attributed to motions slower than overall tumbling, indicating conformational exchange113, as previously suggested from direct examination of the R2 relaxation rates and their dependence on the stationary field strength. Comparisons of calculated values of J0.87H and JN, derived from relaxation data (equations 3.6 and 3.7), with those of simple models is facilitated by a plot of J0.87H versus JN, introduced by Andrec et al.119, who described this plot as a Lipari-Szabo map. In this map, J0.87H and JN are plotted against each other, and compared to J0.87H and JN values predicted from a single Lorentzian function (equation 3.1 or S2 = 1 in equation 3.5). The continuous triangular shaped curve (Fig. 3.9a) representing the single Lorentzian function, referred to as the rigid tumbling curve, was generated using different global tumbling correlation times (τm), ranging from 1 ps to 20 ns. For rigid tumbling, i.e., if the motions of an amide bond depend entirely on the overall molecular tumbling, the experimental J0.87H and JN values should lie on this curve, corresponding to molecular 43 Figure 3.8: Spectral density mapping (J0.87H, JN, and J0) of N (relaxation parameters measured at 600 MHz, in 50 mM phosphate buffer, at pH 7 and 25°C): High J0.87H and low JN (A, B) indicate more flexible terminal amides; Large J0 values (C) indicate conformational exchange for the residues R8, N22, K102 and residues 56-80 44 Figure 3.9: Illustration of Lipari-Szabo (LS) mapping and backbone dynamics analysis of a folded protein by LS mapping, taken from Hanson et al. 113 (A): Triangular shaped curve is the rigid tumbling curve, marked by 0.02, 0.05, 0.1, 0.2. 0.5, 1, 5 and 10 ns timepoints, predicting values of the spectral density functions J0.87H and JN if the amide bond motions are only influenced by overall molecular tumbling113, 119. S2 values other than 0 and 1 result in shifting the experimental J0.87H-JN point inside the rigid tumbling curve (B): LS map of folded globular protein BPTI indicates that the amide bond motions are mostly influenced by overall molecular tumbling113 45 tumbling time (τm). If there are completely unconstrained motions that are faster than overall tumbling, i.e., S2 = 0 in Lipari-Szabo SDF (equation 3.5), the effective tumbling time of the amide bonds are influenced by both globular tumbling and internal motions, and J0.87H and JN values lie on the curve at position corresponding to τ = 1/ (τe-1 + τm-1). If there is restricted internal motion, i.e., 0 < S2 < 1, experimentally derived J0.87H-JN points lie interior of the rigid tumbling curve. The J0.87H-JN points inside the curve lie on the line between τm and τe, and the position of J0.87H-JN points on this line is proportional to S2, as depicted by S2 = 0.25, 0.50 and 0.75 in Fig. 3.9a. For the folded globular proteins, motions of their amides are mostly affected by the globular tumbling of the molecule, and thus the experimental J0.87H and JN values are generally expected to lie close the rigid tumbling curve, if internal motions are restricted (S2 = 1). A LS map for a globular folded protein bovine pancreatic trypsin inhibitor (BPTI) in Fig. 3.9b indicates that the motions of the amide bonds of this protein mostly follow a single global tumbling time τm. LS maps of N at pH 7 and pH 5.5 are respectively shown in Fig. 3.10, 3.11. All the J0.87H-JN points at two different solution conditions lie near the center of the triangular curve, indicating backbone motions of the amide bonds are influenced by at least two different timescales. It appears, although qualitatively, that the order parameter S2 values for a majority of the amides in N at both pH may be close to 0.5, suggesting significant restriction of the faster internal motions. The residues 55-85, which are prone to conformational exchange, and the residues at the termini, with faster amide bond motions, lie on the opposite extremes of the amide bond clusters, suggesting more restricted internal motions for the residues prone to conformational exchange. Moreover, the difference in the patterns of 46 Figure 3.10: Lipari-Szabo (LS) mapping analysis of N at pH 7 (relaxation parameters measured at 600 MHz, in 50 mM phosphate buffer and at 25°C): Backbone dynamics of the N Protein at pH 7 is influenced by both molecular tumbling and internal motions 47 Figure 3.11: Lipari-Szabo (LS) mapping analysis of N at pH 5.5 (relaxation parameters measured at 600 MHz, in 50 mM succinate buffer and at 25°C): Backbone dynamics of the N Protein at pH 5.5 is influenced by both molecular tumbling and internal motions. More heterogeneous amide bond motions are observed at pH 5.5, compared to pH 7 48 calculated J0.87H-JN in the LS maps at pH 7 and 5.5 is notable. Greater dispersion of the J0.87H-JN points of N at pH 5.5, compared to pH 7.0, indicate increased heterogeneity in the internal motions of the amide bonds at pH 5.5. CHAPTER 4 DISCUSSION To characterize the N protein, an IDP, structurally and dynamically, NMR secondary chemical shift analyses and 15N relaxation methods were used. In agreement with previous structural studies, this study supports the disordered nature of N. Secondary chemical shift analyses of N indicated the presence of transient secondary structure in two segments in which such structure has not previously been detected. NMR 15N relaxation studies indicated the presence of both fast, i.e., in ps-ns timescale, and slow, i.e., on s-ms timescale, internal motions of the amide bonds of N. Based on the results of this study, the disordered nature, the secondary structures and the dynamic properties of N are briefly discussed in the following subsections. In the last subsection, we conclude this chapter with ideas for future studies of this system. 4.1 Evidence of disorder The disordered nature of N was suggested by narrow dispersion in the 1H dimension of 1H-15N HSQC spectrum (Fig. 3.1), consistent with previous NMR studies59, 83. Despite being extensively disordered, more than 90% of residues of N were resonance assigned in this study, owing to large chemical shift dispersions of the amide- 50 15N nuclei and the use of (1H-13C-15N) triple resonance experiments. Alternative sets of chemical shifts for the nuclei neighboring proline residues (Table 3.1) indicated that proline isomerization can contribute to the conformational heterogeneity of N. Chemical shift measurements of the C and C nuclei of P23 and P105 residues at pH 7 (Table 3.2) confirmed the presence of significant populations of both the cis and trans conformations, a phenomenon often manifested in the flexible IDPs103. The presence of transient secondary structures (Fig. 3.3) over a significant number of residues indicated partial order, whereas the other residues appear to be fully disordered120. Steady state heteronuclear 1H-15N NOE (NHNOE) values were also suggestive of the disordered nature of N. For a folded globular protein, the NHNOE values of internally rigid amide bonds are close to ~0.8121, 122; whereas, for N, the NHNOE values (Fig. 3.6c) were between +0.3 to -0.2, indicating a high flexibility of the amide bonds. The LS maps of N (Fig. 3.10, 3.11) illustrated the difference in the amide bond dynamics from BPTI (Fig. 3.9b), a folded globular protein, indicating the presence of at least two timescales. Similar LS maps have previously been observed for the unfolded  repressor protein118. In sum, the NMR studies, which were originally aimed to identify local structural properties of full length N, again confirmed the global disordered nature, as was previously suggested by CD and SAXS measurements58, 59. 4.2 Secondary structures Previously, there was no reported evidence of secondary structure formation in the unbound N. Smaller secondary chemical shifts (Fig. 3.2), when compared to fully formed or stable secondary structures, and fractional SSP scores107, both suggested a 51 formation of transient secondary structures in N. Among the transient secondary structures formed, the segments spanning residues 2-7, 23-26 and 34-47 overlap with known functional domains of N80, 83, whereas the transient secondary structures spanning residues 55-75 and 95-107 of N have not been previously mentioned in the literature, so far as we are aware. Previous NMR studies of N fragments showed that the N-terminal residues, 1-22, form an -helical structure when bound to nascent boxB mRNA57, 80. Circular dichroism (CD) studies suggested that the N-terminal residues of full length N or N1-36 peptide are disordered in the absence of boxB mRNA57. In contrast, 13C secondary chemical shifts of our study indicated that the N-terminal residues 2-7 of unbound and full-length N form transient -helical secondary structures on its own (Fig. 3.2), which was further supported by the SSP scores (Fig. 3.3). Previously, structure calculations based on the NOE constraints and molecular dynamics simulations identified the presence of a turn or a short helix on residues 23-26 of N1-36 peptide bound to boxB mRNA57. Secondary chemical shift analyses of our study indicate the presence of a transient helical structure for these residues even in the absence of mRNA. Similarly, previous crystallographic87 and NMR studies88 showed that the residues 34-47 of N binds to NusA protein in an extended conformation; SSP scores in this study showed that the same residues have propensities to form transient extended -strand structures, even in the absence of NusA (Fig. 3.3). These observations suggest that the residues 2-7, 23-26 and 34-47 of N form transient secondary structures in isolation and that these structures are stabilized upon binding their partners. A disordered region of a protein often folds into a defined secondary structure upon binding with its partner7. However, the mechanism behind this 52 coupled-folding or disorder to order transition is poorly understood1. In order to describe the coupled-folding phenomenon, two limiting models, binding-induced folding and conformational selection, have been hypothesized1. In the binding-induced folding model, upon interacting with its partner, the disordered region of IDP adopts the necessary fold. In the conformational selection model, binding-competent structures from the dynamically inter-convertible ensemble of IDPs are selected to form the complex. Upon formation, the selection process shifts the equilibrium in the IDP ensemble towards the functional conformation. The secondary chemical shifts of N of the present study and the results of the previous studies of N peptides, when bound to boxB mRNA57, 59, 80, 88 and NusA87, 88, suggest that the conformational selection model is appropriate for the N-boxB mRNA and N-NusA interactions. Residues 27-52 of N show symmetric SSP scores around the residue L40, which indicates similar structural environment of these residues in the unbound N. Based on the crystallographic study87, it had been previously suggested that N34-L40 and N41-R47 may bind to two different domains of NusA protein, AR1 and AR2. However, chemical shift changes in an NMR study88 showed that only AR1 domain of NusA is affected after binding the residues 34-47 of N. Recently, it has been shown that the interaction of these residues with the AR1 domain of NusA is not important for N mediated antitermination in vitro and in vivo89 and that residues 34-47 of N probably interact with the N-terminal domain of NusA protein, which also binds to RNAP71. Previously, in vitro binding assays and antitermination assays indicated that residues 39-47 of N might also interact with RNAP83. How the residues 34-47 of N, which form 53 transient extended structures in the unbound form, can interact with both NusA and RNAP in the transcription antitermination apparatus is not clear. Our study further showed that two more regions of N, residues 55-75 and 95-107, whose functional activity in the antitermination apparatus has yet to be determined, also form transient secondary structures. The secondary chemical shifts of the 13C nuclei first identified a possible -helical structure over the residues 55-80 of the N (Fig. 3.2). The incorporation of additional chemical shifts, in the SSP parameter of Marsh et al. further showed that the residues 55-75 of N have a significant tendency to form transient -helical secondary structures (Fig. 3.3). The secondary structure propensity (SSP) analysis also indicates that residues 95-107 may form a transient extended structure (Fig. 3.3). As the formation of transient secondary structures on an IDP do not necessarily imply functional roles123, 124, it is possible that one or both of the newly identified segments with structure-forming propensities may not, in fact, be critical for the antitermination activity of N. However, residues 89-107 of N are known to interact with NusA to form an efficient in vitro antitermination system83. Residues 73-107 of N may have an RNAP binding site, as was previously suggested by in vitro assays83. Further experimental evidence is required to understand the functional significance of residues 95-107 in the antitermination activity of N. The transient -helix formed by residues 55-75 is amphipathic (Fig. 3.5). Amphipathic transient -helices in IDPs are known to play important functional roles. The myelin basic protein, for instance, contains three such amphipathic transient alpha-helical regions which have distinct binding partners125. In the case of the phage  antitermination apparatus, in vitro transcription and binding assays in E. coli indicated 54 that residues 73-107 of N might contain an RNAP binding site83. However, the limited number of C-terminal N peptides used in that study precludes precise identification of the RNAP binding sites. Mutational analysis of residues in the segment 55-75 may provide a means of further testing the possible functional role of the transient helical structure described here. Coiled-coil structures derived from intertwined amphipathic helices often play an important structural and functional role in biology126. N has not been observed to dimerize or form higher specific oligomers in this study or in previous SAXS and SANS measurements58. The MultCoil prediction algorithm102 did predict a significant probability (~0.2) for residues 57-78 to form a three-stranded coiled-coil100. However, consistent predictions were not obtained using other prediction programs100, 101, 102, 110, 111, 112. To test whether the residues 57-78 form intertwined helical bundles, crucial hydrophobic residues, namely at the positions a and d of the predicted heptad repeats, can possibly be mutated to hydrophilic residues, and the secondary structure propensities can be compared with the wild type of N. The coiled-coil prediction programs were also applied to the N proteins of other lambdoid phages21, 80, and P22, and the Nun protein of phage HK022127. The N proteins of 21 and P22 each contain an ARM sequence that binds to specific N-utilization (nut) sites to prevent transcription termination82, whereas the N protein of 80 lacks an ARM sequence127. The nun protein of phage HK022 binds with the boxB mRNA encoded by phage  and promotes termination of transcription in E. coli128. Among all the above mentioned lambdoid phages, significant coiled-coil prediction scores were obtained only for the N-terminal 30 residues of phage 80. Although the sequence of the 55 N protein of phage 80 shows the smallest amount of resemblance to the other lambdoid N proteins127, it can be speculated that a transient amphipathic -helix, similar to the one formed on residues 55-75 of N, may also form on the N-terminal 30 residues of phage 80. Secondary chemical shift analyses (Fig. 3.2, 3.3) indicated that the transient secondary helical structure formed over the residues 55-75 of N is destabilized due to the lowering in pH from 7.0 to 5.5. Protonation of the H64 residue at pH 5.5 inside the transient -helix formed by the residues 55-75 may decrease the helix stability108. With the change of pH from 7.0 to 5.5, significant change of the amide-1H chemical shift of the S94 residue was also observed; however, the reason behind this change is not clear. When compared to pH 7, the helical propensities of transient -helix forming residues at the N-terminal also decrease at pH 5.5, suggesting the possibility of a long-range interaction between the H64 residue and the N-terminal transient -helix. The measurement of residual dipolar couplings (RDCs) or paramagnetic relaxation enhancements (PREs) can probably be used to test this speculation129. 4.3 Dynamic properties The difference between the transverse relaxation rates (R2) at 500 and 600 MHz (Fig. 3.7) indicate that the residues 56-75 of N undergo conformational exchange on the s-ms timescale. Nearly the same segment displayed secondary chemical shifts indicative of a transient -helix (Fig. 3.3). Typically, the time required for coil to helix transition is also in the ~s regime130, suggesting that the motions detected by NMR relaxation may correspond to the formation and breakdown of a transient helix. Larger secondary 56 structure propensity (SSP) scores of residues 59-65 of N, when compared to the neighboring residues, indicates that the conformational exchange of residues 55-75 consists of more than two states, a characteristics typically observed for the helix-coil transitions131. A few other residues of N, R8, N22, and K102, also show evidence of conformational exchange at pH 7 in transverse relaxation (R2) experiments (Fig. 3.6b, 3.7); however, the significance of the slow conformational exchange of these residues is not clear. Fast internal motions in the ps-ns timescale are also detected in N from the relaxation parameters and the spectral density mapping (Fig. 3.6, 3.8), indicating large flexibility of the amide bonds of the terminal regions and the residues 27-33. In the LS map of a globular protein, such as BPTI (Fig. 3.9b), most of the spectral density function (SDF) values of J0.87H and JN converge on the rigid tumbling curve, near the overall correlation time (τm), as the amide bond motions are mostly restricted, i.e., S2 ~ 0.8-1 [equation 3.1, or S2 ~ 1 in equation 3.5]. In the LS maps of N (Fig. 3.10, 3.11), the SDF values of J0.87H and JN lie inside the interior of the rigid tumbling curve, which suggests that the amide bonds of N are neither rigid (S2 = 1) nor rotating freely (S2 = 0). As the SDFs J0.87H and JN cannot be described by a single Lorentzian function [equation 3.1], at least two different timescales are required to explain the amide bond dynamics of N. Similar LS maps, where the SDF J0.87H and JN values lie on the interior of the rigid tumbling curve, are also obtained for the denatured state of monomeric  repressor protein at pH 6118, indicating a similar pattern of the amide bond dynamics of the disordered states. The SDFs J0.87H and JN of N are more dispersed at pH 5.5, compared to pH 7 (Fig. 3.10, 3.11), suggesting that more heterogeneous internal motions affect the amide bond dynamics of N at the lower pH. The secondary chemical shift analyses (Fig. 57 3.2, 3.3) also indicate that lowering in pH from 7.0 to 5.5 leads to decreased propensity to form transient -helical structures. The loss of ordered secondary structures of N probably may result in more diverse internal motions of the amide bonds at pH 5.5. To understand the backbone motions, in the case of globular proteins, multiple timescales are often incorporated in the formulation of SDFs116, 132, such as two different timescales representing correlation times of overall tumbling (τm) and internal motions (τe) in the Lipari-Szabo SDF (equation 3.5). However, the assumption of a single global correlation time (τm) in the SDFs may not be appropriate for a disordered protein. Distributions of correlation times have been incorporated in to SDFs to explain the backbone dynamics of other disordered proteins133, 134; however, slow conformational exchange was either not a concern or excluded from the R2 relaxation data in these cases. 4.4 Conclusion One of the principal findings of this NMR study on intrinsically disordered and unbound N is the discovery of two new transient secondary structures, one spanning residues 55-75, which form an amphipathic transient -helix, and the other spanning residues 95-107, which form a transient extended structure. The residues 55-75 also exchange their conformations in the s-ms timescale, indicating a helix-coil transition. The functional significance of these two segments of N in the antitermination apparatus is currently unknown, but should be studied further. In addition, the chemical shifts and the 15N relaxation rates of this study and the results of the previous SAXS study58 may be combined with computational approaches135, 136 to describe the structural ensemble of intrinsically disordered N. The other major finding of this study is that residues 2-7, 58 which form a stable -helical structure when bound to boxB mRNA, and the residues 34-47, which form stable extended structure upon binding NusA protein, form transient -helical and extended structures respectively, even in the absence of their binding partners. This finding suggests that there are preexisting populations of binding-competent conformers that may shift their equilibrium conformations towards the final, bound structures, supporting the conformational selection mechanism of IDP binding. On the other hand, much evidence already suggests that folding of IDPs occur only upon binding their partners, supporting the binding-induced folding mechanism137. Recently, phi-value analysis, which can probe the presence of bound-like structures in the transition states, have been incorporated to differentiate the contributions of the above two competing mechanisms of IDP binding138, 139. In future, N may be subjected to phi-value analysis, or similar experiments, to investigate the role of the two competing mechanisms in IDP binding, which may lead to deeper insight into the structures and functions of IDPs. APPENDIX A RESONANCE ASSIGNMENTS OF THE 13C, 13C, 1HN AND 15N NUCLEI OF N AT pH 7 Table A.1: List of residues and their chemical shifts Residue Numbers 13Ca 13Cb 1HN 15N 1 2 54.089 41.594 3 53.574 18.949 8.543 124.488 4 56.946 28.95 8.456 118.416 5 63.522 69.203 8.118 116.123 6 57.051 30.412 8.23 122.953 7 56.835 30.549 8.272 121.948 8 56.246 30.765 8.405 121.939 9 10 56.984 30.468 11 56.851 30.64 8.323 122.152 12 53.017 18.909 8.274 124.636 13 57.026 30.146 8.3 120.271 14 56.95 32.718 8.228 121.594 15 56.37 29.168 8.253 120.733 16 53.112 18.887 8.237 124.666 17 56.489 29.043 8.228 119.007 18 58.021 29.393 8.042 121.894 19 56.443 33.327 7.859 122.759 20 52.441 19.042 7.911 124.084 21 52.287 19.206 8.031 122.451 22 51.073 38.992 8.125 118.571 23 63.67 32.16 24 55.376 41.86 8.082 119.87 25 54.884 42.149 7.867 121.723 26 62.428 32.848 7.825 120.331 27 45.286 8.407 112.345 60 Table A.1 continued 28 62.267 32.819 7.934 119.107 29 58.197 63.775 8.349 119.337 30 52.267 19.292 8.246 126.206 31 54.274 32.553 8.183 122.055 32 63.053 32.041 33 62.184 32.896 8.224 120.361 34 53.041 39.077 8.46 122.599 35 53.978 30.501 8.235 122.849 36 63.072 32.123 37 61.181 38.514 8.252 121.63 38 55.037 42.44 8.281 126.584 39 58.255 63.79 8.232 116.798 40 55.353 42.327 8.243 124.225 41 53.262 38.745 8.347 118.965 42 55.888 30.828 8.17 121.563 43 54.334 32.501 8.32 123.983 44 63.025 32.156 45 56.462 33.004 8.469 121.908 46 58.231 64.001 8.333 117.256 47 56.343 30.833 48 62.558 32.66 8.157 121.445 49 56.777 30.207 8.477 124.811 50 58.397 63.841 8.248 116.78 51 52.543 19.168 8.262 125.878 52 55.035 42.463 8.043 120.571 53 51.28 39.008 8.34 120.416 54 63.67 32.17 55 61.462 38.75 8.064 119.872 56 54.173 41.219 8.134 123.662 57 56.213 41.82 8.292 123.271 58 64.02 69.272 8.239 114.335 59 63.375 32.301 7.683 121.498 60 55.955 42.151 7.938 123.675 61 53.717 18.885 8.096 123.23 62 57.487 29.856 8.137 118.642 63 59.195 38.353 7.986 120.82 64 57.157 30.454 8.1 119.542 65 57.396 32.728 7.94 121.051 66 56.772 28.957 8.175 120.859 67 62.071 38.341 7.98 121.101 68 57.511 30.093 8.273 123.344 69 59.227 63.658 8.254 116.228 61 Table A.1 continued 70 51.282 39.008 8.347 120.499 71 56.259 42.005 72 56.677 28.985 8.101 119.397 73 56.997 30.53 8.071 121.214 74 61.876 38.609 8.019 121.553 75 56.925 30.318 8.364 124.217 76 57.134 30.609 8.342 122.139 77 57.1 32.718 8.223 121.627 78 53.492 38.637 8.347 118.697 79 56.336 29.388 8.154 120.212 80 56.457 30.642 8.278 121.533 81 62.044 69.742 8.028 114.755 82 57.193 29.848 8.017 123.059 83 57.539 39.025 7.819 121.212 84 58.005 63.994 7.994 117.589 85 54.407 32.618 8.244 124.232 86 63.765 31.935 87 45.275 8.484 109.538 88 56.559 30.505 8.046 120.559 89 56.32 30.75 8.4 121.828 90 45.312 8.389 109.704 91 61.224 38.769 8.014 120.039 92 61.612 69.894 8.264 117.929 93 58.275 63.809 94 58.8 63.813 8.245 116.792 95 45.408 8.384 110.693 96 56.278 30.73 8.143 120.519 97 55.938 29.513 8.37 121.26 98 56.403 32.982 8.337 123.095 99 60.871 38.753 8.151 122.554 100 56.578 32.948 8.446 126.475 101 45.153 8.403 110.521 102 56.369 33.287 8.17 120.84 103 58.166 63.619 104 58.72 38.755 8.155 124.168 105 63.13 32.184 106 55.354 42.306 8.311 123.204 107 62.619 40.022 7.483 125.321 Duplicate chemical shifts 20-a 7.921 124.655 62 Table A.1 continued 21-a 52.133 8.019 123.568 22-a 7.894 117.338 23-a 62.911 34.462 24-a 55.559 42.403 8.338 121.956 36-a 62.835 37-a 61.665 38.214 8.283 120.23 38-a 54.934 42.616 8.348 127.301 57-a 56.269 41.907 58-a 64.135 69.112 8.355 114.323 59-a 63.333 32.217 7.606 121.334 103-a 58.245 63.995 105-a 62.75 34.505 106-a 55.946 42.144 8.494 123.151 107-a 62.62 40.078 7.542 126 APPENDIX B RESONANCE ASSIGNMENTS OF THE 13C, 13C, 1HN AND 15N NUCLEI OF N AT pH 5.5 Table B.1: List of residues and their chemical shifts Residue Numbers 13Ca 13Cb 1HN 15N 1 2 54.112 41.476 3 53.427 19.142 8.555 124.73 4 56.704 29.107 8.491 118.697 5 63.176 69.293 8.133 115.938 6 56.822 30.617 8.256 123.125 7 56.742 30.269 8.418 121.937 8 30.505 8.338 122.315 9 56.669 30.413 10 56.864 30.506 8.311 122.248 11 56.67 30.672 8.294 122.039 12 52.942 19.07 8.303 124.791 13 56.871 30.311 8.317 120.209 14 56.768 32.748 8.251 121.723 15 56.243 29.294 8.297 120.883 16 53.045 18.966 8.263 124.694 17 56.403 29.156 8.26 119.047 18 57.887 29.341 8.051 121.779 19 56.363 33.276 7.885 122.778 20 52.551 19.136 7.953 124.242 21 52.401 19.311 8.07 122.545 22 51.035 38.986 8.167 118.513 23 63.699 32.102 24 55.362 41.84 8.124 119.996 25 54.805 42.123 7.907 121.66 26 62.401 32.817 7.852 120.197 27 45.257 8.434 112.432 64 Table B.1 continued 28 62.256 32.772 7.953 119.039 29 58.165 63.825 8.365 119.227 30 52.338 19.401 8.259 126.188 31 54.163 32.506 8.199 121.978 32 63.026 31.993 33 62.155 32.879 8.227 120.256 34 52.996 39.038 8.467 122.45 35 53.864 30.283 8.221 122.678 36 63.137 32.024 37 61.218 38.494 8.262 121.578 38 55.059 42.37 8.294 126.398 39 58.111 63.62 8.238 116.653 40 55.386 42.258 8.23 124.05 41 53.271 38.706 8.362 118.895 42 55.87 30.857 8.149 121.22 43 54.277 32.452 8.322 123.951 44 63.117 32.157 45 56.484 32.971 8.462 121.91 46 58.165 63.891 8.305 117.025 47 56.304 30.837 8.426 123.317 48 62.497 32.632 8.159 121.406 49 56.653 30.221 8.494 124.678 50 58.347 63.792 8.267 116.768 51 52.615 19.228 8.293 125.891 52 55.034 42.411 8.072 120.586 53 51.233 39.008 8.368 120.369 54 63.717 32.123 55 61.416 38.73 8.116 120.004 56 54.056 41.126 8.187 123.566 57 55.972 41.904 8.294 123.24 58 63.623 69.306 8.266 114.212 59 63.061 32.28 7.746 121.377 60 55.667 42.114 7.987 123.914 61 53.419 18.994 8.125 123.697 62 56.954 30.025 8.17 118.923 63 58.52 38.569 8.058 120.963 64 55.842 29.06 8.228 119.92 65 56.993 33.033 8.139 121.921 66 56.463 29.207 8.344 121.637 67 61.593 38.539 8.111 121.706 68 56.934 30.159 8.386 124.108 69 58.787 63.706 8.295 116.57 65 Table B.1 continued 70 53.901 38.547 8.438 120.678 71 55.939 42.078 8.085 121.445 72 56.367 29.192 8.188 119.84 73 56.556 30.611 8.149 121.568 74 61.575 38.633 8.087 121.791 75 56.747 30.259 8.427 124.638 76 56.717 30.613 8.347 122.181 77 56.95 32.804 8.304 121.689 78 53.37 38.676 8.385 118.844 79 56.056 29.388 8.204 120.341 80 56.359 30.651 8.305 121.665 81 61.921 69.784 8.04 114.745 82 57.202 29.786 8.044 123.092 83 57.559 39.016 7.855 121.195 84 57.996 63.94 8.022 117.521 85 54.396 32.534 8.257 124.17 86 63.838 31.924 87 45.269 8.486 109.539 88 56.421 30.463 8.085 120.537 89 56.295 30.745 8.383 121.622 90 45.325 8.391 109.762 91 61.209 38.767 8.026 120.063 92 61.657 69.907 8.275 117.778 93 58.411 63.731 8.369 118.145 94 58.582 63.754 8.38 117.722 95 45.449 8.385 110.752 96 56.205 30.736 8.133 120.515 97 55.916 29.576 8.389 121.313 98 56.401 32.985 8.347 123.073 99 60.877 38.736 8.162 122.468 100 56.532 32.962 8.445 126.3 101 45.185 8.4 110.511 102 56.07 33.371 8.169 120.887 103 58.15 63.631 8.377 117.609 104 58.767 38.683 8.167 124.172 105 63.223 32.153 106 55.316 42.27 8.344 123.3 107 62.542 39.863 7.538 125.325 Duplicate chemical shifts 103-a 58.237 64.026 66 Table B.1 continued 104-a 58.256 40.051 7.834 120.857 105-a 62.609 34.505 106-a 55.912 42.029 8.542 123.293 107-a 62.523 39.838 7.599 125.897 67 REFERENCES 1 Dyson, H. J. Expanding the proteome: disordered and alternatively folded proteins. Q. Rev. Biophys. 44, 467-518, doi:10.1017/s0033583511000060 (2011). 2 Levitt, M., Gerstein, M., Huang, E., Subbiah, S. & Tsai, J. Protein folding: the endgame. Annu. Rev. Biochem. 66, 549-579, doi:10.1146/annurev.biochem.66.1.549 (1997). 3 Bartlett, A. I. & Radford, S. E. An expanding arsenal of experimental methods yields an explosion of insights into protein folding mechanisms. Nat. Struct. Mol. Biol. 16, 582-588 (2009). 4 Berg J. M., Tymoczko J. L. & Stryer L. Biochemistry. 5th edition., (New York: W H Freeman, 2002). 5 Uversky, V. N. & Dunker, A. K. Understanding protein nonfolding. Biochim. Biophys. Acta 1804, 1231-1264, doi:10.1016/j.bbapap.2010.01.017 (2010). 6 Daughdrill, G. W., Pielak, G. J., Uversky, V. N., Cortese, M. S. & Dunker, A. K. Natively Disordered Proteins, in Protein Folding Handbook (eds J. Buchner and T. Kiefhaber), Wiley-VCH Verlag GmbH. doi: 10.1002/9783527619498.ch41 (2005). 7 Tompa, P. & Fersht, A. Structure and Function of Intrinsically Disordered Proteins. (Taylor & Francis, 2010). 8 Fischer, E. Einfluss der Configuration auf die Wirkung der Enzyme. Ber. Dtsch. Chem. Ges., 27, 2985-2993. doi: 10.1002/cber.1894027364 (1894). 9 Kendrew, J. C. et al. A three-dimensional model of the myoglobin molecule obtained by X-ray analysis., Nature 181, 662-666. doi: 10.1038/181662a0 (1958). 10 Anson, M. L. & Mirsky, A. E. On some general properties of proteins. J. Gen. Physiol. 9, 169-179 (1925). 68 11 Pauling, L., Corey, R. B. & Branson, H. R. The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Natl. Acad. Sci. U.S.A. 37, 205-211 (1951). 12 Crick, F. H. Is alpha-keratin a coiled coil? Nature 170, 882-883 (1952). 13 Perutz, M. F. et al. Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5-A. resolution, obtained by X-ray analysis. Nature 185, 416-422 (1960). 14 Blake, C. C. et al. Structure of hen egg-white lysozyme. A three-dimensional Fourier synthesis at 2 Angstrom resolution. Nature 206, 757-761 (1965). 15 Kendrew, J. C. et al. Structure of myoglobin: a three-dimensional Fourier synthesis at 2 A. resolution. Nature 185, 422-427 (1960). 16 Muirhead, H. & Perutz, M. F. Structure of haemoglobin. A three-dimensional fourier synthesis of reduced human haemoglobin at 5.5 A. resolution. Nature 199, 633-638 (1963). 17 Uversky, V. N. & Dunker, A. K. The case for intrinsically disordered proteins playing contributory roles in molecular recognition without a stable 3D structure. F1000 Biol Rep 5, 1, doi:10.3410/B5-1 (2013). 18 Arnone, A. et al. A high resolution structure of an inhibitor complex of the extracellular nuclease of Staphylococcus aureus. I. Experimental procedures and chain tracing. J. Biol. Chem. 246, 2302-2316 (1971). 19 Bloomer, A. C., Champness, J. N., Bricogne, G., Staden, R. & Klug, A. Protein disk of tobacco mosaic virus at 2.8 A resolution showing the interactions within and between subunits. Nature 276, 362-368 (1978). 20 Lian, L. Y. NMR structural studies of glutathione S-transferase. Cell. Mol. Life Sci. 54, 359-362 (1998). 21 Cary, P. D., Moss, T. & Bradbury, E. M. High-resolution proton-magnetic-resonance studies of chromatin core particles. Eur. J. Biochem. 89, 475-482 (1978). 22 Williams, R. J. The conformational mobility of proteins and its functional significance. Biochem. Soc. Trans. 6, 1123-1126 (1978). 23 van der Goot, F. G., Gonzalez-Manas, J. M., Lakey, J. H. & Pattus, F. A 'molten-globule' membrane-insertion intermediate of the pore-forming domain of colicin A. Nature 354, 408-410, doi:10.1038/354408a0 (1991). 69 24 Uversky, V. N. & Narizhneva, N. V. Effect of natural ligands on the structural properties and conformational stability of proteins. Biochem. 63, 420-433 (1998). 25 Martin, J. et al. Chaperonin-mediated protein folding at the surface of groEL through a 'molten globule'-like intermediate. Nature 352, 36-42 (1991). 26 Boublik, M., Bradbury, E. M., Crane-Robinson, C. & Johns, E. W. An investigation of the conformational changes of histone F2b by high resolution nuclear magnetic resonance. Eur. J. Biochem. 17, 151-159 (1970). 27 Lee, W. et al. Solution structure of the tetrameric minimum transforming domain of p53. Nat. Struct. Biol. 1, 877-890 (1994). 28 Schweers, O., Schonbrunn-Hanebeck, E., Marx, A. & Mandelkow, E. Structural studies of tau protein and Alzheimer paired helical filaments show no evidence for beta-structure. J. Biol. Chem. 269, 24290-24297 (1994). 29 Gast, K. et al. Prothymosin alpha: a biologically active protein with random coil conformation. Biochem. 34, 13211-13218 (1995). 30 Kriwacki, R. W., Hengst, L., Tennant, L., Reed, S. I. & Wright, P. E. Structural studies of p21Waf1/Cip1/Sdi1 in the free and Cdk2-bound state: conformational disorder mediates binding diversity. Proc. Natl. Acad. Sci. U. S. A. 93, 11504-11509 (1996). 31 Daughdrill, G. W., Hanely, L. J. & Dahlquist, F. W. The C-terminal half of the antisigma factor FlgM contains a dynamic equilibrium solution structure favoring helical conformations. Biochem. 37, 1076-1082, doi:10.1021/bi971952t (1998). 32 Wright, P. E. & Dyson, H. J. Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J. Mol. Biol. 293, 321-331, doi:10.1006/jmbi.1999.3110 (1999). 33 Romero, P. et al. Thousands of proteins likely to have long disordered regions. Pacific Symposium on Biocomputing. Pac. Symp. Biocomput. 437-448 (1998). 34 Dunker, A. K., Brown, C. J., Lawson, J. D., Iakoucheva, L. M. & Obradovic, Z. Intrinsic disorder and protein function. Biochem. 41, 6573-6582 (2002). 35 Xue, B., Dunker, A. K. & Uversky, V. N. Orderly order in protein intrinsic disorder distribution: disorder in 3500 proteomes from viruses and the three domains of life. J. Biomol. Struct. Dyn. 30, 137-149, doi:10.1080/07391102.2012.675145 (2012). 36 Dyson, H. J. & Wright, P. E. Intrinsically unstructured proteins and their functions. Nat. Rev. Mol. Cell Biol. 6, 197-208, doi:10.1038/nrm1589 (2005). 70 37 Uversky, V. N. What does it mean to be natively unfolded? Eur. J. Biochem. 269, 2-12 (2002). 38 Uversky, V. N., Gillespie, J. R. & Fink, A. L. Why are "natively unfolded" proteins unstructured under physiologic conditions? Proteins 41, 415-427 (2000). 39 Uversky, V. N. Natively unfolded proteins: a point where biology waits for physics. Protein Sci. 11, 739-756, doi:10.1110/ps.4210102 (2002). 40 Marsh, J. A. & Forman-Kay, J. D. Sequence determinants of compaction in intrinsically disordered proteins. Biophys. J. 98, 2383-2390, doi:10.1016/j.bpj.2010.02.006 (2010). 41 Rantalainen, K. I. et al. Potato virus A genome-linked protein VPg is an intrinsically disordered molten globule-like protein with a hydrophobic core. Virology 377, 280-288, doi:10.1016/j.virol.2008.04.025 (2008). 42 Uversky, V. N. Protein folding revisited. A polypeptide chain at the folding-misfolding-nonfolding cross-roads: which way to go? Cell Mol. Life Sci. 60, 1852-1871, doi:10.1007/s00018-003-3096-6 (2003). 43 Uversky, V. N., Santambrogio, C., Brocca, S. & Grandori, R. Length-dependent compaction of intrinsically disordered proteins. FEBS lett. 586, 70-73, doi:10.1016/j.febslet.2011.11.026 (2012). 44 Love, J. J., Li, X., Chung, J., Dyson, H. J. & Wright, P. E. The LEF-1 high-mobility group domain undergoes a disorder-to-order transition upon formation of a complex with cognate DNA. Biochem. 43, 8725-8734, doi:10.1021/bi049591m (2004). 45 Patil, A., Kinoshita, K. & Nakamura, H. Hub promiscuity in protein-protein interaction networks. Inter. J. Mol. Sci. 11, 1930-1943, doi:10.3390/ijms11041930 (2010). 46 Vuzman, D. & Levy, Y. Intrinsically disordered regions as affinity tuners in protein-DNA interactions. Mol. BioSyst. 8, 47-57, doi:10.1039/c1mb05273j (2012). 47 Coelho Ribeiro Mde, L. et al. Malleable ribonucleoprotein machine: protein intrinsic disorder in the Saccharomyces cerevisiae spliceosome. PeerJ 1, e2, doi:10.7717/peerj.2 (2013). 48 Hsu, W. L. et al. Intrinsic protein disorder and protein-protein interactions. Pac. Symp. Biocomput. 116-127 (2012). 71 49 Dunker, A. K., Cortese, M. S., Romero, P., Iakoucheva, L. M. & Uversky, V. N. Flexible nets. The roles of intrinsic disorder in protein interaction networks. FEBS J. 272, 5129-5148, doi:10.1111/j.1742-4658.2005.04948.x (2005). 50 Oldfield, C. J. et al. Flexible nets: disorder and induced fit in the associations of p53 and 14-3-3 with their partners. BMC Genom. 9, S1, doi:10.1186/1471-2164-9-s1-s1 (2008). 51 Hegyi, H., Buday, L. & Tompa, P. Intrinsic structural disorder confers cellular viability on oncogenic fusion proteins. PLoS Comput. Biol. 5, e1000552, doi:10.1371/journal.pcbi.1000552 (2009). 52 Uversky, V. N. Targeting intrinsically disordered proteins in neurodegenerative and protein dysfunction diseases: another illustration of the D(2) concept. Expert Rev. Proteomics 7, 543-564, doi:10.1586/epr.10.36 (2010). 53 Huang, A. & Stultz, C. M. Finding order within disorder: elucidating the structure of proteins associated with neurodegenerative disease. Future Med. Chem. 1, 467-482, doi:10.4155/fmc.09.40 (2009). 54 Li, S., Iakoucheva, L. M., Mooney, S. D. & Radivojac, P. Loss of post-translational modification sites in disease. Pac. Symp. Biocomput. 337-347 (2010). 55 Goh, G. K., Dunker, A. K. & Uversky, V. N. Protein intrinsic disorder and influenza virulence: the 1918 H1N1 and H5N1 viruses. Virology J. 6, 69, doi:10.1186/1743-422x-6-69 (2009). 56 Das, A. How the phage lambda N gene product suppresses transcription termination: communication of RNA polymerase with regulatory proteins mediated by signals in nascent RNA. J. Bacteriol. 174, 6711-6716 (1992). 57 Scharpf, M. et al. Antitermination in bacteriophage lambda. The structure of the N36 peptide-boxB RNA complex. Eur. J. Biochem. 267, 2397-2408 (2000). 58 Johansen, D., Trewhella, J. & Goldenberg, D. P. Fractal dimension of an intrinsically disordered protein: small-angle X-ray scattering and computational study of the bacteriophage lambda N protein. Protein Sci. 20, 1955-1970, doi:10.1002/pro.739 (2011). 59 Van Gilst, M. R., Rees, W. A., Das, A. & von Hippel, P. H. Complexes of N antitermination protein of phage lambda with specific and nonspecific RNA target sites on the nascent transcript. Biochem. 36, 1514-1524, doi:10.1021/bi961920q (1997). 72 60 Gwatkin, R. Molecular cell biology, 2nd edition, by James Darnell, Harvey Lodish, and David Baltimore, Scientific American Books, distributed by W. H. Freeman, New York, 1105 pp, $56.95. Molecular Reproduction and Development 34, 114-114, doi:10.1002/mrd.1080340119 (1993). 61 Friedman, D. I., Granston, A. E., Thompson, D., Schauer, A. T. & Olson, E. R. Genetic analysis of the N transcription antitermination system of phage lambda. Genome. 31, 491-496 (1989). 62 Mason, S. W., Li, J. & Greenblatt, J. Host factor requirements for processive antitermination of transcription and suppression of pausing by the N protein of bacteriophage lambda. J. Biol. Chem. 267, 19418-19426 (1992). 63 Severinov, K. RNA polymerase structure-function: insights into points of transcriptional regulation. Curr. Opin. Microbiol. 3, 118-125 (2000). 64 Malhotra, A., Severinova, E. & Darst, S. A. Crystal structure of a sigma 70 subunit fragment from E. coli RNA polymerase. Cell 87, 127-136 (1996). 65 Zhang, G. et al. Crystal structure of Thermus aquaticus core RNA polymerase at 3.3 A resolution. Cell 98, 811-824 (1999). 66 Joshua-Tor, L. et al. Complete Structural Model of Escherichia coli RNA Polymerase from a Hybrid Approach. PLoS Biol. 8, e1000483, doi:10.1371/journal.pbio.1000483 (2010). 67 Severinov, K. et al. Dissection of the beta subunit in the Escherichia coli RNA polymerase into domains by proteolytic cleavage. J. Biol. Chem. 267, 12813-12819 (1992). 68 Kumar, A. et al. Role of the sigma 70 subunit of Escherichia coli RNA polymerase in transcription activation. J. Mol. Biol. 235, 405-413, doi:10.1006/jmbi.1994.1001 (1994). 69 Burmann, B. M. & Rosch, P. The role of E. coli Nus-factors in transcription regulation and transcription:translation coupling: From structure to mechanism. Transcription. 2, 130-134, doi:10.4161/trns.2.3.15671 (2011). 70 Mooney, R. A. et al. Regulator Trafficking on Bacterial Transcription Units In Vivo. Mol. Cell 33, 97-108 (2009). 71 Yang, X. & Lewis, P. J. The interaction between RNA polymerase and the elongation factor NusA. RNA Biol. 7, 272-275 (2010). 73 72 Belogurov, G. A., Mooney, R. A., Svetlov, V., Landick, R. & Artsimovitch, I. Functional specialization of transcription elongation factors. EMBO J. 28, 112-122, doi:10.1038/emboj.2008.268 (2009). 73 Stagno, J. R. et al. Structural basis for RNA recognition by NusB and NusE in the initiation of transcription antitermination. Nucleic Acids Res. 39, 7803-7815, doi:10.1093/nar/gkr418 (2011). 74 Greive, S. J. Assembly of an RNA-Protein Complex: binding of NusB and NusE (S10) proteins to boxA RNA nucleates the formation of the antitermination complex involved in controling rRNA transcription in Escherichia coli. J. BIol. Chem. 280, 36397-36408, doi:10.1074/jbc.M507146200 (2005). 75 DeVito, J. & Das, A. Control of transcription processivity in phage lambda: Nus factors strengthen the termination-resistant state of RNA polymerase induced by N antiterminator. Proc. Natl. Acad. Sci. U.S.A. 91, 8660-8664 (1994). 76 Chattopadhyay, S., Garcia-Mena, J., DeVito, J., Wolska, K. & Das, A. Bipartite function of a small RNA hairpin in transcription antitermination in bacteriophage lambda. Proc. Natl. Acad. Sci. U.S.A. 92, 4061-4065 (1995). 77 Das, A. Control of transcription termination by RNA-binding proteins. Annu. Rev. Biochem. 62, 893-930, doi:10.1146/annurev.bi.62.070193.004333 (1993). 78 Zhang, X., Lee, S. W., Zhao, L., Xia, T. & Qin, P. Z. Conformational distributions at the N-peptide/boxB RNA interface studied using site-directed spin labeling. RNA 16, 2474-2483, doi:10.1261/rna.2360610 (2010). 79 Cilley, C. D. & Williamson, J. R. Analysis of bacteriophage N protein and peptide binding to boxB RNA using polyacrylamide gel coelectrophoresis (PACE). RNA 3, 57-67 (1997). 80 Legault, P., Li, J., Mogridge, J., Kay, L. E. & Greenblatt, J. NMR structure of the bacteriophage lambda N peptide/boxB RNA complex: recognition of a GNRA fold by an arginine-rich motif. Cell 93, 289-299 (1998). 81 Su, L. et al. RNA recognition by a bent alpha-helix regulates transcriptional antitermination in phage lambda. Biochem. 36, 12722-12732, doi:10.1021/bi971408k (1997). 82 Tan, R. & Frankel, A. D. Structural variety of arginine-rich RNA-binding peptides. Proc. Natl. Acad. Sci. U.S.A. 92, 5282-5286 (1995). 83 Mogridge, J. et al. Independent ligand-induced folding of the RNA-binding domain and two functionally distinct antitermination regions in the phage lambda N protein. Mol. Cell 1, 265-275 (1998). 74 84 Franklin, N. C. Clustered arginine residues of bacteriophage lambda N protein are essential to antitermination of transcription, but their locale cannot compensate for boxB loop defects. J. Mol. Biol. 231, 343-360, doi:10.1006/jmbi.1993.1287 (1993). 85 Xia, T., Frankel, A., Takahashi, T. T., Ren, J. & Roberts, R. W. Context and conformation dictate function of a transcription antitermination switch. Nat. Struct. Biol. 10, 812-819, doi:10.1038/nsb983 (2003). 86 Eisenmann, A., Schwarz, S., Prasch, S., Schweimer, K. & Rosch, P. The E. coli NusA carboxy-terminal domains are structurally similar and show specific RNAP- and lambdaN interaction. Protein Sci. 14, 2018-2029, doi:10.1110/ps.051372205 (2005). 87 Bonin, I. et al. Structural basis for the interaction of Escherichia coli NusA with protein N of phage lambda. Proc. Natl. Acad. Sci. U.S.A. 101, 13762-13767, doi:10.1073/pnas.0405883101 (2004). 88 Prasch, S. et al. Interaction of the intrinsically unstructured phage lambda N Protein with Escherichia coli NusA. Biochem. 45, 4542-4549, doi:10.1021/bi0523411 (2006). 89 Mishra, S., Mohan, S., Godavarthi, S. & Sen, R. The interaction surface of a bacterial transcription elongation factor required for complex formation with an antiterminator during transcription antitermination. J. Biol. Chem. 288, 28089-28103, doi:10.1074/jbc.M113.472209 (2013). 90 Pace, C. N., Vajdos, F., Fee, L., Grimsley, G. & Gray, T. How to measure and predict the molar absorption coefficient of a protein. Protein Sci. 4, 2411-2423, doi:10.1002/pro.5560041120 (1995). 91 Delaglio, F. et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR. 6, 277-293 (1995). 92 Johnson, B. A. & Blevins, R. A. NMR View: A computer program for the visualization and analysis of NMR data. J. Biomol. NMR 4, 603-614, doi:10.1007/bf00404272 (1994). 93 Cavanagh, J., Fairbrother, W. J., Palmer III, A. G. & Skelton, N. J. Protein NMR spectroscopy: principles and practice. (Academic Press, 1995). 94 Moseley, H. B., Sahota, G. & Montelione, G. Assignment validation software suite for the evaluation and presentation of protein resonance assignment data. J. Biomol. NMR. 28, 341-355, doi:10.1023/B:JNMR.0000015420.44364.06 (2004). 75 95 Markley, J. L. et al. Recommendations for the presentation of NMR structures of proteins and nucleic acids--IUPAC-IUBMB-IUPAB Inter-Union Task Group on the standardization of data bases of protein and nucleic acid structures determined by NMR spectroscopy. Eur. J. Biochem. 256, 1-15 (1998). 96 Farrow, N. A. et al. Backbone dynamics of a free and phosphopeptide-complexed Src homology 2 domain studied by 15N NMR relaxation. Biochem. 33, 5984-6003 (1994). 97 Farrow, N. A., Zhang, O., Szabo, A., Torchia, D. A. & Kay, L. E. Spectral density function mapping using 15N relaxation data exclusively. J. Biomol. NMR. 6, 153-162 (1995). 98 Yao, L., Vögeli, B., Ying, J. & Bax, A. NMR determination of amide N−H equilibrium bond length from concerted dipolar coupling measurements. J. Am. Chem. Soc. 130, 16518-16520, doi:10.1021/ja805654f (2008). 99 Tjandra, N., Szabo, A. & Bax, A. protein backbone dynamics and 15N chemical shift anisotropy from quantitative measurement of relaxation interference effects. J. Am. Chem. Soc. 118, 6986-6991, doi:10.1021/ja960510m (1996). 100 Lupas, A., Van Dyke, M. & Stock, J. Predicting coiled coils from protein sequences. Science 252, 1162-1164, doi:10.1126/science.252.5009.1162 (1991). 101 McDonnell, A. V., Jiang, T., Keating, A. E. & Berger, B. Paircoil2: improved prediction of coiled coils from sequence. Bioinformatics 22, 356-358, doi:10.1093/bioinformatics/bti797 (2006). 102 Wolf, E., Kim, P. S. & Berger, B. MultiCoil: a program for predicting two- and three-stranded coiled coils. Protein Sci. 6, 1179-1189, doi:10.1002/pro.5560060606 (1997). 103 Theillet, F.-X. et al. The alphabet of intrinsic disorder I. Act like a Pro: On the abundance and roles of proline residues in intrinsically disordered proteins. Intrinsically Disord. Proteins 1, e24360 (2013). 104 Schubert, M., Labudde, D., Oschkinat, H. & Schmieder, P. A software tool for the prediction of Xaa-Pro peptide bond conformations in proteins based on 13C chemical shift statistics. J. Biomol. NMR. 24, 149-154 (2002). 105 Dorman, D. E. & Bovey, F. A. Carbon-13 magnetic resonance spectroscopy. spectrum of proline in oligopeptides. J. Org. Chem. 38, 2379-2383, doi:10.1021/jo00953a021 (1973). 106 Kjaergaard, M. & Poulsen, F. M. Sequence correction of random coil chemical shifts: correlation between neighbor correction factors and changes in the 76 Ramachandran distribution. J. Biomol. NMR 50, 157-165, doi:10.1007/s10858-011-9508-2 (2011). 107 Marsh, J. A., Singh, V. K., Jia, Z. & Forman-Kay, J. D. Sensitivity of secondary structure propensities to sequence differences between alpha- and gamma-synuclein: implications for fibrillation. Protein Sci. 15, 2795-2804, doi:10.1110/ps.062465306 (2006). 108 Armstrong, K. M. & Baldwin, R. L. Charged histidine affects alpha-helix stability at all positions in the helix by interacting with the backbone charges. Proc. Natl. Acad. Sci. U.S.A. 90, 11337-11340 (1993). 109 Carr, D. W. et al. Interaction of the regulatory subunit (RII) of cAMP-dependent protein kinase with RII-anchoring proteins occurs through an amphipathic helix binding motif. J. Biol. Chem. 266, 14188-14192 (1991). 110 Berger, B. et al. Predicting coiled coils by use of pairwise residue correlations. Proc. Natl. Acad. Sci. U.S.A. 92, 8259-8263 (1995). 111 Delorenzi, M. & Speed, T. An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics 18, 617-625 (2002). 112 Trigg, J., Gutwin, K., Keating, A. E. & Berger, B. Multicoil2: predicting coiled coils and their oligomerization states from sequence in the twilight zone. PloS One 6, e23519, doi:10.1371/journal.pone.0023519 (2011). 113 Hanson, W. M., Beeser, S. A., Oas, T. G. & Goldenberg, D. P. Identification of a residue critical for maintaining the functional conformation of BPTI. Journal of molecular biology 333, 425-441 (2003). 114 Millet, O., Loria, J. P., Kroenke, C. D., Pons, M. & Palmer, A. G. The static magnetic field dependence of chemical exchange linebroadening defines the NMR chemical shift timescale. J. Am. Chem. Soc. 122, 2867-2877, doi:10.1021/ja993511y (2000). 115 Morin, S. A practical guide to protein dynamics from 15N spin relaxation in solution. Prog. Nucl. Mag. Res. Spectrosc. 59, 245-262, doi:10.1016/j.pnmrs.2010.12.003 (2011). 116 Lipari, G. & Szabo, A. Model-free approach to the interpretation of nuclear magnetic resonance relaxation in macromolecules. 1. Theory and range of validity. J. Am. Chem. Soc. 104, 4546-4559, doi:10.1021/ja00381a009 (1982). 117 Flynn, P.F., Bieber Urbauer, R. F., Zhang, H., Lee, A. L., & Wand, A. J. Main chain and side chain dynamics of a heme protein: 15N and 2H NMR relaxation 77 studies of R. capsulatus ferrocytochrome c2. Biochem., 40, 659-6569, doi:10.1021/bi00102252 (2001). 118 Chugha, P. & Oas, T. G. Backbone dynamics of the monomeric lambda repressor denatured state ensemble under nondenaturing conditions. Biochem. 46, 1141-1151, doi:10.1021/bi061371g (2007). 119 Andrec, M., Montelione, G. T. & Levy, R. M. Lipari-Szabo mapping: A graphical approach to Lipari-Szabo analysis of NMR relaxation data using reduced spectral density mapping. J. Biomol. NMR. 18, 83-100 (2000). 120 Zhang, H., Neal, S. & Wishart, D. RefDB: A database of uniformly referenced protein chemical shifts. J. Biomol. NMR. 25, 173-195, doi:10.1023/A:1022836027055 (2003). 121 Cornilescu, C. C., Bouamr, F., Carter, C. & Tjandra, N. Backbone (15)N relaxation analysis of the N-terminal domain of the HTLV-I capsid protein and comparison with the capsid protein of HIV-1. Protein Sci. 12, 973-981, doi:10.1110/ps.0235903 (2003). 122 Lawrence, C. W. & Showalter, S. A. Carbon-Detected 15N NMR Spin Relaxation of an Intrinsically Disordered Protein: FCP1 Dynamics Unbound and in Complex with RAP74. J. Phys. Chem. Lett. 3, 1409-1413, doi:10.1021/jz300432e (2012). 123 Uversky, V. N. Seven Lessons from One IDP Structural Analysis. Struct. 18, 1069-1071, doi:http://dx.doi.org/10.1016/j.str.2010.08.003 (2010). 124 Sigalov, A. B., Zhuravleva, A. V. & Orekhov, V. Y. Binding of intrinsically disordered proteins is not necessarily accompanied by a structural transition to a folded form. Biochim. 89, 419-421, doi:10.1016/j.biochi.2006.11.003 (2007). 125 Bamm, V. V., De Avila, M., Smith, G. S., Ahmed, M. A. & Harauz, G. Structured functional domains of myelin basic protein: cross talk between actin polymerization and Ca(2+)-dependent calmodulin interaction. Biophys. J. 101, 1248-1256 (2011). 126 Lupas, A. Coiled coils: new structures and new functions. Trends Biochem. Sci. 21, 375-382 (1996). 127 Weisberg, R. A. & Gottesman, M. E. Processive antitermination. J. Bacteriol. 181, 359-367 (1999). 128 Faber, C., Scharpf, M., Becker, T., Sticht, H. & Rosch, P. The structure of the coliphage HK022 Nun protein-lambda-phage boxB RNA complex. Implications for the mechanism of transcription termination. J. Biol. Chem. 276, 32064-32070, doi:10.1074/jbc.M102975200 (2001). 78 129 Habchi, J., Tompa, P., Longhi, S. & Uversky, V. N. Introducing Protein Intrinsic Disorder. Chem. Rev., doi:10.1021/cr400514h (2014). 130 Neumaier, S., Reiner, A., Buttner, M., Fierz, B. & Kiefhaber, T. Testing the diffusing boundary model for the helix-coil transition in peptides. Proc. Natl. Acad. Sci. U.S.A. 110, 12905-12910, doi:10.1073/pnas.1303515110 (2013). 131 Scholtz, J. M. et al. Calorimetric determination of the enthalpy change for the alpha-helix to coil transition of an alanine peptide in water. Proc. Natl. Acad. Sci. 88, 2854-2858 (1991). 132 Clore, G. M. et al. Deviations from the simple two-parameter model-free approach to the interpretation of nitrogen-15 nuclear magnetic relaxation of proteins. J. Am. Chem. Soc. 112, 4989-4991, doi:10.1021/ja00168a070 (1990). 133 Buevich, A. V., Shinde, U. P., Inouye, M. & Baum, J. Backbone dynamics of the natively unfolded pro-peptide of subtilisin by heteronuclear NMR relaxation studies. J. Biomol. NMR 20, 233-249 (2001). 134 Ochsenbein, F., Neumann, J. M., Guittet, E. & van Heijenoort, C. Dynamical characterization of residual and nonnative structures in a partially folded protein by (15)N NMR relaxation using a model based on a distribution of correlation times. Protein Sci. 11, 957-964, doi:10.1110/ps.4000102 (2002). 135 Fisher, C. K. & Stultz, C. M. Constructing ensembles for intrinsically disordered proteins. Curr. Opin. Struct. Biol. 21, 426-431, doi:10.1016/j.sbi.2011.04.001 (2011). 136 Terakawa, T. & Takada, S. Multiscale ensemble modeling of intrinsically disordered proteins: p53 N-terminal domain. Biophys. J. 101, 1450-1458, doi:10.1016/j.bpj.2011.08.003 (2011). 137 Wright, P. E. & Dyson, H. J. Linking folding and binding. Curr. Opin. Struct. Biol. 19, 31-38, doi:10.1016/j.sbi.2008.12.003 (2009). 138 Dogan, J., Mu, X., Engstrom, A. & Jemth, P. The transition state structure for coupled binding and folding of disordered protein domains. Sci. Rep. 3, 2076, doi:10.1038/srep02076 (2013). 139 Giri, R., Morrone, A., Toto, A., Brunori, M. & Gianni, S. Structure of the transition state for the binding of c-Myb and KIX highlights an unexpected order for a disordered system. Proc. Natl. Acad. Sci. (2013).
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6w69v0d