| Title | Molecular dynamics simulations on G-quadruplexes |
| Publication Type | dissertation |
| School or College | College of Pharmacy |
| Department | Medicinal Chemistry |
| Author | Cang, Xiaohui |
| Date | 2010-12 |
| Description | Guanine-rich DNA/RNA sequences have the potential to form four-stranded structures called G-quadruplexes. Because of the rigidity and glycosidic bond orientation diversity in the stems, the flexibility of loops, and the featured alignment of several cations within one quadruplex channel, G-quadruplexes can serve as a sensitive benchmark to assess and validate current force fields for nucleic acids in molecular dynamics (MD) simulations. |
| Type | Text |
| Publisher | University of Utah |
| Subject | AMBER; DNA; Glycosidic conformation; G-quadruplex; Molecular dynamics; Structural polymorphism |
| Subject LCSH | Quadruplex nucleic acids; Molecular dynamics |
| Dissertation Institution | University of Utah |
| Dissertation Name | PhD |
| Language | eng |
| Rights Management | © Xiaohui Cang |
| Format | application/pdf |
| Format Medium | application/pdf |
| Source | Original in Marriott Library Special collections, QP6.5 2010 .C255 |
| ARK | ark:/87278/s6697j7w |
| DOI | https://doi.org/doi:10.26053/0H-2HAP-EBG0 |
| Setname | ir_etd |
| ID | 194086 |
| OCR Text | Show MOLECULAR DYNAMICS SIMULATIONS ON G-QUADRUPLEXES by Xiaohui Cang A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Medicinal Chemistry The University of Utah December 2010 Copyright © Xiaohui Cang 2010 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of Xiaohui Cang has been approved by the following supervisory committee members: Thomas E. Cheatham, III , Chair 8/31/2010 Date Approved Darrell R. Davis , Member 8/31/2010 Date Approved Eric W. Schmidt , Member 9/14/2010 Date Approved Julio C. Facelli , Member 8/31/2010 Date Approved Valeria Molinero , Member 8/31/2010 Date Approved and by Darrell R. Davis , Chair of the Department of Medicinal Chemistry and by Charles A. Wight, Dean of The Graduate School. ABSTRACT Guanine-rich DNA/RNA sequences have the potential to form four-stranded structures called G-quadruplexes. Because of the rigidity and glycosidic bond orientation diversity in the stems, the flexibility of loops, and the featured alignment of several cations within one quadruplex channel, G-quadruplexes can serve as a sensitive benchmark to assess and validate current force fields for nucleic acids in molecular dynamics (MD) simulations. Our investigations have shown that the current nucleic acid force field (ff99bsc0) failed to represent the correct diagonal dT4 loop geometry; failed to maintain the noncanonical backbone geometries of the 5´-anti-syn and 5´-syn-syn steps; and also failed to represent the correct cation-cation distances within the quadruplex channel. However, our results also suggested that these problems could be significantly alleviated by simply refining the backbone torsion parameters or cation van der Waals parameters of the current force fields, which indicates that the simple pair-wise additive force fields still have space to be improved and the current efforts in refining the torsion potentials of the nucleic acid force fields are in the right direction. G-quadruplexes are highly polymorphic in solutions which make them challenging to be fully explored by the conventional experimental techniques. Even though the current nucleic acid force fields are not perfect, MD simulations have shown advantages in investigating G-quadruplex structure, stability and polymorphism. Our MD simulations and free energy analyses on two-quartet stem models have shown the different stabilities iv for the four glycosidic steps, and successfully explained the glycosidic patterns for anti-parallel G-quadruplexes. To investigate the effect of loops on G-quadruplex folding topology and stability, MD simulations were carried out on a series of single-loop models. Energetic estimates on seven loop types (variants of lateral, diagonal and double-chain-reversal loops) for each of the five sequences were consistent with previous experiments in most cases. However, our simulations also revealed that generally short loops are less stable than longer loops, which is conflicting with previous experimental conclusions. This result led to our proposal that anti-parallel/parallel equilibrium and their balance with parallel-multimer is critical for understanding G-quadruplex folding and stability. When the above work on stems and the work on loops are combined together, more folding rules on G-quadruplexes have been proposed, and reasonable explanations have been given to a series of G-quadruplex-related problems. TABLE OF CONTENTS ABSTRACT....................................................................................................................... iii ACKNOWLEDGMENTS ............................................................................................... viii Chapter 1. INTRODUCTION ................................................................................................. 1 A Brief History of G-quadruplexes ....................................................................... 1 Building-blocks of DNA....................................................................................... 1 G-quartet and G-quadruplex Structures................................................................. 4 Telomere, Telomerase and Telomeric G-quadruplexes......................................... 5 Nontelomeric G-quadruplexes............................................................................... 7 Molecular Dynamics Simulations.......................................................................... 7 MM-PBSA Calculations........................................................................................ 9 Outline of the Following Chapters ...................................................................... 10 2. MD SIMULATIONS ON THE DIAGONAL G-QUADRUPLEXES FORMED FROM [d(G4T4G4)]2: LIMITATIONS OF THE CURRENT NUCLEIC ACIDS FORCE FIELD..................................................................... 17 Abstract................................................................................................................ 17 Introduction.......................................................................................................... 18 Methods ............................................................................................................... 20 Initial structures for MD simulations...................................................... 20 MD simulation protocol.......................................................................... 21 Locally enhanced sampling simulations ................................................. 22 Free-energy calculations ......................................................................... 23 Experimental structures used for statistics.............................................. 23 Results.................................................................................................................. 24 Various different loop geometries observed in MD simulations ............ 24 The stable over-stacked loop geometry .................................................. 24 Stability of the over-stacked loop comes primarily from torsion potentials ................................................................................................. 25 Failure to keep the detailed backbone properties.................................... 26 Dihedral angle distributions in the four glycosidic steps........................ 27 Effect of the stacking in backbone dihedral angles of anti-syn steps ..... 29 Discussion............................................................................................................ 30 vi 3. EXPLAINING THE VARIED GLYCOSIDIC CONFORMATIONAL, G-TRACT LENGTH, AND SEQUENCE PREFERENCES FOR ANTI-PARALLEL G-QUADRUPLEXES.................................................................... 43 Abstract................................................................................................................ 43 Introduction.......................................................................................................... 44 Methods ............................................................................................................... 49 Building the two-quartet models............................................................. 49 Molecular dynamics simulations and free-energy calculations .............. 50 Results.................................................................................................................. 51 Free energy calculations.......................................................................... 51 Relative stabilities of different glycosidic steps and G-quadruplex folding rules ............................................................................................ 53 Entropy calculations................................................................................ 54 Molecular dynamics of the two-quartet models...................................... 54 MM-PBSA calculations for the three-quartet and four-quartet stem models ..................................................................................................... 56 Discussions .......................................................................................................... 58 Validate theoretical G-quadruplex folding rules with experimental structures ................................................................................................. 58 Dihedral angles in anti-syn and syn-syn steps......................................... 60 Comparison to earlier results .................................................................. 61 Noncanonical G-quadruplexes................................................................ 62 G-quadruplexes with modified residues ................................................. 63 G-quadruplex of G2 and G4-tracts: more definite folding topologies ..... 64 G-quadruplex of G3-tracts: the extreme polymorphism.......................... 64 Supporting Information ....................................................................................... 72 4. INSIGHT INTO TELOMERE STRUCTURAL POLYMORPHISM AND G-QUADRUPLEX FOLDING FROM SEQUENCE AND LOOP CONNECTIVITY THROUGH FREE ENERGY ANALYSIS .......................... 74 Abstract................................................................................................................ 74 Introduction.......................................................................................................... 75 Methods ............................................................................................................... 83 Starting structures ................................................................................... 83 MD simulation protocol and free-energy calculations............................ 85 Results.................................................................................................................. 86 MD simulations - summarizing the observed structural features........... 86 Free energy estimation from MM-PBSA calculations............................ 88 MD simulations of the left-handed double-chain-reversal loop ............. 92 The effect of loop length on G-quadruplex stability............................... 92 A stable cation binding site..................................................................... 93 Limitations of the current work .............................................................. 94 Discussions and Conclusions............................................................................... 95 Comparison of simulation and experimental observations..................... 96 Predominant geometries of human telomeric G-quadruplexes............... 98 vii Conservation of the telomere sequence d(TTAGGG) .......................... 100 Parallel or anti-parallel? Correlation of parallel structures with multimers .............................................................................................. 101 Further hypotheses on G-quadruplex folding - combining stems with loops.............................................................................................. 103 Concluding Remarks ......................................................................................... 108 Supporting Information ..................................................................................... 118 5. ION BEHAVIOR OF G-QUADRUPLEXES IN MD SIMULATIONS........... 122 Abstract.............................................................................................................. 122 Introduction........................................................................................................ 122 Methods ............................................................................................................. 125 QM optimization on G-quartet bases.................................................... 125 Experimental structures ........................................................................ 125 MD simulations and free energy calculation ........................................ 126 Results................................................................................................................ 127 Cation binding sites in G-quadruplexes of different layers .................. 127 Correlation of the bifurcated hydrogen bonding with cations .............. 128 Comparison of QM, MD and crystal structures.................................... 129 Different parallel/anti-parallel equilibriums with Na+ and K+.............. 131 Discussions ........................................................................................................ 132 6. CONCLUSIONS AND FUTURE DIRECTIONS............................................. 138 REFERENCES ............................................................................................................... 143 ACKNOWLEDGMENTS First of all, I want to thank my advisor, Thomas E. Cheatham, III. I thank him for accepting me as his student, and thank him for keeping me in the group when I was not that into the research in the first two years. If it were not for him, I would hardly have had any chance to graduate, and more importantly, to find work that I could really enjoy. Thanks for his guidance in my research, and thanks for his enormous support when I needed to be there for my family. One is so lucky to meet an advisor like him who enjoys his work, his life and is always so nice to others. I also want to thank our collaborator Jiří Šponer. Tom and Jiří have spent a lot of time and effort in editing and revising the manuscripts, which is highly appreciated. Thanks to all my committee members for their advice on my research and the time for reviewing my thesis. Their enthusiasm and dedication to science are so impressive, and this will affect my academic career thereafter. I also want to thank all my lab mates, In-suk Joung, Niel Henriksen, Jianyin Shao, Kiumars Shahrokh and Scott Pendley, for their help and valuable discussion. Support from the U.S. National Institutes of Health (TEC3) [grant number R01-GM59306890], the Center for High Performance Computing at the University of Utah and the U.S. National Science Foundation supported TeraGrid (TEC3) [grant number MCA01S027] is acknowledged. Last but not least, I thank my husband for his love and patience of waiting, and thank all my family for their support on whatever decisions I've made. CHAPTER 1 INTRODUCTION A Brief History of G-quadruplexes Formation of polycrystalline gels from guanosine was observed as early as the 19th century, and in the 1960s Gellert et al.1 proposed the tetrameric arrangement of the associated guanine bases. Four guanine bases form a co-planar array called G-quartet through eight hydrogen bonds (Figure 1.1). In the 1970s, four-stranded structures with stacked quartets termed G-quadruplexes (or G-DNA) were demonstrated.2, 3 Identification of the G-rich repetitive sequences at the end of chromosomes ignited great interest in G-quadruplex structures in early 1990s.4 In the 2000s, potential quadruplex-forming sequences were identified in other genomic area such as gene promoter regions,5-8 and this lead to the rapid growth of physiological relevant research on G-quadruplexes. 9 Building-blocks of DNA Nucleic acids are polymers of nucleotides, and nucleotides are composed of three building blocks: the bases, the pentose sugars and the phosphate groups. A base attached to a pentose sugar ring is termed a nucleoside. In DNA the sugar is deoxyribose, and there are four distinct types of bases: guanine, cytosine, adenine, and thymine (Figure 2 1.2). In double-helix DNA, base pairing is formed between purines (Guanine, Adenine) and pyrimidines (Thymine, Cytosine) through Watson-Crick edges (Figure 1.3). The G:C base pair has three hydrogen bonds and shows more stability than the A:T base pair that has two hydrogen bonds (Figure 1.4). Beside the Watson-Crick edge, the Hoogsteen edge of the base can also be used in hydrogen bond formation (Figure 1.3). A nucleotide conformation can be defined by eight backbone torsions as shown in Figure 1.5: α (P-O5´), β (O5´-C5´), γ (C5´-C4´), δ (C4´-C3´), ε (C3-O3´), ζ (O3´-P), the phase angle of pseudorotation P and the glycosidic bond χ. The five internal dihedral angles (τ0, τ1, τ2, τ3, τ4) in the deoxyribose ring are also shown in Figure 1.5. The five-membered pentose sugar ring shows inherent non-planarity, which is termed sugar puckering. The puckers can be defined by the phase angle of pseudorotation P (Eq. 1.1) and the maximum degree of pucker τm (Eq. 1.2).10 2 (sin 36 sin 72 ) tan ( ) ( ) 2 4 1 3 0 P [1.1] m P cos 2 [1.2] Various distinct deoxyribose ring pucker geometries have been observed by experimental techniques. When the major out-of-plane deviation atom is on the same side of the base and C4´-C5´ bond, it is termed endo. If on the opposite side, it is called exo. C2΄ endo and C3´ endo (Figure 1.6) are the two main geometries for isolated nucleosides and nucleotides. B-form DNA double-helices mainly occupy C2΄ endo sugar puckers and A-form DNA double-helices mainly occupy C3´ endo sugar puckers. Changes in sugar puckering will alter the orientations of C1´, C3´ and C4´ and result in changes in 3 backbone conformation and overall structure. Steric hindrance alone makes the angles α, β, γ and ζ each restricted to three energy minima: -60° (gauche-, g-), 60° (gauche+, g+), 180° (trans, t), and the angle ε with a broad range around -120°.11 In double helix DNA, these backbone angles are highly correlated and only show limited sets of low-energy conformations. The phosphate conformation ε and ζ are correlated and usually show two conformations: t/g- (80%) and g-/t. The dihedral angles α and ζ show a conformation of g-/g- in right-handed helices. All three ranges have been observed for the dihedral angle γ, and g+ is the predominant conformation in right-handed double helix. The dihedral angle β is always around trans. The dihedral angle δ is quite related to sugar pucker: ~75° for C3´ endo and ~150° for C2΄ endo.11 The deoxyribose sugar ring and the base are linked through the glycosidic bond. The glycosidic bond angle χ is defined as O4´-C1´-N9-C4 for purines and O4´-C1´-N1-C2 for pyrimidines. Two common glycosidic conformations are observed in folded DNA: anti (-120° < χ < 180°) and syn (0° < χ < 90°), as shown in Figure 1.7. In the anti conformation, the hydrogen attached to C8 in purines and C6 in pyrimidines lies over the sugar ring; while in syn conformation, the orientations are reversed (Figure 1.7). For pyrimidines, the syn conformation is less preferred than the anti conformation due to the unfavorable contact between the phosphate group and the O2 atom of the base. For adenosine, the anti conformation is slightly more stable than the syn conformation, whereas for guanosine-containing nucleotides, the syn conformation is slightly more favored because of the electrostatic interactions between the phosphate group and the N2 amino group.11 Substituting the H8 atom of the guanosine with bulky group such as bromine or the methyl group will predominantly move the equilibrium toward syn 4 conformation. The glycosidic bond conformations are affected by the puckering of the sugar ring: syn conformations are not found with C3´ endo because of the unfavorable contact between the base and the H3´ atom (Figure 1.6). G-quartet and G-quadruplex Structures In a G-quartet, hydrogen bonds are formed between the Watson-Crick edge and the Hoogsteen edge of two adjacent guanosines, and all eight hydrogen bonds within a quartet follow the same electron donor→acceptor directions around the quartet (Figure 1.8). In this thesis, the direction from electron pair donor to electron pair acceptor is defined as the direction of the hydrogen bond. Within a quartet, there are four geometries between two adjacent guanines: anti→syn, syn→anti, anti→anti and syn→syn along the hydrogen bonding direction, and these four different geometries show different strand polarities and groove widths (Figure 1.9). The anti→syn residues along the hydrogen bonding electron donor→acceptor direction form a narrow groove; the syn→anti residues form a wide groove; the anti→anti and syn→syn residues form middle-width groove (Figure 1.9). Within a quartet, if two residues show the same glycosidic conformation, they show the same strand 5´→3´ orientations; and if two residues show opposite glycosidic conformations, they show opposite strand orientations (Figure 1.9).12 In G-quadruplexes, if two residues within the same G-strand show the same glycosidic bond angle conformation, the two quartets in which they are located show the same hydrogen bonding directions; while if two residues show opposite glycosidic bond angle conformations, the two quartets in which they are located show opposite hydrogen bonding directions (Figure 1.10). If two stem-forming strands show the same directions, they show exactly the same glycosidic bond angle conformation along the strands; 5 whereas if two stem-forming strands show opposite directions, they show reversed and opposite glycosidic bond angle conformations from 5´→3´ direction along the strands. For example, if one strand shows the conformation of 5´-syn-anti-anti, the other strand oriented in the opposite direction will adopt a 5´-syn-syn-anti conformation. We have shown above the relationships between the groove widths and the geometries of two adjacent residues within a quartet. There is a simpler way to tell the widths of a groove based on the relative orientations of two adjacent strands: if the two strands show the same direction, a middle-width groove forms; if two strands are aligned in a anti-clockwise (or left-handed) way, a narrow-width groove forms; if two strands are aligned in a clockwise (or left-handed) way, a wide-width groove forms (Figure 1.11). More details on the G-quadruplex structures can be found in the Introduction parts of the following chapters and in previous reviews.12-19 Telomere, Telomerase and Telomeric G-quadruplexes Within cells, DNA is organized into higher order structures known as chromosomes. The bacteria and archea typically possess a single circular chromosome and thus do not suffer from the end replication problem.20 The eukaryotes have multiple linear chromosomes, and they solve the end replication problem through creating specialized regions of simple repetitive DNA at the ends of the linear chromosomes. Such repetitive DNA sequences are called telomeres,21 which are synthesized by telomerase.22, 23 Telomerase is a reverse transcriptase enzyme that carries its own RNA template, and this enzyme is active only in germ cells, stem cells and some white blood cells.24 In somatic cells, telomerase is inactive, and telomeres are steadily shortened with each replication cycle.25 When the telomere is too short, the cell riches the Hayflick limit,26, 27 and it stops 6 dividing and commences apoptosis.20 It's suggested that telomere shortening is a mechanism to prevent accumulation of replication mistakes. In most cancer cells, however, telomerase is highly expressed,24 which leads to the elongation of telomeres, therefore the cells never reach the Hayflick limit no matter how many divisions the cells undergo, and the cells become immortal. Potential anti-cancer drugs targeting to turning off telomerase in cancer cells have received intensive investigation,28-31 and some drugs are already in early clinical trials. Telomeres are simple repetitive DNA sequences, and most telomere repeats show a (Tn)AGn pattern.32 Formation of quadruplex structure from telomeric sequences in vitro has been investigated for a long time, and in recent years more and more evidence has accumulated to prove the existence of telomeric G-quadruplexes in vivo.33-35 The human telomeric repeat sequence is TTAGGG,36 and this is conserved among all the vertebrates.37 Human telomeres are usually several kilobases long with a single-stranded overhang of 100-200 nucleotides at the 3´-end of the chromosomes.38 A variety of structures39-47 have been reported to form from the human telomeric sequences by both NMR method and crystallography. Another subject under intensive research is the Oxytricha nova telomeric G-quadruplexes formed from 1.5 telomere repeats d(GGGGTTTTGGGG) 48-53 or 3.5 telomere repeats d(G4T4G4T4G4T4G4) 54, 55. The Tetrahymena telomeric repeat d(T2G4)4 forms a three-quartet quadruplex structure similar to the folding topology of human telomeric G-quadruplex in K+ solution.56 Other than directly turning off telomerase, stabilizing telomeric G-quadruplex structure is also a good way to prevent telomerase from elongating telomeres. Therefore, G-quadruplex stabilizing agents such as TMPyP4 are also potential anti-cancer drugs.57, 58 7 Nontelomeric G-quadruplexes Computational searches on complete genome sequences have elucidated the potential global gene regulation functions of G-quadruplexes. For example, computational results have revealed that potential quadruplex-forming sequences (or G4 motifs) are also enriched in promoter regions of genomes of various different organisms,5-8, 59 and these G4 motifs correlate with gene function: they are enriched in promoters of oncogenes, whereas they are found at very low frequencies in tumor suppressor genes.60 It has also been reported that G4 motifs are enriched in the nucleosome excluded regions.61 Genome-wide experimental techniques such as single-strand antibody experiments also suggest the role of G-quadruplexes as part of a mechanism for gene regulation.62-64 Very recent experiments indicate the regulatory role of RNA G-quadruplexes at translational level.65-69 More details on the gene regulatory roles of G-quadruplexes can be found in several reviews.9, 70-73 Molecular Dynamics Simulations Molecular dynamic (MD) simulations have become important tools to understand structure and function of biomolecules.74-77 In MD simulations, atoms are allowed to interact and move for a period of time (termed ‘time step') following approximations of known physics, so that a view of motions of the molecules can be obtained at the atomic level as a function of time. The functional form and parameter sets that are used to describe potential energy of a system of atoms are usually referred to as a "force field". Currently, simple pair-wise additive force fields are commonly used in the MD simulations targeting macromolecules such proteins and DNA, and one widely used potential energy functions is: 8 nonbonds( , ) 0 12 6 angles dihedrals 2 0 bonds 2 0 4 ( / ) ( / ) ( ) ( ) ( ) ( / 2)(1 cos[ ]) i j ij i j ij ij ij ij b n r q q A r B r U r K b b K V n . [1.3] Here U is potential energy of the model system, and r refers to the atomic positions of all atoms in the system. b K , K and n V are constants for bonds, angles and dihedrals respectively. 0 b is equilibrium bond length and 0 is equilibrium angle; n and are periodicity and phase shift for a dihedral angle. b , , and are bond distance, angle, and dihedral angle respectively of the given geometry. ij A and ij B are Lennard-Jones coefficients for nonbonded atoms i and j. i q and j q are the atom charges; 0 is the permittivity in free space; ij r is the distance between two nonbonded particle i and j. The AMBER (Assisted Model Building and Energy Refinement)78, 79 MD simulation package has been widely used in the simulation of biomolecules such as proteins and nucleic acids. This package is composed of several subprograms, and the most commonly used programs in this thesis are leap, sander, ptraj and MM-PBSA. Leap is the program to prepare the prmtop file used for MD simulations, and this prmtop file contains all the force field parameters of the molecular system. Sander is the main molecular dynamics engine, and it generates 3D coordinates of the system along the progression of simulation time that are termed as trajectories. The generated trajectories are analyzed by the program ptraj to obtain the useful information we need. The force fields in AMBER program use the potential energy function as described in Eq. 1.3. Compared to other force fields for nucleic acids, the AMBER force fields perform best in representing the structural and dynamic properties of nucleic acids. In 9 this thesis, the parmbsc0-modified ff99 force field (or ff99bsc0 force field)80 was applied for all the simulations, and sometimes the ff99 force field81 was also used for comparison purposes. In simulations with the original ff99 force field, / backbone torsions move from the canonical g-/g+ conformation to undesirable g+/t forms after tens of nanoseconds during MD simulations.82 The ff99bsc0 force field was introduced based on the ff99 force field with the improved description of / conformers. MM-PBSA Calculations The MM-PBSA method83-85 was intensively used in this research for free energy calculations. In this method, solvent molecules are first removed from the trajectories generated by MD simulations and replaced with an implicit continuum solvent model. In MM-PBSA calculations, free energy usually includes three parts: molecular mechanic energy of the solute (EMM), solvation energy (GPBSA) and solute entropy contribution (-TS). EMM is calculated with the Sander program. Solvation energy is the summation of two parts: an electrostatic part calculated by Delphi program86, 87 with Poisson-Boltzmann (PB) equation, and a non-electrostatic part calculated from solvent-accessible surface area (SASA) with mol-surf program. The solute entropy is calculated by normal mode or quasi-harmonic analysis.84, 85 The MM-PBSA method is an approximate free energy calculation method because of the continuum solvent assumption and simplified entropy term. However, this method is very useful when two conformations of a solute are compared. There are various other methods to calculate free energies, which have been thoroughly summarized in previous reviews.88, 89 10 Outline of the Following Chapters The studies in this thesis center on two main topics: one is about what G-quadruplexes can tell us about the performance and accuracy of the current nucleic acid force field, and the other side is about what the current force field, even though not perfect, can tell us about the G-quadruplexes. Chapter 2 of this thesis focuses on the force field deficiencies exposed through MD simulations on the diagonal dimeric G-quadruplexes formed from Oxytricha nova telomere sequence [d(G4T4G4)]2. The ff99bsc0 force field failed to represent the correct diagonal dT4 loop geometry, and also failed to represent the backbone geometries of anti-syn and syn-syn steps. These results consistently suggest deficiencies in the torsion potential of the current force field. Even though the current force fields are not perfect, they can still provide valuable information that may be impossible for conventional experimental techniques, especially for the highly polymorphic G-quadruplexes. The successful application results of MD simulations in G-quadruplexes are shown in Chapters 3 and 4. In Chapter 3, MD simulations on the simple two-quartet stem models revealed different stabilities of the four glycosidic steps, and therefore G-tract lengths predominantly determine the glycosidic patterns of the anti-parallel G-quadruplexes. This work is a step forward toward understanding the sequence-structure relationship of G-quadruplexes and has been submitted for publication in Nucleic Acids Research. In Chapter 4, the effect of loops on G-quadruplex folding was investigated through MD simulations on various different single-loop models. This simulation work presents results that are consistent with previous experiments in most cases. Moreover, it also provides us new insight into the G-quadruplex behavior and suggests the importance of the parallel/anti-parallel equilibrium and their balance 11 between parallel multimer in understanding G-quadruplex properties. The two studies in Chapters 3 and 4 together shed more light on G-quadruplex structure, stability and polymorphism, and detailed conclusions and hypotheses are given in the Discussion part of Chapter 4. The work in Chapter 4 has been submitted for publication in Journal of the American Chemical Society. Ions are critical for the formation and stability of G-quadruplexes, and both "good" and "bad" ion-related results during the MD simulations are shown in Chapter 5. In this chapter, we reveal the failures of MD simulations to represent the ion-related detailed properties of G-quadruplexes such as the ion-ion distances within the channel. We also show the successful results of the MD simulations to correctly represent the trends of different parallel/anti-parallel equilibrium in K+ and Na+. In the final chapter, conclusions and future directions are summarized. 12 Figure 1.1. The planar structure of a G-quartet Figure 1.2. The four bases of DNA Watson-Crick Edge Sugar Edge HO O O P- O N N N N O O O N H H H H H H H Hoogsteen Edge Figure 1.3. Depiction of interaction edges of deoxyribonucleotide. This figure is adapted from Leontis's previous paper.90 Adenine H O N N N R H H H H3C O O N N H R H H H H H N R O N N N N H H H H R N N N N N Guanine Thymine Cytosine 13 (a) (b) H N N N N O R H N H H R N N N O H Figure 1.4. (a) GC and (b) AT base pairs through Watson-Crick hydrogen bonding. Figure 1.5. Atomic numbering and dihedral angles for a polydeoxyribonucleotide chain. The five internal dihedral angles (τ0~τ4) in a deoxyribose ring are also shown. This figure is adapted from Saenger's classic and out of print book "Principles of Nucleic Acid Structure." ε δ β ζ τ4 α γ χ N C4 (purines) C2 (pyrimidines) τ0 τ1 τ2 τ3 nucleotide unit chain direction R N N O O H CH3 H R N N N N N H 14 Figure 1.6. Schematic illustration of C2΄ endo and C3´ endo sugar pucker conformations. This figure is adapted from Neidle's out of print book "DNA structure and recognition." Figure 1.7. Deoxyribonucleotide unit in anti and syn glycosidic bond angle conformations Figure 1.8. Eight hydrogen bonds follow the same direction around the G-quartet. The arrows point from the electron pair donor to the electron pair accepter. O4´ Base O5´ O3´ C2΄ endo C3´ endo H3´ O4´ Base O3´ O5´ H3´ anti syn 15 Figure 1.9. Four possible geometries of two adjacent guanine residues within a quartet. The relationships of the glycosidic conformation, strand direction and groove width in G-quadruplexes are also shown. (a) anti→syn resides along the hydrogen bonding direction, forming a narrow groove; (b) syn→anti residue along the hydrogen bonding direction, forming a wide groove; (c) anti→anti residues along the hydrogen bonding direction, forming a middle groove; (d) syn→syn residues along the hydrogen bonding direction, forming a middle groove. ×: 5´→3´ strand direction pointing into the paper; •: 5´→3´ strand direction pointing toward viewers. syn anti anti anti syn syn anti syn narrow groove wide groove middle groove middle groove (a) (c) (d) (b) 16 Figure 1.10. Relationship of glycosidic angle conformation and hydrogen bonding direction. White box: anti glycosidic bond conformation; grey: syn glycosidic bond conformation. Dash line with arrow: hydrogen bond electron donor→acceptor direction within a quartet. (a) Same glycosidic angle conformation → same hydrogen bonding direction. (b) Opposite glycosidic angle conformation → opposite hydrogen bonding direction. Figure 1.11. Schematic illustration of the relationships of groove width and relative strand orientations viewed from the side of G-quadruplexes. (a) Two parallel strands form middle grooves; (b) Two strands arranged in the left-handed (or anti-clockwise) manner form narrow grooves; (c) Two strands arranged in the right-handed (or clockwise) manner form wide grooves. (a) (b) (c) middle narrow wide (a) (b) CHAPTER 2 MD SIMULATIONS ON DIAGONAL G-QUADRUPLEXES FORMED FROM [d(G4T4G4)]2: LIMITATIONS OF THE CURRENT NUCLEIC ACIDS FORCE FIELD Note that the vdW parameters characterized by a 10 percent increase or decrease in attractions between nucleobase groups were provided by Niel Henriksen from our lab. Abstract In the current work, to fully expose deficiencies of current force fields, MD simulations were run on the diagonal dimeric G-quadruplexes formed from [d(T4G4T4)]2. Problems both on diagonal dT4 loops and on stems have been revealed. We found that with longer simulation time (up to 100 ns), the original dT4 loop geometry was not maintained. The results suggest that our systems may circumvent energy barriers near the native structures to locate more stable yet incorrect over-stacked loop geometries. The results also suggested that the over-stacked loop geometry is caused by the imbalance of the torsion potentials of the current force fields. Even though G-stems were very rigid during simulations, we failed to reproduce some detailed and intrinsic characteristics. For example, backbone deformations were observed between the two middle quartets. Detailed investigation suggests that these abnormal behaviors may be related to the 18 different glycosidic steps in G-quadruplexes. There are four types of glycosidic steps in quadruplex structures: anti-anti, syn-anti, anti-syn and syn-syn. To investigate different structural properties of the four glycosidic steps and to better evaluate the MD simulations on G-quadruplexes, the probability distribution for each dihedral angle in each type of glycosidic step was calculated from all the available experimental structures in PDB databank. MD simulations on G-quadruplex stem models were carried out, and distribution of the dihedral angles were also calculated and compared to the experimental statistics. The comparisons showed that the current force fields perform pretty well for the anti-anti and syn-anti steps. For the anti-syn and syn-syn steps, in which noncanonical backbone geometries are more common, the force fields failed to keep the original noncanonical values. The modification to the α/γ torsion potential in the ff99bsc0 force field partially mitigates the imbalance and greatly improved the simulations for the anti-syn and syn-syn steps. We also ran simulations on G-quadruplexes with 10% increased or decreased attractive vdW potentials of the base groups, and the results showed that vdW interactions are not significantly involved in the backbone misrepresentation. Introduction G-quadruplexes are featured with rigid stems and flexible loops. Because of the prevalence of hydrogen bonds within each G-quartet and the very favorable stacking interactions between adjacent G-quartets, G-quadruplexes are quite stable structures when the electronegative channels are filled with appropriate monovolent cations such as K+ or Na+. Once formed, some G-quadruplexes are extremely stable. The quadruplex structure is even partially formed at 95-100°C in the presence of potassium.91, 92 The 19 hydrogen involved in the hydrogen bond formation within the G-quartet requires days to exchange even at 60°C.93 The formation of quadruplex structures is clearly an enthalpy driven process, with an enthalpy per quartet of -15 ~ -25 kcal/mol,94 which is more negative than the enthalpy per base pair in a double-helix.95 The exceptional rigidity of quadruplex stems was also confirmed in MD simulations, in which both parallel and anti-parallel structures yield very stable trajectories, and the RMSD values between the simulated structures and the X-ray structures are very small.96-98 Sequences connecting the rigid stem strands in dimeric or monomeric G-quadruplexes form a variety of loops, which make G-quadruplexes good models for studying loop behavior. These G-quadruplex loops can serve as prototype for RNA loops, which are extremely important in RNA-protein interactions. G-quadruplexes formed form Oxytricha nova telomere repeats d(T4G4)n have been intensively investigated, and many crystal and solution structures have been solved. All single repeat sequences were reported to form parallel structures (Figure 2.1(a)), with all guanine residues in anti glycosidic angle conformation. The 1.5 repeat sequences [d(G4T4G4)]2 predominantly form diagonal dimeric quadruplexes (Figure 2.1(b)),48-53 with only one case reported to form head-to-tail lateral dimeric quadruplexes (Figure 2.1(c)).99 The 3.5 repeat sequences d(G4T4G4T4G4T4G4) were reported to form a basket-type structure (Figure 2.1(d)),54, 55 even though a chair-type structure was also modeled base on NMR data (Figure 2.1(e)).100 In Oxytricha nova telomeric quadruplexes, all the anti-parallel structures share the same characteristics: from 5´ to 3´ direction, each strand shows a syn-anti-syn-anti glycosidic angle conformation (Figure 2.1(b)(c)(d)(e)). Among these structures, diagonal dimeric structures are most interesting to us, 20 because they have been intensively investigated, and both X-ray and NMR structures show quite definite and consistent loop geometries (Figure 2.2) with different cations such as K+, TI+ and Na+.51-53, 101 In previous simulation work, this definite loop geometry was maintained in the conventional 10 ns MD.102 However, during the LES (locally enhanced sampling) MD simulations, both the upper and lower loops underwent marked structural rearrangement, and at the end of simulation, the upper and lower loops did not converge to the same geometries.102 Moreover, the cations at the loop-stem junctions were not maintained in their original positions, indicating the limitations of the current force field. In the current work, to further expose the force field deficiencies, we ran both the conventional and the LES MD simulations on the same diagonal structure for much longer time scales (up to 100 ns for conventional MD and 40 ns for LES MD), and both loop geometries and stem backbone geometries were investigated. Methods Initial structures for MD simulations Initial coordinates for the diagonal anti-parallel Oxytricha nova telomeric quadruplex (Figure 2.1.(b)) were taken from the X-ray structure 1JPQ52 with the initial five K+ cations. In the crystal structure, the five cations stay in a line. Three cations are in the channel, between adjacent two quartet layers, and two cations are located at the loop-stem junction just above the channel entrance. Coordinates for parallel stem models (AAA) were taken from the first three quartets of the solution structure 139D.103 The anti-parallel stem model (SAA) was built from the stem of the NMR structures 2GKU42 (first model). For the models built from solutions 21 structures, two channel K+ ions104 were added manually in the middle of two adjacent quartet layers. MD simulation protocol Explicit solvent MD simulations and free energy analyses were carried out with AMBER 9 or AMBER 10.78, 79 Both the ff99 and ff99bsc0 force fields were applied.80, 105 Each solute was solvated in an octahedral box of TIP3P106 water with an extension of at least 10 Å from each side of the solute, and additional K+ were added to net-neutralize the system. Both the old potassium parameter (atomic radius 2.6580 Å and well depth 0.000328 kcal/mol )107 and the new TIP3P specific potassium parameters (atomic radius 1.705 Å and well depth 0.1936829 kcal/mol ) of Joung and Cheatham104 were used. Periodic boundary conditions were applied, and the particle mesh Ewald (PME) method108 was used to calculate the long range electrostatics with a charge grid spacing of ~1Å and a DSUM_TOL value of 10-5 Å. SHAKE109 was used to constrain covalent bonds involving hydrogen with a tolerance of 0.00001 Å, and the integration time step was set to 2 fs. Langevin dynamics110 was used for temperature control with a collision frequency of 1 ps-1. The nonbonded list was updated whenever any atom moves more than 0.5 Å since last list update, and a cutoff of 9 Å was applied to the Lennard- Jones and direct space electrostatic interactions. The solvated system was first minimized (500 steepest descent cycles followed by 500 conjugate gradient cycles) with a restraint of 100 kcal/(mol•Å2) on the solute (including the channel K+ ion). This was followed by 25 ps MD simulation with 100 kcal/(mol •Å2) positional restraints applied on the solute atoms, and the temperature was slowly increased from 0 K to 300 K with a collision frequency of 0.2 ps-1. This was followed by another cycle of the above 22 minimization and dynamics steps with a decreased restraint force constant of 25 kcal/(mol•Å2). Equilibration continued with five rounds of 1000 step minimizations with a solute restraint force constants of 20, 15, 10, 5, 0 kcal/(mol•Å2) respectively. The equilibration was followed by a 250 ps unrestrained MD simulation during which the temperature was increased from 0 K to 300 K with a collision frequency of 0.2 ps-1. The above equilibration steps were carried out at constant volume. Finally, a 250 ps unrestrained MD simulation at 300 K with 1 atm constant pressure was carried out to equilibrate the density, and this was followed by production MD simulations under equivalent conditions without restraints. Locally enhanced sampling simulations The ADDLES module of AMBER was used to split each of the two loop regions into five independent copies, and sander.LES was used for running locally enhanced sampling simulations. Force field parameters were scaled among the five copies so that energy barriers are lowered to efficiently increase the sampling.111 The simulation was started at 500 K to kick the five loop copies apart. Then a long relaxation phase followed to allow the five copies to have sufficient freedom to settle to different conformations. This slow relaxation process was controlled by decreasing the temperature from 500 K gradually to 300 K over 1.5 ns. During the first 750 ps, the temperature was first slowly dropped to 350 K, and the pressure was set to 100 atm to avoid blowing up the system at high temperatures. Distance restraints (R1 = 0.0, R4 = 6.0, PK2 = 5.0, PK3 = 10.0; R2 and R3 bracket the desired value by 0.5 Å more or less than the distance between the two restraint atoms) were applied to the two heavy atoms of each hydrogen bond within each quartet to keep the quadruplex structure intact through the high temperature phase of the 23 simulations. After the LES equilibration, the production period was run at 300 K without restraints for 30 ns, followed by conventional MD simulations. Free-energy calculations The approximate free energies for each loop geometry were calculated over the trajectories with snapshots taken at 200 ps intervals using the MM-PBSA script distributed with AMBER-9.0112. The channel bound K+ ion was included explicitly in the calculation as has been shown to make the results consistent and meaningful.97 When the new K+ parameter104 was used, the size of K+ ions was modified to 1.705 Å in the MM-PBSA script. Experimental structures used for statistics The experimental G-quadruplex structures released before January of 2009 in the RCSB PDB databank were used for analysis. The NMR structures include: 139D, 186D, 1EEG, 1EVN, 1EVM, 1EVO, 1NP9, 1NZM, 1OZ8, 1MYQ, 1XAV, 2O3M, 1EMQ, 1XCE, 143D, 1A6H, 1A8N, 1A8W, 1AFF, 1I34, 1F3S, 1JJP, 1LVS, 1U64, 230D, 2AQY, 2A5P, 1Y8D, 2A5R, 2GKU, 2JSQ, 2JSL, 2JSM, 2JSK, 2F8U, 2HY9, 2JPZ, 1NYD, 1QDI, 1QDK, 1QDF, 2JWQ, 148D, 1QDH, 1FQP, 1D6D, 201D, 2JT7, 1RDE 1K4X 2AKG 156D 1C32, 1C34, 1C35, 1C38. The X-ray structures include: 1K8P, 1KF1, 1O0K, 1S45, 1S47, 2AVJ, 2GW0, 2E4I, 2HRI, 2O4F, 2JWQ, 3CDM, 3CE5, 1JRN, 1JPQ, 2GWQ, 2GWE, 2HBN, 1JB7, 1L1H, 3EUM, 3ET8, 3ES0, 3ERU, 3EQW, 3EM2, 1PH1, 1PH2-9 1PHJ, 1PA6, 1HUT, 1HAO, 1HAP, 244D. These structures lead to a total of 1971 anti-anti steps, 1543 syn-anti steps, 440 anti-syn steps and 109 syn-syn steps. 24 Results Various different loop geometries observed in MD simulations Several MD simulations were carried out on the diagonal anti-parallel G-quadruplexes formed from Oxytricha nova telomere sequences d(G4T4G4)2: independently with the old Aqvist-adapted and new K+ parameters. The simulations applied the ff99 force field with and without parmbsc0 modifications and performed either conventional MD or MD with locally enhanced sampling (LES). These simulations all yielded very stable trajectories in the stem region, and the overall quadruplex structures were well maintained. However, compared to the rigid stems, loops are more mobile, and a variety of loop geometries were observed. These different loop geometries and their interconversions observed during simulations are shown in Figure 2.4. The free energies for the different loop geometries calculated with the MM-PBSA method are also shown. The results suggest that the experimental loop geometry is less stable than most other geometries observed in MD simulations (Figure 2.4). The stable over-stacked loop geometry The over-stacked loop geometry shows the most stability as judged by the MM-PBSA calculation, with a big difference of -8 kcal/mol compared to the experimental loop geometry. In the over-stacked loop geometry, the first three residues of the loop stack perfectly in a line, and the first loop residue stacks on the quartet plane; the final loop residue also stacks on the quartet plane and its methyl group was positioned just above the channel entrance (Figure 2.4. STACK-I). This over-stacked structure was observed in the conventional MD simulation (84 ns long) with ff99bsc0 force field and old K+ parameter. Before the over-stacked structure was formed, the T7@O4 geometry (Figure 25 2.4) was kept for almost 30 ns. The 2-D RMSD curve for this loop during the MD simulation is shown (Figure 2.5). This observation indicates that with longer simulation time, our force field may circumvent bigger energy barriers and locate to more stable yet incorrect loop geometries. In the LES MD simulations, energies barriers between different conformations are dramatically lowered and molecules are easier to converge to the local minimum. In our LES simulations, the upper and lower loops converged to exactly the same loop geometry, with the same over-stacked property for the first three loop residues (Figure 2.6) as in STACK-I geometry. The over-stacked loop structure was also observed in our simulation of the two-quartet anti-parallel G-quadruplex formed from the thrombin binding aptamer d(GGTTGGTGTGGTTGG) (Figure 2.7). Stability of the over-stacked loop comes primarily from torsion potentials At the first sight of the over-stacked loop geometry (Figure 2.4. STACK-I), we thought its stability may quite possibly come from the favorable stacking interactions. Therefore, we recalculated the free energies of the different loop geometries with the van der Waals potential increased or decreased by 10 percent. However, this change in calculation did not alter the relative ranking of different loop geometries, and the over-stacked loop geometry still shows the most stability (results are not shown). Then the free energy components of these different loops were analyzed, and the results indicate that the favorable internal free energy (bonds, angles, dihedrals) is the main reason for the stability of the over-stacked loop geometry, with a -7 kcal/mol more stable internal free energy (195 kcal/mol vs. 202 kcal/mol) than the experimental loop geometry ( Table 2.1). Potentials of bonds and angles are very close in the different loop geometries, so the 26 difference primarily comes from the dihedral potentials. This result suggests that the current force fields are deficient in correctly describing the backbone torsion potentials. Failure to keep the detailed backbone properties Beside the failure to reproduce the correct diagonal loop geometry as is shown above, simulations also failed to represent some detailed backbone properties, specifically between two middle quartets: the distances between two C1´ atoms along each strand were significantly decreased, from an average of 6.4 Å in the experimental structure to an average of 5.2 Å in the simulated structure (the ff99bsc0 force field and the new K+ parameter), and the two sequential bases showed more overlap; the syn residues showed an O4´-endo instead of C2΄-endo for the sugar pucker; the original backbone geometries were also not maintained (Figure 2.7(a)). The syn residues between two middle quartets show a noncanonical α/γ (g+/g-) geometry in the crystal structure (PDB-ID 1JPQ), however during the simulations the backbones predominantly showed the canonical α/γ (g-/g+) geometry (Figure 2.7(b)). These structural deformations were not observed between the two outer quartets. The residues between the two middle quartets all show the anti glycosidic angle conformation for the first residue and the syn conformation for the second residue from 5´ to 3´ direction along the strand. Therefore, the structural deformations observed between two middle quartets led to our hypotheses that our force fields may behave quite differently for different glycosidic steps. There are four types of glycosidic steps in G-quadruplexes: 5´-anti-anti-3´, 5´-syn-anti-3´, 5´-anti-syn-3´ and 5´-syn-syn-3´ (Figure 2.3). The noncanonical α/γ (g+/g-) geometries are observed in anti-syn steps of the diagonal G-quadruplex structure investigated in this work. Is it a specific case, or a more 27 general structural property for all anti-syn steps? Therefore, we also hypothesize that the dihedral angle distributions are different in the four glycosidic steps in experimental G-quadruplex structures. Dihedral angle distributions in the four glycosidic steps To test the above hypotheses, all the experimental G-quadruplex structures available in the PDB databank were used, regardless of their folding topologies, and this led to 1971 structures for the anti-anti step, 1543 structures for the syn-anti step, 440 structures for the anti-syn step and 109 structures for the syn-syn step. For each type of glycosidic steps, probability distributions were calculated for each dihedral angle. To assess how well our current force fields represent the dihedral angle geometries of the four glycosidic steps, we ran 100 ns MD simulations on two models: the parallel model (Figure 2.9(a)) and the hybrid-type anti-parallel model (Figure 2.9(b)), with both ff99 and ff99bsc0 force fields. The trajectories of the parallel model were used to calculate the dihedral angle distributions for anti-anti steps, and the trajectories of the hybrid-type model were used to calculate distributions for syn-anti and syn-syn steps. The simulated distribution for anti-syn steps were calculated from the previous trajectories of the Oxytricha nova telomeric G-quadruplexes. The probability distribution of each backbone dihedral angle in the four glycosidic steps in experimental structures and in MD simulation trajectories with ff99 and ff99bsc0 force fields are shown in Figure 2.10 to Figure 2.13. The results confirmed our hypothesis: the probability distributions of the dihedral angles are quite different in the four types of glycosidic steps in experimental structures, and our force fields behave quite differently. In experimental structures both anti-anti and syn-anti steps exhibit canonical 28 backbone dihedral angles, and both ff99 and ff99bsc0 force fields nicely represent these canonical values except some small details (Figure 2.10 and Figure 2.11). The ff99 force field overestimates the γ (t) population in anti-anti steps (Figure 2.10), consistent with the α/γ = (g+, t) overpopulation problem exposed in DNA duplex.82 While ff99bsc0 fails to represent this noncanonical population γ (t) in anti-anti steps (Figure 2.10), indicating that the ff99bsc0 force field may overcorrect the α/γ problem of the ff99 force field. In experimental syn-anti steps, there are more noncanonical distributions in α (g+, t), γ (t), ε (g+) and ζ (g+) than in anti-anti steps, and both ff99 and ff99bsc0 force fields fail to represent any of these small noncanonical peaks (Figure 2.11). Even though the ff99 force field shows overpopulations of α/γ = (g+, t) in DNA duplexes and anti-anti steps in G-quadruplexes, it fails to represent any α/γ = (g+, t) population in syn-anti steps. An apparent shift between the most favorable χ angle for the syn residues in MD simulations and experimental structures is observed, and O4´-endo populations are also overestimated (Figure 2.11). In anti-syn steps of experimental structures, majority of population is at the noncanonical α/γ = (g+, g-) conformation. The ff99 force field failed to represent this backbone geometry: in the first 2 ns or during the equilibration, the original α/γ = (g+, g-) conformation changed to the canonical α/γ = (g-, g+), and then it changed to the α/γ = (g-, t) conformation and never came back. The ff99bsc0 force field behaves much better than the ff99 force field: several interconversions between the original α/γ = (g+, g-) and the noncanonical α/γ = (g-, g+) conformations were observed on a 100 ns time scale, even though the majority population is still canonical (Figure 2.8(b)). Beside the α/γ geometries, the ff99 force field was totally unable to describe the correct β, χ (for the syn 29 residue) angles for the anti-syn step (Figure 2.12). Refinement on the description of the α/γ conformers in the ff99bsc0 force field also dramatically improves the description on other dihedral angles: β, χ, ε and ζ (Figure 2.12), even though experimental values still cannot be fully represented. The syn-syn steps in experimental structures also show a big population of noncanonical backbone geometries. Different from anti-syn steps, both ff99 and ff99bsc0 force fields totally failed to represent these noncanonical values: the original noncanonical backbone geometries change to the canonical values during the equilibration steps or at the very beginning of the production period, leading to an improper description of the dihedral angles of syn-syn steps. Effect of the stacking in backbone dihedral angles of anti-syn steps As shown above, the anti-syn steps show greater nucleobase overlap in the simulations compared to the experimental structures (Figure 2.8). To test whether the stacking interactions are involved in the backbone deformation of the anti-syn step, MD simulations were run for the Oxytricha nova telomeric G-quadruplex (Figure 2.1(b)) with a 10% increase or 10% decrease in the vdW attraction for the N, O, C atoms of the nucleobase groups. The probability distribution plots show that either decreasing or increasing the attractive vdW interactions between the two nucleobase groups of the anti-syn steps moves the backbone distributions towards even less accurate values (Figure 2.14). These results suggest that the backbone deformation or the inability to keep the original dihedral angles is not related to the stacking interactions. 30 Discussion MD simulations on G-quadruplex loops and stems both suggest deficiencies of torsion potentials of current nucleic acids force fields. The backbone geometries are strongly dependent on the glycosidic bond conformations in experimental structures, and our force fields cannot reproduce the noncanonical backbone geometries in anti-syn and syn-syn conformations. It was pointed out that sugar-phosphate backbone is difficult to deal with for two reasons: (1) the constant point charges do not reproduce the electrostatic potential equally well for different backbone geometries; (2) the phosphate backbone is highly polarizable, and the electronic structure is very complicated because of the solvation and conformational dynamics.113 However, the simple refinement of the α/γ conformers not only significantly improves the behavior on the canonical B-DNA80 but also dramatically improves the description of the noncanonical backbone geometries of the anti-syn steps as shown in this work. These results indicate that the current effort in refining the torsion potentials of nucleic acid force fields is heading the right direction. 31 Figure 2.1. Schematic diagrams of G-quadruplex structures formed from Oxytricha nova telomere repeats. (a) Parallel tetrameric structures93, 99 formed from single repeats. (b) Diagonal dimeric structure48-53 (this structure with diagonal dT4 loops was investigated in the current work). (c) Head-to-tail lateral dimeric structure.99 (d) Basket-type monomeric structure.54, 55 (e) Chair-type monomeric model structures.100 Open and black boxes indicate anti and syn conformation of guanine residues. The arrows point from 5´ side to 3´ side along the strand. Each strand shows 5´-syn-anti-syn-anti-3´ glycosidic conformation in all the anti-parallel structures. (c) (e) (a) (b) (d) 32 Figure 2.2. Schematic illustration of the diagonal dT4 loop in dimeric Oxytricha nova telomeric G-quadruplexes. (a) Geometry of the diagonal dT4 loop in the crystal structure52 with the terminal quartet shown in white. The first to the last loop residues are shown in red, yellow, blue and green respectively. The K+ cation at the loop-stem junction (above the channel entrance) is shown in green sphere. (b) Hydrogen bond between T5 and T7 in the loop. Figure 2.3. Four glycosidic steps in G-quadruplexes. (a) syn-anti step, (b) anti-anti step, (c) anti-syn step, (d) syn-syn step. The residue shown in black is the one at the 5´ side. T5 T6 T7 T8 (a) (b) (c) (d) (a) (b) T7 T5 2.02 33 Figure 2.4. Different geometries of diagonal dT4 loop in MD simulations. The first to the last loop residues are shown in red, yellow, blue and green respectively. Each geometry name, free energy (in kcal/mol) and the hydrogen bond within the loop are shown. The arrows show the directions of structural transitions observed during the MD simulations. LES denotes the simulations with the locally enhanced sampling method applied. The stable water molecules observed in the TRIAD-II and WAT geometries are also shown. STACK-I (-634) X-RAY (-626 kcal/mol) T7-N3-H3···O2-T5 T7@O4 (-628) T7-O4···H3-N3-T5 TRIAD-I (-626) T7-N3-H3···O2-T5 TRIAD-II (-628) T8-O4···H3-N3-T5 WAT (-633) T7-N3-H3···O2-T5 STACK-II (-627) STACK-III (-623) LES LES TRIAD-III (-630) T8-O4···H3-N3-T5 34 Figure 2.5. The 2-D RMSD plot of the diagonal dT4 loop structure. Structures at different points during the simulation are compared to each other. Each blue block around the diagonal refers to a family of the same loop geometry during that simulation time. For each geometry family, the name and the free energy estimate calculated from the MM-PBSA method are also shown. -624 kcal/mol (x-ray) -626 kcal/mol (triad-II) -626 kcal/mol (x-ray) -628 kcal/mol (7@O4) -634 kcal/mol (stack-I) 0 100 200 300 400 0 100 200 300 400 6. 21 4.67 3.12 1.56 0.00 35 Table 2.1. Free energies of the different diagonal dT4 loop geometries X-ray X-ray X-ray X-ray triad-I triad-I triad-I triad-II triad- III stack-I stack-II stack-III ELE 599 577 587 555 615 588 607 604 607 607 604 617 VDW -66 -62 -66 -68 -69 -67 -68 -66 -69 -62 -63 -72 INT 202 202 205 203 204 205 205 204 202 195 200 206 GAS 735 716 726 690 750 726 744 742 740 740 740 750 PBSUR 2 2 2 2 2 2 2 2 2 3 3 2 PBCAL -1364 -1343 -1353 -1317 -1378 -1351 -1370 -1372 -1372 -1376 -1370 -1375 PBSOL -1362 -1341 -1351 -1315 -1376 -1349 -1368 -1370 -1370 -1373 -1367 -1373 PBELE -765 -767 -766 -762 -763 -763 -762 -768 -766 -769 -766 -758 PBTOT -626 -625 -626 -625 -626 -623 -624 -628 -630 -634 -627 -623 GBSUR 3 3 3 3 3 3 3 3 3 4 3 3 GB -1293 -1272 -1282 -1246 -1310 -1283 -1302 -1299 -1303 -1300 -1299 -1311 GBSOL -1290 -1269 -1279 -1243 -1308 -1281 -1300 -1296 -1300 -1296 -1296 -1309 GBELE -694 -696 -695 -692 -696 -695 -695 -695 -696 -693 -696 -694 GBTOT -555 -553 -554 -554 -558 -554 -556 -554 -560 -556 -556 -558 36 Figure 2.6. Converged loop geometries with over-stacked property during LES simulations. This simulation was run with parmbsc0 modified ff99 force field and old K+ parameter. The first to the last loop residues from 5´ to 3´ are shown in red, yellow, blue and green respectively. Figure 2.7. The over-stacked structure observed during the MD simulations of TBA. (a) The experimental TBA structure (148D). (b) The simulated TBA structure. The simulation was run with parmbsc0 modified ff99 force field and new K+ parameter for 50ns. A solvent K+ was attracted to the channel entrance position during the simulation. (a) (b) 37 Figure 2.8. Backbone deformations observed between the two middle quartets during MD simulations. (a) Structural overlap between the simulated structure and the X-ray structure. The simulated structure (ff99bsc0, new K+) is shown in pink. The X-ray structure (1JPQ) is shown in grey. (b) The α (shown in red) and γ (shown in black) angles of the residue 3 and the residue 11 of the two chains along progression of the simulation. The simulation was run with parmbsc0 modified ff99 force field and new K+ parameter for 100 ns. Figure 2.9. Schematic diagram of the parallel and the hybrid-type anti-parallel stem models used in this work. (a) The parallel stem. (b) The hybrid-type anti-parallel stem. Open and black boxes indicate syn and anti conformation of guanine residues. (a) (b) res 3 res 3 res 11 res 11 (b) 0 20 40 60 80 100 ns 120 60 0 -60 -120 120 60 0 -60 -120 120 60 0 -60 -120 120 60 0 -60 -120 (a) O4'-endo C2΄-endo 38 Figure 2.10. Probability distributions for each dihedral angle in anti-anti steps. Orange bar: experimental structures; solid line: ff99bsc0 force field; dash line: ff99 force field. 39 Figure 2.11. Probability distributions for each dihedral angle in syn-anti steps. Orange bar: experimental structures; solid line: ff99bsc0 force field; dash line: ff99 force field. 40 Figure 2.12. Probability distributions for each dihedral angle in anti-syn steps. Orange bar: experimental structures; solid line: ff99bsc0 force field; dash line: ff99 force field. 41 Figure 2.13. Probability distributions for each dihedral angle in syn-syn steps. Orange bar: experimental structures; solid line: ff99bsc0 force field; dash line: ff99 force field. 42 Figure 2.14. Probability distributions for each dihedral angle in anti-syn steps with 10% change in vdW attraction of base groups (ff99bsc0, new K+). Orange bar: experimental results, solid line: original vdW paramter, doted line: 10% decrease in vdW attraction, dash line: 10% increase in vdW attraction. CHAPTER 3 EXPLAINING THE VARIED GLYCOSIDIC CONFORMATIONAL, G-TRACT LENGTH, AND SEQUENCE PREFERENCES FOR ANTI-PARALLEL G-QUADRUPLEXES Note that this chapter has been submitted for publication in Nucleic Acids Research, authors Xiaohui Cang, Jiří Šponer, and Thomas E. Cheatham, III. All of the simulation work and analysis was performed by Cang and Šponer and Cheatham helped edit and write the paper. Abstract Guanine-rich DNA sequences tend to form four-stranded G-quadruplex structures. Characteristic glycosidic conformational patterns along the G-strands, such as 5´-syn-anti- syn-anti observed with the Oxytricha nova telomeric G-quadruplexes, have been well documented. However, an explanation for these featured glycosidic patterns has not emerged. This work presents MD simulation and free energetic analyses for simplified two-quartet [d(GG)]4 models and suggests that the four base pair step patterns show quite different relative stabilities: syn-anti (-4 kcal/mol) > anti-anti (0 kcal/mol) > anti-syn (3 kcal/mol) > syn-syn (5 kcal/mol). This suggests the following rule: when folding, anti- 44 parallel G-quadruplexes tend to maximize the number of syn-anti steps and avoid the unfavorable anti-syn and syn-syn steps. This rule is consistent with most of the anti-parallel G-quadruplex structures in the Protein Databank (PDB). Structural polymorphisms of G-quadruplexes relate to these glycosidic conformational patterns and the lengths of the G-tracts. The folding topologies of G2-tracts and G4-tracts are not very polymorphic since each strand tends to populate the stable syn-anti repeat. G3-tracts, on the other hand, cannot present this repeating pattern on each G-tract. This leads to smaller energy differences between different geometries and helps explain the extreme polymorphism of the human telomeric G-quadruplexes. Introduction Guanine-rich oligonucleotides have the potential to self-assemble into right-handed four-stranded helical structures called G-quadruplexes (sometimes also termed G-DNA or G4-DNA). Helical aggregates of guanosine 5´-monophosphate (GMP) were first observed almost six decades ago,1 and in the late 1980s, based on gel-mobility shift assays, a parallel four-stranded structure was hypothesized.114 Since then, many parallel and anti-parallel G-quadruplex structures have been solved and hypothesized to have in vivo roles in meiosis, telomere maintenance and gene regulation.18, 115, 116 G-quadruplex structures have also drawn increasing interest as potential anti-cancer targets and therapeutic targets.15, 71, 117-121 The basic structural unit of the G-quadruplex is the G-quartet (or G-tetramer, G-tetrad) which is a coplanar structure formed by four hydrogen-bonded guanines.114 In the G-quartet, the four electronegative carbonyl oxygens of the guanines are directed towards the center of the G-quartet, and metal ions coordinate between the G-quartet planes to 45 stabilize the structures. This enables several G-quartets to stack on each other to form the G-quadruplex. G-quadruplex structures are highly polymorphic due to the variability in the number of stacked G-quartets, the G-strand (G-tract) orientations, the orientation of glycosidic bonds of the guanines, and the various different loop types that can connect the G-strands.115 In parallel structures, all of the guanine residues adopt anti conformation for the glycosidic bond orientation, and the four strands all have the same orientation.115 In anti-parallel G-quadruplexes, both anti and syn guanines are found in the structures, and at least one of the four strands is oriented anti-parallel to the others.115 There are three types of non-parallel G-strand scaffolds that have been observed: abab, where each strand has two adjacent neighbors of opposite direction (with orientation denoted by a or b); aabb, where each strand has one adjacent neighbor in the same direction and the other in the opposite direction; and the (3+1) scaffold, in which three strands are in one direction and a fourth one is in the opposite direction. The (3+1) scaffold is sometimes referred to as the hybrid (parallel and anti-parallel) type, however, in this work we classify this scaffold within the anti-parallel class. In addition to strand orientation and glycosidic bond orientation pattern, generally three types of loops connecting the strands in G-quadruplexes are observed: lateral, diagonal and double-chain reversal (propeller) loops.115 Lateral loops connect two adjacent anti-parallel G-strands, diagonal loops join the opposite anti-parallel G-strands, and the double-chain reversal loops connect two adjacent parallel G-strands. Besides such factors as the ion identity, loop length and sequence, and 5´, 3´-flanking sequences, the G-tract length also has a great influence on the stability and folding topologies of G-quadruplexes. Lee et al. showed that anti-parallel G-quaduplexes with 46 four quartets are significantly more stable than quadruplexes with three quartets,122 and further that single-base mutations in one of the G-tracts dramatically influences the conformational dynamics of the human telomeric G-quadruplex.123 Abu-Ghazalah and Macgregor reported that modifying the G-tract length of the Oxytricha nova telomeric sequence d(T4G4)4 has profound effects on the folding topology and leads to polymorphic behavior.124 Thermal melting, CD, and gel electrophoresis experiments also showed quite various topologies and stabilities for sequences of different G-tract lengths intervened by the short dT and dT2 sequences.125 Human telomeric G-quadruplexes d(TTAGGG)4 which contains three quartets show remarkable structural polymorphism and dynamic equilibrium, and this is suggested to be intrinsic to its highly conserved sequence,126 a feature that is quite distinct from the Oxytricha nova telomeric quadruplex. In general, it is well established that more quartets lead to more stability.127-129 However, the rules regarding how the G-tract length affects the folding topology and polymorphism of G-quadruplexes have not been fully elucidated. In anti-parallel G-quadruplexes, different glycosidic bond orientation patterns have been observed around the G-quartets, specifically: syn•anti•syn•anti, syn•syn•anti•anti, syn•anti•anti•anti and syn•syn•syn•anti. These χ angle orientation patterns are directly related to the relative G-strand orientations. For example, with the abab strand orientation pattern, each G-quartet displays the syn•anti•syn•anti (or anti•syn•anti•syn) glycosidic bond orientation pattern; with the aabb strand orientation pattern, each G-quartet displays the syn•syn•anti•anti (or anti•anti•syn•syn) pattern; in the (3+1) scaffold, both syn•syn•syn•anti and anti•anti•anti•syn quartets have been observed. There is a simple rule to describe the relationship between the glycosidic conformation and the 47 strand orientation, specifically: same strand direction, same glycosidic conformation. In other words, for a given G-quartet, if any two guanines belong to the two strands oriented in the same direction, the two guanines display identical glycosidic conformation. If their strand orientations are opposite, they show opposite glycosidic conformations. Beyond the glycosidic bond orientations around the G-quartets, the characteristic glycosidic conformations along the G-tracts have also been well documented. The alternating syn-anti conformation for each adjacent pair of guanines along the strand was first found two decades ago in the G-quadruplexes formed by d(G2T5G2) or d(G2T4CG2) sequences.130, 131 Since then, many G-quadruplex structures have been observed with the syn-anti alternating glycosidic conformation. For example, the thrombin binding aptamer (TBA, d(GGTTGGTGTGGTTGG)) forms an anti-parallel G-quadruplex structure composed of two G-quartets and three lateral loops. As shown in both crystal and NMR experiments, these structures always adopt 5´-syn-anti glycosidic bond orientation patterns in each of the G-tracts.49, 132-135 This 5´-syn-anti conformation allows the TBA to form a well-defined G-quadruplex structure, whereas changing the glycosidic bond orientation pattern would perturb the formation, stability or folding topology of the G-quadruplex. 136 The G-quadruplex formed by the Oxytricha nova telomere sequence d(G4T4G4)2 or d(G4T4G4T4G4) inevitably features a 5´-syn-anti-syn-anti pattern along each G4-tract, regardless of the monovalent ion type (Na+ or K+). G-quadruplexes formed by the human telomere sequences d(TTAGGG)4 do not show this syn-anti alternating glycosidic conformation and their structures are highly polymorphic,126 however, the predominant conformation observed is the (3+1) scaffold with three tracts with a 5´-syn-anti- anti pattern in one direction and one 5´-syn-syn-anti pattern pointing in the opposite 48 direction. All the above sequences consistently show the syn conformation at 5´-end and the anti conformation at 3´-end of each G-tract, and the observations suggest that the glycosidic bond orientation pattern is related to the length of the G-tract involved. However, little is known about why these molecules display such variable glycosidic conformations along the G-tracts. The G-tracts glycosidic bond orientation patterns can be regarded as different combinations of the four possible glycosidic base pair step patterns, specifically: the 5´- anti-anti step, 5´-syn-anti step, 5´-anti-syn step and 5´-syn-syn step (Figure 2.3). Analysis of the known structures suggests that in standard anti-parallel G-quadruplexes, a quadruplex core either contains only syn-anti steps, contains a combination of syn-anti steps with anti-syn steps, or the core contains syn-anti steps together with anti-anti and syn-syn steps. In those few interlocked or V-shaped G-quadruplexes137-140, syn-anti steps are observed only with anti-anti steps. The definite glycosidic bond conformational patterns observed along G-tracts led to our hypothesis that different glycosidic steps show differential stabilities and these differences in stability lead to the different glycosidic bond orientation patterns observed in anti-parallel G-quadruplexes. To test this hypothesis, we built up six two-quartet models (Figure 3.1), and performed explicit solvent molecular dynamics (MD) simulations with modern force field and simulation protocols. The simulation results provide an understanding of the relative free energies of different G-DNA syn/anti patterns and explain the structural polymorphism of particular G-quadruplex sequences. 49 Methods Building the two-quartet models Six two-quartet models were investigated in this work (Figure 3.1). The starting structures were generated from experimental structures found in the PDB databank.141 Three syn-anti (SA) models (SA-aabb, SA-abab and SA-aaab) were built. They all have four syn-anti steps but with different strand orientations ("a" and "b" denote the strand directions around the G-quartet). The coordinates of the first and the second quartets with sandwiched K+ ions were obtained from the crystal structure of the diagonal anti-parallel G-quadruplex [d(G4T4G4)]2 (PDB: 1JPQ)52 and were used as the initial structure for the SA-aabb model. The SA-abab structure was built from the NMR solution structure of the thrombin-binding DNA aptamer d(GGTTGGTGTGGTTGG) (PDB: 148D, the first frame).132 The SA-aaab was built from the first two quartets of the NMR solution structure of the human telomere G-quadruplex d(T2G3TTAG3TTAG3TTAG3A) (PDB: 2GKU, first frame).42 In the AA model, all of the guanines have anti glycosidic bond orientations, and the structures have four parallel anti-anti steps. The respective coordinates were taken from the first and the second G-quartets of the parallel quadruplex d(TTGGGGT)4 NMR solution structure (PDB: 139D; the first frame was used).103 The AS model was built from the second and the third quartets of 1JPQ 52; this model has four anti-syn steps arranged in the "aabb" scaffold. In the "3AA+1SS" model, there are three anti-anti steps and one syn-syn step, and the initial coordinates were obtained from the second and third quartets of 2GKU.42 The 2GKU experimental structure is characterized by 12 representative solution structure models. Among these models, the syn-syn steps show two types of noncanonical backbone geometries with either g-/g- or g+/g- α/γ 50 dihedral combinations. Thus, we built two initial structures ("3AA+1SS-I" and "3AA+1SS-II") from the 1st model (α/γ g-/g-) and the 10th model (α/γ g+/g-) structures, respectively. Note that the canonical α/γ values are g-/g+. For all of the models built from solution structures, a K+ ion was added manually into the channel between the two quartets in each structure. The TIP3P specific potassium parameters (atomic radius 1.705 Å and well depth 0.1936829 kcal/mol) of Joung and Cheatham were used.104 Molecular dynamics simulations and free-energy calculations The ff99bsc0 force field was applied for all the MD simulations, and simulation protocols are the same as stated in Chapter 2. The approximate free energies for each of the two-quartet models were calculated over the MD trajectories taking snapshots at 200 ps intervals (unless specified otherwise) using the MM-PBSA script distributed with AMBER 9.112 The channel bound K+ ion was included explicitly in the free energy calculations, which is essential to obtain consistent results of quadruplex stems, as in detail explained elsewhere.142 The size of the K+ ion was modified to 1.705 Å in the MM-PBSA script to obtain consistent solvation energies.142 Note that solute rotational, translational and vibrational entropies were not included in the MM-PBSA estimates. As all the models are of similar size, small, and have similar degrees of freedom and flexibility, the differences in the solute entropy among the models is likely very small (see the later discussion and information provided in the supporting information). 51 Results Free energy calculations Although the two-quartet stems are often unstable on longer simulation time scales, as will be shown below, we were able to calculate approximate free energies from the stable portions of the respective MD trajectories. Our aim was to understand the relative free energy order of the four glycosidic steps. Table 3.1 shows the estimated free energies calculated with MM-PBSA. The three SA models show very similar free energies. The implication is that in anti-parallel G-quadruplexes, the strand orientations alone do not significantly alter the stem stability assuming the 5´-syn arrangement. The influence of strand orientation on stability is therefore likely only dependent on the types of loops necessary to connect the strands. This result is consistent with the previous conclusion that the syn•anti•syn•anti and syn•syn•anti•anti G-quartets are equally stable.143 As anti-anti steps are dominant in the DNA double helix, the AA model was used as a reference to calculate the relative free energy of each model. Surprisingly, the free energy results suggest that the SA models are significantly more stable than the AA model. The differences in estimated free energies are large (-32 kcal/mol) and dominated by the internal (bond, angle and dihedral) and electrostatic free energy contributions. Much of this over-stabilization is due to hydrogen bonds at the 5´ syn-dG due to the dangling ends (see the ΔG† values in Table 3.1). When the 5´-terminal guanine has a syn glycosidic bond orientation, a hydrogen bond between O5´-H···N3 will form with a high occupancy during the MD simulations (Figure 3.2); this is not observed when the 5´- terminal guanine has an anti glycosidic bond orientation. Similar hydrogen bonding is also observed in experimental structures, with at most one 5´-dG syn O5´-H···N3 52 hydrogen bond in monomeric anti-parallel quadruplexes and two in the dimeric anti-parallel quadruplexes. The simple two-quartet models studied here introduce more 5´- ends because of the absence of connecting loops. The SA models have four O5´- H5T···N3 hydrogen bonds which brings extra stability to the system; however, this is not representative of the native folded quadruplex structures generated from a single contiguous sequence. To exclude these contributions, we ran an additional simulation on the SA-aabb model with a restraint to prevent the formation of the O5´-H···N3 hydrogen bonds (referred to as SA-aabb-r). A lower bound restraint distance of 3.5 Å with a force constant of 5 kcal/(mol•Å2) was applied to the H5T and N3 atoms at each of the four 5´- ends. A stable MD trajectory over 33 ns was observed, and the last 10 ns were used to estimate MM-PBSA energetics calculated at 200-ps intervals. The restrained SA-aabb-r model is now -14.4 kcal/mol (-3.6 kcal/mol per strand) more stable than the AA model. Additionally, we recalculated the energetics omitting these hydrogen bonds with the three unrestrained SA trajectories and similar results are obtained. The free energetic results suggest that the four 5´-end hydrogen bonds in total contribute approximately -18 kcal/mol to the free energy of the SA-aabb model, i.e., approximately -4.5 kcal/mol for each hydrogen bond which is an overestimate due to approximations of the MM-PBSA approach. When these dangling end hydrogen bonds are omitted, a more reasonable ~3 kcal/mol per strand difference is obtained. Although the values are large, in part due to the approximate nature of the MM-PBSA method and potential deficiencies in the glycosidic torsional potential, they are rather consistent across the different SA trajectories. As the AS model quickly underwent structural change (at ~300 ps), we estimated the 53 free energy for the initial 300 ps (with a 6-ps interval between frames in the MM-PBSA) and also for the models over 15-25 ns (with a 200-ps interval). The results show that before the structural change, the AS model is ~14 kcal/mol less stable than the AA model, while after the change the system gained ~9 kcal/mol in stability compared to the first 300 ps. The AS model with restraints on the α/γ angles (AS-r) only maintained the quadruplex structure for 2.5 ns; over this interval it was similarly stable as the initial AS model even though the component energies show some differences. For the "3AA+1SS" models, the structure with canonical α/γ angles for the syn-syn step is -6 kcal/mol more stable than the AA model, while both α/γ (g-/g-) (3AA+1SS_I, 5-7 ns) and α/γ (g+/g-) (3AA+1SS_II-r, 5-7 ns) initial geometries show similar free energies to the AA model. Relative stabilities of different glycosidic steps and G-quadruplex folding rules Column IX ( ΔG† ) in Table 3.1 shows that the SA model is ~14.4 kcal/mol more stable than the AA model; the AS model is ~13.7 kcal/mol less stable than the AA model; and the 3AA-1SS models are ~5 kcal/mol less stable than the AA model. To facilitate comparison per quadruplex strand, the relative free energies of the SA, AA and AS models were divided by four to get the relative free energies of each of the syn-anti, anti-anti and anti-syn steps. The relative instability of ~5 kcal/mol of each of the 3AA+1SS models comes entirely from the syn-syn steps. Considering the results together, the free energetic results suggest that the approximate relative stabilities of the four types of step are: syn-anti (-4 kcal/mol) > anti-anti (0 kcal/mol) > anti-syn (3 kcal/mol) > syn-syn (5 kcal/mol). Based on the significant differences in the relative stabilities, representing ~2- 54 3 orders of magnitude shifts to lower populations between syn-anti to anti-anti and anti-anti to anti-syn and syn-syn, we propose the following rules: (1) Anti-parallel quadruplexes have a strong propensity to form as many syn-anti steps as possible; and (2) once the maximum numbers of syn-anti steps are obtained, avoid the anti-syn and syn-syn steps as much as possible. Based on these rules, we infer that the glycosidic bond orientation patterns are predetermined by the length of the G-tract, and the most favorable glycosidic conformation along the G2, G3 and G4 tracts are syn-anti, syn-anti-anti and syn-anti-syn-anti, respectively. Entropy calculations As entropy calculations are less reliable and harder to converge compared to the other energy components within the MM-PBSA approximations, we did not include entropy contributions in the free energy calculations shown. However, free energies with entropy contributions are provided separately in the supplementary Table S3.1, and the results suggest that the inclusion of entropy does not change the relative stability ranking of the four glycosidic steps. Molecular dynamics of the two-quartet models The all-atom RMSD values for the four two-quartet models over the MD trajectories are shown in Figure 3.3. The SA-aabb model remained close to the initial structure indicative of a stable trajectory throughout the 100 ns of MD simulation. The other two SA models, SA-abab and SA-aaab, were also built to test the influence of the arrangement of the four strands, and they also both sampled very stable trajectories as suggested by low RMSD (an average all-atom RMSD of 0.8 Å for SA-abab and 0.9 Å for 55 SA-aaab) throughout 25-ns length simulations. The starting structure of the AA model was maintained for the first 40 ns. Then the molecule converted from a right-hand helical to a left-hand helical structure which was stable until 56 ns whereupon the quadruplex structure was lost entirely. The model structure then sampled several different geometries with the most typical exemplified by a structure with all eight bases stacked effectively into a single-stranded helix (Figure S3.1). This structure was found to be considerably more stable by MM-PBSA free energy analysis than the two-quartet AA model. The free energy estimations suggest that this collapse process to the fully stacked structure is mainly driven by improvement of van der Waals interactions (Figure S3.2) and also partly by reductions in the internal (bonds, angles and dihedrals) free energy. Note that the loss of quadruplex structure was not preceded by loss of channel bound ion. The instability of the AA model in our MD simulation is consistent with the observation that no experimental structure has been reported for the two-quartet parallel G-quadruplex except in the case where it forms multimers by stacking with other G-quadruplex molecules.144, 145 For the AS model, the quadruplex structure was maintained for 98 ns (Figure 3.3), however subtle structural adjustments were observed at the very beginning of the production simulation (at ~300 ps) which led to an increased overlap between the nucleobases in the sequential anti-syn steps accompanied with reduced co-planarity within each of the G-quartets. In the initial structure of the AS model, the four anti-syn steps all show the noncanonical dihedral angles: either α/γ (g-/g-) or α/γ (g+/g-). In the simulations, three out of the four anti-syn steps moved to α/γ (g+/trans) conformation between 300 ps to 2 ns, and the fourth step moved to the α/γ (g+/trans) conformation at 56 35 ns. Even though α/γ (g+/trans) is not the most favorable crankshaft conformation, particularly with the parmbsc0 force field modifications which strongly disfavor γ = trans,80 this likely represents a compromise between the dihedral angles and desire for co-planarity of the G-quartets. As the α/γ (g-/g-) and α/γ (g+/g-) conformations were not maintained with this force field, simulations with a restraint of 5 kcal/(mol•A2) on the α/γ angles (AS-r) were also performed to force the dihedral angles to the original conformations. However, in the MD simulations with these restraints on the α/γ angles, the quadruplex structure was lost very quickly, namely, ~2.5 ns after simulation start. Neither of the "3AA+1SS" models started from two different initial structures maintained the quadruplex structure in MD simulation: The "3AA+1SS_I" lost the quadruplex structure at ~15 ns and "3AA+1SS_II" lost the structure at ~10ns. In the "3AA+1SS_I" model structure, the initial α/γ (g-/g-) conformation of the syn-syn step was maintained for 8ns before moving directly to the canonical α/γ (g-/g+) value. In contrast, with the "3AA+1SS_II" model structure, the initial α/γ (g+/g-) changed rapidly to the α/γ (120°/trans) geometry during equilibration, and then moved to canonical α/γ (g-/g+) values at the beginning (800 ps) of the production MD simulation. As the syn-syn step in the 3AA+1SS_II model changed its original α/γ (g+/g-) conformation so quickly, a 5 kcal/(mol•A2) restraint on the α/γ angles was applied in MD simulation (3AA+1SS_II-r) for 33 ns; this model structure lost its quadruplex structure after ~9 ns of MD. MM-PBSA calculations for the three-quartet and four-quartet stem models The two-quartet models applied in this work are very simplified models for MD simulations; they were used to get reasonable and straightforward energetic estimates for 57 the four glycosidic steps. To test whether the same trends are maintained in the stems of more quartet layers, similar work investigated various three-quartet and four-quartet stems models (Figure 3.4). The (3SAA+1SSA) model was built from the human telomeric G-quadruplex 2GKU42; AAA was built from the first three G-quartets of NMR parallel structure 139D.103 A model with four parallel syn-ant-anti strands (SAA-parallel) was also built based on the previous NMR experiments indicating that [d(TGMeGGT)]4 form a parallel quadruplex possessing an all syn quartet.146 The (3SAA+1SSA) model was used as the template to build the SAA-parallel model through UCSF Chimera.147 In this SAA-parallel model, only syn-anti and anti-anti steps exist, therefore comparison to the AAA model was straightforward. The four-quartet parallel and anti-parallel stem models were built from the G-quartets of the structures 139D and 1JPQ,52 respectively. Additional three-quartet stem models were investigated, however the anti-syn and syn-syn steps in these models lost the original backbone α/γ geometry during the simulation equilibration steps, so these results are not shown here. The MM-PBSA calculations showed trends consistent with the results from the two-quartet models: the anti-parallel stems that accommodate more favorable syn-anti steps are generally more stable than their parallel counterpart (Table 3.2 and Table 3.3). The SAA-parallel model is much more stable than the AAA parallel model, which is consistent with the experimental results that [d(TGMeGGT)]4 shows a higher Tm of 66ºC vs. 45ºC than [d(TGGGT)]4,146 and this result is also consistent with the calculations from the two-quartet model: the syn-anti step is more stable than the anti-anti step. The results consistently suggest that the anti-parallel structures are more favorable than the parallel counterparts. However, the tetrameric quadruplex structures 58 predominantly adopt parallel conformations with all residues showing anti glycosidic bond angles. This discrepancy may be explained based on previous experiments showing that rapid annealing produces parallel structures, whereas slow annealing produces anti-parallel structures.125 This suggests that anti-parallel structures are thermodynamically favored, whereas parallel structures are kinetically favored (possibly since there is no need to circumvent the energy barriers of the anti→syn conversion). Therefore, the tetrameric parallel structures could possibly be the kinetically favored; once formed, disruption of the structure to convert to the more thermodynamically favored structures is disfavored. Discussions Validate theoretical G-quadruplex folding rules with experimental structures Based on the theoretical estimates on the relative stabilities of the four glycosidic steps, we've put forward simple folding rules with an emphasis on the glycosidic conformations of G-quadruplexes. To validate these folding rules, most of the anti-parallel G-quadruplex structures available in the PDB databank were investigated. To reach the maximum number of syn-anti steps, each tract of the sequence composed of G2 repeats needs to adopt a syn-anti conformation. Therefore, all the G-quadruplexes formed from thrombin binding aptamer (TBA) sequences show the same glycosidic bond orientation pattern of 5´-syn-anti-loop-syn-anti-loop-syn-anti-loop-syn-anti.49, 132-135, 148 The quadruplex formed from d(GGTTTTGGCAGGGTTTTGGT) has a double chain reversal loop and two diagonal loops, and each of the G2-tract adopts a syn-anti conformation.149 In the tetrameric quadruplexes formed from [d(TAGG)]4, the G2-tract in each strand also 59 adopts the syn-anti pattern.150 The same pattern is also observed exclusively in dimeric quadruplexes with two quartet layers.151, 152 For sequences composed of G4-tracts, each G4-tract can accommodate at most two syn-anti steps, in a pattern 5´-syn-anti-syn-anti. This explains why the dimeric or monomeric G-quadruplexes formed from the Oxytricha nova telomeric sequences all show the alternating 5´-syn-anti-syn-anti glycosidic bond orientation pattern along each G-tract, regardless of the nature of the coordinated cations.48, 50-53, 101 As mentioned in the Introduction, if two guanines within the same G-quartet belong to two strands oriented in parallel, the two residues in each strand show the same glycosidic bond orientation pattern, whereas if the two guanines belong to two anti-parallel strands, inverted opposite glycosidic bond orientations are observed. In G2 and G4 tracts, the glycosidic bond orientation patterns (5´-syn-anti and 5´-syn-anti-syn-anti) are self-symmetric, that is, the inverted glycosidic patterns are the same regardless of strand orientation. Therefore, in anti-parallel G-quadruplexes formed from G2, G4 or other even-numbered G-tracts, each of the four G-tracts will adopt the syn-anti alternating conformation to maximize the number of syn-anti steps. G3-tracts are quite different from G2 and G4-tracts: the glycosidic conformations must be different for two G3-tracts to align in opposite directions. There are four glycosidic bond orientation patterns that will accommodate the maximum of one syn-anti step: 5´- syn-anti-anti, 5´-syn-anti-syn, 5´-anti-syn-anti and 5´-syn-syn-anti. According to the relative stability of the four steps, 5´-syn-anti-anti is the most favorable geometry, and this is likely the reason why most of the monomeric anti-parallel quadruplexes from human telomeric sequences share the same characteristic: i.e., three strands with the 5´- 60 syn-anti-anti conformation with a less favorable 5´-syn-syn-anti strand accommodated to finish the anti-parallel folding.44, 45, 153 The data suggest that other anti-parallel G-quadruplexes composed of G-tracts of odd-numbered guanines will also show this characteristic: i.e., at least one tract will not present the most stable glycosidic bond orientation pattern. Dihedral angles in anti-syn and syn-syn steps As shown, the noncanonical α/γ dihedral angle geometries of the anti-syn and syn-syn steps from the experimental structures were not maintained in the MD simulations. This suggests that the parmbsc0 force field modifications to AMBER ff99 may over-destabilize the correct, yet noncanonical, dihedral angles of the anti-syn and syn-syn steps. Moreover, the energy barrier may be underestimated thereby facilitating the conformational changes. Similar result was shown recently for the first loop nucleotide in parallel stranded X-ray structure of the human telomeric quadruplex.154 This nucleotide has γ-trans in the experiment while it is shifted to the canonical conformation in parmbsc0 simulations. Similarly, in A-RNA double helices, parmbsc0 appears to overcorrect γ-trans substates.155 However, in reality, the force field issue is likely more complex and we should not prematurely conclude that parmbsc0 universally over-penalizes γ-trans conformations. The torsional profiles of simple force fields represent only very approximate and rather unphysical models of electronic structures of the real molecules. Thus, we should not expect to be deriving perfect force fields by parameterizing the torsional profile. The parmbsc0 force field suppressing the gamma-trans was shown to be absolutely crucial force field refinement to achieve global stability of B-DNA simulations. In contrast to the above-mentioned cases, for B-DNA it still 61 appears that γ-trans substates remain little excessively populated albeit they no more represent the incorrect global minimum.156 The inability of the current force field to fully populate the expected geometries for the anti-syn and syn-syn steps could brings errors to the free energy calculations. More accurate assessment on the four glycosidic steps may depend on the refinement of current nucleic acid force fields or more accurate methods such as quantum mechanical calculations. Despite this, the energetic trends observed are consistent with experimental G-quadruplex structure. Comparison to earlier results Earlier molecular-mechanics calculations of base stacking suggested that the sequential syn-syn, anti-anti and syn-anti quartet stacks are almost equal in energy, and the anti-syn stack is less favorable by 4 kcal/mol.157 The results are inconsistent with the current data. However, it is not surprising, since the earlier analysis is based on crude in vacuo computations of energy contributions based on short 150 ps MD simulations. Moreover, the relative stabilities of different steps were derived solely from calculated base stacking energies, neglected important contributions from the DNA backbone and also the bound channel cations. Such calculations are not sufficient to derive stabilities of nucleic acids, which stem from a very complex balance of molecular forces. It is well established, that in the final balance of forces the intrinsic (gas phase) stacking trends are often completely lost or even inverted.158, 159 In contrast, the current sets of MD simulations with explicit water were performed for much longer time scale (up to 100 ns) with current state-of-the-art force fields and simulation protocols for DNA while including full balance of all energy contributions. As our results show that the stability differences of the four steps comes primarily from the dihedral potentials (Table 3.1, 62 MM_int) rather than from the stacking interactions, this reasonably explains the differences between the two studies. Noncanonical G-quadruplexes Not all anti-parallel G-quadruplexes contain an equal number of guanines in each G-tract, yet these G-tracts may still be connected through the standard lateral, diagonal or double-chain reversal loops. These noncanonical G-quadruplexes show more complicated and diversified glycosidic bond orientation patterns. Despite this, the current results still help explain many of these structures. One interesting example is the bimolecular quadruplex formed with [d(G4T4G3)]2, where the two G4 repeats display the expected 5´-syn-anti-syn-anti pattern to maximize the number of syn-anti steps, and the two G3 repeats display 5´-syn-anti-syn and 5´-anti-syn-anti patterns, respectively, to finish the self-folding.160 The related sequence [d(G3T4G4)]2 forms a quite different V-shape folding topology139; this structure cannot maximize the number of syn-anti steps, however this is compensated for by avoiding the unfavorable anti-syn and syn-syn steps. For the other G-quadruplexes with the V-shaped scaffold,137, 152 the unstable anti-syn and syn-syn steps are also avoided. In the modified TBA sequence containing a 5´-3´ inversion of polarity, the first two guanines reversed their glycosidic bond orientation patterns to maintain the syn-anti conformation along the 5´→3´ direction.161 This example suggests that what matters are the glycosidic bond orientation patterns along the G-tract, rather than the local glycosidic bond orientation pattern around the G-quartet. As our models only considered the guanine residues, it is not clear if the rules proposed will generalize to noncanonical G-quadruplexes with dC or dA residues in the stem. 63 G-quadruplexes with modified residues Based on the significant energetic differences for the four glycosidic steps discussed, the most stable glycosidic conformation for a certain length of G-tract appears to be well defined. Knowing what leads to instability may suggest modifications that could stabilize the less stable glycosidic bond orientation patterns to increase the stability of the G-quadruplex. For example, bulky bromide or methyl groups may be introduced at C8 position to shift guanine to favor the syn conformation, while rG or LNA modifications with 3´-endo sugar conformations have a greater propensity to form anti glycosidic bond orientation patterns. Previous experiments have shown clear relationships among the glycosidic conformations, stabilities and folding topologies,44, 123, 136, 162, 163 and the current results provide a simple means to understand these observations based on energetics. For example, in the work by Tang, they observed that when rG was introduced at the second position of each G2-tract of TBA, the molecule kept the anti-parallel structure with greater stability; while if rG was introduced at the first position of each G2-tract, the molecule changed to a parallel structure.136 According to our results and conclusions, when rG is introduced at the second position of each G2-tract of TBA, it stabilizes the syn-anti conformation of each tract and therefore increases the stability of the molecule; while if rG is introduced at the first position of each G2-tract, it is hard to form stable syn-anti steps, and therefore the most stable step the G2-tract can form with the first position in an anti conformation is the anti-anti step, and this forces the molecule to fold into a parallel form. Our simulation results on the two-quartet AA model and also the absence of the two-quartet parallel G-quadruplexes in the PDB databank indicate that a TBA sequence cannot form a stable parallel quadruplex structure by itself. Their 64 electrophoresis and thermal denaturation studies have shown that the molecule solves this problem by forming multistranded structures.136 G-quadruplex of G2 and G4-tracts: more definite folding topologies The results also suggest that the G-tract lengths determine the glycosidic conformations in anti-parallel G-quadruplexes, and that G-tracts of different lengths show quite different behaviors. For the G-quadruplexes formed from sequences of G2 and G4- tracts, each strand adopts the same syn-anti alternative pattern, and the quadruplex core with this conformation is more stable than that with any other pattern. We have also shown that different strand orientations do not significantly affect the quadruplex core stability. This implies that loops play an extremely important role in determining the folding topologies of the G-quadruplexes of G2 and G4-tracts, and the folding topologies of these molecules are more definite and less polymorphic. G-quadruplex of G3-tracts: the extreme polymorphism The most stable geometry of the G-quadruplex core in quadruplexes with G3 repeats is the (3+1) scaffold, since two G-tracts with the same glycosidic bond orientation pattern cannot be aligned in opposite directions. Thus, three strands are in the most stable syn-anti- anti conformation in the same direction, and the fourth strand in the less stable syn-syn- anti conformation in the opposite direction. This geometry is the major form of the monomeric human telomeric G-quadruplexes in K+ solution.42, 44, 45 The asymmetric dimeric G-quadruplex formed from one and three human telomeric repeats d(TAGGG), d(GGGTTAGGGTTAGGGT) also formed (3+1) scaffold.164 In addition to the (3+1) geometry, other geometries have also been observed for the human telomeric sequences. 65 The monomeric basket type structure is reported in Na+ solution, and this has two strands with the 5´-syn-anti-syn pattern and the other two strands with the 5´-anti-syn-anti pattern.39 Recently a two-quartet structure in K+ solution was reported.43 Even though only two quartets are formed in this structure, the unfavorable syn-syn steps are avoided, and moreover, the guanines not involved in the quartets still form extensive base pairing and stacking, which contribute to the overall stability of this folding geometry. Between the two quartets are the four stable syn-anti steps, which is consistent with our previous conclusions. More recently, another two-quartet structure was reported to form from the human telomeric sequences, and between the quartets are also four 5´-syn-anti steps.46 Monomeric or dimeric parallel structures have also been reported for the human telomeric sequences,47, 165 but the equilibrium between the anti-parallel structure and intramolecular parallel structure might involve more complex kinetics and multimer association issues and will not be covered here. In the (3+1) geometry, the unfavorable syn-syn step effectively lowers the energy barriers differences between different conformations, thus contributing to the greater structural polymorphism for the human telomeric G-quadruplexes. It seems that the structural polymorphism lies in the length of G-tract, and sequences of G3-tracts (or G-tracts of odd-numbered guanines) show more structural polymorphism than sequences carrying G-tracts of even-numbered guanines. The human telomeric sequence d(TTAGGG)n is strongly conserved among the vertebrates,37 and majority of the guanine-rich sequences found in eukaryotic promoter regions also forms G-quadruplexes of three quartets.166 The structural polymorphism, folding kinetics, and the relative stability of G-quadruplexes may be very important for their biological function. 66 In summary, MD simulation and free energy analysis of two-quartet stem models suggests that the four types of glycosidic steps in G-quadruplexes have different stabilities, specifically with a favoring order of syn-anti > anti-anti > anti-syn > syn-syn. This energetic order is consistent with the glycosidic conformations experimentally observed for G-quadruplexes. When folding, quadruplex molecules are prone to create more syn-anti steps and avoid the unfavorable anti-syn and syn-syn steps. The energetic analysis brings us a step closer to understanding the effect of G-tract lengths on quadruplex folding and polymorphism. To further elucidate the folding rules underlying G-quadruplexes more work is needed to investigate the effect of the intervened loop sequence; and this work is currently in progress in our labs. Additional lessons learned relate to the inability of the current AMBER nucleic acid force fields to describe the noncanonical backbone geometries in the anti-syn and syn-syn steps which could lead to errors regarding the magnitude of the stability differences between four glycosidic steps. However, considering the consistency of the calculation results with experimental observations, the deficiencies are unlikely to affect the relative stability order. 67 Figure 3.1. Six two-quartet models used in this work. (a) SA-aabb, (b) SA-abab, (c) SA-aaab, (d) AA, (e) AS, (f) 3AA+1SS. Yellow is for syn and blue is for anti. The channel cation (K+) is not shown. Figure 3.2. The hydrogen bond formed with a high occupancy within the 5´-terminal syn guanosine during the MD simulation (a) (b) (c) (d) (e) (f) 68 Figure 3.3. RMSD curves for the SA-aabb (black), AA (red), AS (green) and 3AA+1SS (blue) model simulations 69 Figure 3.4. three-quartet and four-quartet stem models tested in this work. (a) 3SAA+1SSA, (b) AAA, (c) SAA-parallel, (d) ASA-parallel, (e) SASA, (f) AAAA. Yellow is for syn and blue is for anti. The channel cations (K+) are not shown. Because the original backbone geometry of the anti-syn steps in the SASA model couldn't be kept in the MD simulation, restraints were applied to the α/γ angles of the anti-syn steps as in the two-quartet models. The MD simulations on these stem model all produced very stable trajectories. In three-quartet models, the two channel-K+ were very stable through out whole simulations. In four-quartet models, the three channel-K+ are more mobile and there were two conformations as to different channel cation arrangements: three K+ are all within the channel; two K+ are in the channel and one K+ is at the channel entrance position. (a) (b) (c) (d) (e) 70 Table 3.1. MM-PBSA results (kcal/mol) of the two-quartet stem models Model Regions used for calculation MM_ele MM_vdw MM_int PB_sur PB_cal G ΔG ΔG† SA-aabb 30-40 ns -1362.1 -41.7 340.4 11.3 -486.6 -1538.7 -32.2 -14.4 SA-abab 15-25 ns -1362.9 -40.4 342.8 11.2 -489.2 -1538.6 -32.0 -14.2 SA-aaab 15-25 ns -1366.7 -39.4 342.4 11.3 -485.1 -1537.5 -31.0 -13.2 SA-aabb-r 23-33 ns -1333.5 -43.6 353.4 11.7 -509.0 -1520.9 -14.4 -14.4 AA 30-40 ns -1348.2 -39.9 368.7 11.9 -499.0 -1506.5 0 0 AS 15-25 ns -1203.5 -41.0 358.2 11.8 -527.4 -1502.0 4.6 4.6 AS 0-300 ps -1309.4 -43.0 367.3 11.5 -518.8 -1492.5 14.0 14.0 AS-r 1.5-2.5 ps -1306.1 -44.5 370.9 11.5 -524.6 -1492.8 13.7 13.7 3AA+1SS_I (α/γ: g-/g-) 5-7 ns -1348.5 -37.8 362.0 11.6 -493.6 -1506.3 0.3 4.8 3AA+1SS_I (α/γ: g-/g+) 10-12 ns -1341.1 -41.1 359.2 11.6 -501.7 -1513.0 -6.4 -1.9 3AA+1SS_II (α/γ: g-/g+) 5-7 ns -1345.4 -41.2 363.1 11.6 -499.4 -1511.3 -4.8 -0.3 3AA+1SS_II-r (α/γ: g+/g-) 5-7 ns -1336.9 -42.1 362.7 11.6 -501.4 -1506.2 0.4 4.9 Note: As discussed, solute entropic contributions were not included. †: Contributions from the H-bonds formed at 5' syn-dG are excluded ( -4.5 kcal/mol for each H-bond). MM_ele, MM_vdw and MM_int represent the electrostatic potential, van der Waals potential and the internal (bond, angle, dihedral angle) potential energy respectively. PB_sur stands for the non-electrostatic solvation energy, and PB_cal is the electrostatic solvation energy. G is the estimated absolute free energy of each model, which is the sum of the above five energy components. 71 Table 3.2. MM-PBSA results (kcal/mol) of the three-quartet stem models Model regions used for calculation MM_ele MM_vdw MM_int PB_sur PB_cal G ΔG ΔG† 3SAA+1SSA 0-100 ns -1959.4 -97.0 536.0 14.5 -1092.4 -2598.2 -25.9a -7.9 3SAA+1SSA 0-5 ns -1980.5 -99.7 542.7 14.2 -1065.0 -2588.3 -20.6b -2.6 AAA 0-100 ns -1968.9 -100.6 557.2 14.8 -1074.8 -2572.3 0 0 AAA 0-5 ns -1965.4 -100.2 559.7 14.9 -1076.7 -2567.7 0 0 SAA-parallel 0-50 ns -1995.9 -97.2 534.8 14.7 -1058.5 -2602.2 -29.9a -11.9 SAA-parallel 0-5 ns -1996.5 -99.0 539.1 14.6 -1055.8 -2597.5 -29.8b -11.8 In the (3SAA+1SSA) model, the original backbone geometries (α/γ=g-/g-) of the syn-syn step were kept for 5.4 ns. Free energies were estimated both for the whole trajectory and for the 0-5 ns region for each model. a: relative to the free energy of AAA(0-100 ns); b: relative to the free energy of AAA (0-5 ns). Table 3.3. MM-PBSA results (kcal/mol) of the four-quartet stem models Model regions used for calculation MM_ele MM_vdw MM_int PB_sur PB_cal G ΔG ΔG† SASA_r a 18-28 ns -2349.8 -164.7 726.5 17.2 -1880.7 -3651.5 -22.0 -4.0 SASA_r b 5-15 ns -2337.2 -164.6 726.4 17.3 -1898.1 -3656.2 -23.2 -5.2 AAAA a 8.3-10.4 ns -2388.2 -164.6 750.0 17.7 -1844.3 -3629.5 0 0 AAAA b 10.5-20 ns -2355.5 -163.3 746.5 17.9 -1878.6 -3633.0 0 0 In four-quartet stem models, the three channel K+ are more mobile, and one of the terminal K+ frequently moved to the channel entrance position, so free energies were estimated for these two conformations: a: three K+ in the channel; b: two K+ in the channel and one K+ at the channel entrance position. 72 Supporting Information Figure S3.1. After the AA G-DNA two-quartet stem model structure is disrupted, the structure adopts helix-like arrangement stabilized evidently by stacking. The shown structure is taken from interval between 79 and 81 ns. Different residues are shown in different colors. Figure S3.2. The change of the van der Waals potential energy (kcal/mol) versus time (ps) during the 100 ns MD simulation of the AA model. 73 Table S3.1. Free energy calculations with solute entropy contributions included nmode quasiharmonic Model ΔG† -TS -TΔS ΔG -TS -TΔS ΔG SA-aabb -14.4 -227.1 7.5 -6.9 -66.2 5.8 -8.6 AA 0 -234.6 0.0 0.0 -72.0 0.0 0.0 AS-r 13.7 -228.5 6.1 19.8 -70.6 1.4 15.1 3AA+1SS_I (α/γ: g-/g-) 4.8 -231.0 3.6 8.4 -71.6 0.4 5.2 ΔG† is the relative free energy compared to the AA model without the entropy contribution. Two methods were used to calculate the solute entropy: nmode and quasiharmonic.167 Following the same method in the Result section, when entropy contributions are included, the relative stabilities of the four types of step are: syn-anti (-2 kcal/mol) > anti-anti (0 kcal/mol) > anti-syn (5 kcal/mol) > syn-syn (8 kcal/mol) if the entropy was calculated by the nmode method, or : syn-anti (-2 kcal/mol) > anti-anti (0 kcal/mol) > anti-syn (4 kcal/mol) > syn-syn (5 kcal/mol) if the entropy was calculated by the quasiharmonic method. CHAPTER 4 INSIGHT INTO TELOMERE STRUCTURAL POLYMORPHISM AND G-QUADRUPLEX FOLDING FROM SEQUENCE AND LOOP CONNECTIVITY THROUGH FREE ENERGY ANALYSIS Note that this chapter has been submitted for publication in the Journal of the American Chemical Society, authors Xiaohui Cang, Jiří Šponer, and Thomas E. Cheatham, III. All of the work was performed by Cang with paper editing and refinements by Šponer and Cheatham. Abstract The lengths of G-tracts and their connecting loop sequences determine G-quadruplex folding and stability patterns, complete understanding of the sequence-structure relationships remains unclear. Here, single-loop G-quadruplexes were simulated to investigate the effect of loop and G-tract length on the folding topologies and stability of G-quadruplexes. Seven loop types, including different variants of lateral, diagonal, and double-chain reversal loops, and five different loop sequences (dT, dT2, dT3, dTTA and dT4) were investigated through molecular dynamics simulation and free energy analysis. The results suggest: (1) The single nucleotide dT loop has a strong preference to form parallel structures; that di-nucleotide dT2 loops are compatible with 75 both parallel and anti-parallel structures; and that tri-nucleotide or longer loop sequences prefer the formation of anti-parallel structures. (2) Short loops are less stable than longer loops. The high stabilities experimentally observed for sequences with short loops likely result from multimer formation. (3) Single nucleotide loop sequences are able to form lateral-narrow loops over narrow grooves. (4) The left-handed double-chain-reversal loops are extremely unstable. (5) In MD simulation, the final loop residues frequently stack with nearby terminal quartets; this trend is less obvious in experiment due to the potential presence of loop-loop and flanking sequence interactions. Overall, in most cases the free energetic estimates consistently rank well with the experimental observations. Beyond providing insight into G-quadruplex folding, the results also suggest why human telomeric sequences predominantly form hybrid-type geometries and further help to explain the structural polymorphisms observed with some G-quadruplex structures. Moreover, consideration of the parallel/anti-parallel equilibrium and its balance with multimer formation is critical to understand G-quadruplex folding and stability. The hypotheses and explanations presented, including more general folding rules, provide insight into inconsistencies in the experimental thermodynamics data and the polymorphic behaviors of different G-quadruplex sequences. Introduction G-quadruplexes are a class of noncanonical four-stranded helical structures formed from guanine-rich DNA/RNA sequences. Four guanine bases associate through eight hydrogen bonds to form the characteristic planar structural unit called a G-quartet.114 In tetrameric quadruplexes, four separate strands of guanine repeat sequences come together to form parallel structures with all strands oriented in the same direction. In dimeric (i.e., 76 two strands) or monomeric (i.e., folded from a single sequence) G-quadruplexes, the intervening sequences between G-rich repeats form different types of loops. Depending on sequence, loop length, G-tract length, and choice of monovalent ion, a diversity of folding topologies is observed.115, 168 Human telomeres comprise thousands of tandem repeats of the d(GGGTTA)n sequence36 followed by a single-stranded overhang of 100-200 nucleotides at the extreme 3´-ends of chromosomes.38 Small molecules that stabilize G-quadruplex structure can inhibit telomere elongation and inhibit the growth of cancer cells in cases where the telomere maintenance ribonucleoprotein enzyme telomerase is usually over-expressed.169 For this reason, telomeric G-quadruplexes have drawn intense interest as therapeutic targets for the treatment of cancer.71, 72, 170-176 Genomewide sequence searches have shown a high prevalence of the putative quadruplex sequence motif G3-5N1-7G3-5N1-7G3- 5N1-7G3-5 (where N denotes any nucleotide) in the human genome,177 and such sequences are enriched in promoter regions,5 especially in promoters of proto-oncogenes.60 Experimental evidence also supports the hypothesis that G-quadruplex structures may be involved in regulating transcription, replication and translation.9, 68, 178 In combination with antisense oligonucleotides, RNA/DNA heteroquadruplexes have been shown to be effective inhibitors of reverse transcription.179 G-quadruplexes can also self-assemble to form higher order structures, so there is a growing interest in nanotechnology applications.72 In parallel monomeric and dimeric G-quadruplexes, the four strands forming the G-quartet stem all orient in the same direction and because of this only double-chain-reversal connecting loops (denoted as R loops in this study) are possible (connected 3´ to 77 5´ on opposite sides of the quadruplex stem). In anti-parallel G-quadruplexes, at least one strand orients oppositely to the other strands, and three types of loops are observed: lateral (L) loops connecting adjacent strands at one terminal quartet, diagonal (D) loops spanning across the quartet and the above-noted R loops that connect opposite ends of the quadruplex (see Figure 4.2). Besides these loop types, the so-called V-shaped loop has also been reported in G-quadruplexes formed from sequences with unequal number of guanines in the G-tracts.137, 140, 180 However, for simplicity here we consider this arrangement as a specific stem scaffold rather than a standard quadruplex loop type, so it will not be discussed further. For more convenient description of different loop topologies, we introduce nomenclature to distinguish different G-quadruplexes. This simple nomenclature differs somewhat from that proposed by Webba da Silva181 and describes: right-handed vs. left-handed loops, parallel vs. anti-parallel double-chain-reversal loops and right-handed vs. left-handed progression directions. Facing one side of a G-quadruplex and considering one of the strands with its 5´→3´ direction pointing upward, if the other strand on the face is on the first strands right side, the connecting loop is defined as right-handed loop (Figure 4.1(a) and Figure 4.1(c)); if the next strand is on its left, the loop is defined as left-handed loop (Figure 4.1(b) and Figure 4.1(d)). In G-quadruplexes, groove width is determined by the relative orientations of the two adjacent strands: if the two strands are oriented in the same direction, a medium-sized groove forms (Figure 4.1(c) and Figure 4.1(d)); if two anti-parallel strands are arranged in a right-handed (or clockwise) order a wider groove forms (Figure 4.1(a)); and if two strands are arranged in a left-handed (or anti-clockwise) manner a narrower groove arises (Figure 4.1(b)). Therefore, all R loops 78 span middle grooves, all right-handed L loops span wide grooves (denoted as Lw loops, Figure 4.1(a)) and all left-handed L loops span narrow grooves (Ln loops, Figure 4.1(b)). Note that a double-chain-reversal (R) loop may span a different number of quartet layers, and thus Rn denotes an R loop spanning "n" quartet layers. In parallel G-quadruplexes, all stem guanines display anti glycosidic bond orientation angles, so an R loop connects two anti dG nucleotides. However, in anti-parallel G-quadruplexes, each G-tract always starts with a syn dG and ends with an anti dG. Therefore, the R loop connects an anti dG on the 5´ side with a syn dG on the 3´ side. The different glycosidic bond orientation angles of the guanine strands connected to the 3´ side of the R loops between the parallel and anti-parallel G-quadruplexes strongly influence-and very likely lead to significant differences-in the respective loop structures and stabilities. Therefore both parallel double-chain-reversal loop (donated as Rnp) and anti-parallel double-chain-reversal loop (denoted as Rna) were investigated in this work. A monomeric G-quadruplex may progress in two opposite directions, and we define a right-handed progression if the first loop (from the 5´-end) is right-handed and a left-handed progression if the first loop is left-handed (Figure 4.1(e) and Figure 4.1(f)). To better understand this nomenclature, refer to Figure 4.1 and Figure 4.2. Previous CD spectroscopy, UV thermal melting studies, and energetic analyses have shown the significant influence of loop length on the stability and topology of intra-molecular G-quadruplexes.182-192 Additionally, bioinformatics and statistical methods have been applied to understand G-quadruplex stability as a function of sequence.193 Some trends can be generalized: (1) Parallel structures are favored by short loops; (2) longer loops favor anti-parallel conformations; (3) sequences with single-nucleotide 79 loops usually show extremely high melting temperatures; and (4) increases in the loop length usually lead to decreased melting temperatures. However, a single guanine-rich sequence may populate multiple folding conformations in solution, and sometimes two or more G-quadruplexes may associate to form higher order structures (multimers).168, 185 Such structural polymorphisms complicate kinetic and thermodynamic analyses and often make the experimental results difficult to interpret and explain. For example, although G-quadruplexes containing only single-nucleotide loops show high stability, it has also been shown that for quadruplexes with total loop length of 4-5nt, the presence or addition of single-nucleotide loop leads to reduced stability.187 To better understand the biological roles of G-quadruplexes and their sequence dependent structures, accurate prediction of the predominant G-quadruplex conformations for a given sequence is of high importance. This has been attempted through active learning procedures on databases of known G-quadruplex structures with some success.193 However, given the polymorphic structure, dependence on ion identity, and strong sequence dependence, free energetic based approaches have the potential to be more successful. For example, sampling of different model and experimental conformations of nucleic acids using molecular dynamics (MD) simulations in explicit solvent with subsequent free energetic analyses have already proven useful for investigating the structure and stability of a wide variety of nucleic acid molecules194-197 and such technologies have been widely applied to provide useful adjunct information to experiment for a variety of G-quadruplexes.113 For example-thro |
| Reference URL | https://collections.lib.utah.edu/ark:/87278/s6697j7w |



