Determination of electric fields along vibrational probes in photoactive yellow protein

Publication Type	honors thesis
School or College	College of Science
Department	Chemistry
Faculty Mentor	Matthew T. Kieber-Emmons
Creator	Zagorec-Marks, Chase
Title	Determination of electric fields along vibrational probes in photoactive yellow protein
Date	2020
Description	It has been known for some time that proteins are constantly in motion and that this motion likely impacts the protein's function1. There remains a lack of knowledge of how this motion and the transition from one conformation to another occurs and results in this altered functionality. To better understand this transition, several theories have been proposed2, but experimental work has not yet been able to determine which, if any, is best supported. In this work, molecular dynamics (MD) simulations using the software package AMBER were used to correlate the dynamics of Photoactive Yellow Protein (PYP) to shifts in the frequency of five genetically incorporated vibrational probes3,9. This was done by calculating the electric field present at the center of mass of these probes across the trajectories of these simulations, and performing a linear least squares regression optimization of these fields as a function of experimentally measured shifts in IR absorption spectra for each probe. These simulations correlated well (R2 = 0.91) to the observed experimental shifts in four of the five probes, indicating that these simulations were representative of the PYP's dynamics.
Type	Text
Publisher	University of Utah
Subject	protein dynamics; molecular dynamics simulations; photoactive yellow protein
Language	eng
Rights Management	(c) Chase Zagorec-Marks
Format Medium	application/pdf
ARK	ark:/87278/s66kr83f
Setname	ir_htoa
ID	2964406
OCR Text	Show DETERMINATION OF ELECTRIC FIELDS ALONG VIBRATIONAL PROBES IN PHOTOACTIVE YELLOW PROTEIN by Chase Zagorec-Marks A Senior Honors Thesis Submitted to the Faculty of The University of Utah In Partial Fulfillment of the Requirements for the Honors Degree in Bachelor of Science In Chemistry Approved: ______________________________ Matthew T. Kieber-Emmons, PhD Thesis Faculty Supervisor _____________________________ Matthew S. Sigman, PhD Chair, Department of Chemistry _______________________________ Thomas G. Richmond, PhD Honors Faculty Advisor _____________________________ Sylvia D. Torti, PhD Dean, Honors College May 2020 Copyright © 2020 All Rights Reserved ii ABSTRACT It has been known for some time that proteins are constantly in motion and that this motion likely impacts the protein’s function1. There remains a lack of knowledge of how this motion and the transition from one conformation to another occurs and results in this altered functionality. To better understand this transition, several theories have been proposed2, but experimental work has not yet been able to determine which, if any, is best supported. In this work, molecular dynamics (MD) simulations using the software package AMBER were used to correlate the dynamics of Photoactive Yellow Protein (PYP) to shifts in the frequency of five genetically incorporated vibrational probes 3,9. This was done by calculating the electric field present at the center of mass of these probes across the trajectories of these simulations, and performing a linear least squares regression optimization of these fields as a function of experimentally measured shifts in IR absorption spectra for each probe. These simulations correlated well (R2 = 0.91) to the observed experimental shifts in four of the five probes, indicating that these simulations were representative of the PYP’s dynamics. iii TABLE OF CONTENTS ABSTRACT ii INTRODUCTION 1 METHODS 7 RESULTS 12 CONCLUSION 18 REFERENCES 19 SUPPLEMENTARY INFORMATION 21 1 INTRODUCTION PYP is a protein found in the bacterium Halorhodospira halophila whose chromophore undergoes a trans- to cis-isomerization upon absorption of blue light (446 nm) which results in a negative phototactic response by the bacterium4. This isomerization also leads to a proton transfer from Glu46 to the chromophore as the rest of PYP transitions to its excited state, and another proton transfer from the chromophore to Glu46 as PYP transitions back to its ground state5. However, the process by which PYP transitions from the ground state to its excited state is still largely unknown. It is for this reason that MD simulations were utilized to better understand this process and develop a methodology for monitoring this motion experimentally by shifts in vibrational absorption spectra. The simulations used here were completed using the MD software package AMBER. While there are other computational methods that have higher levels of accuracy than MD simulations (such as quantum mechanical computational programs), these methods are far more computationally intensive, requiring orders of magnitude more time to determine the mechanism by which processes occur. Such methods are primarily implemented for situations in which the molecule of interest is composed of less than one hundred atoms. MD simulations, on the other hand, are less accurate but far less computationally intensive. This is because these simulations make use of Newton’s second law which states that the change in an object’s momentum with respect to time is equal to the total force acting on said object. The forces used in these calculations depend solely on the positions, velocities, and physical properties of atoms that compose the molecule of interest. As a result of this, all that one needs to perform such a simulation is 2 an initial arrangement of atoms, velocities for each atom, masses of each atom, and the distribution of electrical charges across the molecule. Once these components have been determined all that remains for the simulator to decide is which computational algorithm to make use of (e.g. a fourth order Runge-Kutta, Verlet, or other method), and the specific conditions under which the simulation is run (e.g. pressure, volume, temperature, environment, and other considerations). Proteins and other biomolecules are the perfect objects for investigation with MD simulations as their large sizes require incredible amounts of computational power to determine the dynamics over just a microsecond (10-6 second) period, even by simple computational methods. This same feat done by a quantum mechanical computational program would require so much time, given the current computing power of GPUs and CPUs, that by the time the computation had completed all the original investigators would be long gone. There are some workarounds to this in which the two methods are used but on different scales. For instance, most of the system (molecule of interest and the solvent surrounding it) is treated classically (i.e. with an MD simulation), and then a smaller region is treated quantum mechanically6. This combination allows for low accuracy in regions of lower interest and higher accuracy for the regions under the simulator’s investigation. A similar process can be used to determine the electrostatic properties in regions of interest. In this work and the work done by Webb7 and coworkers, these regions of interest were genetically incorporated nitrile probes. Specifically, the work done by Webb focused on measuring electrostatic properties of guanosine triphosphate binding proteins, artificially placed probes, and the correlation of these probe’s properties to 3 experimentally observed shifts in vibrational absorbance spectra. However, as the work done here only focused on the correlation of changes in the electric field of probes to experimentally observed shifts, that aspect of Webb’s work will be the main focus here. Currently, it is only possible to experimentally measure the shift in absorbance spectra and not directly measure the electric field, Webb developed a method for determining the electric field along the genetically incorporated probe from computational simulations. This method involves three calculations: one that is straightforward, and two that share an explanation but require a more in-depth discussion. Once the electric field is found, it is possible to use the reasoning provided later to determine how to make the comparison between experiment and simulation. How then does Webb’s method work? Its first calculation is easy to understand as it relies on Coulomb’s Law for Point Particles (see Equation 9 and note that this is the field form of the equation). This law states that the force acting on two charged particles is proportional to the product of their charges and inversely proportional to the square of their separation distance, and that the direction of said force is along the vector connecting the two charges. The law also follows the principle of superposition (i.e. the total electrical force acting on a charge is the sum of all electrical forces acting on that charge), which allows us to treat the interactions between each pair of charges as independent of the other charges present. Thus, the first calculation in Webb’s method is merely determining the contribution of the protein on the electric field at the center of the nitrile bond. The remaining two calculations are more complicated, but the theory behind them is simple when broken down. Specifically, these two calculations rely on numerical solutions to the Poisson-Boltzmann equation (PBE). Only a brief overview of the 4 rationale behind this equation will be provided here for brevity’s sake (for a more indepth discussion see Holst8). Imagine a ball of some material that is large enough in size that the atoms that compose it are incredibly tiny in comparison. For such a system, we can make use of the continuous form of Coulomb’s Law (Equation 1). ′ ′ 𝜌(𝑟 )(𝑟−𝑟 ) 𝐸⃗ (𝑟) = ∫𝑉′ 4𝜋𝜀 \|𝑟−𝑟′\|3 𝑑𝑉′ 𝑜 (Equation 1) Note that the difference between this form and the form for individual particles is that in the place of a finite sum, the equation has an integral, and in place of individual charges, there is now a continuous distribution of charges denoted by 𝜌(𝑟′) (In other words, each position, denoted by 𝑟′, corresponds to some tiny amount of charge belonging to our ball of volume, 𝑉′). If we then make use of the fact that electric fields are the negative gradient of electric potentials (𝜙), it is possible to take the gradient of Coulomb’s Continuous Law to obtain Poisson’s equation (Equation 2). ∇2 𝜙 = ∆𝜙 = 𝜌(𝑟) 𝜀𝑜 (Equation 2) This tells us that if we can find a solution to the Poisson equation for some given charge distribution, 𝜌(𝑟), then we can determine the electric field at any point that is caused by this distribution. Notice, however, that this requires us to have a charge distribution in the first place. This is where the Boltzmann portion of the PBE comes into play. Specifically, this model designates three regions of different charge distributions. The first is the molecule of interest itself (PYP in this case) which has its charge distribution provided by the simulator (AMBER). The second is called the mobile ion exclusion zone, which is a small region of space just off the surface of the molecule of interest in which there is no charge present. The third region is where all the mobile ions and solvent molecules are located. The charge distribution of this region is modeled by 5 assuming that the mobile ions that compose it are distributed according to the Boltzmann distribution (Equation 3). − ⃗) 𝑈(𝑟 𝑃 ∝ 𝑒 𝑘𝐵𝑇 (Equation 3) In other words, the probability of observing a mobile ion at some point outside the region filled by the protein is proportional to Euler’s number raised to negative power of the energy corresponding to the ion being at that location divided by Boltzmann’s constant multiplied by the absolute temperature. Put simply, the mobile ions present will be found in areas that correspond to low energies more often than those that correspond to high energies. This is in essence the main idea behind the PBE. There are complications that arise with determining where the first and second regions end and begin, how to treat dielectrics, and factors such as polarization—but those will not be discussed here. This leaves us with the question of how to connect the results of these calculations with experimentally observed phenomena. For this we have the following reasoning. It is well known that the energy of an electric dipole (U) in an external electric field is given by the equation: 𝑈 = −𝑝 ∙ 𝐸⃗ (Equation 4) Where 𝑝 is the polarization of the dipole and 𝐸⃗ is the external electric field. Since polar chemical bonds are electric dipoles, we can describe the energy (classically) by Equation 4. These bonds behave like springs, stretching and compressing. As the atoms involved in the bond move at the end of these “springs”, there is a resultant emission of electromagnetic waves with a frequency corresponding to the natural frequency of the bond. We can determine the energy of this light by Equation 5: 6 𝑈 = ℎ𝑓 (Equation 5) Where h is Planck’s constant and f is the frequency of the emitted light. It is possible to excite the electrons within a bond by irradiating the molecule with light of a similar frequency. In the case of PYP’s chromophore, this excitation results in an overall structural change in PYP and a change in the frequency of nitrile bond of our probe. By equating Equations 4 and 5 we can approximate how the polarization of the bond changes. This gives Equation 6: ℎ∆𝑓 = −∆(𝑝 ∙ 𝐸⃗ ) (Equation 6) We can make use of light’s particle-wave duality by the relation 𝑓 = 𝑐𝑘, where c is the speed of light and k is the wavenumber characteristic of the wave in question, to rewrite our equation as Equation 7: ℎ𝑐∆𝑘 = −∆(𝑝 ∙ 𝐸⃗ ) (Equation 7) In this work we treat the polarization to be static. Therefore, we can rewrite Equation 7 in terms of the Stark tuning rate (𝜇) by the relation (𝑝 = ℎ𝑐𝜇 ), and obtain Equation 8: ∆𝑘 = −𝜇 ∙ ∆𝐸⃗ (Equation 8) This final equation allows us to relate experimental results (i.e. ∆𝑘) to changes that we see in our simulations (i.e. ∆𝐸⃗). All that remains is how to determine ∆𝐸⃗. 7 METHODS In this work, the AMBER16 Molecular Dynamics Software Package was used in all simulations. The starting structures of the light and dark state were obtained from the RCSB online protein data bank (PDB). The structures chosen to represent the dark state came from the 3phy NMR solution data and 5hd3 X-ray diffraction data. The structures chosen to represent the light state came from the 2kx6 NMR solution data and 5hd5 Xray diffraction data. NMR and X-ray structures were both used in these simulations as prior to the investigation it was unclear which would be most representative of experimental results. Experimental data on vibrational peak shifts were obtained by PhD candidate Sarah Lefave (for full details of the processes of mutation, growth and measurements of PYP, see Lefave9). The motions of the N-terminal arm of PYP are hypothesized to play a key role in PYP’s interaction with its putative partner protein. Furthermore, dynamics at the Nterminus are thought to be essential to the transition between the ground and excited state. Initial simulations were conducted to determine the time required for this transition to take place. This was done by performing a simulation on the first eleven residues of PYP. Specifically, this simulation was conducted over a period of 100 ns at a temperature of 300 K, as determined by a Langevin Thermostat with a collision frequency of 5 ps -1, in an NPT ensemble. This system was treated with an implicit solvent using the modified generalized Born model10,11 at a constant pH of 5.0. The system was minimized using 1000 steps of maximum descent (steepest descent algorithm in AMBER) followed by 9000 gentle descent (conjugate gradient method in AMBER) steps with a 10 kcal/mol∙A2 restraint on the backbone of the chain. This minimization was followed by linearly 8 heating the system from 0 K to 300 K over the course of 500 ps with a 1 fs time step, and allowed to fluctuate with a restraint of 5 kcal/mol∙A2 on the backbone for an additional 500 ps. This heating step was followed by an equilibration stage for which the system was maintained at 300 K, had the backbone atoms restrained with a restraint of 0.1 kcal/mol∙A2, and conducted over a period of 1 ns using a 1 fs time step. Finally, the system was allowed to fluctuate for an additional 100 ns using a 1 fs time step. Further simulations were conducted on the N-terminal arm to determine at which pH the whole protein ought to be simulated at. This was done by making use of the constant pH functionality of the Amber software package. Simulations were conducted at pH values of 5, 6, 7, 8, and 9. Since the initial structure was obtained at a pH of 5.0, incremental simulations were performed to slowly increase the pH of the structure. This was done by using the final structure of a simulation as the starting structure for the next simulation that had its pH value 0.02 pH greater than the previous simulation. These incremental simulations were allowed to run for a period of 1 ns and the simulations at the integer values of pH were run for a period of 60 ns. Unlike the other simulations conducted on the arm, these were done in a truncated octahedron water box with a minimum distance of 8 Å from an atom in the arm to the edge of the box. All simulations performed on the N-terminal arm were analyzed by measuring the end to end distance of the chain and the solvent accessible surface area. Prior to determining the shifts in the electric fields of the genetically incorporated vibrational probes across simulations from many unique starting structures, preliminary results from the NMR and X-ray models were compared. Specifically, the first structure of the PDB file for each method was simulated and had its average electric field across 9 that trajectory compared to the average electric fields of the others (see the following paragraphs for how this was done). This was done to determine whether choosing an NMR solution model or an X-ray model had any effect on the calculated electric field shift, and if there was an effect which method—NMR or X-ray—provided more accurate results. It is possible that these comparisons may be dependent solely on the probe that is chosen to have its calculated field compared. Thus, it was necessary to perform this comparison for every probe and ensure that the information provided by one probe was consistent with the results of the other probes. To computationally determine the shifts in the electric fields of the genetically incorporated vibrational probes, the motions of ten starting structures of both the ground state PYP (3phy) and excited state PYP (2kx6) over a 24 ns period at a temperature of 300K, as determined by a Langevin Thermostat with a collision frequency of 5 ps -1, in an NPT ensemble. The overall system was composed of a TIP3P truncated octahedron water box, with a minimum distance of 8 Å from an atom in the protein to the edge of the box. Cl- and K+ ions were added to neutralize the charge of the system and produce a final salt concentration of 0.1M. Nitrile probes were added manually to residues Phe6, Phe28, Phe79, Phe96, and Phe121. The charges of these probes and the p-coumaric acid chromophore were defined at the HF/6-13G+(d,p) level of theory, and the Generalized Amber Force Field (GAFF) was used as their force constants. The force constants of the remainder of the protein were determined by the ff14SB force field. The system was minimized in two steps. The first minimization was used to minimize the locations of solvent molecules and remove bad bonds, specifically this minimization involved 10,000 steps of maximum descent followed by up to 90,000 steps of gentle 10 descent, all the while having a restraint of 900 kcal/mol∙A2 on the protein. The second minimization was used to minimize the protein and remove bad bonds present; this was done by a similar set up of steps but this time with a restraint of 10 kcal/mol∙A2 only on the backbone atoms. After this step, the system was treated as an NVT ensemble—to let the pressure of the system adjust so that the overall energy remained as minimized as possible— and linearly heated from 0K to 300K over a period of 0.67 ns. The system was then allowed to fluctuate at this temperature for another 0.33 ns prior to running the equilibration calculation. This equilibration calculation was run as an NPT ensemble using a restraint of 1 kcal/mol∙A2 on the backbone atoms for a total of 5 ns. It was after this that the simulation was run, the difference between the two being that the simulation step had no restraints on the backbone. The heating, equilibration and simulations steps all made use of a time step of 0.5 fs. From these simulations, intermediate structures were taken every 90 ps and had four calculations conducted on them to determine the electric field at the center of mass of the nitrile bond. The first two calculations performed by the Adaptive PoissonBoltzmann Solver (APBS) were done to determine the solvent field effects. This was done by calculating potentials at seven points, corresponding to chargeless, massless dummy atoms, along the bond in both calculations. Specifically, these calculations involved a two-stage focusing method; one course solution for the entire protein, and a finer solution centered on the center of mass of the nitrile bond. The boundary conditions of this setup were chosen by the single Debye-Hückel boundary conditions as the boundary of the course box was set to be roughly 50 Å away from the protein. Grid sizes 1 for the finer box were set to 30 Å to minimize numerical errors12. Once these boxes were 11 established, the linear Poisson-Boltzmann (PB) equation was numerically solved using a linear spline method (This was chosen to allow for the use of chargeless, massless, dummy atoms to determine the potential along the nitrile bond). The only differences between the two calculations were the dielectrics used. In the first, a solvent dielectric of 80 and a solute dielectric of 2 were used. In the second, the solvent and solute dielectrics were both set to 2. The potentials of the first minus the potentials of the second provided the effects of the solvent on the potential of the bond. To find the solvent’s effect on the electric field, the negative gradient of the potential along the bond was calculated using linear regression. This was done by taking the potential of the three dummy atoms nearest the center of mass of the nitrile bond, and performing a linear least-squares regression minimization of these potentials as a function of distance in angstroms. To determine the effects of the protein on the electric field at the center of mass of the nitrile bond, a direct Coulombic field calculation was performed using a simple Python script. These two results are then combined to form the total electric field at the center of the nitrile bond. The fourth calculation was a purely Coulombic field calculation for which the electric field along the nitrile bond was determined by the following: 𝑞 𝑟̂ 𝑞 𝑟̂ 𝐸⃗ = ∑𝑤𝑎𝑡𝑒𝑟𝑠 4𝜋𝜀 𝜀𝑖 𝑟 2 + ∑𝑝𝑟𝑜𝑡𝑒𝑖𝑛 4𝜋𝜀 𝜀 𝑗 𝑟 2 𝑜 𝑤𝑎𝑡 𝑖 𝑜 𝑝𝑟𝑜𝑡 𝑗 (Equation 9) This was done to check the efficiency of the method developed by Webb and coworkers and to determine whether it provides a higher level of accuracy in predicting experimental shifts. Once these calculations were run, it was possible to plot the difference in the electric fields of the excited and ground states versus the observed shift 12 in the bond’s vibrational peak to obtain the Stark tuning rate of our probe. In Figure 6, the average difference for each of the mutants is plotted with their respective errors. RESULTS The data obtained for the simulations of the N-terminal arm indicates that this particular region in PYP alternates between ‘globular’ and ‘elongated’ forms on the order of ~20 ns. This is best seen by Figure 1 in which the solvent accessible surface area is plotted as a function of time across the entirety of the 100 ns trajectory. This periodic motion is also partially seen in the plot of the end to end distance of the N-terminal arm across this same trajectory (Figure 2), though it is less distinct. This is discrepency is due to the fact that it is possible for this distance to be small and the surface area to be either large or small; whereas, for a large distance the surface area is necessarily large as well. For example, consider a length of string in the shape of a horseshoe and this same piece of string crumpled into a ball. The horseshoe structure has roughly the same end to end distance but a much larger surface area. Figure 1: Plot of solvent accessible surface area of the N-terminal arm across the 100 ns trajectory. It was determined from this plot that a period of at least 20 ns is necessary to properly capture the extent of motions of this portion of PYP. 13 Figure 2: Plot of end to end distance of the N-terminal arm across the 100 ns trajectory. This plot provides support for the interpretation of Figure 1. Discrepencies between the two plots arise from the geometrical properties being measured. Time is plotted along the x-axis in units of 0.1 ps. In addition to this, it was found in the pH simulations of the N-terminal arm that these transitions occur at longer periods of time as the pH of the system increases. The explicit water model further reduced the speed of this transition. This effect can be seen by the longer period (and apparent loss of periodicity) of transition in the plot of the solvent accessible surface area for the 5.0 pH system (Figure 3). This is most likely explained by the requirement that the water molecules present in the explicit model need to translationally diffuse out of the path of the arm; whereas, in the implicit model there is no diffusion requirement and the arm is free to move unimpeded. The increase in transition time as the pH of the system increases can be seen best in the plots for a pH value of 8 (Figure 4). This is most likely a result of the protonation states of the residues not explicitly being changed during these simulations. As the pH is increased these protonation states become less stable and other protonation states become more favorable. Therefore, to ensure that transitions occur in a timely fashion across future simulations a pH value of 5.0 was used. 14 Figure 3: Plot of solvent accessible surface area of the N-terminal arm across a 60 ns trajectory in an explicit water box at a constant pH value of 5.0. Time is plotted along the x-axis in units of 0.5 ps. Notice the difference between this plot and that of Figure 1 in which an implicit model was utilized. Figure 4: Plot of solvent accessible surface area of the N-terminal arm across a 60 ns trajectory in an explicit water box at a constant pH value of 8.0. Time is plotted along the x-axis in units of 0.5 ps. The sharp increase in surface area at the end of the trajectory was not studied further though it may be the result of a sudden transition to an ‘elongated’ form. This could potentially correspond to an opening of a horseshoe shape. From the comparisons of the calculated electric fields of the vibrational probes for the NMR and X-ray it was found that the NMR models were more representative of experimentally observed shifts. These comparisons revealed that the two X-ray models (both the light and dark) consistently had 95% confidence intervals that contained the average field of the dark state NMR. In other words, there is no reason to assume that these three models are providing unique information. This overlap is best represented by the results obtained for the Phe28 vibrational probe (shown in Table 1, for additional comparisons see SI1). In addition to these, an overlay of the average structure across the trajectories for these various models was made and found to support this conclusion (see 15 Figure 5, for RMSD plots that further support this see SI2). Thus, all the remaining simulations that were used to determine the shifts in electric field from the dark to light structure were done using the NMR solution models. Table 1: 95% confidence intervals of the difference in average calculated electric fields between various protein models for the 28F mutant. The NMR solution models (3phy and 2kx6, dark and light respectively) display a noticeable difference in average field. The X-ray diffraction models, on the other hand, display no difference between amongst both themselves and the 3phy model. Thus, to avoid performing unnecessary calculations, only the 2kx6 and 3phy models were used to determine changed in the electric field. Figure 5: Overlay of the average structures obtained from 24 ns trajectories of the 5hd5 (yellow), 5hd3 (red), and 3phy (blue) protein models. The plot of the calculated change in the electric field vs. experimentally observed shift (Figure 6) suggests that the method utilized is very effective (R2 = 0.91) at modeling probes located in regions that are not solvent-exposed. However, its applicability to solvent-exposed residues is low (as seen by the large deviation of the 6F mutant from the 16 linear-least squares regression line) due to its lack of accounting for polarization and nonlinear dielectric behavior. This is because the APBS calculation relies on an implicit, continuous model of the solvent with ions (of positive and negative one elementary charge in a 1:1 ratio) distributed throughout the solvent according to a Boltzmann distribution. From this setup, APBS determines a numerical solution to the resultant Poisson equation thereby only capturing the electrical components of the bond in question. This method, and other continuum model methods, are far from perfect and are subject to many flaws (for a more in-depth discussion see Gunner and Baker17). However, this method does obtain a higher level of accuracy to that obtained by a direct coulombic calculation (albeit at a rate ~1000x slower than the direct calculation). This result can be seen by the R2 value obtained by the latter in correlating calculations to shifts as shown in Figure 7. Figure 6: Plot of calculated electric field (using Webb’s method) at center of mass of the nitrile bond versus the experimental shift for each probe location. The regression lines (NMR/NMR blue, XTAL/XTAL yellow, XTAL(L)/NMR(D) red, and XTAL(D)/NMR(L) green) were produced using a weighted linear-least squares 17 regression method weighted on the calculated field error. Vertical error bars were determined by the standard error of the field's distribution for each mutant, and horizontal error bars were determined by fits produced for experimental data. Figure 7: Plot of calculated electric field (using a purely Coulombic calculation) at center of mass of the nitrile bond versus the experimental shift for each probe location. The regression line (seen in blue) was produced using a weighted linear-least squares regression method weighted on the calculated field error. Vertical error bars were determined by the standard error of the field's distribution for each mutant, and horizontal error bars were determined by fits produced for experimental data. Additionally, it was found that one must be careful about using only a few starting structures in their simulations. A sample of five starting structures obtained from the 3phy file produced results that agreed with the results obtained from a larger sample of ten such structures. In the case of the 2kx6 file this was not so. In fact, in two out of the five mutants the larger sample resulted in a drastic (greater than three significant figures) change to the calculated shift (the mutants affected most were the 6F and 79F with both originally not falling in line with their counterparts). One explanation is that the starting structures of the ground state (3phy) did not have enough energy to transition to the excited state and were thus ‘stuck’ in a handful of ground-state-like states. This would lead to convergence of all the ground state models to a narrow range of calculated fields throughout the simulation. The excited state (2kx6) models, on the other hand, started with greater energies, and could access more states which resulted in the trajectories 18 varying greatly from model to model. Thus, they converged to more disparate fields thereby requiring a larger sample size to capture the overall trend. CONCLUSION In this work, the shifts in IR absorbance spectra of five genetically incorporated vibrational probes in PYP were correlated with electric fields determined from MD simulations. There was a strong correlation between four out of the five probes which suggests that the calculations done here are representative of the observed properties of PYP. Thus, this work serves as a step towards better understanding of the dynamics not only in PYP but also in other proteins. The methods used here can be applied to other protein systems and provide insights into how to monitor protein dynamics using vibrational spectroscopy in conjunction with MD simulations. 19 REFERENCES 1. Vinson, V. J. “Proteins in Motion. Introduction.” Science 2009, 324, 197-197. doi:10.1126/science.324.5924.197 2. Kamerlin, S. C. L. and Warshel, A. “At the Dawn of the 21st Century: Is Dynamics the Missing Link for Understanding Enzyme Catalysis?” Protein 2010, NA-NA. doi:10.1002/prot.22654 3. Boxer, S. G. “Stark Realities†” J Phys Chem B 2009, 113, 2972-2983. doi:10.1021/jp8067393 4. Hendriks, J. (2004), Hellingwerf, Klass J., CRC Handbook of Organic Photochemistry and Photobiology. [ebook]. EBSCO Publishing 5. Imamoto, Y. Mihara, O. H., Kataoka, M. Tokunaga, F. Bojkova, N. Yoshihara, Evidence for Proton Transfer from Glu-46 to the Chromophore during the Photocycle of Photoactive Yellow Protein, Journal of Biological Chemistry, 1997, 272, 12905-12908 6. Náray-Szabó, Gábor; Oláh, Julianna; Krámos Quantum Mechanical Modeling: A Tool for the Understanding of Enzyme Reactions. Biomolecules. 2013 Sep, 3(3), 662-702. doi:10.3390/biom3030662 7. Ritchie, Andrew W.; Webb, Lauren J. Understanding and Manipulating Electrostatic Fields at the Protein-Protein Interface Using Vibrational Spectroscopy and Continuum Electrostatics Calculations. J. Phys. Chem. B. 2015, 119, 13945-13957 8. Holst, M. J. Multilevel Methods for the Poisson-Boltzmann Equation. PhD Thesis, Numerical Computing Group, University of Illinois at UrbanaChampaign, October 1993. (Published as Technical Report UIUCDCS-R-931821.) 9. Lefave, S., Zagorec-Marks, C., Lee, R., Graham, B., Brueggemeyer, M., Richards, W., Wisniewski, J., Kieber- Emmons, M. (2020). FTIR Spectroscopy Reveals Conformational Sampling as a Driver for Signaling in Photoactive Yellow Protein. Manuscript in preparation 10. Onufriev, A.; Bashford, D.; Case, D. Modification of the generalized Born model suitable for macro- molecules. J. Phys. Chem. B, 2000, 104, 3712–3720 11. Onufriev, A.; Bashford, D.; Case, D. Exploring protein native states and largescale conformational changes with a modified generalized Born model. Proteins, 2004, 55, 383–394. 20 12. Harris, R. C.; Boschitsch, A. H.; Fenley, M. O. Influence of Grid Spacing in Poisson-Boltzmann Equation Binding Energy Estimation. J. Chem. Theory Comput. 2013, 9, 3677-3685 13. Roe, Daniel R.; Cheatham III, Thomas E. PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data. J. Chem. Theory Comput., 2013, 9 (7), 3084-3095 14. Joung, In Suk; Cheatham, Thomas E. Determination of Alkali and Halide Monovalent Ion Parameters for Use in Explicitly Solvated Biomolecular Simulations. J. Phys. Chem. B. 2008, 112 (30), 9020-9041 15. Jurrus, Elizabeth; Engel, Dave; Star, Keith; Monson, Kyle; Brandi, Jaun; Felberg, Lisa E.; Brookes, David H.; Wilson, Leighton; Chen, Jiahui; Liles, Karina; Chun, Minju; Li, Peter; Gohara, David W.; Dolinsky, Todd; Konecny, Robert; Koes, David R.; Nielsen, Jens Erik; Head-Gordon, Teresa; Geng, Weihua; Krasny, Robert; Wei, Guo-Wei; Holst, Michael J.; McCammon, J. Andrew; Baker, Nathan A. Improvements to the APBS Biomolecular Solvation Software Suite. Protein Science, 2018, 27, 112-128 16. Holst, M. Adaptive Numerical Treatment of Elliptic Systems on Manifolds. Advances in Computational Mathematics, 2001, 15, 139-191 17. Gunner, MR; Baker, Nathan Continuum Electrostatics Approaches to Calculating pKas and Ems in Proteins, Methods Enzymol., 2016, 578, 1-20. doi:10.1016/bs.mie.2016.05.052 21 SUPPLEMENTARY INFORMATION SI1: 95% confidence intervals of the difference in average calculated electric fields between various protein models for all mutants (28F provided in manuscript). The NMR solution models (3phy and 2kx6, dark and light respectively) display a noticeable difference in average field. The X-ray diffraction models, on the other hand, display no difference between amongst both themselves and the 3phy model. Thus, to avoid performing unnecessary calculations, only the 2kx6 and 3phy models were used to determine changed in the electric field. Note: 2kx6 and 2kx6F differ by two hydrogens corresponding to the protonation state of HIS3 (HIP3 for 2kx6F) and HIS108 (HIP108 for 2kx6F). 6F: Comparison 3phy - 2kx6 Experimental 3phy - 5hd5 5hd5 - 2kx6 5hd3 - 2kx6 5hd3 - 3phy 5hd3-5hd5 5hd3 -2kx6F 3phy -2kx6F 5hd5-2kx6F 95% conf int 1.7 4.5 -0.3 0.5 1.8 -1.2 -0.2 -0.02 -0.1 -1.3 96F: Comparison 3phy - 2kx6 Experimental 3phy - 5hd5 5hd5 - 2kx6 5hd3 - 2kx6 5hd3 - 3phy 5hd3-5hd5 5hd3 -2kx6F 3phy -2kx6F 5hd5-2kx6F 95% conf int 1.5 4.6 0.0 0.1 0.7 3.8 -0.6 2.4 1.9 4.7 -1.3 1.7 1.0 3.8 -0.16 2.8 -0.4 2.7 -2.6 0.4 4.4 6.6 2.5 3.4 4.7 1.6 2.7 2.9 2.6 1.6 79F: Comparison 3phy - 2kx6 Experimental 3phy - 5hd5 5hd5 - 2kx6 5hd5 - 5hd3 5hd5 - 2kx6F 3phy - 2kx6F 5hd3 - 2kx6F 5hd3 - 3phy 5hd3 - 2kx6 95% conf int 1.0 4.5 6.9 9.1 -1.0 2.2 0.6 3.7 -2.7 0.1 2.5 5.4 2.8 6.2 2.2 5.2 -0.9 2.3 1.9 5.1 121F: Comparison 3phy - 2kx6 Experimental 3phy - 5hd5 5hd5 - 2kx6 5hd3 - 2kx6 5hd3 - 3phy 5hd3-5hd5 5hd3 -2kx6F 3phy -2kx6F 5hd5-2kx6F 95% conf int 2.1 4.9 0.0 0.0 -2.8 -0.1 3.6 6.4 3.3 6.2 -0.2 2.6 -1.6 1.2 3.73 6.8 2.7 5.4 4.1 6.9 22 SI2: RMSD plots of backbone atoms for trajectories of the first models obtained from the RCSB databank. Each frame corresponds to 5 ps, notice the large RMSD values for the 2kx6 trajectories in comparison to the 5hd5, 5hd3 and 3phy trajectories. This is indicative of sampling a very different subspace of configurations that the other trajectories are not sampling. 6F: (Top to bottom, left to right: 5hd3 to 5hd3 average, 5hd5 to 3phy average, 2kx6 to 3phy average, 5hd3 to 3phy average) 23 28F: (Top to bottom, left to right: 5hd3 to 5hd3 average, 5hd5 to 3phy average, 2kx6 to 3phy average, 5hd3 to 3phy average) 24 79F: (Top to bottom, left to right: 5hd3 to 5hd3 average, 5hd5 to 3phy average, 2kx6 to 3phy average, 5hd3 to 3phy average) 25 96F: (Top to bottom, left to right: 5hd3 to 5hd3 average, 5hd5 to 3phy average, 2kx6 to 3phy average, 5hd3 to 3phy average) 26 121F: (Top to bottom, left to right: 5hd3 to 5hd3 average, 5hd5 to 3phy average, 2kx6 to 3phy average, 5hd3 to 3phy average) 27 SI3: Plot of experimental and calculated shifts in absorbance as functions of the shift in number of water molecules (from transitioning from the dark to light state), within spheres of various radii centered on the center of mass of the nitrile bond. Identities of the mutants are provided in the 10 Angstrom plot (since the absorbance shifts are invariant from one plot to the next, the order of the mutants is the same across each plot). The orange lines correspond to the linear regression fit for the experimentally observed shifts, and the blue lines to those of the calculated shifts. Note the large R2 values obtained for correlations between the experimental absorbance shifts and the shifts in number of waters near the nitrile bonds for the 3 Angstrom and 4 Angstrom cases, and the almost no-correlation for larger shells. This would suggest that the absorbance shifts observed experimentally are highly dependent on the water environment around the bonds and may be worth investigating as a way of explaining these shifts.
Reference URL	https://collections.lib.utah.edu/ark:/87278/s66kr83f