| Title | Genetic variation and the regulatory landscape governing the transcriptional response to endoplasmic reticulum (ER) stress |
| Publication Type | dissertation |
| School or College | School of Medicine |
| Department | Human Genetics |
| Author | Russell, Nicole Denise |
| Date | 2021 |
| Description | The endoplasmic reticulum (ER) stress response is the cellular response to misfolded proteins accumulated in the ER, which entails a large transcriptional response that attempts to return the cell to homeostasis called the unfolded protein response (UPR). Despite the UPR being a conserved cellular process, studies have demonstrated the UPR to be highly variable dependent on genetic background in multiple organisms. This dissertation aims to better understand the genetic mechanisms that underlie the variability within the ER stress response. Chapter 2 studies the cis- and trans- mechanisms involved in the ER stress response. We utilized different strains of an in vivo mouse model and their F1s to identify and quantify context-specific cis- and trans- regulatory mechanisms that underlie the variable transcriptional response to misfolded proteins. We found hundreds of genes that respond to ER stress in a tissue- and genotype-dependent manner. The majority of regulatory mechanisms that underlie these variable transcriptional responses are driven by cis- regulatory variation and are unique to a given tissue or ER stress state. Chapter 3 studies one potential cis- regulatory mechanism through which natural genetic variation can underlie the variable ER stress response. We created a pipeline to identify potential binding sites of the three main transcription factors of the UPR. We quantified how much genetic variation is within these sites known to influence gene expression in multiple human tissues. Finally, we utilized both expression and iv sequencing data from human cell lines experiencing ER stress to perform an eQTL analysis. With this, we identify thousands of SNPs that associate with variable changes in gene expression post-ER stress, with many of these falling within putative binding sites. The results of both of these studies push forward our understanding of how genetic variation can impact the ER stress response. The first study uses a mouse model to characterize the patterns of cis- and trans- mechanisms involved in the ER stress response. The second study further explores how one specific cis- mechanism, TF binding, can potentially underlie the ER stress variability seen in the human population. |
| Type | Text |
| Publisher | University of Utah |
| Dissertation Name | of Philosophy Department |
| Language | eng |
| Rights Management | © Denise Russell |
| Format | application/pdf |
| Format Medium | application/pdf |
| ARK | ark:/87278/s6hn0tvd |
| Setname | ir_etd |
| ID | 2100226 |
| OCR Text | Show GENETIC VARIATION AND THE REGULATORY LANDSCAPE GOVERNING THE TRANSCRIPTIONAL RESPONSE TO ENDOPLASMIC RETICULUM (ER) STRESS by Nicole Denise Russell A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Human Genetics The University of Utah December 2021 Copyright © Nicole Denise Russell 2021 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL Nicole Denise Russell The dissertation of has been approved by the following supervisory committee members: Clement Chow , Chair 11/5/2021 Christopher T. Gregg , Member 11/5/2021 Julie Hollien , Member 11/5/2021 Charles L. Murtaugh , Member 11/5/2021 Aaron Quinlan , Member 11/8/2021 and by Lynn B. Jorde the Department/College/School of and by David B. Kieda, Dean of The Graduate School. Date Approved Date Approved Date Approved Date Approved Date Approved , Chair/Dean of Human Genetics ABSTRACT The endoplasmic reticulum (ER) stress response is the cellular response to misfolded proteins accumulated in the ER, which entails a large transcriptional response that attempts to return the cell to homeostasis called the unfolded protein response (UPR). Despite the UPR being a conserved cellular process, studies have demonstrated the UPR to be highly variable dependent on genetic background in multiple organisms. This dissertation aims to better understand the genetic mechanisms that underlie the variability within the ER stress response. Chapter 2 studies the cis- and transmechanisms involved in the ER stress response. We utilized different strains of an in vivo mouse model and their F1s to identify and quantify context-specific cis- and transregulatory mechanisms that underlie the variable transcriptional response to misfolded proteins. We found hundreds of genes that respond to ER stress in a tissue- and genotypedependent manner. The majority of regulatory mechanisms that underlie these variable transcriptional responses are driven by cis- regulatory variation and are unique to a given tissue or ER stress state. Chapter 3 studies one potential cis- regulatory mechanism through which natural genetic variation can underlie the variable ER stress response. We created a pipeline to identify potential binding sites of the three main transcription factors of the UPR. We quantified how much genetic variation is within these sites known to influence gene expression in multiple human tissues. Finally, we utilized both expression and sequencing data from human cell lines experiencing ER stress to perform an eQTL analysis. With this, we identify thousands of SNPs that associate with variable changes in gene expression post-ER stress, with many of these falling within putative binding sites. The results of both of these studies push forward our understanding of how genetic variation can impact the ER stress response. The first study uses a mouse model to characterize the patterns of cis- and trans- mechanisms involved in the ER stress response. The second study further explores how one specific cis- mechanism, TF binding, can potentially underlie the ER stress variability seen in the human population. iv This dissertation is dedicated to Kiera Jorgensen. Her kindness, compassion, laugh, and love of science gave me the strength I needed during my toughest years of undergraduate and graduate school. Your beautiful spirit will be missed. This dissertation is dedicated to Dr. Chow for taking a chance on me, believing in me, and mentoring me through the years in his lab. I dedicate this dissertation to my parents, Tom and Denise, for their love and support which allowed me to flourish and pursue my dreams in science. Finally, I dedicate this dissertation to my sister, Kristi, for being my best friend and support system throughout graduate school. I would not have made it through without her. TABLE OF CONTENTS ABSTRACT....................................................................................................................... iii LIST OF FIGURES ......................................................................................................... viii ACKNOWLEDGMENTS .................................................................................................. x Chapters 1. INTRODUCTION .......................................................................................................... 1 The Endoplasmic Reticulum Stress Response ........................................................... 1 ER Stress Response and Disease ............................................................................... 2 Utilizing Natural Genetic Variation and Modifier Genes .......................................... 4 Cis- and Trans- Regulatory Landscape...................................................................... 5 Transcription Factor Binding Variability................................................................... 7 References ................................................................................................................ 12 2. THE DYNAMIC EFFECT OF REGULATORY GENETIC VARIATION ON THE IN VIVO ER STRESS TRANSCRIPTIONAL RESPONSE ............................................ 18 Abstract .................................................................................................................... 18 Introduction .............................................................................................................. 19 Results ...................................................................................................................... 22 Discussion ................................................................................................................ 41 Methods.................................................................................................................... 43 References ................................................................................................................ 60 3. TRANSCRIPTION FACTOR BINDING SITE PREDICTION PAIRED WITH GTEX DATA AND HUMAN CIS- EQTL ANALYSIS .................................................. 68 Abstract .................................................................................................................... 68 Introduction .............................................................................................................. 69 Results ...................................................................................................................... 71 Discussion ................................................................................................................ 80 Methods.................................................................................................................... 85 References .............................................................................................................. 100 4. CONCLUSION ........................................................................................................... 106 References .............................................................................................................. 110 vii LIST OF FIGURES Figures 1.1. The unfolded protein response ..................................................................................... 9 1.2. Cis- and trans- regulation .......................................................................................... 10 1.3. Impacts of genetic variation on transcription factor binding ..................................... 11 2.1. Upregulation of canonical ER stress genes across genotypes and tissues ................. 50 2.2. Tissue-specific expression post ER stress response................................................... 51 2.3. Genotype-specific expression post ER stress ............................................................ 52 2.4. Variable expression of genes across genotypes post ER stress ................................. 53 2.5. Expression ratios of genes with cis- or trans- effects ................................................ 54 2.6. ER stress and tissue type reveal hidden regulatory variation .................................... 55 2.7. Impact of cis- and trans- effects on gene expression ................................................. 56 2.8. ASE and corresponding change in RNA transcript ................................................... 57 2.9. Change in ASE and transcript levels across stress conditions ................................... 58 2.10. Variable magnitude in ASE across tissues............................................................... 59 3.1. Pipeline for identifying potential transcription factor binding sites in the human genome .............................................................................................................................. 89 3.2. Letter probability matrices for each of the binding motifs used in our pipeline search ................................................................................................................................ 90 3.3. Amount of GTEx cis- eQTLs at each position for all four binding motifs................ 91 3.4. PhyloP conservation for each position within each binding motif ............................ 92 3.5. Majority of significant cis- eQTLs are unique to a given condition .......................... 94 3.6. Significant eQTLs that fall within a predicted TFBS ................................................ 95 ix ACKNOWLEDGMENTS I would like to thank my Ph.D. advisor, Dr. Clement Chow, for taking a chance on me and welcoming me in his lab. He has provided a collaborative, supportive, fun, and nurturing lab environment that has allowed me flourish as a student and scientist. His guidance allowed me to have greater confidence in myself as both a student and a scientist. I would like to thank all of the Chow lab members, both past and present, including Dr. Becky Palu, Dr. Dana Talsness, Dr. Kevin Hope, Dr. Hans Dalton, Katie Owings, Maddie Haller, Travis Tu’ifua, Holly Thorpe, Emily Coelho, Karen Peralta, and Shayna Scott for their helpful advice and friendship during my time in the lab. I want to thank the Chow lab members for creating a lab environment that is fun and supportive. I would like to give thanks to my committee members, Dr. Chris Gregg, Dr. Julie Hollien, Dr. Charles Murtaugh, and Dr. Aaron Quinlan for their amazing advice and mentorship throughout my time here. I would like to thank all researchers, students, and staff of the human genetics department that have helped me in my graduate career that are too numerous to list here. CHAPTER 1 INTRODUCTION The Endoplasmic Reticulum Stress Response The endoplasmic reticulum (ER) is a large eukaryotic organelle that occupies more than 10% of total cell volume (1). The ER plays a central role in lipid and protein biosynthesis, protein trafficking, protein folding, and calcium storage (1). The interior of the ER, the lumen, facilitates proper protein folding through a specialized oxidative environment and with the aid of molecular chaperones, folding catalysts, and posttranslational modifications (2, 3). The ER must be able to accommodate large amounts of diverse proteins. It is estimated that one-third of the mammalian genome encodes proteins that are targeted to the ER and subsequently trafficked to other organelles or released in the extracellular space (2). Correctly folding polypeptides is an error-prone process and it has been estimated that as many as one-third of all newly synthesized proteins may be subsequently degraded in some cell types, presumably due to problems during their synthesis and folding (4, 5). The inability of the cell to degrade these proteins can lead to an accumulation of misfolded proteins in the ER lumen. This accumulation is termed ER stress. Cells respond to ER stress with a large transcriptional response called the 2 unfolded protein response (UPR). The UPR is composed of three signaling pathways: IRE1, ATF6, and PERK. The UPR is initiated when a signal is transmitted across the ER membrane by at least one of the three transmembrane proteins of each signaling branch (Figure 1.1). IRE1, when activated by unfolded proteins, will cleave its only known substrate: an mRNA encoding a transcription factor that targets UPR genes named XBP1 (6, 7). Under ER stress conditions, ATF6 will be transported from the ER to the Golgi, where it will become cleaved by two different proteases. The cytosolic portion of ATF6 will then travel to the nucleus to activate gene expression (6, 8). PERK phosphorylates eIF2α, which results in lower levels of translation initiation, and a global reduction of newly synthesized proteins, many of which would have been destined for the already stressed ER lumen (9). In addition to this reduction, phosphorylated eIF2α selectively increases translation of the transcription factor ATF4 (10). The goal of the UPR is to establish and maintain protein homeostasis within the cell (11). When the UPR cannot restore ER homeostasis, apoptotic signals can then be activated (12). ER Stress Response and Disease Due to the essential role that the ER stress response has in cell homeostasis, its prevalent role in many diseases is not surprising. In fact, many studies have demonstrated that altering ER stress genes can alter disease outcomes (13–20). Sado et al. found that exogenous expression of the active form of XBP1 (XBP1s) protein had protective effects against cell death in a Parkinson’s Disease model (19). Contrary to this, Hetz et al. found that deletion of XBP1 in a different neurodegenerative disease model, ALS, led to an increased resistance to disease 3 development (20). These two studies demonstrate the ability to influence disease outcomes through modifying ER stress components and supports the hypothesis that the ER stress response may be an appropriate target for future disease therapies. However, they also demonstrate the complicated nature of the ER stress response by highlighting that both up- and down- regulation of the ER stress response, through the same gene, in different disease models can both have beneficial effects on disease outcomes. These two studies are extreme examples of modifiers of the ER stress response. In the human population, it is far more likely that the ER stress response is altered through slight expression changes of a multitude of genes. In fact, multiple studies have shown that the ER stress response, despite being a well conserved process, is highly variable across different genetic backgrounds within a population (21–24). One of these studies examined the individual response to ER stress across a range of unrelated individuals in the human population. This study found a strong correlation between genetic background and an altered transcriptional response to ER stress (22). There was extensive individual variability in gene expression in response to ER stress with a large genetic component to this variation. This study also found many of the variable ER stress responsive genes had previously been implicated in disease. The slight changes in expression of genes involved in the ER stress response can compound and result in the variable susceptibility and severity of many diseases with an ER stress component. This study also supports the idea of utilizing natural genetic variation within a population to identify genes involved in the variable ER stress response. 4 Utilizing Natural Genetic Variation and Modifier Genes Natural genetic variation within a population can be utilized to identify which genes tolerate gene expression variability and still result in a healthy organism. The ER stress response is an evolutionarily well conserved mechanism, which provides the opportunity to study this process in model organisms (25). In model organisms we can reliably induce ER stress by using the drug tunicamycin (TM), a well-established drug that induces ER stress in all tissues (26–28). TM prevents the initial step of glycoprotein biosynthesis in the ER, which blocks all N-linked glycosylation. N-linked glycosylation is critical for proper protein folding and quality control by chaperones in the ER (29). When TM is administered to a model organism, we can expect to reliably induce ER stress in all tissues by causing misfolded protein accumulation in the ER. The combination of a reliable inducer of ER stress and natural genetic variation within a model organism population provides a great opportunity to identify modifiers of the ER stress response. A modifier gene is a gene that can influence the transcription of a gene or alter a phenotype at the molecular, cellular, or organismal level (30). Modifier genes of the ER stress response are potential therapeutic targets for human diseases with a large ER stress component. One example of using genetic variation in a model organism and TM-induced ER stress is a 2013 study that used Drosophila (23). This study design was able to uncover previously unknown ER stress genes and even identify genetic variation within canonical ER stress genes. In 2015, there was a similar study in mouse embryonic fibroblasts (MEFs) that uncovered hundreds of genes that varied in their response to ER stress across the mouse strains utilized (24). These genes represent potential modifiers of the ER stress 5 response. Chapter 2 of this dissertation expands upon this idea by using genetic variation in an in vivo mouse model and an ER stress inducer to identify more modifiers of the ER stress response. Despite the inability to utilize humans as a model organism, there are still methods in which we can harness the power of natural genetic variation within the human population to identify modifiers of the ER stress response. First is through the use of cell lines derived from humans. We can reliably induce ER stress with TM, similar to the model organisms, and measure the response through gene expression profiling across different genetic backgrounds. While there are limitations with cell lines, such as the inability to measure phenotypic outcomes and multi-tissue responses, this provides a window into the variable response to ER stress in humans and its potential link with disease. Another method in which we can harness the power of natural genetic variation within the human population is to utilize large online databases that attempt to categorize variants present in the population. These two methods are explored in Chapter 3 of this dissertation. Cis- and Trans- Regulatory Landscape Discordant transcriptional responses can be driven by both genetic and nongenetic causes. Non-genetic causes include both environmental factors and imprinting. This study is specifically interested in the genetic causes of the variable transcriptional output. In order to study these genetic causes, non-genetic causes must be controlled for. Environmental causes in a mouse model can be controlled for by exposing each mouse to the same external conditions. Imprinting effects can be accounted for by maintaining the 6 same strain as the maternal or paternal source of genetic information for each cross. By controlling for these non-genetic causes, we can attribute any discordant transcriptional response to genetic causes. Genetic influence over differential transcription levels includes two classes of regulation: cis- and trans- (Figure 1.2). A cis- regulatory variant influences the expression of a gene it is physically linked to. One example of a cis- effect includes a polymorphism within a promoter of a gene which alters expression levels. A transregulatory variant influences an unlinked gene, often physically distant from the variant. One example of a trans- effect is a polymorphism in a transcription factor which impacts binding efficacy for a wide range of genes across the genome. Natural genetic variation within a population has been shown to act through both cis- and trans- regulatory mechanisms, which can impact both adaptation and divergence between species. However, it has been shown that cis- regulatory effects have quantitatively greater effects on gene expression than trans- regulatory effects (24, 31, 32). More than 80% of testable genes exhibit some kind of cis- regulatory variation (24, 33). While there is a majority of cis- effects contributing to gene expression differences, trans- effects can act in a dominant fashion, as opposed to cis- effects acting in an additive fashion (34). This dominant mechanism for trans- effects means that despite their low numbers they can still have a significant effect on phenotype. Identifying the regulatory mechanisms that impact gene expression is important as they have been shown to have a significant influence on variation within populations, with some having significant contributions to human disease (35–37). In fact, the 2015 study mentioned previously was able to identify regulatory mechanisms in MEFs that 7 influenced the variable ER stress response (24). This study found that many of the variable ER stress responsive genes with regulatory effects have been implicated in human disease. Chapter 2 of this dissertation also identifies cis- and trans- mechanisms affecting the ER stress response, identifying potential links to human disease. However, this dissertation expands on the 2015 study by performing the experiments in an in vivo mouse model. This provides a more comprehensive multi-tissue view on the cis- and trans- effects of the ER stress response. Transcription Factor Binding Variability Identifying a gene that is impacted by cis- regulatory effects still does not reveal the exact way in which a genetic variant is exerting its influence on the gene’s expression levels. One potential mechanism that is explored in this dissertation is that of variable transcription factor (TF) binding. Transcription factors have unique DNA sequences that they preferentially bind to known as a ‘binding motif’ (38, 39). Transcription factor binding sites (TFBSs) may be located near, within, or even far away from genes that they regulate (40). Genetic variation within a binding site of a TF on the DNA strand can have significant effects on the binding efficacy of said TF (41–43). Given that the binding of TFs to their TFBSs regulates both the timing and amount of gene transcription, the introduction of a SNP within the TFBS can have far reaching implications in terms of disease risk, onset, and severity. This provides a heritable genetic model for genetic variation within non-coding regions of the genome and how variation can impact human health. There has been previous work done that supports this hypothesis. It was shown 8 that TFBSs make up 31% of GWAS SNPs, yet only comprise 8% of the human genome (44, 45). Additionally, up to 21.6% of variants that affect gene expression overlap annotated TFBSs (43, 46). SNPs within TFBSs that lead to variable binding efficacy of TFs are highly enriched in a set of causal SNPs of various traits across multiple studies (43, 47). Functional experiments found there to be a significant correlation between variants within a TFBS and changes in gene expression (48). Finally, mutations in TFBSs not only have been found to alter gene expression, but to be associated with various human diseases such as type 2 diabetes, cancers, and increased cholesterol (45, 49–54). As mentioned previously, the UPR is composed of three main branches. Each branch, through distinctive steps, culminates in a large transcriptional response to misfolded proteins driven by three main transcription factors: ATF4, ATF6, and XBP1. The binding sites of these three transcription factors have been studied and well characterized in the literature (55–58). Genetic variants in the TFBSs of these three UPR TFs could potentially be some of the cis- regulatory effects discovered in previous studies and described in Chapter 2 of this dissertation. Cis- regulatory effects could be exerted through variable binding efficacy of the three TFs, variable gene expression of UPR target genes, and ultimately a variable ER stress response which could underlie human disease (Figure 1.3). Chapter 3 of this dissertation aims to identify these cis- regulatory effects that act through TF binding in the human population. Chapter 2 and Chapter 3 of this dissertation together are a step towards better understanding the regulatory landscape of the ER stress response. 9 Figure 1.1 The unfolded protein response. 10 Figure 1.2 Cis- and trans- regulation. 11 Figure 1.3 Impacts of genetic variation on transcription factor binding. A) A schematic representation of a transcription factor binding to its preferred binding site on DNA to upregulate a UPR target gene. B) An example of how a SNP within a transcription factor binding site might impact transcription factor binding and gene expression levels. 12 References 1. B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, P. Walter, Molecular Biology of the Cell (Garland Science, ed. 4, 2002). 2. J. A. Olzmann, R. R. Kopito, J. C. Christianson, The mammalian endoplasmic reticulum-associated degradation system. Cold Spring Harb. Perspect. Biol. 5, a013185 (2013). 3. A. Helenius, M. Aebi, Roles of N-linked glycans in the endoplasmic reticulum. Annu. Rev. Biochem. 73, 1019–1049 (2004). 4. U. Schubert, L. C. Antón, J. Gibbs, C. C. Norbury, J. W. Yewdell, J. R. Bennink, Rapid degradation of a large fraction of newly synthesized proteins by proteasomes. Nature 404, 770–774 (2000). 5. C. J. Guerriero, J. L. Brodsky, The delicate balance between secreted protein folding and endoplasmic reticulum-associated degradation in human physiology. Physiol. Rev. 92, 537–576 (2012). 6. D. Ron, P. Walter, Signal integration in the endoplasmic reticulum unfolded protein response. Nat. Rev. Mol. Cell Biol. 8, 519–529 (2007). 7. H. Yoshida, T. Matsui, A. Yamamoto, T. Okada, K. Mori, XBP1 mRNA is induced by ATF6 and spliced by IRE1 in response to ER stress to produce a highly active transcription factor. Cell 107, 881–891 (2001). 8. K. Haze, H. Yoshida, H. Yanagi, T. Yura, K. Mori, Mammalian transcription factor ATF6 is synthesized as a transmembrane protein and activated by proteolysis in response to endoplasmic reticulum stress. Mol. Biol. Cell 10, 3787– 3799 (1999). 9. H. P. Harding, Y. Zhang, D. Ron, Protein translation and folding are coupled by an endoplasmic-reticulum-resident kinase. Nature 397, 271–274 (1999). 10. K. M. Vattem, R. C. Wek, Reinitiation involving upstream ORFs regulates ATF4 mRNA translation in mammalian cells. Proc. Natl. Acad. Sci. U.S.A. 101, 11269– 11274 (2004). 11. P. Walter, D. Ron, The unfolded protein response: from stress pathway to homeostatic regulation. Science 334, 1081–1086 (2011). 12. I. Tabas, D. Ron, Integrating the mechanisms of apoptosis induced by endoplasmic reticulum stress. Nat. Cell Biol. 13, 184–190 (2011). 13. L. Wang, B. Popko, R. P. Roos, The unfolded protein response in familial amyotrophic lateral sclerosis. Hum. Mol. Genet. 20, 1008–15 (2011). 13 14. L. Wang, B. Popko, R. P. Roos, An enhanced integrated stress response ameliorates mutant SOD1-induced ALS. Hum. Mol. Genet. 23, 2629–38 (2014). 15. B. Song, D. Scheuner, D. Ron, S. Pennathur, R. J. Kaufman, Chop deletion reduces oxidative stress, improves beta cell function, and promotes cell survival in multiple mouse models of diabetes. J. Clin. Invest. 118, 3378–89 (2008). 16. S. Oyadomari, A. Koizumi, K. Takeda, T. Gotoh, S. Akira, E. Araki, M. Mori, Targeted disruption of the Chop gene delays endoplasmic reticulum stressmediated diabetes. J. Clin. Invest. 109, 525–32 (2002). 17. S. Wang, R. J. Kaufman, The impact of the unfolded protein response on human disease. J. Cell Biol. 197, 857–867 (2012). 18. G. Auf, A. Jabouille, S. Guérit, R. Pineau, M. Delugin, M. Bouchecareilh, N. Magnin, A. Favereaux, M. Maitre, T. Gaiser, A. von Deimling, M. Czabanka, P. Vajkoczy, E. Chevet, A. Bikfalvi, M. Moenner, Inositol-requiring enzyme 1alpha is a key regulator of angiogenesis and invasion in malignant glioma. Proc. Natl. Acad. Sci. U.S.A. 107, 15553–8 (2010). 19. M. Sado, Y. Yamasaki, T. Iwanaga, Y. Onaka, T. Ibuki, S. Nishihara, H. Mizuguchi, H. Momota, R. Kishibuchi, T. Hashimoto, D. Wada, H. Kitagawa, T. K. Watanabe, Protective effect against Parkinson’s disease-related insults through the activation of XBP1. Brain Res. 1257, 16–24 (2009). 20. C. Hetz, P. Thielen, S. Matus, M. Nassif, F. Court, R. Kiffin, G. Martinez, A. Cuervo, R. Brown, L. Glimcher, XBP-1 deficiency in the nervous system protects against amyotrophic lateral sclerosis by increasing autophagy. Genes Dev. 23, 2294–2306 (2009). 21. R. R. Nayak, W. E. Bernal, J. W. Lee, M. J. Kearns, V. G. Cheung, Stress-induced changes in gene interactions in human cells. Nucleic Acids Res. 42, 1757–1771 (2014). 22. B. A. Dombroski, R. R. Nayak, K. G. Ewens, W. Ankener, V. G. Cheung, R. S. Spielman, Gene expression and genetic variation in response to endoplasmic reticulum stress in human cells. Am. J. Hum. Genet. 86, 719–729 (2010). 23. C. Y. Chow, M. F. Wolfner, A. G. Clark, Using natural variation in Drosophila to discover previously unknown endoplasmic reticulum stress genes. Proc. Natl. Acad. Sci. U.S.A. 110, 9013–8 (2013). 24. C. Y. Chow, X. Wang, D. Riccardi, M. F. Wolfner, A. G. Clark, The genetic architecture of the genome-wide transcriptional response to ER stress in the mouse. PLoS Genet. 11, e1004924 (2015). 25. R. Sano, J. C. Reed, ER stress-induced cell death mechanisms. Biochim. Biophys. Acta Mol. Cell Res. 1833, 3460–3470 (2013). 14 26. J. S. Tkacz, O. Lampen, Tunicamycin inhibition of polyisoprenyl Nacetylglucosaminyl pyrophosphate formation in calf-liver microsomes. Biochem. Biophys. Res. Commun. 65, 248–57 (1975). 27. B. Feng, X. Huang, D. Jiang, L. Hua, Y. Zhuo, D. Wu, Endoplasmic reticulum stress inducer Tunicamycin alters hepatic energy homeostasis in mice. Int. J. Mol. Sci. 18 (2017). 28. K. W. Choy, Y. S. Lau, D. Murugan, M. R. Mustafa, Chronic treatment with paeonol improves endothelial function in mice through inhibition of endoplasmic reticulum stressmediated oxidative stress. PLoS One. 12 (2017). 29. E. Bieberich, Synthesis, processing, and function of N-glycans in N-glycoproteins. Adv. Neurobiol. 9, 47–70 (2014). 30. J. H. Nadeau, Modifier genes in mice and humans. Nat. Rev. Genet. 2, 165 (2001). 31. R. B. Brem, G. Yvert, R. Clinton, L. Kruglyak, Genetic dissection of transcriptional regulation in budding yeast. Science 296, 752–755 (2002). 32. K. A. Hughes, J. F. Ayroles, M. M. Reedy, J. M. Drnevich, K. C. Rowe, E. A. Ruedi, C. E. Cáceres, K. N. Paige, Segregating variation in the transcriptome: cis regulation and additivity of effects. Genetics 173, 1347–1355 (2006). 33. J. J. Crowley, V. Zhabotynsky, W. Sun, S. Huang, I. K. Pakatci, Y. Kim, J. R. Wang, A. P. Morgan, J. D. Calaway, D. L. Aylor, Z. Yun, T. A. Bell, R. J. Buus, M. E. Calaway, J. P. Didion, T. J. Gooch, S. D. Hansen, N. N. Robinson, G. D. Shaw, J. S. Spence, C. R. Quackenbush, C. J. Barrick, R. J. Nonneman, K. Kim, J. Xenakis, Y. Xie, W. Valdar, A. B. Lenarcic, W. Wang, C. E. Welsh, C.P. Fu, Z. Zhang, J. Holt, Z. Guo, D. W. Threadgill, L. M. Tarantino, D. R. Miller, F. Zou, L. McMillan, P. F. Sullivan, F. Pardo-Manuel de Villena, Analyses of allele-specific gene expression in highly divergent mouse crosses identifies pervasive allelic imbalance. Nat. Genet. 47, 353–360 (2015). 34. B. Lemos, L. O. Araripe, P. Fontanillas, D. L. Hartl, Dominance and the evolutionary accumulation of cis- and trans-effects on gene expression. Proc. Natl. Acad. Sci. U.S.A. 105, 14471–14476 (2008). 35. G. A. Wray, The evolutionary significance of cis-regulatory mutations. Nat. Rev. Genet. 8, 206–216 (2007). 36. G. M. Cooper, J. Shendure, Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat. Rev. Genet. 2011 129. 12, 628–640 (2011). 37. F. W. Albert, L. Kruglyak, The role of regulatory variation in complex traits and disease. Nat. Rev. Genet. 16, 197–212 (2015). 15 38. V. Boeva, Analysis of genomic sequence motifs for deciphering transcription factor binding and transcriptional regulation in eukaryotic cells. Front. Genet. 7, 24 (2016). 39. D. M. Suter, Transcription factors and DNA play hide and seek. Trends Cell Biol. 30, 491–500 (2020). 40. C. Y. Lin, V. B. Vega, J. S. Thomsen, T. Zhang, L. K. Say, M. Xie, P. C. Kuo, L. Lipovich, D. H. Barnett, F. Stossi, A. Yeo, J. George, V. A. Kuznetsov, K. L. Yew, H. C. Tze, N. Palanisamy, L. D. Miller, E. Cheung, B. S. Katzenellenbogen, Y. Ruan, G. Bourque, C. L. Wei, E. T. Liu, Whole-genome cartography of estrogen receptor alpha binding sites. PLoS Genet. 3, 0867–0885 (2007). 41. A. Martin-Trujillo, N. Patel, F. Richter, B. Jadhav, P. Garg, S. U. Morton, D. M. McKean, S. R. DePalma, E. Goldmuntz, D. Gruber, R. Kim, J. W. Newburger, G. A. P. Jr., A. Giardini, D. Bernstein, M. Tristani-Firouzi, J. G. Seidman, C. E. Seidman, W. K. Chung, B. D. Gelb, A. J. Sharp, Rare genetic variation at transcription factor binding sites modulates local DNA methylation profiles. PLoS Genet. 16, e1009189 (2020). 42. A. D. Johnston, C. A. Simões-Pires, T. V. Thompson, M. Suzuki, J. M. Greally, Functional genetic variants can mediate their regulatory effects through alteration of transcription factor binding. Nat. Commun. 10, 1–16 (2019). 43. C.C. Tseng, M.C. Wong, W.T. Liao, C.J. Chen, S.C. Lee, J.H. Yen, S.J. Chang, Genetic variants in transcription factor binding sites in humans: triggered by natural selection and triggers of diseases. Int. J. Mol. Sci. 22, 22 (2021). 44. ENCODE Project Consortium, An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). 45. S. S. Nishizaki, N. Ng, S. Dong, R. S. Porter, C. Morterud, C. Williams, C. Asman, J. A. Switzenberg, A. P. Boyle, Predicting the effects of SNPs on transcription factor binding affinity. Bioinformatics 36, 364 (2020). 46. 1000 Genome Project Consortium, A. Auton, L. Brooks, R. Durbin, E. Garrison, H. Kang, J. Korbel, J. Marchini, S. McCarthy, G. McVean, G. Abecasis, A global reference for human genetic variation. Nature 526, 68–74 (2015). 47. J. Yan, Y. Qiu, A. M. Ribeiro dos Santos, Y. Yin, Y. E. Li, N. Vinckier, N. Nariai, P. Benaglio, A. Raman, X. Li, S. Fan, J. Chiou, F. Chen, K. A. Frazer, K. J. Gaulton, M. Sander, J. Taipale, B. Ren, Systematic analysis of binding of transcription factors to noncoding variants. Nature 591, 147–151 (2021). 48. M. Kasowski, F. Grubert, C. Heffelfinger, M. Hariharan, A. Asabere, S. M. Waszak, L. Habegger, J. Rozowsky, M. Shi, A. E. Urban, M.Y. Hong, K. J. Karczewski, W. Huber, S. M. Weissman, M. B. Gerstein, J. O. Korbel, M. Snyder, Variation in transcription factor binding among humans. Science 328, 232 (2010). 16 49. M. P. Fogarty, M. E. Cannon, S. Vadlamudi, K. J. Gaulton, K. L. Mohlke, Identification of a regulatory variant that binds FOXA1 and FOXA2 at the CDC123/CAMK1D type 2 diabetes GWAS locus. PLoS Genet. 10 (2014). 50. K. J. Gaulton, T. Nammo, L. Pasquali, J. M. Simon, P. G. Giresi, M. P. Fogarty, T. M. Panhuis, P. Mieczkowski, A. Secchi, D. Bosco, T. Berney, E. Montanya, K. L. Mohlke, J. D. Lieb, J. Ferrer, A map of open chromatin in human pancreatic islets. Nat. Genet. 42, 255–259 (2010). 51. K. Musunuru, A. Strong, M. Frank-Kamenetsky, N. E. Lee, T. Ahfeldt, K. V. Sachs, X. Li, H. Li, N. Kuperwasser, V. M. Ruda, J. P. Pirruccello, B. Muchmore, L. Prokunina-Olsson, J. L. Hall, E. E. Schadt, C. R. Morales, S. Lund-Katz, M. C. Phillips, J. Wong, W. Cantley, T. Racie, K. G. Ejebe, M. Orho-Melander, O. Melander, V. Koteliansky, K. Fitzgerald, R. M. Krauss, C. A. Cowan, S. Kathiresan, D. J. Rader, From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–719 (2010). 52. M. M. Pomerantz, N. Ahmadiyeh, L. Jia, P. Herman, M. P. Verzi, H. Doddapaneni, C. A. Beckwith, J. A. Chan, A. Hills, M. Davis, K. Yao, S. M. Kehoe, H. J. Lenz, C. A. Haiman, C. Yan, B. E. Henderson, B. Frenkel, J. Barretina, A. Bass, J. Tabernero, J. Baselga, M. M. Regan, J. R. Manak, R. Shivdasani, G. A. Coetzee, M. L. Freedman, The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nat. Genet. 41, 882–884 (2009). 53. D. Savic, H. Ye, I. Aneas, S. Y. Park, G. I. Bell, M. A. Nobrega, Alterations in TCF7L2 expression define its role as a key regulator of glucose metabolism. Genome Res. 21, 1417–1425 (2011). 54. M. L. Stitzel, P. Sethupathy, D. S. Pearson, P. S. Chines, L. Song, M. R. Erdos, R. Welch, S. C. J. Parker, A. P. Boyle, L. J. Scott, E. H. Margulies, M. Boehnke, T. S. Furey, G. E. Crawford, F. S. Collins, Global epigenomic analysis of primary human pancreatic islets provides insights into type 2 diabetes susceptibility loci. Cell Metab. 12, 443–455 (2010). 55. F. Siu, P. J. Bain, R. Leblanc-Chaffin, H. Chen, M. S. Kilberg, ATF4 Is a mediator of the nutrient-sensing response pathway that activates the human asparagine synthetase gene. J. Biol. Chem. 277, 24120–24127 (2002). 56. K. Mori, T. Kawahara, H. Yoshida, H. Yanagi, T. Yura, Signalling from endoplasmic reticulum to nucleus: transcription factor with a basic-leucine zipper motif is required for the unfolded protein-response pathway. Genes Cells 1, 803– 817 (1996). 57. K. Mori, A. Sant, K. Kohno, K. Normington, M. J. Gething, J. F. Sambrook, A 22 bp cis-acting element is necessary and sufficient for the induction of the yeast KAR2 (BiP) gene by unfolded proteins. EMBO J. 11, 2583 (1992). 17 58. H. Yoshida, K. Haze, H. Yanagi, T. Yura, K. Mori, Identification of the cis-acting endoplasmic reticulum stress response element responsible for transcriptional induction of mammalian glucose-regulated proteins: involvement of basic leucine zipper transcription factors. J. Biol. Chem. 273, 33741–33749 (1998). CHAPTER 2 THE DYNAMIC EFFECT OF REGULATORY GENETIC VARIATION ON THE IN VIVO ER STRESS TRANSCRIPTIONAL RESPONSE The following chapter is a preprint manuscript co-authored by Clement Y. Chow. An older version of the preprint and supplementary material are available at bioRxiv: https://www.biorxiv.org/content/10.1101/2021.03.12.435172v1 Abstract Genotype x Environment (GxE) interactions occur when environmental conditions change the effect of a genetic variant. In order to better understand the effect of genetic variation, we need to incorporate multiple environments into our analyses. Many variants, under control conditions, may be silent or even have the opposite effect under stress conditions. This study uses an in vivo mouse model to investigate how the effect of genetic variation changes with tissue type and cellular stress. Endoplasmic reticulum (ER) stress occurs when misfolded proteins accumulate in the ER. This triggers the unfolded protein response (UPR), a large transcriptional response which attempts to restore homeostasis. This transcriptional response, despite being a conserved, basic cellular process, is highly variable across different genetic backgrounds, making it an 19 ideal system to study GxE effects. In this study, we sought to better understand how genetic variation alters expression across tissues, in the presence and absence of ER stress. The use of different mouse strains and their F1s allow us to also identify context specific cis- and trans- regulatory variation underlying variable transcriptional responses. We found hundreds of genes that respond to ER stress in a tissue- and/or genotypedependent manner. The majority of regulatory variation underlying these variable transcriptional responses derive from cis- regulatory variation and are unique to a given tissue or ER stress state. This study demonstrates the need for incorporating multiple environments in future studies to better elucidate the effect of any particular genetic factor in basic biological pathways, like the ER stress response. Introduction Genetic variation rarely acts in isolation and changes in environmental conditions can drastically alter the effect of a particular variant (1–4). This genotype x environment (GxE) effect is pervasive in biology, where the environment can be broadly defined as anything that is not genetic, including external conditions, tissue type, and cellular stressors (5). A GxE effect is defined as an interaction between multiple genotypes and environmental factors, where genotypes respond to different environments in a nonadditive way (6). GxE effects are particularly evident in RNA levels where contextspecific changes in expression are the norm (5, 7–9). GxE studies are most commonly performed with genetically diverse individuals and a single change in context. For example, the GTEx consortium has made significant progress in understanding the genetic architecture underlying gene expression across tissues in healthy human 20 individuals (10). The use of mice to study GxE interactions allow for the control of environment but rarely include multiple environmental conditions, such as tissue type and disease conditions. Without this complexity, we risk missing important elements of these interactions. The endoplasmic reticulum (ER) stress response provides an opportunity to evaluate how GxE interactions alter gene expression across tissue type and stress conditions. The ER is often the largest cellular organelle, in terms of cellular membrane, and is a major site of protein and lipid synthesis, protein folding, and calcium storage (11). ER stress occurs when misfolded proteins accumulate in the ER lumen because of overwhelming protein folding demands or improper protein folding (12). Cells respond to ER stress with the well conserved unfolded protein response (UPR), a coordinated series of cellular processes that reduces ER protein load and increases the ability of the ER to clear misfolded proteins (13, 14). The UPR consists of changes in gene expression, reduction of protein translation, and degradation of misfolded proteins (ER associated degradation or ERAD). Gene expression changes are initiated through the three main signaling branches of the UPR: IRE1, ATF6, and PERK (15). If the UPR is unable to restore ER homeostasis, it will activate apoptosis pathways. The response to misfolded proteins is essential for maintenance of basal cellular conditions and critical when there are any changes in the cellular environment, such as cellular differentiation and increased protein secretion (15). Despite being a conserved, basic cellular function, the ER stress response is subject to inter-individual variation in Drosophila, mouse, and humans (16–19). It is only recently that we are beginning to appreciate the impact of genetic variation on the ER 21 stress response. In Drosophila, we used natural genetic variation to demonstrate that susceptibility to ER stress is highly variable across strains and is associated with SNPs in both canonical and novel ER stress genes (16, 17). We also used mouse embryonic fibroblasts (MEFs) from inbred mouse strains to uncover a complex genetic architecture underlying the variable transcriptional response to ER stress among different genetic backgrounds (18). Another study demonstrated that the ER stress response is variable among immortalized human B cells from diverse individuals (19). These studies nicely demonstrate a GxE effect, where different genotypes show vastly different transcriptional responses, dependent on the presence or absence of ER stress. We previously showed that the highly variable ER stress transcriptional response in MEFs, derived from different genetic backgrounds, is driven by cis- and transregulatory variation (18). There is an entire layer of genetic variation that remains silent under healthy conditions, but alters expression under ER stress. We also found that ER stress can have a profound effect on allele-specific expression (ASE) in both magnitude and direction. However, all past mouse and human studies examining the GxE interactions under ER stress utilized in vitro cell culture. Extensive GxE studies in other contexts have shown that genetic architecture changes drastically across different tissue types (10, 20, 21). In this study, we addressed these limitations by mapping the genetic architecture underlying GxE interactions during ER stress, in vivo, across tissues in the mouse. Here we report that tissue type and ER stress have strong effects on how genetic variation impacts transcript levels. We identified hundreds of genes that showed variability in their response to ER stress that were dependent on tissue type and genetic 22 background in mouse. These genes are enriched for processes, such as metabolism, inflammation, and gene expression. Strikingly, in contrast to previous studies where noncanonical ER stress genes were found to be variable (18), we found that genotypedependent ER stress response genes in vivo are enriched for terms with clear roles in the ER stress response, indicating that at least some variability in response is derived from canonical ER stress genes. Our study design employed F1 mouse crosses to uncover the cis- and trans- regulatory variation that underlies the variable ER stress transcriptional response. We found a complex and context-specific regulatory landscape that underlies variability observed in the in vivo mouse. These cis- and trans- regulatory effects are context-specific in both number and strength. This study expands the understanding of the complex ER stress transcriptional response and the cis-/trans- regulatory variation that impacts this network in different environmental contexts. Together, these findings have implications for identifying ER stress response modifiers, better interpretation of regulatory variation, and more comprehensive knowledge of the elements that make up the highly variable ER stress response. Results In vivo ER stress induced by Tunicamycin To evaluate the extent to which ER stress and tissue type affects gene expression in diverse genetic backgrounds, we induced ER stress in different strains of mice. We administered Tunicamycin (TM) or DMSO (control) with an intraperitoneal injection to male mice from strains C57BL/6J (B6), CAST/EiJ (CAST), and their F1 progeny. TM induces ER stress by inhibiting protein N-glycosylation in the ER, causing an 23 accumulation of misfolded proteins (22). For this study, we focused on liver and kidney. The liver and kidney are tissues that rely heavily on protein transport and secretion. Therefore, proper ER function and response to ER stress plays a large part in their function. In fact, ER stress and aberrant protein trafficking is pathogenic in a large number of liver and kidney diseases (23, 24). An XBP1 splicing assay and RT-qPCR of BiP, a canonical ER stress gene, show that TM injection induced a strong ER stress response (25). To determine the full transcriptional response to TM-induced ER stress, kidney and liver samples were analyzed by RNA-seq. ER stress-induced gene expression changes were identified by comparing control and TM samples in a tissue- and genotypespecific manner. At a cutoff of 1.5-fold (5% FDR) change in transcript level, Gene Ontology (GO) enrichment analyses revealed enrichment for canonical ER stress response genes in all tissues and genotypes. Many canonical ER stress response genes, including Hyou1, Hspa5 (BiP), Ddit3 (Chop), and Herpud1 (26–29) (Figure 2.1), were significantly upregulated in both tissues and all three genotypes, indicating a strong UPR. As expected, injection of TM induces a robust in vivo ER stress-response. ER stress-induced fold-change in different tissues We first characterized how gene expression is affected by ER stress in liver and kidney in each strain independently. In B6, there were 3,071 (1,310 upregulated; 1,761 downregulated) and 3,433 genes (1,944 upregulated; 1,489 downregulated) that showed a significant change in expression post ER stress, in liver and kidney, respectively. As expected, in both tissues, pathways like the ER unfolded protein response (GO: GO:0030968; liver: q=6.5x10-09; kidney: q=6.6x10-12) were enriched. We next compared 24 genes and pathways uniquely affected in each tissue when experiencing ER stress. In B6, of the 5,077 genes affected by ER stress in either tissue, there were 1,751 genes (1751/5077; 34%) whose ER stress-induced expression was different between the two tissues (cutoff of q <0.05). More than a third of the response genes in B6 show tissue effects emphasizing the tissue-specificity of this response. Four of the top six functional enrichment terms for the tissue-effect genes all relate to metabolism (lipid metabolism, GO:0006629, q=3.7x10-19; fatty acid metabolism, GO:0006631, q=3.9x10-08; metabolic process, GO:0008152, q=5.2x10-06; cholesterol metabolic process, GO:0008203, q=2.2x10-04), indicating a tissue-specific metabolic response to ER stress. Functional enrichment of genes whose ER stress response was not dependent on tissue type reveals many canonical ER stress categories such as endoplasmic reticulum unfolded protein response (GO:0030968, q=4.0x10-06), Golgi to ER retrograde vesicle-mediated transport (GO:0006890, q=1.1x10-04), and ER-associated ubiquitin-dependent protein catabolic process (GO:0030433, q=2.4x10-03). Next, we quantified how tissue type impacts the transcriptional response to ER stress in the distantly related CAST strain. In CAST, there were 3,749 (1,900 upregulated; 1,849 downregulated) and 2,841 genes (1,738 upregulated; 1,103 downregulated) in liver and kidney, respectively, that showed an ER stress-induced change in expression. As in B6, pathways related to the ER stress response (GO: GO:0030968; liver: q=1.5x10-08; kidney: q=2.4x10-13) were significantly enriched in both tissues. Of the 5,115 genes affected in either tissue, 1,054 genes showed tissue-dependent responses to ER stress (1054/5115; 21%). Functional enrichment revealed seven significant categories, three of these relate to metabolism, similar to what was observed 25 with B6, again indicating a tissue-specific metabolic response after ER stress induction (lipid metabolic process, GO:0006629, q=1.4x10-03; fatty acid metabolic process, GO:0006631, q=2.2x10-03; metabolic process, GO:0008152, q=3.9x10-03). Functional enrichment of tissue-independent genes reveals enrichment for ER stress terms (ER unfolded protein response, GO:0030968, q=3.9x10-08; response to unfolded protein, GO:0006986, q=9.4x10-07). Finally, we asked how tissue type impacts gene expression levels in response to ER stress in the B6/CAST F1 hybrid mouse. In the F1, there were 3,056 genes in liver (1,330 upregulated; 1,726 downregulated) and 2,806 genes in kidney (1,936 upregulated; 870 downregulated) that showed a significant change in expression post ER stress. As expected, these genes are enriched for the ER stress response (GO: GO:0030968; liver: q=2.2x10-09; kidney: q=4.6x10-12). Of the 4,738 genes affected in either tissue, 1,130 genes showed variable ER stress-induced expression between the two tissues in the F1 hybrid (1130/4738; 24%). We observed enrichment patterns similar to B6 and CAST. There was enrichment for metabolism in the tissue-dependent genes (metabolic process, GO:0008152, q=1.9x10-24; lipid metabolic process, GO:0006629, q=2.9x10-17; fatty acid metabolic process, GO:0006631, q=1.1x10-14; glutathione metabolic process, GO:0006749, q=9.5x10-07) and enrichment for ER stress (GO:0030968, q=1.3x10-07) in the tissue-independent genes. We found 1,623 genes, in all three genotypes, whose ER stress response was not affected by tissue type. As expected, functional enrichment of these genes includes many functions directly related to the ER stress response (ER unfolded protein response, GO:0030968, q=1.8x10-09; response to unfolded protein, GO:0006986, q=1.3x10-06). The 26 top significant functional category was ribosome biogenesis (GO:0042254, q=3.2x10-11) and the top cellular compartment enrichment category was for the nucleolus (GO:0005730, q=1.6x10-33). The nucleolus is integral to ribosome biogenesis. The nucleolus has been shown to play an active role in regulating cellular stress, with potential links to the ER stress response (30–32). This work not only supports the connection between the nucleolar stress response and the ER stress response but finds this pathway to be independent of tissue-type and genetic background, potentially indicating an essential role of ribosome biogenesis through the nucleolus in the ER stress response. Further functional studies with the addition of more tissues and genetic backgrounds can better elucidate the role and function of the nucleolar stress response in the context of ER stress. Many genes affected by ER stress exhibit a tissue-effect (B6: 34%; CAST: 21%; F1: 24%). Of these genes, this tissue-effect is observed in only one genotype (B6: 964/1751, 55%; CAST: 405/1064, 38%; F1: 444/1130, 39%). This genotype-effect on expression is explored further in the next section. When comparing the genes with tissuespecific effects on post ER stress expression, B6 was the only genotype to have significant functional enrichment for the ER stress response (GO:0034976; q=2.1x10-02), suggesting that B6 may harbor variation that affects how its tissues differentially utilize canonical ER stress response genes. For example, one gene in this category is Wfs1. Wfs1 is involved in calcium homeostasis in the ER and impacts the ER stress response (33, 34). Wfs1 has similar control levels in kidney and liver, but there is a Log2FC of 1.8 in kidney and a Log2FC of 3.2 in liver (Figure 2.2A). Wfs1 shows no tissue effect in CAST or the F1. Response to nutrients (q=3.2x10-03) was the only functional enrichment unique to the 27 CAST genotype. Many of these nutrient-sensing genes are involved in lipid transport, cholesterol transport, and lipid localization. This indicates a CAST- and tissue-specific response to ER stress involving lipid metabolism and localization. One of the top significant genes in CAST with a tissue-effect was Tsc22d1 (P-adj=6.64x10-24) which is significantly upregulated in kidney (Log2FC=1.077) and downregulated in liver (Log2FC=-2.7) (Figure 2.2B). Tsc22d1 is a transcription factor responsive to TGFβ and is thought to be a tumor suppressor gene (35). Functional enrichment in genes with a tissueeffect in the F1 revealed terms such as cellular amino acid biosynthetic process (q=4.6x10-02), protein homotetramerization (q=8.1x10-03), and fatty acid beta-oxidation (q=1.8x10-07). One of the top genes with tissue-dependent expression unique to F1 is Acot1 (q=2.02x10-06) (Figure 2.2C). Acot1 impacts the lipid composition of the cell, which can alter membrane components (36). This implicates, similar to CAST, a tissueand genotype-specific response to ER stress which involves membrane composition. ER stress-induced fold-change varies by genetic background To determine how genetic background impacts variation in ER stress-induced gene expression in each tissue, we examined genotype-effects of ER stress on expression in liver and kidney. In either tissue, based on a cutoff (1.5-fold cutoff; FDR 5%), the majority of genes are upregulated post-ER stress in only one or two of the genotypes (Figure 2.3). In liver, 2,330 (of 20,131, 11.5%) genes are significantly upregulated postER stress in at least one of the genotypes. Of these genes, 950 (41%) are uniquely upregulated in only one genotype, 550 (23%) are upregulated in two genotypes, and 830 (36%) are upregulated in all three genotypes (Figure 2.3). In kidney, 2,718 (of 20,661, 28 13.2%) genes are significantly upregulated post-ER stress in at least one of the genotypes. Of these genes, 971 (36%) are uniquely upregulated in only one genotype, 594 (22%) are upregulated in two genotypes, and 1153 (42%) are upregulated in all three genotypes (Figure 2.3). To identify genes with genotype-dependent ER stress-induced gene expression, we performed a likelihood-ratio test in each tissue. Ninety-one ER stress response genes in the liver were dependent on genetic background (91/5,143; 1.8% of ER stress regulated genes) (q<0.05). There was no functional enrichment among these genotypedependent ER stress response genes. Nevertheless, there were some striking observations. For example, many of these genes are involved in immunity, such as Rsad2, Tlr12, Zc3hav1, and Nlrp6. Other genes point to important genotype differences that may define how a genotype responds to stress. For example, Mrs2, which is involved in magnesium transportation into the mitochondria, shows one of the strongest genotype-dependent expressions post-ER stress (p-adj=7.93x10-06) (Figure 2.4A). Under control conditions, each genotype has similarly low levels of Mrs2. However, under ER stress, each genotype responds very differently (Log2FC: B6: -0.145; F1: 0.49; CAST: 1.23). Mrs2 is required for the normal function of the mitochondrial respiratory complex, which has been linked to ER stress outcomes (37–39). Mrs2 or any one of these immunity genes could play a role in the variable ER stress response given their genotype-dependent expression and their connection to the ER stress response (40, 41). In kidney, 117 genes showed a genetic background-dependent response to ER stress (117/4,813; 2.4% of ER stress regulated genes) (q<0.05). There was no functional enrichment among these genotype-dependent genes in the kidney. However, many of 29 these genes have functions that can affect the ER stress response. Some of these processes include amino acid transport (Slc7a5), apoptosis (Cib1, Egln3), regulation of transcription (Jdp2), and nucleotide exchange factors (Sil1) (Figure 2.4B-F). Further functional validation will be required to confirm the impact these genes have on the variable ER stress response. Similar to liver, there are a number of genes that are involved in the immune response in kidney. However, the exact immunity genes with genotype-effects are unique to kidney, such as Cdd36, Alcam, and Tapbpl. For the 208 genes that display genotype-dependent expression in either liver or kidney, we asked which genotype had the highest or lowest expression of the three genotypes. We found F1 to be the least likely (23%, 95/416) to have the outlying expression level, indicating that in most genes that show a genotype-effect (77%, 321/416) there is a potential additive effect between the three genotypes where F1 expression is intermediate to B6 and CAST expression. There were a similar number of genes in both tissues that displayed genotype-dependent expression (liver: 91; kidney: 117). Between the two tissues, nine genes showed a genotype effect in both tissues, which is greater than expected by random chance (p=1.08x10-05). Of these nine genes, some maintain similar expression patterns between the two tissues across the three genotypes, while some trend in different directions. For example, Ces1f displays genotype-dependent expression in both kidney and liver (kidney q=0.034; liver: q=7.18x10-08). However, the expression pattern is different between the two tissues. F1 Ces1f expression is the lowest of the three genotypes in liver, while F1 Ces1f expression is highest in kidney (Liver Log2FC: B6: 2.99, F1: -4.71, CAST: -3.22; Kidney: B6: -1.72, F1: -0.94, CAST: -2.32). 30 Identification of cis- and trans- regulatory variation The identification of a highly variable ER stress response across different genetic backgrounds and tissues implicates a network of regulatory variation that impacts the expression levels of a wide range of genes, through either cis- or trans- regulatory variation. A cis- regulatory variant influences the expression of a gene it is physically linked to. Due to this, allele-specific expression in a diploid, heterozygous animal (e.g. F1 hybrid mouse) is strong evidence for a cis- acting genetic variant in or near that expressed gene. An example of a cis- effect is a promoter polymorphism impacting the gene’s expression levels. A trans- regulatory variant influences an unlinked gene, often physically distant from the variant. As opposed to cis- acting variants, trans- acting variants affect both alleles equally, and consequently, differential expression of a gene between two inbred strains that cannot be explained by ASE in the F1 hybrid is most likely a result from a trans- acting variant. An example of a trans- effect is a polymorphism impacting a transcription factor that can then alter the expression of a wide range of genes across the genome. To quantify allele-specific expression and partition the effects of genetic variation on gene expression into cis- and trans- effects, we used F1 hybrid mice. Classification of cis- and trans- effects was performed using previously published methodology (18, 42). In order to determine if a gene is impacted by a cis- or trans- effect, we generated F1 hybrid mice by crossing the highly divergent parental strains; female B6 to male CAST. Transcripts from F1 mice can be assigned to a parental chromosome based on parental SNPs in the spliced transcript. ASE cannot be performed in genes that lack variants in the spliced transcript. cis- and trans- effects for a particular transcript are assigned by 31 comparing the ratio of allelic expression in the F1 to the ratio of total expression between the parental strains. In the F1 hybrid mouse, both parental alleles are exposed to the same trans- factors. Therefore, the ratio of allelic expression is a measure of cis- regulatory variation between the two parental strains. If the allelic ratio matches the parental expression ratio, the expression difference is attributed to cis- regulatory variation. If the allelic ratio differs from the parental expression ratio, the expression difference is attributed to trans- regulatory variation. With this allelic analysis in the F1 mouse, we can determine how many expression differences between B6 and CAST can be attributed to cis- effects, transeffects, or a combination of the two. We performed cis-/trans- analyses on liver and kidney under both control and TM conditions. Genes were assigned a cis- or transexpression pattern or a combination of both effects (FDR = 0.1%). For the remaining analyses, we focused on genes that displayed only a cis- or trans- effect. In liver, under control conditions, 580 transcripts displayed a cis- effect and 392 transcripts displayed a trans- effect (Figure 2.5A). Under TM conditions, 617 transcripts displayed a cis- effect and 449 transcripts displayed a trans- effect (Figure 2.5B). In kidney, 710 transcripts displayed a cis- effect and 288 transcripts displayed a trans- effect under control conditions (Figure 2.5C). Under TM conditions, 825 transcripts displayed a cis- effect and 230 transcripts displayed a trans- effect (Figure 2.5D). The majority of the regulatory variation is due to cis- regulatory variation. 32 ER stress reveals hidden regulatory variation unique to stress To determine whether ER stress alters the contribution of cis- and trans- effects to regulatory variation, we compared the proportion of transcripts displaying a cis- or transeffect in each tissue, under control and TM conditions. In liver, ER stress does not significantly alter the proportion of genes displaying a cis- effect (control: 0.59; TM: 0.58; P=0.413) (Figure 2.6A). However, in kidney, there is a small, but significant increase in the proportion of genes with a cis- effect under ER stress conditions (control: 0.71; TM: 0.78; P=0.00024) (Figure 2.6B). Because the ER stress transcriptional response involves hundreds of transcripts, it is likely that the actual genes showing cis- and trans- patterns are different under stress. To address this, the overlap under control and TM conditions, in both the liver and kidney were analyzed for genes showing cis- and trans- regulatory variation. We observed both cis- and trans- regulatory variation that was unique to control or stress conditions or present under both conditions. In liver, of the cis- effects detected under control or TM conditions, 37% (357/974) are unique to control, 40% (394/974) are unique to stress, and 23% (223/974) are common to both (Figure 2.6C). Of the trans- effects detected under control or TM conditions, 39% (288/737) are unique to control, 47% (345/737) are unique to stress, and 14% (104/737) are common to both. The magnitude of the common cis- or trans- effects observed in both conditions were highly correlated (r2=0.86, p<2.2x10-16), suggesting that this common, overlapping regulatory variation is not impacted by ER stress. In kidney, 30% (359/1183) of cis- effects are unique to control, 40% (474/1183) are unique to stress, and 30% (351/1183) are common to both (Figure 2.6C). Of the 33 trans- effects detected in kidney, 49% (224/454) are unique to control, 37% (166/454) are unique to stress, and 14% (64/454) are common to both. The magnitude of the common cis- or trans- effects that were observed in both conditions were highly correlated (r2=0.91, p<2.2x10-16) and likely not impacted by ER stress. The majority of genes that display a regulatory difference depend on the presence or absence of ER stress and those observed only in stress conditions may reveal critical components that might be responsible for the genotype-specific differences in the ER stress response. In some cases, canonical UPR genes display regulatory variation only under stress conditions. For example, Ire1α, one of the main signal transducers of ER stress, displayed a strong cis- regulatory effect in the mouse liver that is only detectable under stress conditions (Figure 2.6D). Under control conditions, Ire1α is expressed at similar levels by the B6 and CAST allele. Once ER stress is induced, the B6 allele is expressed 1.5-fold higher than the CAST allele. There are 691 SNPs within a ±2kb window of the Ire1α gene that differs between the B6 and CAST genotype. Any one or a combination of these SNPs could be contributing to this cis- regulatory difference. Genes that have not been implicated in the ER stress response, but show a strong, stress-specific regulatory difference, might represent novel UPR genes and pathways. For example, in the mouse kidney, the gene Nuak2 displayed a strong cis- effect seen only under stress conditions (Figure 2.6E). While under control conditions the Nuak2 alleles are equally expressed, but in the stressed F1 mouse kidney, the CAST allele is expressed 3-fold higher than the B6 allele. Nuak2 belongs to the AMPK protein kinase family and has mainly been linked to cancer (43, 44). Understanding Nuak2 and its role in AMPK regulation and ER stress can provide insight into the role of Nuak2 in human disease such as cancer. 34 ER stress reveals hidden regulatory variation unique to tissue type We next asked whether there was tissue-specific regulatory variation. We investigated the overlap of genes that displayed cis- or trans- regulatory variation between liver and kidney in control and TM conditions. Of the genes that display a ciseffect under control conditions, 38% (436/1,146) are unique to liver, 49% (566/1,146) are unique to kidney, and 13% (144/1,146) are common to both (Figure 2.6F). Of the genes that display a cis- effect under TM conditions, 36% (458/1,283) are unique to liver, 52% (666/1,283) are unique to kidney, and 12% (159/1,283) are common to both (Figure 2.6F). The magnitude of the cis- effects observed in both tissues were moderately correlated (Control: r2=0.425, p<2.2x10-16; TM: r2=0.408, p<2.2x10-16). Of the genes that displayed a trans- regulatory effect under control conditions, 55% (357/644) are unique to liver, 39% (252/644) are unique to kidney, and 6% (35/644) are common to both (Figure 2.6F). Of the genes that display a trans- regulatory effect under TM conditions, 65% (422/652) are unique to liver, 31% (203/652) are unique to kidney, and 4% (27/652) are common to both (Figure 2.6F). The magnitude of the transeffects observed in both tissues showed a small correlation only in control conditions (Control: r2=0.186, p=0.009; TM: r2=0.126, p=0.069). Under control and TM conditions, we found that more genes with a cis- effect were common between the two tissues than genes with a trans- effect (Control: χ2 p < 0.00001; TM: χ2 p < 0.00001). Under TM conditions, the majority of genes that displayed cis- or transregulatory variation were unique to either liver or kidney. Any one of these genes with a tissue- and stress-specific regulatory effect could be a gene involved in inter-individual variation in tissue-specific ER stress responses. For example, the genes Lama5, Hnf4a, 35 Scnn1b, and Pkd2 all display strong cis- regulatory variation under stress conditions in kidney that are not observed in the mouse liver. Each of these genes were previously implicated in kidney diseases, such as Liddle’s syndrome and Polycystic kidney disease (45–48). The kidney is a tissue that relies heavily on protein transport and secretion. Proper ER function and response to ER stress plays a large part in kidney function. In fact, ER stress and aberrant protein trafficking is pathogenic in a large number of kidney diseases (23). In liver, for genes such as Sidt2 and Adk, they display cis- regulatory variation that is unique to the stressed liver and have been associated with human fatty liver disease (49, 50). These genes with tissue-specific cis- regulatory variation are clear examples of how genetic variation has a differential impact across tissues. Magnitude of effect of regulatory variation on gene expression To determine the magnitude of the effect of cis- and trans- regulatory variation on gene expression levels, we compared the ratio of the absolute fold change of the parental B6 expression to the parental CAST expression for each gene displaying cis- or transregulatory variation. In liver, under control and TM conditions, cis- regulatory differences have a stronger effect on gene expression levels than trans- regulatory differences (control: P<0.003; TM: P<1.6x10-5) (Figure 2.7A). There was no difference in the magnitude of effect when comparing cis- or trans- regulatory variation across control and TM conditions in liver (cis-: P=0.26; trans-: P=0.93) (Figure 2.7A). We found a similar pattern in kidney. cis- regulatory differences have a stronger effect on gene expression levels than trans- regulatory differences in control and TM conditions (control: P<9.0x10-7; TM: P<7.06x10-4) (Figure 2.7B). Again, there was no difference in 36 the effect when comparing cis- or trans- regulatory variation across conditions in kidney (cis- P=0.25; trans- P=0.98) (Figure 2.7B). We compared the effects of cis- and trans- regulatory variation between liver and kidney to better understand the impact of tissue type on the strength of a regulatory effect. Under control conditions, there is no difference between the strength of cis- effects between the two tissues (P=0.170). However, under TM conditions, cis- effects in liver have a stronger effect on gene expression than in kidney (P<1.0x10-7) (Figure 2.7C). trans- effects, under control conditions, also showed no difference between tissues, but were stronger in liver than kidney under TM conditions (control:P=0.20; TM:P<1.6x10-3) (Figure 2.7D). Within a condition, cis- effects on average are stronger than trans- effects, but tissue type and ER stress have different effects on the strength of regulatory effects. Genetic variation within the stressed liver has a stronger effect on transcription than in any other environmental combinations. ER stress induced change in allele-specific expression Next, we tested whether ER stress changes the proportion of ASE – that is, do two alleles respond differently to stress. To do this, we compared the allelic ratio in the F1 under stress and control conditions (Fisher’s exact test; 5% FDR). A significant change in ASE post-ER stress was observed in 17% and 13% of expressed transcripts in liver (970/5669) and kidney (1010/7764), respectively (Figure 2.8A and C). The liver displayed a higher proportion of genes with a change in ASE under stress (χ2; p=0.00001). Change in ASE in both tissues was driven equally by CAST and B6 alleles, indicating that there is no unexpected hybrid effect. 37 The majority of genes that show an ER stress-induced change in ASE do not exhibit a change in their total transcript abundance (liver: 522/970 or 54%; kidney: 745/1010 or 74%) (Figure 2.8B and D). Genes that do exhibit a change in ASE and transcript level post ER stress fall in both up- and downregulated categories (Liver: 35% upregulated 156/448, 65% downregulated 292/448; Kidney: 78% upregulated 206/265, 22% downregulated 59/265) (Figure 2.8B and D) In all cases, the B6 and CAST alleles contributed equally to changes in ASE. Genes that show a significant change in ASE and in transcript levels post-ER stress are of particular interest, as this suggests differential allelic response to stress. Sesn2, which is involved in a variety of different stress responses (51), showed one of the most significant changes in ASE (Fisher’s exact; q<0.00001). Under control conditions, the B6 and CAST allele in the F1 hybrid mouse are expressed at equal levels (B6: 0.53; CAST: 0.47). Total Sesn2 transcript responded to TM conditions with a 13-fold increase. At the allelic level, the CAST allele is increased 24-fold, while the B6 allele is only increased 3-fold (TM: B6: 0.13, CAST: 0.87) (Figure 2.9A). The large increase of the CAST allele is driving the Sesn2 transcriptional response to TM-induced ER stress in the F1. In the parental strains, there is a similar bias in terms of Sesn2 upregulation post-ER stress. The B6 parental strain has a 6-fold increase in Sesn2 expression while the CAST parental strain has a 22-fold increase. This strong parental and allelic response indicates that Sesn2 contains a strain- and ER stress-specific cis- element that drives differential transcriptional response to ER stress and potentially other stress stimuli such as hypoxia and reactive oxygen species (51). We see similar patterns for genes that are downregulated post-ER stress. Presenilin 2 (Psen2), which cleaves proteins such as APP 38 (amyloid-beta precursor protein) (52) and has been shown to cause Alzheimer’s Disease, displays a change in ASE (Fisher’s exact; q=0.00001) and a 2.7-fold decrease in RNA transcript levels (Figure 2.9B). Under control conditions, the B6 allele is more highly expressed (B6: 0.72; CAST: 0.28). Under stress conditions, the Psen2 B6 allele decreases 4.2-fold in expression while the CAST allele only decreases 1.6-fold. The greater reduction of the B6 allele results in near equal expression levels of the two alleles under stress conditions (B6: 0.47; CAST: 0.53). The is mirrored in the parental strains, B6 has a 2.2-fold decrease in Psen2 expression while CAST has only a 1.1-fold decrease. This again indicates a strain- and ER stress-specific regulatory element that drives this differential response of Psen2 to ER stress in different genetic backgrounds. As mentioned above, the majority of genes that have a change in ASE post-ER stress do not have significant changes in transcript levels. Phosphatidylcholine transfer protein (Pctp) shows a significant change in ASE, but no change in RNA transcript levels (Figure 2.9C). Under control conditions, the B6 allele accounts for 63% of allelic expression, while under TM conditions, the B6 allele accounts for only 30% of allelic expression (Fisher’s exact; q=0.00001). While the ratio of expression between alleles is significantly changed under TM conditions, the net result is no significant change in total RNA transcript levels, suggesting a possible compensatory mechanism between the two alleles. Similar patterns emerge in kidney with a wide range of genes (Figure 2.9D-F). The majority of ASE post ER stress is tissue-specific. Only 210 transcripts display a change in ASE post-ER stress common to both tissues (Liver: 210/970, 22%; Kidney: 210/1010, 21%). For these common genes, tissue type had a strong effect on the magnitude of the ASE changes post ER stress, in line with what we observed with tissue- 39 specific changes in magnitude of cis- and trans- regulatory variation. For example, in Cathepsin L (Ctsl) (Figure 2.10A), the CAST allele is more ER stress responsive in kidney, while in liver, the B6 allele is more responsive. Ctsl, which is involved in lysosomal protein degradation, is upregulated in both liver (FC= 4.3) and kidney (FC= 3.2) post-ER stress. Ctsl also shows a change in ASE in liver (q=0.017) and kidney (q=0.0002). However, in liver, the CAST allele was responsible for only 55% of the increase in expression levels, but in kidney, the CAST allele was responsible for 71% of the increase in expression levels. This pattern was also observed in downregulated genes. Flavin containing dimethylaniline monoxygenase 1 (Fmo1) (Figure 2.10B), which is involved in the oxidation reduction process, is downregulated in both liver (FC=-3.54) and kidney (FC=-1.5) and shows a change in ASE in liver (q=7x10-5) and kidney (q<0.00001). The CAST allele in liver accounts for 80% of the expression downregulation, but only 28% of the downregulation in kidney. Transcription factor binding site enrichment in common and variable ER stress genes One potential way a cis- regulatory variant could impact ASE differences is by affecting transcription factor binding. The large transcriptional response to ER stress is partly driven by three major transcription factors, XBP1, ATF6, and ATF4, binding in cis- near a gene (53). While we cannot identify where these transcription factors are binding with this study design, we hypothesized that at least some of the cis- regulatory variation sits within these known transcription factor binding sites, thus leading to ASE. We searched for enrichment of these known TF binding sites near ER stress responsive 40 genes from this study. First, we identified TF binding site enrichment in the ER stress responsive genes common to all genotypes and tissues. Some transcription factor binding sites are associated with canonical ER stress response genes. Two of these sites, both bound in conjunction by NFYA and ATF6, are called the ER Stress Response Elements I and II (ERSEI and ERSEII) (54, 55). Within these common genes, NFYA binding sites were most significantly enriched (Z-score = 54.8). There was also enrichment for CEBPA binding sites (Z-score = 3.4), which can bind in conjunction with CHOP (56), a downstream component of the UPR involved in ER stress induced apoptosis signaling (57). CHOP has also been implicated in regulation of metabolism, especially in the mouse liver, and inflammation (56–59). We also observed enrichment for KLF4 binding sites, a transcription factor involved in proliferation and apoptosis (Z-score = 24.3) (60). HIF1A binding sites were also highly enriched (Z-score = 18.5). HIF1A is a hypoxia induced transcription factor that is involved in oxygen homeostasis (61). There is significant overlap in genes between the hypoxic response pathways and the ER stress response (62). We also see significant enrichment for Nf-kappaB which is involved in immunity and inflammation (Z-score = 3.2) (63). Next, we analyzed the set of variable ER stress responsive genes that showed genotype-dependent expression post-ER stress. For genes variable in liver, we found enrichment for TFs involved in diseases such as cancer (Evi1, Z-score=9.9) and diabetes (Foxa2, Z-score=8.5) (64–66). The most significant enrichment in kidney was for Nr3c1 (Z-score=11.9). Nr3c1 binding sites are associated with11 genes that show genotypeeffect on ER stress response in kidney. Nr3c1 is involved in many processes, including inflammation, mRNA decay, and chromatin remodeling (67–69). Similar to the common 41 ER stress responsive genes, there is also enrichment for genes involved in immunity and inflammation, like RELA and Nf-kappaB (Liver: Z-score = 4.9 and 10.7; Kidney: Zscore = 8.3 and 6.5), mirroring the results found in MEFs (18). Discussion The ER stress and large UPR transcriptional response provides a unique opportunity to study how GxE interactions can alter gene expression levels. We took advantage of two genetically diverse mouse strains, B6 and CAST, and their F1, and induced a strong in vivo ER stress transcriptional response. This provided the opportunity to study how stress and tissue type impacts the effect of genetic variation. We uncovered genes that showed variable transcript levels in a genotype x tissue x stress manner. These genes implicated networks and pathways that could contribute to the variable ER stress response. Additionally, the F1 hybrid gave us the ability to uncover the cis- and transregulatory variation that is impacted by stress and tissue type. We discovered that most cis- and trans- regulatory variation is context specific, with most unique to only one context. Altogether our results provide a better understanding for how genetic background and tissue type impacts the genetic architecture of the inter-individual transcriptional response to ER stress in mouse and how different genotypes respond to different environments. We previously used mouse embryonic fibroblasts (MEFs) to assay how a complex genetic architecture influences the transcriptional response to ER stress across different genetic backgrounds (18). Here, we utilized an in vivo mouse model to identify how these patterns change when comparing across different tissues. In the MEF study, upregulated 42 genes most significantly influenced by genetic background were enriched for roles in inflammation (18). However, in this current study, we find no enrichment for any particular function in the genotype-dependent genes, in either tissue. In contrast to the MEF study, these genes are involved in such a wide array of functions, that there is no enrichment. Much of the variability in the ER stress response likely stems from these disparate genes. However, we did find commonalities in these genes, such as many being involved in immunity displaying a genotype-effect. Additionally, we find genes with roles clearly linked to the ER stress response such as apoptosis, protein transport, regulation of transcription, and amino acid transport. This demonstrates the strong impact that genotype has on how different genes respond to ER stress in different tissues. Additionally, this highlights the strength of an in vivo study which utilizes different tissues to uncover greater depth to the variable ER stress response. This study emphasizes the strong effect that tissue type has on how genetic variation impacts the transcriptional response to ER stress. These differences are observed in the number of regulatory effects and the strength of this regulatory effect across tissues. The ER stress response is implicated in many different diseases such as Alzheimer’s disease, type II diabetes, ALS, atherosclerosis, and cancer (70–75). Each of these diseases are unique in their tissue of origin. Additionally, each individual diagnosed with an ER stress-related disease will have different genetic backgrounds. The tissuedependent ER stress response observed in this study illustrates how future studies involving ER stress and disease should investigate the disease-relevant tissue in the context of different genetic backgrounds, potentially uncovering novel tissue-specific effects and mechanisms. 43 This study utilized genetic variation present in different strains of mice to demonstrate the strong impact that genetic background and tissue type have on cellular processes such as the ER stress response. We detected numerous ER stress- and tissuespecific responses in expression levels, regulatory variation, regulatory variation strengths, and allele-specific effects. The majority of these findings would have been missed if only studying one genotype, condition, or tissue. Future studies can reveal even more complex interactions that affect transcriptional levels by incorporating more environmental variables, such as cell type, additional tissues, and other cellular stressors. This type of analysis provides better predictive power for how a variant will impact expression across different environments. Methods Mice C57BL/6J (B6) and CAST/EiJ (CAST) mice were obtained from Jackson Laboratories (Bar Harbor, ME). F1s were generated by crossing female B6 mice to male CAST mice. 6 male mice of each genotype (B6, CAST, and F1) of approximately 15-23 weeks of age were used for the experiment. All experiments involving mice were performed according to institutional IACUC and NIH guidelines. Tunicamycin injection and RNA extraction We administered Tunicamycin (TM) or DMSO (control) (Sigma) with an intraperitoneal injection. TM was dissolved in DMSO (Sigma) to achieve a 2.5x10-4 mg/uL concentration. To induce ER stress in the mouse model, we injected mice with a 44 final concentration of 1mg of TM per 1kg mouse weight (same concentration for DMSO control mice) (76, 77). The final concentration of DMSO in the control injection was well below the maximum amount that has been reported to be well tolerated in mice through IP injection (78). After injection, mice were allowed to recover for 8 hr. Mice were then euthanized and organs were harvested and stored at -80°C. RNA was isolated by Trizol (ambion) and Direct-zol RNA MiniPrep Kit (Zymo Research) RNA column extraction protocol. Illumina mRNA sequencing and alignment mRNA sequencing was performed on 18 samples (3 genotypes X 3 replicates X 2 treatments) for liver and 18 samples for kidney, for a total of 36 samples. Samples were prepared and sequenced by the Huntsman Cancer Institute High-Throughput Genomics Core. Library prep was performed using Illumina TruSeq Stranded Total RNA Library Prep Ribo-Zero Gold. The 36 samples were then sequenced on the NovaSeq 2 x 50 bp Sequencing, for a total of approximately 25 million paired reads per sample. Fastq files were trimmed by using seqtk v1.2 software. Parental RNA-seq reads were aligned to strain-specific reference genomes using Bowtie2 v2.2.9 software (79). Genomes were obtained from Ensembl (http://ftp.ensembl.org/pub/release-103/fasta/). F1 reads were aligned to masked genomes using STAR v2.6 software, to allow for a more variant aware alignment (80). This did not adversely affect the overall alignment rates compared to when aligning parental transcripts to their respective genomes using Bowtie2. Masked reference genomes were created using bedtools v2.28.0 (81). Within the B6 reference genome, known CAST variants are replaced with ambiguous N nucleotides. Alignment 45 files were sorted and converted using samtools v1.12 (82). Quantification of expression levels The Deseq2 default normalization method (median of ratios) was used to normalize counts (83). For each genotype, condition, and tissue type, principal component analysis was used to identify outlying samples. For a given tissue, within a genotype, we required the TM samples to be clustered together and the control samples to be clustered together. If a given replicate was not with the appropriate cluster, it was removed from the analysis. At least two replicates remained after removing outliers for each combination of tissue, genotype, and condition. Remaining samples were reanalyzed using Deseq2 v1.28.1. A gene was considered “expressed” if the Deseq2 value of base mean was ≥5. A gene was considered significantly altered by ER stress if it met a 1.5fold (5% FDR) change cutoff. Effect of genetic background and tissue type on fold-change levels To identify genes that were significantly impacted by tissue-type in each genotype, we used the same Deseq2 pipeline described above, but incorporated tissue and condition as interaction terms. Small p-values from this study design indicates that the log fold change due to treatment (ER stress) is significantly different for the two tissues. We used a p-adj cutoff of 0.05. This was performed for each of the three genotypes. To identify genes in a given tissues that is impacted by genetic background, we used a likelihood ratio test (LRT) that is provided in the Deseq2 software. An LRT is used to identify genes that show change in fold-change across the three different genetic 46 backgrounds after the induction of ER stress. Significant p-values indicate a gene that has a change in fold-change, across the three genotypes, in any combination, determined solely by the difference in deviance between the full and reduced model formula. To test overlap of these sets of genes from a common pool of genes, we used a hypergeometric distribution test. Allele-specific expression quantification in the F1 mouse Allele-specific expression was quantified using GATK ASEReadcounter v3.8 (84). SNP information was obtained from the Sanger Mouse Genomes Project (https://www.sanger.ac.uk/data/mouse-genomes-project/). Counts for all replicates were combined to increase coverage and reduce variability. To increase the reliability of counts, we only included genes in our analyses that had at least 2 informative SNPs between B6 and CAST and at least 20 counts in at least one of the conditions. This resulted in 5,669 genes liver and 7,764 genes in kidney. Genes were considered to have a significant change in ASE post-ER stress if the ratio of the expression of the two alleles (B6 and CAST) under control conditions was significantly different than the ratio of expression under TM conditions determined by the Fisher’s exact test followed by a 5% FDR correction. Determination of regulatory effect To determine whether the variable expression of a gene was due to cis- regulatory effect, trans- regulatory effect, or some other effect, we used a method that was previously described in McManus et al., 2010 and Chow et al., 2015 (18, 42). 47 This pipeline was run on counts for genes in each of the parental strains (B6 and CAST) as well as each of the alleles in the F1 mouse (B6 allele and CAST allele) for both control and TM conditions in liver and kidney. First, to determine if expression was significantly different between the two parental counts for a given gene, we performed a binomial exact test followed by a 0.1% FDR correction. Next, to determine if expression was significantly different between the two parental alleles in the F1, we performed a binomial exact test followed by a 0.1% FDR correction. Finally, to compare the ratio of parental expression to the ratio of allelic expression in the F1, we performed a Fisher’s exact test, followed by a 0.1% FDR correction. This results in two corrected binomial exact test p-values, and one corrected Fisher’s exact test p-value for each gene in the F1 for the control liver, control kidney, TM liver, and TM kidney. Based on these values, we categorized genes into: cis-, trans-, cis-+trans-, cis-xtrans-, compensatory, conserved, and ambiguous based on the guidelines listed below. • cis-: Parental strains are different, alleles in F1 are different, but the ratio of parental expression compared to allelic expression is not different. • trans-: Parental strains are different, but the alleles in the F1 are not different, making the ratio of parental expression different than the ratio of allelic expression. Next, for genes that displayed both cis- and trans- regulatory differences, they were divided into three categories: cis-+trans-, cis-xtrans-, and compensatory. Genes with cis+trans- indicates both cis- and trans- effects that favor the same allele. cis-xtrans- 48 indicates both cis- and trans- effects that favor opposite alleles. Finally, compensatory indicates evidence of both cis- and trans- effects, but have no significant expression differences in the parental strains. • cis-+trans-: Parental strains are different, and parental alleles in the F1 are different. However, the ratio of parental expression and ratio of allelic expression is also different. The log of the ratio of parental expression was performed. A positive number indicates the B6 parental strain has a higher expression. Negative indicates the CAST parental strain has a higher expression level. Similarly, the log of the ratio of allelic expression was performed. Positive indicates the B6 allele has a higher expression level while negative indicates the CAST allele has a higher expression level. Genes where the higher expressed parental strain and higher expressed F1 allele matched, they were considered cis-+trans- (Ex: B6 parental strain & B6 allele had higher expression in each respective ratio). This indicates that both cis- and trans- effects favor the same allele. • cis-xtrans-: Same criteria as cis-+trans-, except for the log of the parental expression and log of the allelic expression have opposite signs. This indicates that the cis- and trans- effects favor expression of opposite alleles. • Compensatory: Parental strains are not different, but the alleles in the F1 are different, resulting in a difference in ratio of parental strains and ratio of alleles. This reveals that there is no significant difference in expression between the two parental strains despite evidence of both cis- and trans- regulatory effects. • Conserved: There is no significant differences between parental strains, alleles, or 49 ratios of both. • Ambiguous: Genes that do not fall into any of the previous categories. Only genes that displayed cis- or trans- effects are talked about and analyzed in the text. Enrichment analyses All Gene Ontology analysis was performed with DAVID v6.8 (85). We used the Benjamini-Hochberg method for adjusting for multiple testing. We used adjusted pvalues to determine significance of enrichment terms and use a p-adjusted value cutoff of 0.05. Transcription factor binding site enrichments were identified by using oPOSSUM v3.0 (86). We used the mouse single site analysis (SSA) tool with a cutoff of 2000 base pairs up- and downstream of the transcription start site. 50 Figure 2.1 Upregulation of canonical ER stress genes across genotypes and tissues. Log2(TM/Control) is plotted for genes with known functions in the ER stress response. Dotted line indicates a 1.5-fold change in gene expression. Each point represents a biological replicate. 51 Figure 2.2 Tissue-specific expression post ER stress response. Normalized counts are plotted for genes that display expression with a tissue-effect post ER stress in B6 (A), CAST (B), and F1 (C). Each point represents a biological replicate. All plots show a significant tissue effect (p-adj<0.05). 52 Figure 2.3 Genotype-specific expression post ER stress. Proportion of ER stressupregulated genes that are shared in all three genotypes, shared in two genotypes, or unique to one genotype. 53 Figure 2.4 Variable expression of genes across genotypes post ER stress. Normalized counts for each genotype are plotted for control and TM conditions for a subset of genes that display a significant variable expression across the genotypes in (A) liver or (B-F) kidney. Each points represents a biological replicate. All plots show a significant genotype effect (p-adj<0.05). 54 Figure 2.5 Expression ratios of genes with cis- or trans- effects. Log2(B6/CAST) plotted for either alleles in the F1 hybrid or parental expression in control liver (A), TM liver (B), control kidney (C) and TM kidney (D). Each point represents a gene displaying either a cis- or trans- regulatory effect. Numbers in parenthesis are the number of genes that display that particular regulatory effect. 55 Figure 2.6 ER stress and tissue type reveal hidden regulatory variation. Proportion of genes displaying a cis- or trans- effect in liver (A) and kidney (B). Proportion of genes with either a cis- or trans- effect in only stress conditions, only control conditions or both, for liver and kidney (C). Examples of genes displaying cis- regulatory effects only under stress conditions in F1 liver (D) and F1 kidney (E). Proportion of genes displaying a cis- or trans- effect seen in only liver, only kidney, or both (F). *P<0.00024 56 Figure 2.7 Impact of cis- and trans- effects on gene expression. Absolute log2 fold change of parental expression for genes displaying a regulatory effect. Comparing genes with a cis- or trans- effect in either control or stress conditions in liver (A) or kidney (B). Comparing impact of cis- effects (C) or trans- effects (D) on gene expression in control or stress conditions. Liver: control cis-: mean=1.04, SD=1.03 | control trans-: mean=0.82, SD=0.91 | TM cis-: mean=1.14, SD=1.09 | TM trans-: mean=0.86, SD=0.74; Kidney: control cis-: mean=0.94, SD=0.89 | control trans-: 0 66 S 0 44 | i 08 S 0 4| 06 57 Figure 2.8 ASE and corresponding change in RNA transcript. Proportion of genes in the F1 displaying significant change in ASE in liver (A) and kidney (C). Proportion of genes with a significant change in ASE showing ER stress-induced increase in RNA transcripts, decrease, or no change in liver (B) and kidney (D). 58 Figure 2.9 Change in ASE and transcript levels across stress conditions. ASE and total RNA expression levels are plotted. Genes that display ER stress-induced change in ASE in liver (A-C) or kidney (D-F). (A) and (D) show an increase in total transcript levels, (B) and (E) show a decrease in total transcript level, and (C) and (F) show no change in transcript levels. 59 Figure 2.10 Variable magnitude in ASE across tissues. Examples of genes that show ASE in both tissues, but differ in the magnitude of that ASE in a tissue-dependent manner. An example of an upregulated gene (A) and a downregulated gene (B). ASE and total RNA expression levels are plotted. 60 References 1. E. E. Eichler, J. Flint, G. Gibson, A. Kong, S. M. Leal, J. H. Moore, J. H. Nadeau, Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet. 11 (2010), pp. 446–450. 2. S. Nickels, T. Truong, R. Hein, K. Stevens, K. Buck, S. Behrens, U. Eilber, M. Schmidt, L. Häberle, A. Vrieling, M. Gaudet, J. Figueroa, N. Schoof, A. B. Spurdle, A. Rudolph, P. A. Fasching, J. L. Hopper, E. Makalic, D. F. Schmidt, M. C. Southey, M. W. Beckmann, A. B. Ekici, O. Fletcher, L. Gibson, I. dos Santos Silva, J. Peto, M. K. Humphreys, J. Wang, E. Cordina-Duverger, F. Menegaux, B. G. Nordestgaard, S. E. Bojesen, C. Lanng, H. Anton-Culver, A. Ziogas, L. Bernstein, C. A. Clarke, H. Brenner, H. Müller, V. Arndt, C. Stegmaier, H. Brauch, T. Brüning, V. Harth, The GENICA Network, A. Mannermaa, V. Kataja, V.-M. Kosma, J. M. Hartikainen, kConFab, A. M. Group, D. Lambrechts, D. Smeets, P. Neven, R. Paridaens, D. Flesch-Janys, N. Obi, S. Wang-Gohrke, F. J. Couch, J. E. Olson, C. M. Vachon, G. G. Giles, G. Severi, L. Baglietto, K. Offit, E. M. John, A. Miron, I. L. Andrulis, J. A. Knight, G. Glendon, A. M. Mulligan, S. J. Chanock, J. Lissowska, J. Liu, A. Cox, H. Cramp, D. Connley, S. Balasubramanian, A. M. Dunning, M. Shah, A. Trentham-Dietz, P. Newcomb, L. Titus, K. Egan, E. K. Cahoon, P. Rajaraman, A. J. Sigurdson, M. M. Doody, P. Guénel, P. D. P. Pharoah, M. K. Schmidt, P. Hall, D. F. Easton, M. Garcia-Closas, R. L. Milne, J. Chang-Claude, Evidence of gene-environment interactions between common breast cancer susceptibility loci and established environmental risk factors. PLoS Genet. 9, e1003284 (2013). 3. M. Rask-Andersen, T. Karlsson, W. E. Ek, Å. Johansson, Gene-environment interaction study for BMI reveals interactions between genetic factors and physical activity, alcohol consumption and socioeconomic status. PLoS Genet. 13 (2017). 4. T. Matsui, I. M. Ehrenreich, Gene-environment interactions in stress response contribute additively to a genotype-environment interaction. PLoS Genet. 12 (2016). 5. A. Hodgins-Davis, J. P. Townsend, Evolving gene expression: from G to E to G × E. Trends Ecol. Evol. 24, 649–658 (2009). 6. V. Grishkevich, I. Yanai, The genomic determinants of genotype × environment interactions in gene expression. Trends Genet. 29, 479–487 (2013). 7. Y. Li, O. A. Álvarez, E. W. Gutteling, M. Tijsterman, J. Fu, J. A. G. Riksen, E. Hazendonk, P. Prins, R. H. A. Plasterk, R. C. Jansen, R. Breitling, J. E. Kammenga, Mapping determinants of gene expression plasticity by genetical genomics in C. elegans. PLoS Genet. 2, e222 (2006). 8. D. Glass, A. Viñuela, M. N. Davies, A. Ramasamy, L. Parts, D. Knowles, A. A. Brown, Å. K. Hedman, K. S. Small, A. Buil, E. Grundberg, A. C. Nica, P. Di 61 Meglio, F. O. Nestle, M. Ryten, R. Durbin, M. I. McCarthy, P. Deloukas, E. T. Dermitzakis, M. E. Weale, V. Bataille, T. D. Spector, Gene expression changes with age in skin, adipose tissue, blood and brain. Genome Biol. 14, R75 (2013). 9. V. Grishkevich, S. Ben-Elazar, T. Hashimshony, D. H. Schott, C. P. Hunter, I. Yanai, A genomic bias for genotype-environment interactions in C. elegans. Mol. Syst. Biol. 8 (2012). 10. GTEx Consortium, Genetic effects on gene expression across human tissues. Nature. 550, 204–213 (2017). 11. B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, P. Walter, Molecular Biology of the Cell (Garland Science, ed.4, 2002). 12. J. H. Lin, P. Walter, T. S. B. Yen, Endoplasmic reticulum stress in disease pathogenesis. Annu. Rev. Pathol. Mech. Dis. 3, 399–425 (2008). 13. M. C. Jonikas, S. R. Collins, V. Denic, E. Oh, E. M. Quan, V. Schmid, J. Weibezahn, B. Schwappach, P. Walter, J. S. Weissman, M. Schuldiner, Comprehensive characterization of genes required for protein folding in the endoplasmic reticulum. Science 323, 1693–1697 (2009). 14. K. J. Travers, C. K. Patil, L. Wodicka, D. J. Lockhart, J. S. Weissman, P. Walter, Functional and genomic analyses reveal an essential coordination between the unfolded protein response and ER-associated degradation. Cell 101, 249–258 (2000). 15. D. Ron, P. Walter, Signal integration in the endoplasmic reticulum unfolded protein response. Nat. Rev. Mol. Cell Biol. 8, 519–529 (2007). 16. C. Y. Chow, M. F. Wolfner, A. G. Clark, Using natural variation in Drosophila to discover previously unknown endoplasmic reticulum stress genes. Proc. Natl. Acad. Sci. U.S.A. 110, 9013–8 (2013). 17. C. Y. Chow, K. J. P. Kelsey, M. F. Wolfner, A. G. Clark, Candidate genetic modifiers of retinitis pigmentosa identified by exploiting natural variation in Drosophila. Hum. Mol. Genet. 25, 651–659 (2016). 18. C. Y. Chow, X. Wang, D. Riccardi, M. F. Wolfner, A. G. Clark, The genetic architecture of the genome-wide transcriptional response to ER stress in the mouse. PLoS Genet. 11, e1004924 (2015). 19. B. A. Dombroski, R. R. Nayak, K. G. Ewens, W. Ankener, V. G. Cheung, R. S. Spielman, Gene expression and genetic variation in response to endoplasmic reticulum stress in human cells. Am. J. Hum. Genet. 86, 719–729 (2010). 20. D. L. Taylor, D. A. Knowles, L. J. Scott, A. H. Ramirez, F. P. Casale, B. N. Wolford, L. Guan, A. Varshney, R. D. Albanus, S. C. J. Parker, N. Narisu, P. S. 62 Chines, M. R. Erdos, R. P. Welch, L. Kinnunen, J. Saramies, J. Sundvall, T. A. Lakka, M. Laakso, J. Tuomilehto, H. A. Koistinen, O. Stegle, M. Boehnke, E. Birney, F. S. Collins, Interactions between genetic variation and cellular environment in skeletal muscle gene expression. PLoS One. 13, e0195788 (2018). 21. A. R. Marderstein, E. R. Davenport, S. Kulm, C. V. Van Hout, O. Elemento, A. G. Clark, Leveraging phenotypic variability to identify genetic interactions in human phenotypes. Am. J. Hum. Genet. 108, 49–67 (2021). 22. J. S. Tkacz, O. Lampen, Tunicamycin inhibition of polyisoprenyl Nacetylglucosaminyl pyrophosphate formation in calf-liver microsomes. Biochem. Biophys. Res. Commun. 65, 248–57 (1975). 23. C. Schaeffer, A. Creatore, L. Rampoldi., Protein trafficking defects in inherited kidney diseases. Nephrol. Dial. Transplant. 29, iv33–iv44 (2014). 24. X. Liu, R. M. Green, Endoplasmic reticulum stress and liver diseases. Liver Res. 3 (2019), pp. 55–64. 25. C. M. Oslowski, F. Urano, Measuring ER stress and the unfolded protein response using mammalian tissue culture system. Methods Enzymol. 490, 71–92 (2011). 26. K. Kokame, K. L. Agarwal, H. Kato, T. Miyata, Herp, a new ubiquitin-like membrane protein induced by endoplasmic reticulum stress. J. Biol. Chem. 275, 32846–32853 (2000). 27. Y. Wang, Z. Wu, D. Li, D. Wang, X. Wang, X. Feng, M. Xia, Involvement of oxygen-regulated protein 150 in AMP-activated protein kinase-mediated alleviation of lipid-induced endoplasmic reticulum stress. J. Biol. Chem. 286, 11119–11131 (2011). 28. X. Z. Wang, B. Lawson, J. W. Brewer, H. Zinszner, A. Sanjay, L. J. Mi, R. Boorstein, G. Kreibich, L. M. Hendershot, D. Ron, Signals from the stressed endoplasmic reticulum induce C/EBP-homologous protein (CHOP/GADD153). Mol. Cell. Biol. 16, 4273–4280 (1996). 29. Y. Kozutsumi, M. Segal, K. Normington, M. J. Gething, J. Sambrook, The presence of malfolded proteins in the endoplasmic reticulum signals the induction of glucose-regulated proteins. Nature 332, 462–464 (1988). 30. J. Chen, L. A. Stark, Insights into the relationship between nucleolar stress and the NF-κB pathway. Trends Genet. 35 (2019), pp. 768–780. 31. A. Pecoraro, M. Pagano, G. Russo, A. Russo, Role of autophagy in cancer cell response to nucleolar and endoplasmic reticulum stress. Int. J. Mol. Sci. 21 (2020), pp. 1–21. 32. K. Yang, J. Yang, J. Yi, Nucleolar stress: hallmarks, sensing mechanism and 63 diseases. Cell Stress 2, 125–140 (2015). 33. T. Yamada, H. Ishihara, A. Tamura, R. Takahashi, S. Yamaguchi, D. Takei, A. Tokita, C. Satake, F. Tashiro, H. Katagiri, H. Aburatani, J. I. Miyazaki, Y. Oka, WFS1-deficiency increases endoplasmic reticulum stress, impairs cell cycle progression and triggers the apoptotic pathway specifically in pancreatic β-cells. Hum. Mol. Genet. 15, 1600–1609 (2006). 34. S. G. Fonseca, S. Ishigaki, C. M. Oslowski, S. Lu, K. L. Lipson, R. Ghosh, E. Hayashi, H. Ishihara, Y. Oka, M. A. Permutt, F. Urano, Wolfram syndrome 1 gene negatively regulates ER stress signaling in rodent and human cells. J. Clin. Invest. 120, 744–755 (2010). 35. C. Hömig-Hölzel, R. Van Doorn, C. Vogel, M. Germann, M. G. Cecchini, E. Verdegaal, D. S. Peeper, Antagonistic TSC22D1 variants control BRAFE600induced senescence. EMBO J. 30, 1753–1765 (2011). 36. Y. Liu, L. Zeng, Y. Yang, C. Chen, D. Wang, H. Wang, Acyl-CoA thioesterase 1 prevents cardiomyocytes from Doxorubicin-induced ferroptosis via shaping the lipid composition. Cell Death Dis. 11, 1–14 (2020). 37. J. Knupp, P. Arvan, A. Chang, Increased mitochondrial respiration promotes survival from endoplasmic reticulum stress. Cell Death Differ. 26, 487–501 (2019). 38. E. Balsa, M. S. Soustek, A. Thomas, S. Cogliati, C. García-Poyatos, E. MartínGarcía, M. Jedrychowski, S. P. Gygi, J. A. Enriquez, P. Puigserver, ER and nutrient stress promote assembly of respiratory chain supercomplexes through the PERK-eIF2α axis. Mol. Cell. 74, 877-890.e6 (2019). 39. M. Piskacek, L. Zotova, G. Zsurka, R. J. Schweyen, Conditional knockdown of hMRS2 results in loss of mitochondrial Mg + uptake and cell death. J. Cell. Mol. Med. 13, 693–700 (2009). 40. D. J. Todd, A. H. Lee, L. H. Glimcher, The endoplasmic reticulum stress response in immunity and autoimmunity. Nat. Rev. Immunol. 8, 663–674 (2008). 41. S. E. Bettigole, L. H. Glimcher, Endoplasmic reticulum stress in immunity. Annu. Rev. Immunol. 33, 107–138 (2015). 42. C. J. McManus, J. D. Coolon, M. O. Duff, J. Eipper-Mains, B. R. Graveley, P. J. Wittkopp, Regulatory divergence in Drosophila revealed by mRNA-seq. Genome Res. 20, 816–825 (2010). 43. W. C. Yuan, B. Pepe-Mooney, G. G. Galli, M. T. Dill, H. T. Huang, M. Hao, Y. Wang, H. Liang, R. A. Calogero, F. D. Camargo, NUAK2 is a critical YAP target in liver cancer. Nat. Commun. 9, 1–12 (2018). 64 44. X. Sun, L. Gao, H. Y. Chien, W. C. Li, J. Zhao, The regulation and function of the NUAK family. J. Mol. Endocrinol. 51, R15-22 (2013). 45. W. Wang, W. Zhou, L. Jiang, B. Cui, L. Ye, T. Su, J. Wang, X. Li, G. Ning, Mutation analysis of SCNN1B in a family with Liddle’s syndrome. Endocrine 29, 385–390 (2006). 46. E. Cornec-Le Gall, M. P. Audrézet, E. Renaudineau, M. Hourmant, C. Charasse, E. Michez, T. Frouget, C. Vigneau, J. Dantal, P. Siohan, H. Longuet, P. Gatault, L. Ecotière, F. Bridoux, L. Mandart, C. Hanrotel-Saliou, C. Stanescu, P. Depraetre, S. Gie, M. Massad, A. Kersalé, G. Séret, J. F. Augusto, P. Saliou, S. Maestri, J. M. Chen, P. C. Harris, C. Férec, Y. Le Meur, PKD2-related autosomal dominant polycystic kidney disease: prevalence, clinical presentation, mutation spectrum, and prognosis. Am. J. Kidney Dis. 70, 476–485 (2017). 47. S. S. Marable, E. Chung, M. Adam, S. S. Potter, J. S. Park, Hnf4a deletion in the mouse kidney phenocopies Fanconi renotubular syndrome. JCI Insight 3 (2018). 48. K. Voskarides, G. Papagregoriou, D. Hadjipanagi, I. Petrou, I. Savva, A. Elia, Y. Athanasiou, A. Pastelli, M. Kkolou, M. Hadjigavriel, C. Stavrou, A. Pierides, C. Deltas, COL4A5 and LAMA5 variants co-inherited in familial hematuria: digenic inheritance or genetic modifier effect? BMC Nephrol. 19, 114 (2018). 49. J. Gao, Y. Zhang, C. Yu, F. Tan, L. Wang, Spontaneous nonalcoholic fatty liver disease and ER stress in Sidt2 deficiency mice. Biochem. Biophys. Res. Commun. 476, 326–332 (2016). 50. M. K. Bjursell, H. J. Blom, J. A. Cayuela, M. L. Engvall, N. Lesko, S. Balasubramaniam, G. Brandberg, M. Halldin, M. Falkenberg, C. Jakobs, D. Smith, E. Struys, U. Von Döbeln, C. M. Gustafsson, J. Lundeberg, A. Wedell, Adenosine kinase deficiency disrupts the methionine cycle and causes hypermethioninemia, encephalopathy, and abnormal liver function. Am. J. Hum. Genet. 89, 507–515 (2011). 51. J. H. Lee, R. Bodmer, E. Bier, M. Karin, Sestrins at the crossroad between stress and aging. Aging (Albany NY) 2, 369–374 (2010). 52. B. De Strooper, T. Iwatsubo, M. S. Wolfe, Presenilins and γ-secretase: structure, function, and role in Alzheimer disease. Cold Spring Harb. Perspect. Med. 2, a006304 (2012). 53. A. M. Arensdorf, D. Diedrichs, D. T. Rutkowski, Regulation of the transcriptome by ER stress: Non-canonical mechanisms and physiological consequences. Front. Genet. 4, 256 (2013). 54. K. Yamamoto, H. Yoshida, K. Kokame, R. J. Kaufman, K. Mori, Differential contributions of ATF6 and XBP1 to the activation of endoplasmic reticulum stress-responsive cis-acting elements ERSE, UPRE and ERSE-II. J. Biochem. 136, 65 343–50 (2004). 55. H. Yoshida, T. Okada, K. Haze, H. Yanagi, T. Yura, M. Negishi, K. Mori, ATF6 activated by proteolysis binds in the presence of NF-Y (CBF) directly to the cisacting element responsible for the mammalian unfolded protein response. Mol. Cell. Biol. 20, 6755–6767 (2000). 56. M. R. Chikka, D. D. McCabe, H. M. Tyra, D. T. Rutkowski, C/EBP homologous protein (CHOP) contributes to suppression of metabolic genes during endoplasmic reticulum stress in the liver. J. Biol. Chem. 288, 4405–4415 (2013). 57. H. Nishitoh, CHOP is a multifunctional transcription factor in the ER stress response. J. Biochem. 151, 217–9 (2012). 58. T. Gotoh, M. Endo, Y. Oike, Endoplasmic reticulum stress-related inflammation and cardiovascular diseases. Int. J. Inflam. 2011, 259462 (2011). 59. M. Endo, M. Mori, S. Akira, T. Gotoh, C/EBP homologous protein (CHOP) is crucial for the induction of caspase-11 and the pathogenesis of lipopolysaccharideinduced inflammation. J. Immunol. 176, 6245–6253 (2006). 60. B. B. McConnell, V. W. Yang, Mammalian Krüppel-like factors in health and diseases. Physiol. Rev. 90 (2010), pp. 1337–1381. 61. G. L. Wang, B. H. Jiang, E. A. Rue, G. L. Semenza, Hypoxia-inducible factor 1 is a basic-helix-loop-helix-PAS heterodimer regulated by cellular O2 tension. Proc. Natl. Acad. Sci. U.S.A. 92, 5510–5514 (1995). 62. E. R. Pereira, K. Frudd, W. Awad, L. M. Hendershot, Endoplasmic Reticulum (ER) stress and Hypoxia response pathways interact to Potentiate Hypoxiainducible Factor 1 (HIF-1) Transcriptional activity on targets like Vascular Endothelial Growth Factor (VEGF). J. Biol. Chem. 289, 3352–3364 (2014). 63. T. Lawrence, The nuclear factor NF-kappaB pathway in inflammation. Cold Spring Harb. Perspect. Biol. 1, a001651 (2009). 64. S. Goyama, M. Kurokawa, Pathogenetic significance of ecotropic viral integration site-1 in hematological malignancies. Cancer Sci. 100, 990–995 (2009). 65. S. Lugthart, E. Van Drunen, Y. Van Norden, A. Van Hoven, C. A. J. Erpelinck, P. J. M. Valk, H. B. Beverloo, B. Löwenberg, R. Delwel, High EVI1 levels predict adverse outcome in acute myeloid leukemia: prevalence of EVI1 overexpression and chromosome 3q26 abnormalities underestimated. Blood 111, 4329–4337 (2008). 66. C. Wolfrum, E. Asilmaz, E. Luca, J. M. Friedman, M. Stoffel, Foxa2 regulates lipid metabolism and ketogenesis in the liver during fasting and in diabetes. Nature 432, 1027–1032 (2004). 66 67. M. Kadmiel, J. A. Cidlowski, Glucocorticoid receptor signaling in health and disease. Trends Pharmacol. Sci. 34, 518–530 (2013). 68. C. J. Fryer, T. K. Archer, Chromatin remodelling by the glucocorticoid receptor requires the BRG1 complex. Nature 393, 88–91 (1998). 69. H. Cho, O. H. Park, J. Park, I. Ryu, J. Kim, J. Ko, Y. K. Kim, Glucocorticoid receptor interacts with PNRC2 in a ligand-dependent manner to recruit UPF1 for rapid mRNA degradation. Proc. Natl. Acad. Sci. U.S.A. 112, E1540–E1549 (2015). 70. L. Wang, B. Popko, R. P. Roos, The unfolded protein response in familial amyotrophic lateral sclerosis. Hum. Mol. Genet. 20, 1008–15 (2011). 71. L. Wang, B. Popko, R. P. Roos, An enhanced integrated stress response ameliorates mutant SOD1-induced ALS. Hum. Mol. Genet. 23, 2629–38 (2014). 72. B. Song, D. Scheuner, D. Ron, S. Pennathur, R. J. Kaufman, Chop deletion reduces oxidative stress, improves beta cell function, and promotes cell survival in multiple mouse models of diabetes. J. Clin. Invest. 118, 3378–89 (2008). 73. S. Oyadomari, A. Koizumi, K. Takeda, T. Gotoh, S. Akira, E. Araki, M. Mori, Targeted disruption of the Chop gene delays endoplasmic reticulum stressmediated diabetes. J. Clin. Invest. 109, 525–32 (2002). 74. G. Auf, A. Jabouille, S. Guérit, R. Pineau, M. Delugin, M. Bouchecareilh, N. Magnin, A. Favereaux, M. Maitre, T. Gaiser, A. von Deimling, M. Czabanka, P. Vajkoczy, E. Chevet, A. Bikfalvi, M. Moenner, Inositol-requiring enzyme 1alpha is a key regulator of angiogenesis and invasion in malignant glioma. Proc. Natl. Acad. Sci. U.S.A. 107, 15553–8 (2010). 75. L. Ozcan, I. Tabas, Role of endoplasmic reticulum stress in metabolic disease and other disorders. Annu. Rev. Med. 63, 317–28 (2012). 76. K. W. Choy, Y. S. Lau, D. Murugan, M. R. Mustafa, Chronic treatment with paeonol improves endothelial function in mice through inhibition of endoplasmic reticulum stress-mediated oxidative stress. PLoS One. 12, e0178365 (2017). 77. B. Feng, X. Huang, D. Jiang, L. Hua, Y. Zhuo, D. Wu, Endoplasmic reticulum stress inducer Tunicamycin alters hepatic energy homeostasis in mice. Int. J. Mol. Sci. 18, 1710 (2017). 78. S. C. Gad, C. D. Cassidy, N. Aubert, B. Spainhour, H. Robbe, Nonclinical vehicle use in studies by multiple routes in multiple species. Int. J. Toxicol. 25, 499–521 (2006). 79. B. Langmead, S. L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nat. Methods. 9, 357–359 (2012). 67 80. A. Dobin, C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson, T. R. Gingeras, STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29, 15–21 (2013). 81. A. R. Quinlan, I. M. Hall, BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 841–842 (2010). 82. H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25, 2078–2079 (2009). 83. M. I. Love, S. Anders, V. Kim, W. Huber, RNA-Seq workflow: gene-level exploratory analysis and differential expression. F1000 Res. 4, 1070 (2015). 84. A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, M. Daly, M. A. DePristo, The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). 85. D. W. Huang, B. T. Sherman, R. A. Lempicki, Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009). 86. S. J. Ho Sui, J. R. Mortimer, D. J. Arenillas, J. Brumm, C. J. Walsh, B. P. Kennedy, W. W. Wasserman, oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes. Nucleic Acids Res. 33, 3154–3164 (2005). CHAPTER 3 TRANSCRIPTION FACTOR BINDING SITE PREDICTION PAIRED WITH GTEX DATA AND HUMAN CIS- EQTL ANALYSIS Abstract The ER stress response elicits a large transcriptional response to misfolded proteins that is largely driven by three well characterized transcription factors. This transcriptional response is variable across different genetic backgrounds. One mechanism in which genetic variation can lead to transcriptional variability is through altered binding and activity of the three main transcription factors. This work attempts to better understand this potential mechanism by first creating a computational pipeline to identify potential binding sites throughout the human genome. We then utilize GTEx datasets to identify cis- eQTLs in various tissues that fall within predicted TFBS as well as analyze patterns that arise from this comparison. Finally, we perform an eQTL analysis on human cell lines experiencing ER stress to identify cis- eQTLs that regulate the variable ER stress response. The majority of these cis- eQTLs are unique to a given condition: control or ER stress. Some of these stress specific cis- eQTLs fall within putative binding sites of the three main ER stress response transcription factors, providing a hypothesis by which these cis- eQTLs might be impacting gene expression 69 under ER stress conditions through altered TF binding. This study also represents the first eQTL analysis done on human samples experiencing ER stress and a vital step towards identifying the genetic components responsible for the variable ER stress response. Introduction The endoplasmic reticulum (ER) is a large eukaryotic organelle that is a major site of protein and lipid synthesis, protein folding, and calcium storage (1). ER stress occurs when misfolded proteins accumulate in the lumen of the ER (2). The cell responds to these misfolded proteins with a conserved cellular process called the unfolded protein response (UPR), which aims to reduce the protein load that enters the ER and increase the capacity of the ER to handle unfolded proteins (3). These adaptations entail a significant transcriptional response that activates a large number of UPR target genes. Three different branches of the UPR have been identified: ATF6, IRE1 & PERK, each with their own unique target genes and roles in the UPR (4). Despite being a well-conserved process vital to basic cellular function, the ER stress response has demonstrated transcriptional variability dependent on genetic background in Drosophila, mouse, and humans (5–7). Further studies are required to determine the exact genetic and environmental causes of this variability in the UPR. Due to the UPR being a large transcriptional response to misfolded proteins, one potential mechanism is any genetic variant that impacts transcript levels. This could have a significant influence on the cellular outcomes post-ER stress. One of the main mechanisms in which a genetic variant can affect transcriptional levels is through 70 regulatory mechanisms, either cis- or trans-. In fact, genetic mutations impacting cisregulatory mechanisms, as opposed to trans- mechanisms, are now thought to have the most impact on phenotypic variability (8). There are many types of cis- regulatory mechanisms that can impact gene expression, such as promoters, enhancers, silencers, and insulators, with many of these containing binding sites for transcription factors. Transcription factors (TFs) are proteins that bind to DNA and are involved in the regulation of transcription. Each TF has a unique DNA sequence binding preference that it will bind to known as a ‘binding motif’ (9). A genetic variant within the specific DNA sequence that the TF recognizes could impact binding specificity, binding dynamics, and even DNA methylation profiles (10, 11). Natural genetic variation within multiple TFBSs, and subsequent altered TF binding, can lead to expression profiles widely different between individuals. This concept can be applied to the large transcriptional response of the UPR and the main TFs that regulate it. The three main branches of the UPR (PERK, IRE1, & ATF6) are part of a signaling cascade that culminates in three main TFs entering the nucleus to elicit the large transcriptional response to misfolded proteins. These three TFs are ATF4, XBP1, and ATF6 (3). ATF6 binds in conjunction with another transcription factor, NFY, to two different binding motifs that differ in spacing and orientation. These two sites are known as ER stress response element I (ERSEI) & ERSEII (12, 13). The binding motif for XBP1 is known as the unfolded protein response element (UPRE) (14, 15). Natural genetic variation present within the binding sites of these three transcription factors could lead to variable expression patterns and could partially underlie the variable response to ER stress that has been seen in multiple different organisms (5–7). 71 In this work, we create a computational pipeline that aims to identify TFBSs genome-wide for each of the three main TFs involved in the UPR. After identifying these putative TFBSs, we utilized GTEx, a large online database that attempts to identify genetic variants causing changes in gene expression across multiple tissues. By comparing the two datasets, we identify mutational hotspots in each of the motifs assayed. Finally, we use both expression data after ER stress induction and sequencing data for individuals in an unrelated population to perform an eQTL analysis. This analysis revealed hundreds of SNPs that associated with variable expression levels of genes after ER stress induction. Many of these fell within putative TFBSs, which provides a starting hypothesis as to how these SNPs may be regulating the variable expression patterns under ER stress conditions. Together, these findings provide evidence for one potential mechanism in which natural genetic variation in the human population can lead to a variable ER stress response. Knowing where these binding sites are in the genome, how prevalent genetic variation is within these sites, and how this variation can alter gene expression is vital towards better understanding how TF binding activity can underlie variable susceptibility in human disease with an ER stress component and to potentially identify therapeutic targets. Results Pipeline to identify putative TF binding sites in the human genome To identify predicted binding sites of UPR transcription factors, we created a pipeline that searches genome-wide for literature-supported motifs. Our method entails searching for ERSEI (CCAAT-N9-CCACG), ERSEII (ATTGG-N-CCACG), UPRE 72 ((C/A)CACGTCA), and ATF4 ((A/C)TGA(T/C)GCAA(T/C)) across the human genome (GRCh37) (Figure 3.1). To accommodate for base-pair variability within a binding motif, we utilized a letter probability matrix (LPM) search instead of a consensus-sequence (Figure 3.2). A potential filter to further refine our search pipeline for legitimate TF binding sites is based on the proximity of a motif to a gene. ATF6, ATF4, and XBP1 bind in cisnear known ER stress genes (12, 14–16). Based on this, we filtered motifs that had a protein-coding gene within a 2kb window (Figure 3.1). Applying this distance criteria narrowed down results from 224,244 motifs to 109,345 motifs, corresponding to 16,767 unique protein-coding genes. Included within this dataset are genes with known roles in the ER stress response and TF binding sites in cis-. Some of these known ER stress response genes include HERP, BIP, ERO1L, and CHOP (13, 17–19). One other filter that we can use to focus on potentially real transcription factor binding sites is conservation. The use of conservation scores as a filter does not introduce any bias into our pipeline due to conservation being the same no matter the cell type, tissue type, developmental stage, or environmental stressors. Due to this measurement being of a region of DNA, we used PhastCons score as our measure of conservation. PhastCons measures the probability that each nucleotide belongs to a conserved element (0-1 score) (20, 21). We used a cutoff of 0.5 (50% chance that a nucleotide belongs to a conserved element) and only included motifs that were covered by PhastCons scores. This further narrowed down our list of motifs from 109,345 to 9,231 (Figure 3.1). This corresponds to conserved motifs being within a 2kb window of 6,434 protein coding genes. 73 GO enrichment of these 6,434 genes revealed the most significant enrichment category was for positive regulation of transcription from RNA polymerase II promoter (GO:0045944, q=2.9x10-11). This enrichment for genes that increase transcription perhaps indicates a signaling cascade that starts with the three main UPR TFs targeting other transcriptional activators to eventually lead to the large transcriptional response of the UPR. These 6,434 genes are the culmination of an unbiased computational pipeline to identify potential binding sites of the three main TFs of the UPR and each is a potentially novel UPR target gene that could warrant future functional study. Many cis- eQTLs are found in putative ER stress TF binding sites Next, we utilized the online dataset of the Genotype-Tissue Expression (GTEx) to characterize human genetic variation within putative TF binding sites and the subsequent impact on gene expression (22). The GTEx dataset entails the gene expression from 54 non-diseased tissues across nearly 1,000 individuals. Using the expression and sequencing data available provides eQTL data for 48 different tissues. Each eQTL within the GTEx database represents a genomic position that has been shown to have an impact on the expression of a gene. We hypothesize that these eQTL regions could be impacting gene expression through the differential binding or activity of our predicted TF binding motifs. To investigate this, we first identified how many significant eQTLs within the GTEx dataset fell within a predicted TF binding site. Due to the GTEx dataset already having a distance threshold for proximity to a gene, and wanting to investigate motifs with variability, we used no distance or conservation cutoff in our motif dataset. We 74 found as many as 2,938 overlaps in the tibial artery and as little as 204 overlaps in the substantia nigra (Table 3.1). To better comprehend how enriched this overlap between eQTLs and predicted motifs is, we compared to a random distribution across the human genome. In theory, this will allow us to say if eQTLs fall within a predicted motif more, equal to, or less than random chance, across the 48 different tissues. We created a dataset of positions across the human genome that correspond to the same length and number of our predicted motifs. We then intersected that with eQTLs across the 49 different tissues and quantified the overlaps. To create a better representation of a random overlap, we did this analysis 1000 times. When comparing these overlaps, we see that for every binding motif, ERSEI, ERSEII, UPRE, and ATF4, there is a significantly higher overlap for GTEx eQTLs with predicted motifs than would be expected by random chance across all tissues (Table 3.2). Next, we identified which specific position in each binding motif overlapped with a cis- eQTL. For this analysis, we pooled together the data from each of the 49 different tissues. For each of the four binding motifs, we see that certain positions within the motif harbor more cis- eQTLs than other positions (ERSEI: Position 18=20%, Position 19=18%; ERSEII: Position 10=39%, Position 11=19%; UPRE: Position 4=35%, Position 5=39%; ATF4: Position 5=18%, Position 6=21%) (Figure 3.3). To investigate whether the higher mutation rate of these hot-spots were unique to GTEx or seen throughout evolution, we calculated the conservation score of each position. We found that most of the positions that were hot-spots for GTEx cis- eQTLs were in fact far less conserved than the other positions (ERSEI: Position 18=-0.68, 75 Position 19=-0.58; ERSEII: Position 10=-0.71, Position 11=-0.55; UPRE: Position 4=0.58, Position 5=-0.53; ATF4: Position 5=-0.02, Position 6=-0.19) (Figure 3.4). This indicates a pattern that is not specific to GTEx and is in fact one that is maintained through evolution. Closer examination of these less-conserved cis- eQTL hotspots reveals them to be CpG dinucleotides in each motif. CpG dinucleotides are rare throughout the genome, making up <1% of all dinucleotides in the genome, compared to the expected 4% frequency (23, 24). Despite this underrepresentation, they comprise ~25% of all mutations in humans (25). The mutation rate of CpGs is elevated by ~30-fold relative to the average rate of mutation in great apes (26). This high mutation rate is due to methylation of cytosines, rendering them unstable, a subsequent deamination to a T, and a resulting C→T transition (27). The high number of GTEx cis- eQTLs and lower conservation scores at these positions are most likely due to the high mutation rates seen at CpG dinucleotides. Despite the rarity of CpG dinucleotides throughout the genome, a CpG dinucleotide is present within each of the binding motifs assayed. This perhaps is a mechanism by which there can be greater mutational flexibility at these sites leading to greater variability in binding ability of TFs and subsequent variability in expression levels of target genes. This can lead to greater adaptation to different environments. This theory is in part supported by our data showing high levels of cis- eQTLs at these CpG dinucleotides which GTEx has shown to alter gene expression levels. CpGs are also highly enriched in ~1kb regions of the mammalian genome known as CpG islands (28). These CpG islands are normally associated with the starts of both house-keeping 76 genes and tissue-specific genes, with ~70% of promoters located near the TSS of a gene containing a CpG island (29). These regions are typically unmethylated (28). CpG dinucleotides can only have a C→T mutation after methylation. Both proximity of the TFBS to an active gene and methylation state provide mechanisms by which the mutation of CpG dinucleotides within these binding motifs can be controlled. eQTL analysis on a large human population dataset experiencing ER stress As mentioned previously, an eQTL is a region of DNA that controls the expression of a gene. The overlap of eQTLs and our predicted motifs is a start towards investigating the regulatory mechanisms controlling gene expression. However, the caveat with the previous analysis is that GTEx data is derived from healthy, nonstressed tissue. Based on previous research in our lab, we have shown that the majority of regulatory mechanisms act in a condition-dependent manner. Chapter 2 of this dissertation and previous studies have shown that the majority of cis- and trans- effects in mice are seen in control only or ER stress only conditions (6). Based on this, we would expect the GTEx data to be lacking many eQTLs that are specific to an ER stressed state. To address this, we performed our own eQTL analysis on human samples experiencing ER stress. The human population used for this analysis is from The Centre d'Etude du Polymorphisme Humain (CEPH; Human Polymorphism Study Center) (30). Cell lines from these fully sequenced individuals have been used for many different studies. We utilized the available gene expression data from a group that applied ER stress to CEPH 77 cell lines (131 cell lines in total) (7, 31). This dataset gives us the opportunity to investigate eQTLs that are present in control conditions, similar to GTEx, as well as eQTLs present under ER stress conditions. Based on sequencing and expression data availability, eQTL analysis was performed on 110 samples, using a one Mb window upand down-stream of a TSS for each gene tested in the microarray studies (7,31), for control and TM conditions. Under control conditions, we found 836,115 eQTLs with a cutoff of padj<0.001. Under TM conditions, we found 823,507 eQTLs with a cutoff of padj<0.001. The majority of these significant eQTLs are unique to a given condition (370,895 shared; 465,220 unique to control; 452,612 unique to TM) (Figure 3.5). To the best of our knowledge, this is the first eQTL analysis performed on human samples after the induction of the ER stress response with TM. This type of analysis provides us with a critical understanding of the genes and regulatory mechanisms that govern the large ER stress response in humans. The large number of eQTLs that are unique to TM conditions highlights just how large and complex the ER stress response is. However, based solely on the eQTL data, we have no indication as to how these SNPs are regulating their target gene’s expression levels. As previously discussed, one way in which this could be happening is through the SNP altering TF binding efficacy and ultimately impacting expression levels. To explore this theory, we next evaluated these eQTLs in the context of our previously discovered binding motifs. We intersected these two datasets and investigated the overlaps. The most significant hit for proteincoding genes was an eQTL within an ATF4 binding motif that was significantly associated with variable TM expression of the gene Ubiquitin Like Modifier Activating 78 Enzyme 6 (UBA6) (p=1.12x10-22) (Figure 3.6A). UBA6 is a ubiquitin activating enzyme that initiates ubiquitination by ATP-dependent activation of ubiquitin. The mammalian proteome encodes two ubiquitin activating enzymes, the canonical E1 enzyme UBA1 and the non-canonical E1 enzyme UBA6 (32, 33). Ubiquitination plays a very critical role in the regulation of the cell’s proteome (34). Therefore, it is fitting for UBA6 to be a UPR target gene, given its role in protein regulation. Individuals who are homozygous for the alternate allele of the significant eQTL SNP displayed, on average, lower levels of UBA6 expression when exposed to TM (Figure 3.6A). This decrease could be due to decreased affinity of ATF4 to this mutated binding site and subsequent decreased expression levels under ER stress conditions. Another of our top significant hits is for the protein coding-gene RABEP1 (Rabaptin, RAB GTPase Binding Effector Protein 1) (p=1.26x10-15) (Figure 3.6B). A SNP within the ATF6 binding portion of an ERSEII motif that is located downstream of RABEP1 is associated with changes in RABEP1 gene expression. RABEP1 acts as linker protein that is involved in endocytic membrane fusion and membrane trafficking of recycling endosomes (35). The ER is in contact with endosomes with crosstalk between the two organelles having impacts on protein trafficking (36). Individuals in the CEPH population who are homozygous for the alternate allele have significantly lower expression levels of RABEP1 under ER stress conditions than individuals who are homozygous for the reference allele (Figure 3.6B). Our data represents a hypothesis of how natural genetic variation within a potential ERSEII binding site alters the transcriptional regulation of RABEP1 under ER stress conditions which may impact protein homeostasis within the cells via endosomal trafficking and ER contact sites. 79 One of our top significant hits with a different expression pattern from our overlap analysis with eQTLs impacting gene expression after TM exposure is a variant within a UPRE motif which impacts the gene Phosphoserine Phosphatase homolog (PSPHL) (p=3.7x10-20) (Figure 3.6C). PSPH is a cytosolic protein that is involved in serine biosynthesis. PSPHL is an uncharacterized homolog that contains a region of exact homology. While there is no functional data in the literature for PSPHL, there have been multiple independent studies that have noted a strong correlation of this homolog to cancer (37–40). The link between the UPR and cancer has long been noted (41). We hypothesize that PSPHL expression is regulated through a UPRE located upstream. This provides a working hypothesis for the interplay of PSPHL in the UPR and cancer. At first glance, a SNP within the binding motif of XBP1 leading to higher expression may seem counterintuitive. However, we have no evidence that this binding motif directly regulates PSPHL expression. If this binding motif encodes a repressor of PSPHL, disruption of the binding motif would lead to lower repression and result in the expression pattern we see. Further functional validation would be required to demonstrate if the putative binding motif nearby the gene of interest is a true binding site of the relevant TF and how the SNP from our eQTL analysis may be impacting binding efficacy and gene expression. However, this is a great start towards identifying novel TF binding sites and ER stress genes that may underlie the variable ER stress response in the human population. 80 Discussion The ER stress response is a large transcriptional response to misfolded proteins with substantial characterization of the transcription factors involved. We took advantage of this characterization and the subsequently validated binding site sequences of these transcription factors to create a computational pipeline to identify more binding sites. ChIP-seq experiments are the most direct way in which TFBSs have been identified in the past (42–44). While this is still a great method to identify TFBSs, one of the major downfalls is the introduction of experimental bias. The binding of a TF to DNA is highly subject to many different factors, such as cell type, developmental stage, chromatin accessibility, and stress state. To get a complete picture of every possible TFBS within the human genome, one would have to perform ChIP-seq on an astronomical number of samples ranging from hundreds of different cell types to hundreds of different developmental stages or cellular stressors. However, one constant within every cell type and every condition is genomic data. We took advantage of this fact by identifying experimentally validated binding sequences throughout the human genome to create an un-biased list of potential TFBSs. We used two filtering steps including distance to nearest protein-coding gene and conservation rates as these introduce no experimental bias. With this un-biased list of potential binding sites, we utilized the GTEx dataset to explore how often experimentally validated cis- eQTLs fell within our predicted TFBSs. This provided us with a list of SNPs that fell within a putative TFBS that has been shown to effect gene expression in specific tissues. Each of these cis- eQTLs that fell within a predicted TFBS had the potential to regulate gene expression through 81 altered TF binding across different tissue types and individuals. Due to the nature of the TFs themselves, this inter-individual variation within these TFBSs could underlie some of the transcriptional variability of the ER stress response and ultimately underlie or modify disease susceptibility. For each of the four binding motifs assayed, we found more genetic variation within these sites than expected by random chance. This raises the question as to why there are so many TFBS genetic variants. One of the leading theories is based on the concept that genetic variants within a TFBS are ultimately selected for in order to adapt to the local environment. This is due to the fact that genetic variants within TFBSs have minimal functional tradeoffs associated with evolutionary changes. This is based on the concept of TFBSs primarily impacting the expression of a single gene in one specific cell- or tissue-type, in contrast to protein-coding variations tending to have broader effects (45, 46). This is further corroborated by studies demonstrating cis- regulatory effects in general having a greater effect on evolution and adaptation (47, 48). Based on previous observations, it can be concluded that genetic variation within TFBSs is a major target for evolutionary forces and a major contributor to different phenotypes within the human population. Genetic variation within TFBSs has been shown to affect diseases and traits such as gout (49), obesity (50), chronic kidney disease, (51), type 2 diabetes (52), dyslipidemia (53), heart disease (54), prostate cancer (55), and many more. Given this, it is not unreasonable to conclude that the increased genetic variation within our TFBSs can also have great effects on disease severity and susceptibility in the human population. If a cis- eQTL falling within a predicted TFBS was completely random, we 82 would expect each position to harbor approximately equal numbers of cis- eQTLs. However, in our study we found mutational “hotspots”. These are positions within a predicted TFBS that had more cis- eQTLs than any of the other positions. Upon closer examination, we discovered the hotspots in each binding motif assayed was a CpG dinucleotide. The presence of CpG dinucleotides in each of the four motifs, despite the rarity of CpG dinucleotides throughout the human genome, might imply a functionality of these sites in terms of increased mutational rates. These increased mutation rates at these sites correspond to mutations that have a greater effect on gene expression, as demonstrated by the GTEx dataset. To further explore this, expression levels will need to be assayed after a CpG site is mutated in each of the four motifs. If expression levels of the affected gene is altered when the C of the CpG dinucleotide is mutated to a T, this will provide support for the theory that the methylation state and subsequent C→T mutation can be used as a regulatory control over TF binding, TF activity, and gene expression. One of the main caveats of the GTEx dataset is that each sample is under control conditions. Chapter 2 of this dissertation and previous studies have shown that the majority of regulatory mechanisms are unique to either a control or stressed state (6). To address this, we utilized a dataset that exposed human cell lines to ER stress and the whole-genome sequencing data of the humans the cell lines were derived from. With this data, we were able to perform an eQTL analysis on humans after the induction of ER stress. This eQTL dataset alone represents a significant step towards understanding the regulatory mechanisms governing the variable ER stress response in humans. This study is also significant because this is the first eQTL analysis performed on human cell 83 lines after the induction of ER stress using TM. With this dataset, we highlighted two genes, UBA6 and RABEP1, that followed a pattern of lower expression under ER stress conditions in individuals with a SNP within the predicted binding site of a UPR transcription factor due to decreased TF binding efficacy. This provides a potential mechanism in which the associated eQTL SNP is exerting its effect on the expression levels of the given gene. This also provides a good basis for which the gene highlighted may be a target of the UPR and a modifier of the ER stress response. We also highlighted one gene, PSPHL, that did not follow the expected expression pattern given a SNP present within a predicted TFBS. We provided a hypothesis as to what may be causing this expression pattern, but further experimental validation will be required to elucidate the mechanism. Overall, these three genes represent a small subset of the dataset generated through our pipeline. Our results are a great starting point towards identifying novel TFBSs and genes involved in the UPR that are affected by genetic variation. The significant eQTLs that alter expression in these individuals that do not fall within one of our TFBS may still have an impact on TF binding and activity. Variants outside these motifs may influence TF binding through altering the local shape of the DNA, influencing the binding of other TFs that can form complexes with the assayed TF, or by altering distances to other TF binding sites (56, 57). While it is outside the scope of this study to investigate this further, it is something to note that the eQTLs outside the TFBSs we assayed may still impact TF binding in other ways. This study uses a computational pipeline to create a list of unbiased potential TFBSs. We combine this list with datasets such as GTEx, microarray expression data, 84 and sequencing data. However, we realize that there are other available datasets which can be utilized. ClinVar is an online database for recording variants that have been shown to be clinically pathogenic for a number of different disorders (58). This dataset can be utilized to investigate how often a pathogenic variant falls within a predicted TFBS, which base, and which disease it is known to cause. GWAS investigate genetic associations with diseases and other traits. These genetic associations are often within non-coding regions and could be within predicted TFBSs that lead to increased susceptibility of the disease studied. However, the caveat with this is that many GWAS studies only identify SNPs that are linked to the causative SNP, and are not the causative SNP themselves. Finally, we can utilize datasets that investigate the amount of genetic variation present in a healthy population, such as 1000 Genomes Project and gnomAD (59, 60). This would allow us to verify the hypothesis that if these sites were randomly generated across the human genome, so too would natural genetic variation be randomly distributed across each of the positions of the motif. However, if our dataset represents true potential TFBSs, we might expect to see either less genetic variation within these sites or genetic variation restricted to certain positions. Further analysis can be done with this dataset that are not outlined here. This dataset and subsequent analysis represent a step towards utilizing both computational and experimental data to identify and analyze transcription factor binding sites, specifically in the context of the ER stress response. 85 Methods Motif scan across the human genome We used PWMScan (https://ccg.epfl.ch/pwmtools/pwmscan.php) to identify positions of the four predicted binding sites across the human genome (GRCh37/hg19) (61). Overlapping matches were included. The p-value cut-off used for each motif were: ERSEI: p≥1.6x10-05; ERSEII: p≥1.6x10-05; UPRE: p≥1.2x10-04; and ATF4: p≥1.9x10-05. These p-values allow one ‘wobble’ base for the next most likely base at that position. The input for PWMScan for each motif was a letter-probability matrix (LPM) generated with data from JASPAR (http://jaspar.genereg.net/). LPM matrices listed below: >letter-probability matrix: ERSEI 0.004767 0.965181 0.007047 0.023005 0.006425 0.976788 0.005803 0.010984 0.970570 0.006010 0.003523 0.019896 0.950052 0.019482 0.025699 0.004767 0.007668 0.027358 0.002280 0.962694 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.143906 0.850220 0.000734 0.005140 0.012521 0.966611 0.000000 0.020868 0.997416 0.000000 0.002584 0.000000 0.000000 0.998276 0.000000 0.001724 0.000863 0.000000 0.999137 0.000000 >letter-probability matrix: ERSEII 0.962694 0.002280 0.027358 0.007668 0.004767 0.025699 0.019482 0.950052 0.019896 0.003523 0.006010 0.970570 0.010984 0.005803 0.976788 0.006425 0.023005 0.007047 0.965181 0.004767 86 0.25 0.25 0.25 0.25 0.143906 0.850220 0.000734 0.012521 0.966611 0.000000 0.997416 0.000000 0.002584 0.000000 0.998276 0.000000 0.000863 0.000000 0.999137 0.005140 0.020868 0.000000 0.001724 0.000000 >letter-probability matrix: UPRE 0.473381 0.487458 0.007551 0.031610 0.020875 0.896125 0.027750 0.055250 0.950188 0.002060 0.033450 0.014301 0.001220 0.997153 0.000271 0.001356 0.004555 0.000000 0.994569 0.000876 0.000000 0.001752 0.000876 0.997372 0.049185 0.940202 0.006083 0.004530 0.962366 0.005771 0.020801 0.011062 >letter-probability matrix: ATF4 0.658084 0.208033 0.130793 0.003090 0.000000 0.000000 0.007085 0.992915 0.016964 0.000000 0.973214 0.009821 0.972618 0.000000 0.000000 0.027382 0.002345 0.436108 0.010551 0.550996 0.000902 0.000000 0.995491 0.003607 0.039722 0.832175 0.015889 0.112214 0.973941 0.022801 0.000000 0.003257 0.991071 0.003348 0.001116 0.004464 0.000000 0.255278 0.059501 0.685221 The results of this search resulted in 224,244 motifs (ERSEI: 20,565; ERSEII: 18,410; UPRE: 60,202; ATF4: 125,067). GTEx dataset We used data provided by GTEx (https://gtexportal.org/home/) to investigate tissue-specific genetic variants that impact expression levels. We used GTEx v7 data that was aligned to the hg19 human genome. Within this dataset there was cis- eQTL 87 data for 48 tissues. For each tissue, we overlapped significant cis- eQTLs with our putative TFs across the human genome. We did this using bedtools intersect v2.28.0 (62). Random motif dataset generation To have a comparison for overlap between human datasets and predicted motifs, we created a dataset representing randomly generated regions of the genome. These randomly generated regions represent the same number of predicted motifs (224,244) across the genome. These regions are also the same length and number as their respective motif that they are simulating (ERSEI: length 10, count 20,565; ERSEII: length 10, count 18,410; UPRE: length 10, count 60,202; ATF4: length 8, count 125,067). ERSEI & II are length 10 given that the middle nucleotides (nine for ERSEI; one for ERSEII) have no impact on TF binding and were not included in the overlap comparison of predicted motifs. Only the first 5 and last 5 nucleotides were included in the overlap comparison, thus a length of 10 was used. To generate these random regions across the human genome (GRCh37/hg19), we used the random command of bedtools v2.28.0 (62). Then used the intersect command of bedtools to assess overlap between the randomly generated regions and the dataset of interest. This generation of random regions across the genome and subsequent overlap with a dataset was ran 1000 times to generate a better representation of what overlap would be expected by random chance. 88 Human conservation scores To determine the conservation of a region of DNA, namely the predicted binding site, we utilized PhastCons scores (20, 21). To calculate this, we used the bigWigAverageOverBed tool from Kent utilities (63). To determine the average conservation of each base of each motif, we utilized PhyloP scores (64). To calculate the PhyloP score of each position, we similarly used the bigWigAverageOverBed tool from Kent utilities (63). eQTL analysis on the CEPH database Expression data for cell lines derived from CEPH individuals were obtained from GEO (https://www.ncbi.nlm.nih.gov/geo/) (Accession: GSE36911 & Accession: GSE36910). These datasets were generated from two previous studies that looked at gene expression of these cell lines after the induction of ER stress(7, 31). CEPH DNA samples were whole-genome sequenced at an average depth of 30X using Illumina paired-end technology (65). Variant-call information (VCFs) is available at dbGaP under controlled access, with accession phs001872.v1.p1. With both expression and sequencing data for a subset of individuals within the CEPH database, we ran an eQTL analysis using FastQTL (66). We defined the mapping window as 1 Mb up- and downstream of the TSS of a gene with a p-value cutoff of p<0.001. We then identified which significant cis- eQTLs overlapped putative binding sites by using bedtools intersect as described previously. 89 Figure 3.1 Pipeline for identifying potential transcription factor binding sites in the human genome. 90 Figure 3.2 Letter probability matrices for each of the binding motifs used in our pipeline search. 91 Figure 3.3 Amount of GTEx cis- eQTLs at each position for all four binding motifs. 92 Figure 3.4 PhyloP conservation for each position within each binding motif. The PhyloP score for each position is plotted for ERSEI (A), ERSEII (B), URPE (C), and ATF4 (D). 93 Figure 3.4 Continued. 94 Figure 3.5 The majority of significant cis- eQTLs found are unique to a given condition, either control or TM. 95 Figure 3.6 Significant eQTLs that fall within a predicted TFBS. The Log2Intensity under TM conditions is plotted for each genotype for Uba6 (A), Rabep1 (B), and Psphl (C). 96 Table 3.1 Number of overlaps between predicted TFBSs and GTEx cis- eQTLs 97 Table 3.1 Continued. 98 Table 3.2 Number of overlaps expected between randomly generated regions across the genome and GTEx cis- eQTLs 99 Table 3.2 Continued. 100 References 1. B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, P. Walter, Molecular Biology of the Cell (Garland Science, ed. 4, 2002). 2. J. H. Lin, P. Walter, T. S. B. Yen, Endoplasmic reticulum stress in disease pathogenesis. Annu. Rev. Pathol. Mech. Dis. 3, 399–425 (2008). 3. D. Ron, P. Walter, Signal integration in the endoplasmic reticulum unfolded protein response. Nat. Rev. Mol. Cell Biol. 8, 519–529 (2007). 4. P. Walter, D. Ron, The unfolded protein response: from stress pathway to homeostatic regulation. Science 334, 1081–1086 (2011). 5. C. Y. Chow, M. F. Wolfner, A. G. Clark, Using natural variation in Drosophila to discover previously unknown endoplasmic reticulum stress genes. Proc. Natl. Acad. Sci. U.S.A. 110, 9013–8 (2013). 6. C. Y. Chow, X. Wang, D. Riccardi, M. F. Wolfner, A. G. Clark, The genetic architecture of the genome-wide transcriptional response to ER stress in the mouse. PLoS Genet. 11, e1004924 (2015). 7. B. A. Dombroski, R. R. Nayak, K. G. Ewens, W. Ankener, V. G. Cheung, R. S. Spielman, Gene expression and genetic variation in response to endoplasmic reticulum stress in human cells. Am. J. Hum. Genet. 86, 719–729 (2010). 8. P. J. Wittkopp, G. Kalay, Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet. 13, 59–69 (2011). 9. V. Boeva, Analysis of genomic sequence motifs for deciphering transcription factor binding and transcriptional regulation in eukaryotic cells. Front. Genet. 7, 24 (2016). 10. A. Martin-Trujillo, N. Patel, F. Richter, B. Jadhav, P. Garg, S. U. Morton, D. M. McKean, S. R. DePalma, E. Goldmuntz, D. Gruber, R. Kim, J. W. Newburger, G. A. P. Jr., A. Giardini, D. Bernstein, M. Tristani-Firouzi, J. G. Seidman, C. E. Seidman, W. K. Chung, B. D. Gelb, A. J. Sharp, Rare genetic variation at transcription factor binding sites modulates local DNA methylation profiles. PLOS Genet. 16, e1009189 (2020). 11. A. D. Johnston, C. A. Simões-Pires, T. V. Thompson, M. Suzuki, J. M. Greally, Functional genetic variants can mediate their regulatory effects through alteration of transcription factor binding. Nat. Commun. 10, 1–16 (2019). 12. H. Yoshida, K. Haze, H. Yanagi, T. Yura, K. Mori, Identification of the cisacting endoplasmic reticulum stress response element responsible for transcriptional induction of mammalian glucose-regulated proteins: involvement 101 of basic leucine zipper transcription factors. J. Biol. Chem. 273, 33741–33749 (1998). 13. K. Kokame, H. Kato, T. Miyata, Identification of ERSE-II, a new cis-acting element responsible for the ATF6-dependent mammalian unfolded protein response. J. Biol. Chem. 276, 9199–9205 (2001). 14. K. Mori, T. Kawahara, H. Yoshida, H. Yanagi, T. Yura, Signalling from endoplasmic reticulum to nucleus: transcription factor with a basic-leucine zipper motif is required for the unfolded protein-response pathway. Genes Cells 1, 803– 817 (1996). 15. K. Mori, A. Sant, K. Kohno, K. Normington, M. J. Gething, J. F. Sambrook, A 22 bp cis-acting element is necessary and sufficient for the induction of the yeast KAR2 (BiP) gene by unfolded proteins. EMBO J. 11, 2583 (1992). 16. F. Siu, P. J. Bain, R. Leblanc-Chaffin, H. Chen, M. S. Kilberg, ATF4 is a mediator of the nutrient-sensing response pathway that activates the human asparagine synthetase gene. J. Biol. Chem. 277, 24120–24127 (2002). 17. P. Baumeister, S. Luo, W. C. Skarnes, G. Sui, E. Seto, Y. Shi, A. S. Lee, Endoplasmic reticulum stress induction of the Grp78/BiP promoter: activating mechanisms mediated by YY1 and its interactive chromatin modifiers. Mol. Cell. Biol. 25, 4529 (2005). 18. F.M. Wang, H.J. Ouyang, Regulation of unfolded protein response modulator XBP1s by acetylation and deacetylation. Biochem. J. 433, 245 (2011). 19. Y. Ma, J. W. Brewer, J. Alan Diehl, L. M. Hendershot, Two distinct stress signaling pathways converge upon the CHOP promoter during the mammalian unfolded protein response. J. Mol. Biol. 318, 1351–1365 (2002). 20. J. Felsenstein, G. A. Churchill, A Hidden Markov Model approach to variation among sites in rate of evolution. Mol. Biol. Evol. 13, 93–104 (1996). 21. A. Siepel, G. Bejerano, J. S. Pedersen, A. S. Hinrichs, M. Hou, K. Rosenbloom, H. Clawson, J. Spieth, L. W. Hillier, S. Richards, G. M. Weinstock, R. K. Wilson, R. A. Gibbs, W. J. Kent, W. Miller, D. Haussler, Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034 (2005). 22. GTEx Consortium, Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017). 23. International Human Genome Sequencing Consortium, Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001). 24. A. Hodgkinson, A. Eyre-Walker, Variation in the mutation rate across 102 mammalian genomes. Nat. Rev. Genet. 12, 756–766 (2011). 25. K. J. Fryxell, W.J. Moon, CpG mutation rates in the human genome are highly dependent on local GC content. Mol. Biol. Evol. 22, 650–658 (2005). 26. D. G. Hwang, P. Green, Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc. Natl. Acad. Sci. U.S.A. 101, 13994–14001 (2004). 27. C. Coulondre, J. H. Miller, P. J. Farabaugh, W. Gilbert, Molecular basis of base substitution hotspots in Escherichia coli. Nature 274, 775–780 (1978). 28. A. P. Bird, CpG-rich islands and the function of DNA methylation. Nature 321, 209–213 (1986). 29. S. Saxonov, P. Berg, D. L. Brutlag, A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc. Natl. Acad. Sci. U.S.A. 103, 1412–1417 (2006). 30. J. Dausset, H. Cann, D. Cohen, M. Lathrop, J. M. Lalouel, R. White, Centre d’Etude du polymorphisme humain (CEPH): collaborative genetic mapping of the human genome. Genomics. 6, 575–577 (1990). 31. R. R. Nayak, W. E. Bernal, J. W. Lee, M. J. Kearns, V. G. Cheung, Stressinduced changes in gene interactions in human cells. Nucleic Acids Res. 42, 1757–1771 (2014). 32. X. Liu, L. Sun, D. B. Gursel, C. Cheng, S. Huang, A. W. Rademaker, S. A. Khan, J. Yin, H. Kiyokawa, The non-canonical ubiquitin activating enzyme UBA6 suppresses epithelial-mesenchymal transition of mammary epithelial cells. Oncotarget 8, 87480 (2017). 33. B. A. Schulman, J. W. Harper, Ubiquitin-like protein activation by E1 enzymes: the apex for downstream signalling pathways. Nat. Rev. Mol. Cell Biol. 10, 319 (2009). 34. K. N. Swatek, D. Komander, Ubiquitin modifications. Cell Res. 2016 264. 26, 399–422 (2016). 35. H. Stenmark, G. Vitale, O. Ullrich, M. Zerial, Rabaptin-5 is a direct effector of the small GTPase Rab5 in endocytic membrane fusion. Cell 83, 423–432 (1995). 36. A. A. Rowland, P. J. Chitwood, M. J. Phillips, G. K. Voeltz, ER contact sites define the position and timing of endosome fission. Cell 159, 1027–1041 (2014). 37. D. N. Martin, B. J. Boersma, M. Yi, M. Reimers, T. M. Howe, H. G. Yfantis, Y. C. Tsai, E. H. Williams, D. H. Lee, R. M. Stephens, A. M. Weissman, S. Ambs, Differences in the tumor microenvironment between African-American and 103 European-American breast cancer patients. PLoS One. 4, e4531 (2009). 38. T. A. Wallace, R. L. Prueitt, M. Yi, T. M. Howe, J. W. Gillespie, H. G. Yfantis, R. M. Stephens, N. E. Caporaso, C. A. Loffredo, S. Ambs, Tumor Immunobiological differences in prostate cancer between African-American and European-American Men. Cancer Res. 68, 927–936 (2008). 39. P. Wei, L. Milbauer, J. Enenstein, J. Nguyen, W. Pan, R. Hebbel, Differential endothelial cell gene expression by African Americans versus Caucasian Americans: a possible contribution to health disparity in vascular disease and cancer. BMC Med. 2011 91. 9, 1–18 (2011). 40. J. E. Allard, G. V. R. Chandramouli, K. Stagliano, B. L. Hood, T. Litzi, Y. Shoji, J. Boyd, A. Berchuck, T. P. Conrads, G. L. Maxwell, J. I. Risinger, Analysis of PSPHL as a candidate gene influencing the racial disparity in endometrial cancer. Front. Oncol. 2, 65 (2012). 41. R. K. Yadav, S.-W. Chae, H.R. Kim, H. J. Chae, Endoplasmic reticulum stress and cancer. J. Cancer Prev. 19, 75 (2014). 42. P. J. Farnham, Insights from genomic profiling of transcription factors. Nat. Rev. Genet. 10, 605–616 (2009). 43. P. J. Park, ChIP–seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10, 669–680 (2009). 44. T. S. Furey, ChIP–seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions. Nat. Rev. Genet. 2012 1312. 13, 840–852 (2012). 45. L. Arbiza, I. Gronau, B. A. Aksoy, M. J. Hubisz, B. Gulko, A. Keinan, A. Siepel, Genome-wide inference of natural selection on human transcription factor binding sites. Nat. Genet. 45, 723–729 (2013). 46. C. C. Tseng, M. C. Wong, W. T. Liao, C. J. Chen, S. C. Lee, J. H. Yen, S. J. Chang, Genetic variants in transcription factor binding sites in humans: triggered by natural selection and triggers of diseases. Int. J. Mol. Sci. 22, 4187 (2021). 47. C. D. Meiklejohn, J. D. Coolon, D. L. Hartl, P. J. Wittkopp, The roles of cis- and trans-regulation in the evolution of regulatory incompatibilities and sexually dimorphic gene expression. Genome Res. 24, 84–95 (2014). 48. P. J. Wittkopp, B. K. Haerum, A. G. Clark, Regulatory changes underlying expression differences within and between Drosophila species. Nat. Genet. 40, 346–350 (2008). 49. C. McKinney, L. K. Stamp, N. Dalbeth, R. K. Topless, R. O. Day, D. R. Kannangara, K. M. Williams, M. Janssen, T. L. Jansen, L. A. Joosten, T. R. 104 Radstake, P. L. Riches, A.K. Tausche, F. Lioté, A. So, T. R. Merriman, Multiplicative interaction of functional inflammasome genetic variants in determining the risk of gout. Arthritis Res. Ther. 17, 288 (2015). 50. M. Claussnitzer, S. N. Dankel, K.H. Kim, G. Quon, W. Meuleman, C. Haugen, V. Glunk, I. S. Sousa, J. L. Beaudry, V. Puviindran, N. A. Abdennur, J. Liu, P.A. Svensson, Y.H. Hsu, D. J. Drucker, G. Mellgren, C.C. Hui, H. Hauner, M. Kellis, FTO obesity variant circuitry and adipocyte browning in humans. N. Engl. J. Med. 373, 895 (2015). 51. J. W. Prokop, N. C. Yeo, C. Ottmann, S. B. Chhetri, K. L. Florus, E. J. Ross, N. Sosonkina, B. A. Link, B. I. Freedman, C. J. Coppola, C. McDermott-Roe, S. Leysen, L.G. Milroy, F. A. Meijer, A. M. Geurts, F. J. Rauscher, III, R. Ramaker, M. J. Flister, H. J. Jacob, E. M. Mendenhall, J. Lazar, Characterization of coding/noncoding variants for SHROOM3 in patients with CKD. J. Am. Soc. Nephrol. 29, 1525 (2018). 52. M. P. Fogarty, M. E. Cannon, S. Vadlamudi, K. J. Gaulton, K. L. Mohlke, Identification of a regulatory variant that binds FOXA1 and FOXA2 at the CDC123/CAMK1D type 2 diabetes GWAS locus. PLoS Genet. 10, 1004633 (2014). 53. G. Cui, M. Tian, S. Hu, Y. Wang, D. W. Wang, Identifying functional noncoding variants in APOA5/A4/C3/A1 gene cluster associated with coronary heart disease. J. Mol. Cell. Cardiol. 144, 54–62 (2020). 54. M. E. Reschen, K. J. Gaulton, D. Lin, E. J. Soilleux, A. J. Morris, S. S. Smyth, C. A. O’Callaghan, Lipid-induced epigenomic changes in human macrophages identify a coronary artery disease-associated variant that regulates PPAP2B expression through altered C/EBP-beta binding. PLoS Genet. 11 (2015). 55. Q. Huang, T. Whitington, P. Gao, J. F. Lindberg, Y. Yang, J. Sun, M. R. Väisänen, R. Szulkin, M. Annala, J. Yan, L. A. Egevad, K. Zhang, R. Lin, A. Jolma, M. Nykter, A. Manninen, F. Wiklund, M. H. Vaarala, T. Visakorpi, J. Xu, J. Taipale, G. H. Wei, A prostate cancer susceptibility allele at 6q22 increases RFX6 expression by modulating HOXB13 chromatin binding. Nat. Genet. 46, 126–135 (2014). 56. J. Chang, Y. Zhou, X. Hu, L. Lam, C. Henry, E. M. Green, R. Kita, M. S. Kobor, H. B. Fraser, The molecular mechanism of a cis-regulatory adaptation in yeast. PLOS Genet. 9, e1003813 (2013). 57. F. W. Albert, L. Kruglyak, The role of regulatory variation in complex traits and disease. Nat. Rev. Genet. 16, 197–212 (2015). 58. M. J. Landrum, J. M. Lee, M. Benson, G. R. Brown, C. Chao, S. Chitipiralla, B. Gu, J. Hart, D. Hoffman, W. Jang, K. Karapetyan, K. Katz, C. Liu, Z. Maddipatla, A. Malheiro, K. McDaniel, M. Ovetsky, G. Riley, G. Zhou, J. B. 105 Holmes, B. L. Kattman, D. R. Maglott, ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018). 59. K. J. Karczewski, L. C. Francioli, G. Tiao, B. B. Cummings, J. Alföldi, Q. Wang, R. L. Collins, K. M. Laricchia, A. Ganna, D. P. Birnbaum, L. D. Gauthier, H. Brand, M. Solomonson, N. A. Watts, D. Rhodes, M. Singer-Berk, E. M. England, E. G. Seaby, J. A. Kosmicki, R. K. Walters, K. Tashman, Y. Farjoun, E. Banks, T. Poterba, A. Wang, C. Seed, N. Whiffin, J. X. Chong, K. E. Samocha, E. Pierce-Hoffman, Z. Zappala, A. H. O’Donnell-Luria, E. V. Minikel, B. Weisburd, M. Lek, J. S. Ware, C. Vittal, I. M. Armean, L. Bergelson, K. Cibulskis, K. M. Connolly, M. Covarrubias, S. Donnelly, S. Ferriera, S. Gabriel, J. Gentry, N. Gupta, T. Jeandet, D. Kaplan, C. Llanwarne, R. Munshi, S. Novod, N. Petrillo, D. Roazen, V. Ruano-Rubio, A. Saltzman, M. Schleicher, J. Soto, K. Tibbetts, C. Tolonen, G. Wade, M. E. Talkowski, B. M. Neale, M. J. Daly, D. G. MacArthur, The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). 60. L. Clarke, S. Fairley, X. Zheng-Bradley, I. Streeter, E. Perry, E. Lowy, A.M. Tassé, P. Flicek, The international genome sample resource (IGSR): a worldwide collection of genome variation incorporating the 1000 Genomes Project data. Nucleic Acids Res. 45, D854–D859 (2017). 61. G. Ambrosini, R. Groux, P. Bucher, PWMScan: a fast tool for scanning entire genomes with a position-specific weight matrix. Bioinformatics 34, 2483–2484 (2018). 62. A. R. Quinlan, I. M. Hall, BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 841–842 (2010). 63. W. J. Kent, A. S. Zweig, G. Barber, A. S. Hinrichs, D. Karolchik, BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26, 2204–2207 (2010). 64. K. S. Pollard, M. J. Hubisz, K. R. Rosenbloom, A. Siepel, Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110– 121 (2010). 65. T. A. Sasani, B. S. Pedersen, Z. Gao, L. Baird, M. Przeworski, L. B. Jorde, A. R. Quinlan, Large, three-generation human families reveal post-zygotic mosaicism and variability in germline mutation accumulation. Elife. 8, e46922 (2019). 66. H. Ongen, A. Buil, A. A. Brown, E. T. Dermitzakis, O. Delaneau, Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2016). CHAPTER 4 CONCLUSION The ER stress response has long been studied for its role in disease pathogenesis (1). This is due to its pervasive role in a wide variety of human diseases such as diabetes (2–4), viral infections (5), multiple types of neurodegeneration (6–8), and multiple types of cancer (9–11). Genes that modify the ER stress response could therefore be modifiers of disease severity and potentially be used as therapeutic targets. One method in which we can uncover these modifier genes is by utilizing natural genetic variation. Natural genetic variation is comprised of genetic variants that have arisen in a population, many of which affect gene expression levels, and these changes are tolerated in a healthy organism. Natural genetic variation can also help us to identify the regulatory effects that lead to these variable expression patterns. Instead of experimentally mutating a gene and the region surrounding it to increase or decrease its expression levels, we can simply identify the regulatory effects already present in the population that controls the variable levels of the gene. Given the conservation of the ER stress response, we can even utilize natural genetic variation in multiple different populations, such as Drosophila, mouse, and human. The results of Chapter 2 identify pathways and genes that exhibit transcriptional variability dependent on tissue and genetic background. This represents genes and 107 pathways that can tolerate variability in terms of their response to ER stress and still results in a viable organism. This is especially important given that it has been shown that variability in vital ER stress components can lead to severe disease and death. For example, both missense and nonsense mutations in PERK, a vital component of the UPR, causes Wolcott-Rallison syndrome. This disease causes infancy-onset diabetes, mental retardation, developmental delay, retinal dysfunction, and abnormal bone growth (2). Most patients do not survive to adulthood. This supports the hypothesis that it is not one or two mutations in canonical essential ER stress genes, but multiple variants of small effect size that in combination contribute to variable susceptibility in the human population. These small effect genes could be good therapeutic targets for diseases with a large ER stress component. In addition to identifying the genes that are variable in response to ER stress, chapter 2 of this dissertation also identifies the cis- and trans- mechanisms that underlie this variability. While previous studies have investigated the cis- and trans- mechanisms underlying the variable ER stress response, this study is the first to do so in an in vivo mouse model (12). This provides the ability to study this in a more relevant, physiological model as well as investigate the effects of tissue-type on these patterns. Tissue-type has a large effect on gene expression and the genetic variants underlying gene expression patterns can be highly tissue specific, as demonstrated by both large databases such as GTEx and chapter 2 of this dissertation (13). We found there to be striking differences in terms of the strength of these regulatory effects dependent on both tissue-type and whether it acts in cis- or trans-. An essential next step will be to include more tissue-types and more genetic backgrounds to gain a more comprehensive view of 108 the regulatory mechanisms governing the ER stress response. The results of Chapter 3 demonstrate a computational approach to identifying binding sites of the three main UPR transcription factors. This provides an un-biased approach that aims to uncover all potential binding sites in the human genome and the genes it may regulate. To determine binding sites that are more likely to be true binding sites while not introducing any bias, we filtered on genomic position relative to a proteincoding gene and conservation. By using these filters, we greatly narrowed down the candidate list of TFBSs. This represents a good starting point towards functionally validating the binding sites themselves and the role of the gene it regulates in the ER stress response. Further work done in Chapter 3 of this dissertation utilized the GTEx dataset to identify which of our putative TFBSs overlapped with experimentally derived eQTLs. This provided us with a list of eQTLs from each tissue that could be regulating gene expression by impacting TF binding dynamics. With this dataset, we were able to identify positions within each binding motif that harbored more eQTLs than any other position. These mutational hotspots were CpG dinucleotides. We hypothesize that these CpG dinucleotides serve as a mechanism by which mutations can easily be introduced into each binding motif which can subsequently alter TF binding and gene expression. This mechanism would be extremely useful in adapting to different environments during evolution. In the human population, this could also serve to contribute to variable disease severity and susceptibility. Chapter 3 also represents the first eQTL analysis done on human samples experiencing high levels of ER stress through TM-induction. eQTL analysis is a great 109 first step towards identifying the genetic components that control expression of ER stress response genes, specifically seen only under ER stress conditions. We found that the majority of eQTLs were unique to a given condition, control or ER stress. Additionally, we found many of these fell within putative TF binding sites. This provides us with a working hypothesis as to how the eQTL SNPs exert their influence on gene expression levels. This dissertation represents a utilization of multiple different tools, such as natural genetic variation, mouse models, RNA-sequencing, computational pipelines, and online databases, to investigate the genetics of the variable ER stress response. This dissertation also lays a strong foundation towards identifying the genes responsible. Further functional validation of the genes nominated in this study will begin to increase our knowledge of the genes and modifiers of the UPR. 110 References 1. J. H. Lin, P. Walter, T. S. B. Yen, Endoplasmic reticulum stress in disease pathogenesis. Annu. Rev. Pathol. Mech. Dis. 3, 399–425 (2008). 2. C. Julier, M. Nicolino, Wolcott-Rallison syndrome. Orphanet J. Rare Dis. 5, 29 (2010). 3. S. Oyadomari, A. Koizumi, K. Takeda, T. Gotoh, S. Akira, E. Araki, M. Mori, Targeted disruption of the Chop gene delays endoplasmic reticulum stressmediated diabetes. J. Clin. Invest. 109, 525–32 (2002). 4. D. R. Laybutt, A. M. Preston, M. C. Åkerfeldt, J. G. Kench, A. K. Busch, A. V. Biankin, T. J. Biden, Endoplasmic reticulum stress contributes to beta cell apoptosis in type 2 diabetes. Diabetologia. 50, 752–763 (2007). 5. D. Baltzis, L.K. Qu, S. Papadopoulou, J. D. Blais, J. C. Bell, N. Sonenberg, A. E. Koromilas, Resistance to vesicular stomatitis virus infection requires a functional cross talk between the eukaryotic translation initiation factor 2α kinases PERK and PKR. J. Virol. 78, 12747–12761 (2004). 6. E. J. Ryu, H. P. Harding, J. M. Angelastro, O. V. Vitolo, D. Ron, L. A. Greene, Endoplasmic reticulum stress and the unfolded protein response in cellular models of Parkinson’s disease. J. Neurosci. 22, 10690–10698 (2002). 7. J. J. M. Hoozemans, R. Veerhuis, E. S. Van Haastert, J. M. Rozemuller, F. Baas, P. Eikelenboom, W. Scheper, The unfolded protein response is activated in Alzheimer’s disease. Acta Neuropathol. 110, 165–172 (2005). 8. H. D. Ryoo, P. M. Domingos, M. J. Kang, H. Steller, Unfolded protein response in a Drosophila model for retinal degeneration. EMBO J. 26, 242–252 (2007). 9. O. Pluquet, L.K. Qu, D. Baltzis, A. E. Koromilas, Endoplasmic reticulum stress accelerates p53 degradation by the cooperative actions of Hdm2 and glycogen synthase kinase 3β. Mol. Cell. Biol. 25, 9392–9405 (2005). 10. A. Forus, V. A. Flørenes, G. M. Maelandsmo, Ø. Fodstad, O. Myklebost, The protooncogene CHOP/GADD153, involved in growth arrest and DNA damage response, is amplified in a subset of human sarcomas. Cancer Genet. Cytogenet. 78, 165–171 (1994). 11. M. Bi, C. Naczki, M. Koritzinsky, D. Fels, J. Blais, N. Hu, H. Harding, I. Novoa, M. Varia, J. Raleigh, D. Scheuner, R. J. Kaufman, J. Bell, D. Ron, B. G. Wouters, C. Koumenis, ER stress-regulated translation increases tolerance to extreme hypoxia and promotes tumor growth. EMBO J. 24, 3470–3481 (2005). 12. C. Y. Chow, X. Wang, D. Riccardi, M. F. Wolfner, A. G. Clark, The genetic architecture of the genome-wide transcriptional response to ER stress in the 111 mouse. PLoS Genet. 11, e1004924 (2015). 13. GTEx Consortium, Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017). |
| Reference URL | https://collections.lib.utah.edu/ark:/87278/s6hn0tvd |



