Data-driven methods in earthquake monitoring, detection, and catalog building

Data-driven methods in earthquake monitoring, detection, and catalog building

Title	Data-driven methods in earthquake monitoring, detection, and catalog building
Publication Type	dissertation
School or College	College of Mines & Earth Sciences
Department	Geology & Geophysics
Author	Linville, Lisa Mae
Date	2018
Description	Seismic catalogs are one of the most important products of seismic network operations, allowing quantitative assessment of event statistics for assessing stress state, and stress transfer and release in the crust. As seismic networks change in scope, and the problems addressed through the use of catalogs expand, new strategies for event detection and catalog building are needed. Here we present new methods for detecting and locating small earthquakes from seismic networks with large station spacing (50-70 km). We use our newly developed method to demonstrate some of the limitations in recovering complete faulting histories with cataloged earthquakes. Our work suggests that small earthquakes, even when they have waveforms similar to those of larger earthquakes, contain valuable information for interpreting local fault structures. We also develop automated strategies for source discrimination. Long-term earthquake catalogs are manually curated to discriminate tectonic events from other event types. The separation of events by source type is important because different sources reflect different physical processes, some of which are anthropogenically driven. We demonstrate that various deep learning architectures are able to replicate analyst decisions (above 99%) and can identify analyst errors in existing catalogs.
Type	Text
Publisher	University of Utah
Subject	Geophysics
Dissertation Name	Doctor of Philosophy
Language	eng
Rights Management	© Lisa Mae Linville
Format	application/pdf
Format Medium	application/pdf
ARK	ark:/87278/s6fr4zqn
Setname	ir_etd
ID	1525829
OCR Text	Show DATA-DRIVEN METHODS IN EARTHQUAKE MONITORING, DETECTION, AND CATALOG BUILDING by Lisa Mae Linville A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Geophysics Department of Geology and Geophysics The University of Utah May 2018 Copyright © Lisa Mae Linville 2018 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of______________Lisa Mae Linville__________________ has been approved by the following supervisory committee members: _____________ Kristine L. Pankow ____, Chair __12/18/2017____ Date Approved _____________Keith D. Koper ________, Member __12/18/2017____ Date Approved _____________Jeffrey R. Moore_______, Member __12/18/2017____ Date Approved _____________Fan-Chi Lin__________, Member __12/18/2017____ Date Approved _____________Debi L. Kilb__________, Member __12/18/2017____ Date Approved and by_________Thure Cerling___________, Chair of the Department of_____________Geology and Geophysics________________ and by David B. Kieda, Dean of The Graduate School. ABSTRACT Seismic catalogs are one of the most important products of seismic network operations, allowing quantitative assessment of event statistics for assessing stress state, and stress transfer and release in the crust. As seismic networks change in scope, and the problems addressed through the use of catalogs expand, new strategies for event detection and catalog building are needed. Here we present new methods for detecting and locating small earthquakes from seismic networks with large station spacing (50-70 km). We use our newly developed method to demonstrate some of the limitations in recovering complete faulting histories with cataloged earthquakes. Our work suggests that small earthquakes, even when they have waveforms similar to those of larger earthquakes, contain valuable information for interpreting local fault structures. We also develop automated strategies for source discrimination. Long-term earthquake catalogs are manually curated to discriminate tectonic events from other event types. The separation of events by source type is important because different sources reflect different physical processes, some of which are anthropogenically driven. We demonstrate that various deep learning architectures are able to replicate analyst decisions (above 99%) and can identify analyst errors in existing catalogs. This work is dedicated to my strong and beautiful aunt, Amy. TABLE OF CONTENTS ABSTRACT ................................................................................................................ iii LIST OF FIGURES .................................................................................................... vii LIST OF TABLES ....................................................................................................... ix ACKNOWLEDGEMENTS .......................................................................................... x Chapters 1 INTRODUCTION .................................................................................................... 1 1.1 Overview ..................................................................................................... 1 1.2 References ................................................................................................... 4 2 CONTOUR-BASED EARTHQUAKE DETECTION ............................................... 5 2.1 Abstract ....................................................................................................... 5 2.2 Introduction ................................................................................................. 6 2.3 Data ............................................................................................................. 8 2.3.1 Transportable Array Data .................................................................. 8 2.3.2 Yellowstone Nodal Data .................................................................... 8 2.3.3 Seismic Catalog Data......................................................................... 8 2.4 Method ........................................................................................................ 9 2.4.1 Block 1 ............................................................................................ 10 2.4.2 Block 2 ............................................................................................ 10 2.4.3 Block 3 ............................................................................................ 12 2.4.4 Subroutines: Magnitudes and Linking Detections ............................ 13 2.4.5 Manual Review ............................................................................... 14 2.5 Results ....................................................................................................... 15 2.5.1 Denver-Julesburg Basin ................................................................... 15 2.5.2 Permian Basin ................................................................................. 18 2.5.3 Williston Basin ................................................................................ 19 2.5.4 Yellowstone National Park, Geyser Basin ........................................ 20 2.6 Discussion ................................................................................................. 21 2.7 Conclusions ............................................................................................... 24 2.8 Additional Resources and Acknowledgements ........................................... 25 2.9 References ................................................................................................. 33 3 DETECTION METHOD BIAS IN INDUCED SEISMICITY STUDIES ................ 36 3.1 Abstract ..................................................................................................... 36 3.2 Introduction ............................................................................................... 37 3.3 Methods and Data ...................................................................................... 39 3.4 Results ....................................................................................................... 41 3.4.1 Denver-Julesburg Basin ................................................................... 41 3.4.2 Williston Basin ................................................................................ 44 3.4.3 Permian Basin ................................................................................. 44 3.5 Discussion ................................................................................................. 46 3.6 Conclusions ............................................................................................... 49 3.7 Acknowledgements ................................................................................... 51 3.8 References ................................................................................................. 57 4 LEVERAGING LONG-TERM SEISMIC CATALOGS FOR AUTOMATED EVENT CLASSIFICATION....................................................................................... 60 4.1 Abstract ..................................................................................................... 60 4.2 Introduction ............................................................................................... 61 4.3 Data ........................................................................................................... 65 4.4 Method ...................................................................................................... 67 4.5 Results ....................................................................................................... 72 4.6 Discussion ................................................................................................. 76 4.7 Conclusions ............................................................................................... 79 4.8 Acknowledgements ................................................................................... 81 4.9 References ................................................................................................. 92 vi LIST OF FIGURES 2.1 Map of sedimentary basins in the central United States ...................................... 27 2.2 Example frame from contour algorithm on TA ................................................... 28 2.3 Enhanced catalog for the Denver-Julesburg basin ............................................... 29 2.4 Event densities between Denver-Julesburg catalog and ComCat ......................... 30 2.5 Day and night detections from Yellowstone National Park ................................. 31 2.6 Station mesh and location centroid for incipient source ...................................... 32 3.1 Map of event in the Denver-Julesburg basin ...................................................... 52 3.2 Event families for the Hugo sequence................................................................. 53 3.3 Event histograms for the enhanced Hugo catalog................................................ 54 3.4 Permian basin event catalog and subspaces for Dagger Draw and Snyder ........... 55 3.5 Hugo event space ............................................................................................... 56 4.1 Map of station locations and travel paths for events used .................................... 84 4.2 Permutations of 3-channel spectrograms for model input.................................... 85 4.3 Example LSTM cell ........................................................................................... 86 4.4 Accuracy vs. station count .................................................................................. 87 4.5 Spatial density of model uncertainty ................................................................... 88 4.6 Bulk event characteristics ................................................................................... 90 4.7 Eastern U.S. event classification ......................................................................... 91 4.8 Violin plots of misclassified distributions for time, magnitude, and distance showing that no single characteristic is responsible for a majority of model errors ...... 92 viii LIST OF TABLES 2.1 Parameter overview ............................................................................................ 26 4.1 Model accuracy .................................................................................................. 82 ACKNOWLEDGEMENTS This work covers a diverse range of topics and has been greatly informed and improved by collaboration with a variety of individuals. Chapters 2-3 were a collaboration with Debi Kilb from Scripps Institution of Oceanography and Justin Rubenstein from the United States Geological Survey. Chapter 2 also benefitted from early reviews by Keith Koper and Jamie Farrell from the University of Utah, and early discussions with Chris Hayward from Southern Methodist University. Chapter 4 was a collaboration with Timothy Draelos and Christopher Young from Sandia National Lab. A number of individuals from the University of Utah Seismograph Stations also helped inform numerous areas of the work presented here, including James Pechmann, Relu Berlacu, Mark Hale, Paul Robinson, and Katherine Whidden. I would like to especially acknowledge the direction of my advisor, Kristine Pankow, whose insightful guidance made this work possible. CHAPTER 1 INTRODUCTION 1.1. Overview Researchers are responding to the rapid evolution of sensor technology, computational resources, and storage capacity with new algorithmic advances that are fundamentally changing the field of observational seismology. Likewise, event detection and analysis is evolving to accommodate an expanding dictionary of source types, behaviors, and data types. Although studies in seismology are just beginning to approach data volumes on the scale that other fields, such as atmospheric modeling or speech recognition, have long dealt with, examples that auspicate the transition toward methods reliant on the statistical power of large data are increasingly abundant. For example, seismicity linked to the oil and gas industry is driven by a complex coupling of both natural and anthropogenic factors. Local-scale investigation can elucidate which parameters may be the most influential for single operations, but it required a continentscale database to resolve which of the human-controlled parameters most heavily influenced the occurrence of seismicity related to oil and gas in the U.S. (Weingarten et al., 2015). Similarly, earthquake triggering caused by transient stressing from distant sources is difficult to prove because event increases are often hard to statistically 2 contextualize within highly variable ambient event rates. This is why researchers have turned to alternate approaches to quantify event rate increases that do not rely directly on event catalogs (van der Elst et al., 2014; Velasco et al., 2008). Here, we focus on several basic issues related to increasing data size and type that are inherent to seismic catalog building. The first subject we approach in Chapter 2 is the detection and location of finite sources, with large station spacing on moderate- to largescale datasets. Traditional event processing, while being good at source generalization, struggles to capture abnormal events (with emergent arrivals, for example) and often fails completely for nonearthquake sources (like tremor or landslides, for example). An additional limitation of traditional processing is that as thresholds are lowered to capture smaller events, noise signals increasingly dominate results. When datasets become large, or sensors are located in high surface-noise environments, manual review of potential detections can quickly become intractable. In recognition of these drawbacks, we abandon the notion of seismic event detection at the sensor-level, moving instead to network-level processing. We reframe seismic sources as events that are expressed over certain crustal regions, and detections occur by thresholding the areal extent of affected regions. This allows better generalization across source-types, while stifling surface noise sources and maintaining low false detection rates. We explore the portability and agility of our detection framework using data outside the domain for which it was initially developed, and across nontraditional signal types. Beyond algorithm development, we assess the impact different methods can have on conclusions. Chapter 3 is dedicated to highlighting the information differential from event catalogs of varying resolution. Many of the events and sequences we discuss occur 3 in areas of active oil and gas exploration, where observations in the past have been rare because of poor station coverage. In these areas, the identification of new source zones increases our understanding of the potential for induced events, the associated changes in seismic hazard, and the limitations of using only larger sources to recover complete faulting histories for sequences related to migrating fluids. In Chapter 4, we shift from catalog building to catalog curation, or the labeling of events within a seismic catalog. We follow the same network-centric approach as in the previous chapters, using network-scale event data to build source-path agnostic models capable of binary event classification over large areas. As in the previous chapters, we leverage advances in computational methods and algorithms to extract insight from seismic data, but here we repurpose machine learning approaches developed in the domains of speech and image recognition. This chapter covers some of the constraints of model building and data scale and provides a model for automated event classification in real-time for future events in Utah. Source discrimination at the local scale is also an expanding area of research for the nuclear monitoring community. Our approach to discrimination for network data may provide valuable new strategies useful to regulatory agencies. Overall, it is the vulnerability of our built environment to seismic hazards, some of which we have come to recognize are of our own making, that drives much of this work. 4 1.2. References van der Elst, N. J., Savage, H. M., Keranen, K. M., & Abers, G. A. (2013). Enhanced remote earthquake triggering at fluid-injection sites in the Midwestern United States. Science, 341(6142), 164-167. Velasco, A. A., Hernandez, S., Parsons, T. O. M., & Pankow, K. (2008). Global ubiquity of dynamic earthquake triggering. Nature Geoscience, 1(6), 375-379. Weingarten, M., Ge, S., Godt, J. W., Bekins, B. A., & Rubinstein, J. L. (2015). High-rate injection is associated with the increase in US mid-continent seismicity. Science, 348 (6241), 1336-1340. CHAPTER 2 CONTOUR-BASED FREQUENCY-DOMAIN EARTHQUAKE DETECTION USING TRANSPORTABLE ARRAY DATA 2.1. Abstract We develop a frequency-domain, array-based detection algorithm that exploits the gridded station configuration (interstation spacing of 70 km) of the EarthScope Transportable Array to detect and locate small-magnitude (M< 2.5) earthquakes. We first apply it to three sedimentary basins in the Central United States and find a total of 484 new events, with low false detection rates compared to standard time-domain processing. The method is able to determine a location for small earthquakes (with accuracy linked to station spacing), where phase picks are difficult to obtain on more than one station. Our results identify new seismicity for all sedimentary basins investigated, but disproportionate increases in seismicity between areas (Williston Basin: 50%, Permian: 300%, Denver: 1000%). A majority of the newly detected seismicity in the Permian and Denver-Julesburg basins may be linked to active wells, while there continues to be a lack of evidence for induced seismicity in the Williston Basin. We also apply this method to a dataset from Yellowstone National Park with average interstation distances of ~80 m. Results from Yellowstone data demonstrate that when array station spacing remains 6 regular, events from nonearthquake sources, such as hydrothermal features, can be successfully detected and located without the necessity to tune parameters specifically for these sources. The capability to generalize across source types makes this algorithm potentially useful for signal exploration when signal characteristics are unknown. 2.2. Introduction Long-term seismic monitoring networks typically deploy seismic stations primarily in locations with known seismic hazards. By comparison, EarthScope's Transportable Array (TA) stations are deployed in a regular, uniform geometry independent of the location of previously known seismicity (Astiz et al., 2014). The unbiased station grid of the TA therefore offers a unique opportunity to recover events from large crustal regions without the spatial aliasing inherent to regional seismic catalogs. It is currently unclear how significantly spatial aliasing may distort observations of seismicity distributions in a region. However, it has become clear in the last several years that anthropogenic activity can induce earthquakes in locations where seismic hazard, and consequently station coverage, have historically been low (Ellsworth, 2013). Earthquakes in regions of active oil and gas exploitation that occur in locations not previously known to generate seismicity are vital for interpreting the relationship between anthropogenic activity and subsurface response in these areas. Although earthquakes triggered by anthropogenic activity occur across a broad range of magnitudes, the global Hi-Quake database of anthropogenically induced events suggest that small events may be underreported in current literature by up to 90% for magnitudes below M3 (Wilson et al., 2017). The increasing attenuation of seismic signals with 7 distance can make it difficult for small earthquakes to be recognized, because the detection and location of seismic events typically requires high station densities and amplitude excursions large enough and long enough to trigger detectors at multiple stations. With open questions regarding how many small earthquakes go undetected and the variability in the spatially mapped patterns of event distributions, using the TA to enhance seismicity catalogs can be both valuable and difficult. Previous research has shown that using several TA stations to enhance seismicity catalogs results in the addition of new and typically smaller events (Frohlich et al., 2011; Frohlich et al., 2015). Nakai et al. (2017) demonstrate that using standard network processing tools and TA data, catalog enhancement can occur on scales that span several U.S. states. In this work, we approach earthquake catalog building through an array-based frequency-domain detection algorithm developed specifically to exploit TA station grids to detect small magnitude seismicity, while maintaining tractable false detections rates. We apply the algorithm to continuous TA data from the Williston, Denver-Julesburg, and Permian Basins in the Central United States (CUS) and find new events in each basin. In recognition of the rapidly expanding dictionary of known seismic source types, and the increasing need for versatile algorithms to solve research problems on large datasets, we explore the agility and performance of our algorithm on a secondary dataset from Yellowstone National Park (YNP). The main signals of interest from this smallaperture, large N seismic deployment are related to hydrothermal circulation in the Upper Geyser Basin within YNP. The detection and location of these sources demonstrates an ability to source generalize between earthquakes and tremor-like or emergent signals that can be valuable when a dataset contains both. 8 Our goal is to present the reader with a clear understanding of applications where our method may be useful, as well as outlining the limitations. We also present new seismicity catalogs for three regions in the CUS near areas with active injection wells, and focus discussion on the differences between new and existing catalogs. 2.3. Data 2.3.1. Transportable Array Data We use continuous, vertical channel, 40 Hz data recorded by the TA and retrieved from the Data Management Center of the Incorporated Research Institutions for Seismology (IRIS DMC). For each of the three basins (Figure 2.1), we use approximately 18-months of continuous waveform data for analysis (2009 - 2011). 2.3.2. Yellowstone Nodal Data In November of 2015, a large N geophone array was deployed to image the hydrothermal system in the Upper Geyser Basin of Yellowstone National Park, WY. Researchers reported the presence of several dominant, likely hydrothermally driven, signals. The first is a hydrothermal tremor signal related to the Old Faithful geyser system (Wu et al., 2017). The second is likely related to Doublet Pool on Geyser Hill based on back projection, temporally coincident surface agitation, and audible emissions (Farrell et al., 2016). We use 24 hours of data from 133 stations to evaluate the agility of our algorithm with this new dataset and signal type. 9 2.3.3. Seismic Catalog Data The seismic event catalog produced by the Array Network Facility (ANF) is the highest resolution catalog (i.e., has the lowest magnitude of completeness ~2.5M; Astiz et al., 2014) available for our study areas. The authors of the ANF catalog acknowledge the presence of, and do not exclude, quarry blasts and mining events. The inclusion of quarry and mine blasts is advantageous for comparison because our prefiltered catalog also contains these source types. For catalogs of longer duration, we utilize the Comprehensive Catalog (ComCat), a product of the Advanced National Seismic Stations (ANSS), which was accessed through the web interface provided by the United States Geological Survey. 2.4. Method The basis of our method is introduced by Linville et al. (2015) where it was used to visually survey TA data to identify cases of dynamic earthquake triggering. The primary adaptation of the Linville et al. (2015) method in this work is that event picking is automated by projecting the processed data array (described below) onto a station mesh and setting area and amplitude thresholds. The basic workflow involves 3 main blocks, which we summarize here, and describe in detail below. In the first block, we obtain the data, filter, and compute the Fast Fourier Transform for a sliding window across all the stations. Next, we project the summed amplitude stacks, normalized by frequency bin, onto a station mesh, linking areas of equal amplitude to create contours. In the last block, detections are determined when contours at the amplitude threshold level exceed the area threshold level. Table 2.1 10 summarizes the optimized parameter sets from blocks 1-3. 2.4.1 Block 1 To download the CUS data, we serialize data-calls for day-long increments. Subroutines subsequently operate on binned 2-hour data segments (optimized for Intel Xeon processor, 24-core, 128G). Stations that return data with gaps are filled with zeros. Stations with less than 90% return are omitted. We then demean, taper, and high-pass filter all data at 1 Hz. The spectral amplitudes are then computed using a sliding window (256 samples, Hamming window). We normalize data by the absolute maximum, which we smooth using an acausal, running mean filter of ~ 5 min. The ~5-min duration is chosen to be just longer than the expected signal duration for local earthquakes. We then sum the normalized data frequency contributions from 2-12 Hz for each 2-hour data segment at each station. Values in this amplitude time-series are saturated at 4 times the median absolute deviation (MAD) over the entire array (all stations for a 2-hour duration), which dampens spurious or very large amplitudes. Block 1 processing returns an array of shape Vi,j where i is the location index and j is the time index. 2.4.2. Block 2 We generate a station mesh by computing the Delaunay triangulation for the unstructured latitude/longitude point-set (Barber et al., 1996). This gives a unique and repeatable way to generate station grids that minimizes the link distance between stations. The basic requirement of a Delaunay triangulation is that the circumcircle (a circle enclosing all three vertices) does not contain within it any other point. Whether this 11 criterion is satisfied can be evaluated based on the determinant of the matrix holding the triangle vertices sorted clockwise as latitude/longitude pairs. For example, if the matrix below representing (the determinant of) a triangle with vertices A, B, C and point D evaluates as negative: (1.1) then point D is not within the circumcircle of A, B, C. We rely on the implementation from the Matplotlib API, which follows Bernadou et al. (1981), to bring this calculation to convergence for all available points. In order to avoid precompiled station lists, we pull meta-data from IRIS for all stations within the contributing network. For the TA network, some stations lie outside the main network of active stations for the rolling array. Likewise, for the Yellowstone network, some interstation distances far exceed the median interstation distance. Our triangulation is performed using all available stations, but triangles with side lengths that exceed the average interstation distance by 1.9 are masked. Masked stations are no longer available to participate in the subsequent projection described below. Frame-wise (for each j in Vi,j), Vi (for all i) from block 1 is projected onto the station mesh where contours are determined by linear interpolation between vertices. We use the contour value of three times the MAD, or just below the saturation value to determine an area for each closed contour in j, where values above and below the amplitude threshold of three MAD participate indirectly through interpolation (Figure 12 2.2). 2.4.3. Block 3 Event detection occurs when the contour at the amplitude threshold exceeds a minimum area requirement. The minimum area requirement (Amin), using a spherical earth assumption, is the product of an input scalar (K) and the median triad area (Aj) (using Heron's Formula; Alperin, 1987). 𝐴"#$ = 𝐾 ∑)(*+ 𝐴( 𝐽 1 𝐴( = (𝑎( 1 𝑏( 1 + 𝑎( 1 𝑐( 1 + 𝑏( 1 𝑐( 1 − (𝑎( 1 + 𝑏( 1 + 𝑐( 1 )1 ).8 4 (1.2) (1.3) where a, b, and c are the three side lengths of each triangle (j) in the station mesh calculated as the degree distance (vincenty distance, which assumes an oblate spheroid earth; Karney & Deakin, 2010) between each vertex. We tune our input value for the area threshold to capture events below the ANF catalog level, which results in a value of K = 1.7. Each frame (time-step) is treated independently, meaning detections are made framewise without consideration of detections made in the previous or following time-steps. Radiated energy from fault rupture generally lasts longer than the window length (~2.5 sec) from block 1; therefore, a single earthquake typically generates a point cloud of detections where each point has time and location attributes. We associate point clouds into discrete events based on temporal and spatial attributes (<30 sec, <=70 km). The final detection time is cataloged as the earliest detection in the point cloud. The geometric centroid of the contour of the earliest detection time is used for the detection location. We 13 use the first location rather than the median because locations from detections made later in the coda are more likely to be heavily influenced by crustal heterogeneity and/or path effects. 2.4.4. Subroutines: Magnitudes and Detection Linking We do not use phase picks for location, which usually rely on short-term average/long-term averages (STA/LTA) for identification. However, when possible, a STA/LTA filter is used to identify event duration, which we use in magnitude estimates. We determine a signal onset time using data from the nearest 6 stations and applying a 1/10-sec STA/LTA for a 5-min window around the detection time. The time-series generated by the STA/LTA calculation is subsequently used to determine the event duration at each station. If a station does not successfully return an onset time (nonimpulsive arrival or insufficient signal-to-noise ratio), the original detection time is used for the event start time. When this is the case, the station does not participate in determining the average for the final magnitude. We use 1 Hz, high-pass filtered data and the coefficients from Pechmann et al. (2007), developed for the Montana/Utah regions, to define a coda magnitude: 𝑀: = −2.25 + 2.23𝑙𝑜𝑔𝜏 + 0.0023∆ (1.4) In equation 1.4, τ is duration of the signal in seconds, and Δ is distance in km between the determined centroid and the current station. The end of an event is determined to be the time after the event onset when the STA/LTA function value drops below a ratio of 0.1. The elapsed time between onset and end of the signal is used as τ, the duration term, in equation 1.4. The recorded event magnitude is the arithmetic mean 14 of the calculated magnitudes from participating stations. We find that the differences between the ANF magnitudes and magnitudes determined by the method outline here are dominantly controlled by duration (τ) rather than by equation constants (-2.25 and 2.23). For this reason, we use the coefficients in equation 1.4 for all CUS magnitude calculations. In an effort to exclude false detections from events located outside the footprint of recording stations (off-network sources), we leverage ANF catalog information as part of our processing pipeline. To do this, we identify ANF cataloged events that occurred outside the active array during each 2-hour data window, and use the TauP method of Crotwell et al. (1999) to compute predicted P-arrival times at the location of our detections. Detections that match predicted arrival times from off-network sources within 30 sec are considered to be related to off-network sources. The window of time over which we allow the connection to be made is large compared to the expected error of the arrival calculation because these detections often lack impulsive arrivals and therefore default to the detection time, which can be later in the coda. This limits contributions from off-array events to sources large enough to generate energy at detection levels, but too small to have been previously cataloged. Similarly, we use the ANF catalog to link our local (on-network) detected events with previously cataloged events. 2.4.5. Manual Review To ensure that newly cataloged events have waveform characteristics consistent with the signature of local earthquakes, each detection is manually reviewed. A detection must demonstrate clear and identifiable phases (P-wave, S-wave) on at least one of the 15 five nearest stations from the determined location to be considered an event. In our approach, detection contours track the spatial reach of the highest amplitudes from each event (the S-wave, usually), allowing amplitude excursions to be tracked even as the Pwave drops into the noise. This allows us to recover locations when station spacing and event magnitude make a solution inaccessible through typical location methods. Mine and quarry blasts are often the most abundant seismic signal recorded by arrays that span or are near active extraction sites, making the exclusion of these sources a common challenge. In some cases, extraction practices impart a signature to event waveforms that make it straightforward to identify events as anthropogenic in origin. However, because site conditions, mine practices, and propagation paths vary, the resulting source-path convolution may not reliably appear anthropogenic. Here, we rely on location, time of day, and waveform correlation to filter these events from final catalogs. Where ambiguity persists, events remain in the catalog. Full earthquake catalogs are available at quake.utah.edu/catalogs/contour-detection-catalogs. 2.5. Results 2.5.1. Denver-Julesburg Basin Automated processing returned 1774 detections between 01 January 2009 and 31 December 2010, with an average detection rate of ~2.4 detections/day. After manual review of automated detections, the resulting catalog contained 64 events. Six events in the catalog are events previously reported in the Array Network Facility catalog (ANF). The remaining 58 events are newly detected earthquakes. The median difference between our locations and ANF determined locations is 10 km (sample size: 6). We detect all 16 earthquakes reported by ANF within the footprint of the active array (on-network) during this time period, and one off-network event. Considering only on-network events reduces the median location difference between our epicentral locations and ANF locations to 6.6 km (sample size: 5). In exploration of why some newly detected seismicity occurring in the DenverJulesburg basin went undetected by ANF, we perform a simple test meant to mimic a standard catalog processing workflow. We use a short-term average/long-term average (STA/LTA) filter to process 1 day of data for 10 February 2010, during which our contour method identified two new events. The first new event (magnitude 1.4 MC, 10 February 2010 10:10:54 UTC) is used as the test event. Using 1-10 Hz bandpass filtered data from the nearest station (Q26A) to the test event, we tune the parameter space of the STA/LTA (sta, lta, and threshold) to maximize the values while still successfully capturing the test event. We then use the optimized STA/LTA parameters (sta=0.5 sec, lta=1.0 sec, threshold = 3.5) to process data for the day of 10 February from 63 stations that span the Denver-Julesburg basin. The STA/LTA filter returns a total of 10,661 triggers. A trigger is a time and a set of one or more stations where the STA/LTA ratio exceeds a value of 3.5 within a 120sec window. We count triggers instead of individual detections (which occur at the station level for STA/LTA processing) to avoid double counting events that are seen on multiple stations. Based on tuning, this is the minimum number of triggers returned that also returns the test event. The large number of triggers, which are primarily surface noise (traffic, cattle, noise glitches, etc.), is the result of short-duration window parameters and the low threshold necessary to capture the small test event. If we require 17 at least three stations to participate in a trigger, the total trigger count drops to 2,029. If we require at least three of the stations participating in each trigger to be adjacent, this number drops to 21. If the station count is increased to the level typical of real-time network processing (on the order of 5-7 stations) the number of potential triggers drops to between 5-7. All of the subfilters applied to the initial 10,661 triggers are effective at reducing the number of false triggers, but none of them successfully return the test event. We report these numbers to demonstrate that spatial and temporal filtering in the time domain can have a drastic effect on the number of real and false triggers returned, just as it does in the frequency domain. However, for this test case, requiring even one additional station makes the test event unrecoverable using STA/LTA methods. The test event above was part of a group of earthquakes in which the P-arrivals were mostly identifiable only at station Q26A. The sole exception was the largest event in the group (M2.2), which had similar P-S separation but was detected and located by ANF. The ANF event occurred less than 2 km from an active injection well (Figure 2.3) based on the well data provided by Weingarten et al. (2015). Our method determines locations for these small events, but the location resolution is poor (>10 km in some cases) compared to typical distances where spatial correlation between seismicity and injection is posited (Davis & Frohlich, 1993). However, if we assume the largest event in the sequence offers a representative location for the group of events near Hugo, Colorado, then these events would be well within the distances reported for other induced sequences (Keranen et al., 2014). In support of this conclusion, in Chapter 3, we show that waveforms from this sequence are highly similar (0.60 based on waveform correlation). Further evidence that the Hugo sequence may be related to nearby injection 18 comes from the event magnitude information. The Hugo sequence meets the criteria for swarminess proposed by Vidale and Shearer (2006) and Holdtkamp and Broudzinski (2015). Because induced sequences are typically swarm-like, swarminess has been suggested as an additional criterion for identifying induced seismicity (Skoumal et al., 2015). In addition to the sequence mentioned above, most of the newly cataloged seismicity occurred in the southern basin, while the northern basin remained mostly aseismic during the time of the catalog. The National Earthquake Information Center Comprehensive Earthquake Catalog (ComCat) gives a longer context for seismicity in the Denver-Julesburg basin, with a total of 43 events spanning 1978 through 2011. The highest density of earthquakes in the ComCat catalog occur in the west central portion of the basin, near the Denver metropolitan area. In contrast, our catalog shows a concentration of dense seismic activity in the southern basin, where a diffuse highdensity sector contains embedded zones of extremely high-density seismicity near TA stations Q26A and R27A. This shows that there can be vast differences in seismic event density distributions based on the cataloging method and catalog duration, highlighting that even short-duration high-resolution catalogs can produce appreciably different mapped spatial density distributions of seismicity within a region (Figure 2.4). 2.5.2. Permian Basin, West Texas, USA Automated processing returned 4449 detections between January, 2009 and January, 2011 (743 days), giving an average detection rate of ~6 events per day in the Permian Basin. The manually reviewed earthquake catalog for this basin contains 562 19 earthquakes. Of the 562 events, 140 were also cataloged by ANF. Events cataloged by ANF that are not represented in the new catalog did not occur within the footprint of the active array, or were cataloged by ANF but not included in our catalog because they evaluated as quarry blasts based on location and time of day. There is one event in the ANF catalog unaccounted for in our detection inventory. The missed event occurred ~44 sec after, and in close proximity to, a previous event. With little seismicity prior to the opening of oil fields in the Permian basin (Rogers & Malkiel, 1979), a majority of earthquakes in this region are considered induced ( Doser et al., 1992; Frohlich et al., 2016). In agreement with past studies, a majority of seismicity within our catalog occurs either near Snyder, Texas, or Dagger Draw, New Mexico-regions where induced seismicity has been the subject of study for over a decade (Snyder: Davis & Pennington, 1989; Gan & Frohlich, 2013; Dagger Draw: Herzog, 2014; Sanford et al., 2006; Zhang et al., 2016). 2.5.3. Williston Basin in North Dakota (USA) A long history of petroleum production (since the 1920s; Frohlich et al., 2015; Pollastro et al., 2013), with sparse seismicity and sparse seismic instrument coverage, make the Williston Basin a compelling location to probe for uncatalogued seismicity. To this end, Frohlich et al. (2015) reprocessed several years of TA data using STA/LTA filters. The Frohlich et al. (2015) (hereafter Frohlich Bakken catalog) covers an area that spans just north of the Powder River Basin, Wyoming to the 49th parallel. We use a more conservative boundary that follows well-site locations to restrict our dataset to events pertinent to induced seismicity. However, our processing recovered 100% of events that 20 occurred within the active array reported in the Frohlich Bakken catalog, independent of where boundaries are drawn. Using detections that located within 70 km of a known well, we recovered 194 detections over a 521-day period (~0.37 events/day). After manual review, 7 events from the 194 were cataloged as local earthquakes within the network. Three of the seven were previously identified in the Frohlich Bakken catalog. Although none of the Frohlich events are reported in the ComCat earthquake catalog, two events were detected and reported in the ANF catalog. For these two events, the difference between our locations and the ANF (and Frohlich) locations are 13.39 km and 25.94 km, or between 20-40% of the interstation spacing. 2.5.4. Upper Geyser Basin, Yellowstone National Park The Yellowstone region data used here include data from 133 stations for the day of 6 November, 2015. We keep the frequency and duration parameterization used in the TA basins (Table 2.1), but use a threshold of 5.5, because the average interstation distance is on the order of 85 m, rather than 70 km for TA. Using this parameterization, we recover 180 detections for the 24-hour period. During daylight hours, a majority of these detections migrate along local roads with velocities similar to vehicle speeds rather than at seismic wave propagation velocities. Although the park was closed to visitors at the time of deployment, these signals are most likely false detections related to the movement of maintenance or park staff (Figure 2.5). The two repeating source regions near Doublet Pool and Old Faithful described by Wu et al. (2017) and Farrell et al. (2016) are among the detected events and comprise a majority of all detections during 21 nondaylight hours. Of the 180 detections, we estimate that 26 relate to either Doublet Pool or the Old Faithful Geyser system based on location, tremor-like signal character, and timing. Although Wu et al. (2017) report a pre-eruptive signal related to hydrothermal activity at Old Faithful geyser, the signal we capture at this location is short-duration (~1 min), concurrent with eruption, and likely generated by surface impact of the erupted water, rather than subsurface fluid migration. 2.6. Discussion The use of spatial coherence (for example the practice of associating coincident P and S-picks into events), and spectral coherence filters (e.g., Lomax et al., 2012), are not new concepts in seismic event processing. Our processing relies on these filters but uses alternate implementations, such as frequency-domain transformation, frequency normalization, and spatial coherence embedded at earth's surface. Using these adaptations, we were able to lower the detection threshold, increase the algorithm's ability to source generalize (recover both earthquakes and hydrothermal tremor, for example), recover locations for small events, and maintain a sufficiently low return of false detections. To concretely demonstrate this, we optimize a commonly used power detector (STA/LTA) and show that the detection of a single small event over a day-long period returns more false events than our processing scheme returns over a 2-year period on the same data. In this case, the false detection rate is intractably high because the event is small and can only be ‘seen' by the time-domain algorithm on a single station. The same limitations that make events such as this example difficult to detect in the time domain also make them difficult to locate using methods that rely on phase picks. By 22 comparison, our method tracks amplitude excursions from specific spectral bands at low levels, allowing them a larger area of expression, regardless of the suddenness of signal onset. This gives the method presented here an additional advantage over traditional detection strategies for similar datasets. When we apply our method to data used to create prior catalogs, we find that we are able to increase the number of events for all the CUS basins. However, the event increases are highly variable. For example, the catalogs for Williston and Permian basins are increased between 50% and 300%, while the increase for the Denver-Julesburg is near 1000%. Despite the small increase of events within the Williston basin, the low overall event count is consistent with previous work that finds minimal evidence of seismicity (tectonic and induced) in the Williston basin (Frohlich et al., 2015). By comparison, increased event counts in the Permian and Denver-Julesburg basins, in locations where the cause of earthquakes has previously been attributed to anthropogenic activity (Dagger Draw, Cogdell/Snyder), and where events occur within 1-5 km of active injection wells (Hugo sequence and most Permian Basin events), suggest that a majority of newly cataloged events are probably induced by human activity. In the Denver basin, we also observe a significant shift in event density patterns compared to long-duration catalogs such as ComCat. Part of this may reflect a latent response to anthropogenic activity. For example, future maps of the seismicity patterns in the Denver basin may change significantly when the maps are updated with recent sequences that are likely related to oil and gas production, such as the Greeley events (Yeck et al., 2016). The other aspect of changing density patterns might be related to the aliasing and magnitude bias from lack of adequate station coverage. The newly 23 discovered seismic sequences may require additional study to determine seismic potential or meet regulatory obligations, especially if located near critical facilities. Our locations rely on the first detection in a chained set of detections that are related by distance and time. This rule follows the observation that earthquake location accuracy is reduced when amplitude contours contain a high degree of spatial anisotropy caused by local crustal structure. For hydrothermal or other seismic signals generated by shallow incipient sources, the near-source wave-field, and consequently the amplitude contours, are more likely to be source-dominated than path dominated. To survey how these potential differences may influence location based on the contour method we describe here, we use an example for one detection near Old Faithful (Figure 2.6). The migration of the centroid for the event over time is shown by the colored trace in Figure 2.6b. For the Old Faithful signal, although determined locations are within the standard deviation of the point cloud distribution, an alternate approach using the median of the detection cloud centroids (Figure 2.6c) may provide a more appropriate final location for these incipient sources. Seismic catalogs contain information foundational for understanding and mitigating the effects of seismogenic zones in the crust, including those associated with fluid injection. Current methods of linking cataloged seismicity to injection activity often use template events from known catalogs, relying on the assumption that source volumes typically generate detectable seismicity from the same general locations with the same mechanism over a given time interval. The detectable level is dominantly controlled by station coverage, and consequently the fraction of detectable seismicity and the represented source diversity can be heavily biased by station coverage. Here, we 24 demonstrate that the template space in existing catalogs may not provide adequate information to assess the occurrence of induced seismicity in an area if the related seismicity remains small and swarm-like, as in newly detected southern Colorado seismicity. 2.7. Conclusion We present a method capable of processing ~14,000,000 station hours of data (~400 stations, 4 years) at a processing time/real-time ratio of .0017 (max: 2.5 min/day 12 core; min: ~20 min/day single core.). This method is most suitable for detection tasks on regularly spaced, high-return seismic arrays where acceptable location error is within the average station spacing or subsequent relocation methods are feasible. The advantage gained over typical time domain processing (STA/LTA) increases with decreasing event magnitude and increasing interstation distance. While slower than typical time domain processing, the method results in fewer false positives, allowing a lower magnitude of completeness for resulting catalogs. It is also capable of returning location estimates for small events where station spacing makes locations unrecoverable through travel-timebased methods. We use this method to survey sedimentary basins in the CUS to identify new areas of seismicity previously unidentified due to sampling bias and/or event magnitude. Without exception, we add new sources to existing catalogs. Our findings support previous studies that report significant variation in seismic responsiveness to fluid injection between and within basins. We highlight sequences from several basins that suggest that methods capable of recovering events below current catalog levels (~M2.5) 25 aid and potentially alter interpretations based on observational data. 2.8. Additional Resources and Acknowledgments We thank early critical reviews of this work by Keith Koper and Jamie Farrell. All data were downloaded from IRIS using Obspy and the FDSN web services client. The well database was generously provided by Matthew Weingarten and is discussed in detail in Weingarten et al. (2015). We operate within the python framework and rely heavily on existing numerical libraries such as Numpy and Scipy as well as the community-driven seismic package, Obspy. Our distance calculations use the geopy library. This work was funded by NSF EAR grant #1460360 "Collaborative Research: Capitalizing on EarthScope Transportable Array Data to Better Characterize Induced Seismic Sequences." 26 Table 2.1. Parameter overview. Overview of algorithm parameters and general values for processing blocks 1-3. Block 1 Parameter Default Value Rational Range FFT Window Size 256 Samples > 64 Samples Running Mean Filter 5 minutes > FFT window Bandwidth 2-12 Hz prefilter < x < nyquist Saturation 4 MAD amplitude threshold < x < 5 MAD Block 2 Parameter Default Value Rational Range Amplitude Threshold 3 MAD 2.5 MAD < x < saturation Block 3 Parameter Default Value Rational Range Area Threshold 1.7 x>1 27 Figure 2.1. Study locations in the Central US include three basins (Williston, DenverJulesburg, and Permian, which are outlined and labeled). Also mapped are wells (black dots) from the database compiled and provided by Weingarten et al. (2015), which delineate the basin boundaries. 28 Figure 2.2. Example of our contouring algorithm, which uses a 3-sec frame for an earthquake on March 20, 2010. The data array is projected onto the station mesh where contours can be visualized in space. The red contour represents the detection amplitude threshold. When the red contours grow to sufficient size (1.7x average triad area), a potential detection is declared. For this event, the location based on the contour centroid (red dot) is ~32 km from the ANF determined location (cyan dot). In this case, the large centroid error is likely due to missing data from TA station B24A. 29 Figure 2.3. Earthquake catalog for the Denver-Julesburg basin. Of the 64 new events, 6 were also in the ANF (open cyan circles). The majority of events occurred in the southern basin. The group of events described in the text located near active injection wells (red open circles) are near and surrounding station Q26A. 30 Figure 2.4. Event densities for the new 2-year catalog (left) and ComCat's 33-year catalog (right). Note that events (<10) near seismic station Q24A were detected in the new catalog but reviewed as quarry blasts. These events coincide with the event cluster from ComCat in the same region. 31 Figure 2.5. Daytime (red) and night-time (yellow) detections from one day of data in Yellowstone National Park near Old Faithful. The red boxes outline daytime only detections of assumed anthropogenic origin. On the right are characteristic signals from each source region described in the text. 32 Figure 2.6. Contours for a detection in YNP. (a) The station mesh from the Yellowstone array showing the detection point cloud for 1 event within the amplitude contours above and below the detection amplitude threshold for a single frame of the data. On the right, we highlight the differences between using the earliest onset (b) and the cloud centroid (c). For the earliest onset (b), the colored line traces the changes in location over time, with earliest in yellow, and latest in dark purple. 33 2.9. References Alperin, R. (1987). Heron's area formula. The College Mathematics Journal, 18(2), 137138. doi:10.2307/2686503 Astiz, L., Eakins, J. A., Martynov, V. G., Cox, T. A., Tytell, J., Reyes, J. C., ... Davis, G. A. (2014). The array network facility seismic bulletin: Products and an unbiased view of United States seismicity. Seismological Research Letters, 85(3), 576-593. Barber, C. B., Dobkin, D. P., & Huhdanpaa, H. (1996). The quickhull algorithm for convex hulls. ACM Transactions on Mathematical Software (TOMS), 22(4), 469-483. Bernadou, M., & Hassan, K. (1981). Basis functions for general Hsieh-Clough-Tocher triangles, complete or reduced. International Journal for Numerical Methods in Engineering, 17(5), 784-789. Crotwell, H. P., T. J. Owens, & J. Ritsema (1999). The TauP Toolkit: Flexible seismic travel-time and ray-path utilities, Seismological Research Letters, 70, 154-160. Davis, S.D., & W. D. Pennington, (1989). Induced seismic deformation in the Cogdell oil field of west Texas. Bulletin of the Seismological Society of America, 79, 1477-1495. Davis, S. D., & Frohlich, C. (1993). Did (or will) fluid injection cause earthquakes? Criteria for a rational assessment. Seismological Research Letters, 64(3-4), 207-224. Doser, D. I., Baker, M. R., Luo, M., Marroquin, P., Ballesteros, L., Kingwell, J., ... Kaip, G. (1992). The not so simple relationship between seismicity and oil production in the Permian Basin, west Texas. Pure and Applied Geophysics, 139(3-4), 481-506. Ellsworth, W. L. (2013). Injection-induced earthquakes. Science, 341 (6142), 1225942. Farrell, J., Lin, F. C., Allam, A. A., Smith, R. B., & Karplus, M. S. (2016, February). Using a large N geophone array to identify hydrothermal seismic sources in the Upper Geyser Basin of Yellowstone National Park. In AGU Fall Meeting Abstracts, Available at https://agu.confex.com/agu/fm16/meetingapp.cgi/Paper/186346 Frohlich, C., Hayward, C., Stump, B., & Potter, E. (2011). The Dallas-Fort Worth earthquake sequence: October 2008 through May 2009. Bulletin of the Seismological Society of America, 101(1), 327-340. Frohlich, C., Walter, J., & Gale, J. (2015). Analysis of transportable array (USArray) data shows earthquakes are scarce near injection wells in the Williston Basin, 2008-2011. Seismological Research Letters, 86(2), 492-499. 34 Frohlich, C., Walter, J., Deshon, H., Stump, B., Hayward, C., & Hornbach, M. (2016). A historical review of induced earthquakes in Texas. Seismological Research Letters, 87(4), 1022-1038. Gan, W., & Frohlich, C. (2013). Gas injection may have triggered earthquakes in the Cogdell oil field, Texas. Proceedings of the National Academy of Sciences, 110(47), 18786-18791. Herzog, M. (2014). Investigation of possible induced seismicity due to wastewater disposal in the Delaware Basin, Dagger Draw Field, New Mexico-Texas, USA (B.S. Honor's Thesis). Retrieved from https://scholar.colorado.edu/cgi/viewcontent.cgi?article=1117&context=honr_theses, Holtkamp, S. G., Brudzinski, M. R., & Currie, B. S. (2015). Regional detection and monitoring of injection-induced seismicity: Application to the 2010-2012 Youngstown, Ohio, seismic sequence. AAPG Bulletin, 99(9), 1671-1688. Karney, C. F. F., & Deakin, R. E. (2010). FW Bessel (1825): The calculation of longitude and latitude from geodesic measurements. Astronomische Nachrichten, 331(8), 852-861. Keranen, K. M., Weingarten, M., Abers, G. A., Bekins, B. A., & Ge, S. (2014). Sharp increase in central Oklahoma seismicity since 2008 induced by massive wastewater injection. Science, 345(6195), 448-451. Linville, L., Pankow, K., Kilb, D., & Velasco, A. (2015). Exploring remote earthquake triggering potential across EarthScopes' Transportable Array through frequency domain array visualization. Journal of Geophysical Research: Solid Earth, 119(12), 8950-8963. Lomax, A., C. Satriano and M. Vassallo (2012). Automatic picker developments and optimization: FilterPicker - a robust, broadband picker for real-time seismic monitoring and earthquake early-warning, Seismological Research Letters, 83, 531-540. doi: 10.1785/gssrl.83.3.531 Nakai, J. S., Sheehan, A. F., & Bilek, S. L. (2017). Seismicity of the Rocky Mountains and Rio Grande Rift from the EarthScope Transportable Array and CREST temporary seismic networks, 2008-2010. Journal of Geophysical Research: Solid Earth, 122(3), 2173-2192. Pechmann, J. C., Nava, S. J., Terra, F. M., & Bernier, J. C. (2007). Local magnitude determinations for intermountain seismic belt earthquakes from broadband digital data. Bulletin of the Seismological Society of America, 97(2), 557-574. Pollastro, R. M., L. N. R. Roberts, and T. A. Cook (2013). Geologic assessment of technically recoverable oil in the Devonian and Mississippian Bakken formation, Assessment of Undiscovered Oil and Gas Resources of the Williston Basin Province of North Dakota, Montana, and South Dakota, 2010, Chapter 5, U.S. Geological Survey 35 Digital Data Series DDS-69-W Rogers, A. M., & Malkiel, A. (1979). A study of earthquakes in the Permian Basin of Texas-New Mexico. Bulletin of the Seismological Society of America, 69(3), 843-865. Sanford, A. R., T. M. Mayeau, J. W. Schlue, R. C. Aster, and L. H. Jaksha (2006), Earthquake catalogs for New Mexico and bordering areas II: 1999-2004. New Mexico Geology., 28(4), 99-109. Skoumal, R. J., Brudzinski, M. R., & Currie, B. S. (2015). Distinguishing induced seismicity from natural seismicity in Ohio: Demonstrating the utility of waveform template matching. Journal of Geophysical Research: Solid Earth, 120(9), 6284-6296. Vidale, J. E., & Shearer, P. M. (2006). A survey of 71 earthquake bursts across southern California: Exploring the role of pore fluid pressure fluctuations and aseismic slip as drivers. Journal of Geophysical Research: Solid Earth, 111, B05312, doi:10.1029/2005JB004034 Weingarten, M., Ge, S., Godt, J. W., Bekins, B. A., & Rubinstein, J. L. (2015). High-rate injection is associated with the increase in US mid-continent seismicity. Science, 348(6241), 1336-1340. Wilson, M. P., Foulger, G. R., Gluyas, J. G., Davies, R. J., & Julian, B. R. (2017). HiQuake: The human-induced earthquake database. Seismological Research Letters, 88(6), 1560-1565. Wu, S. M., Ward, K. M., Farrell, J., Lin, F. C., Karplus, M., & Smith, R. B. (2017). Anatomy of Old Faithful from subsurface seismic imaging of the Yellowstone Upper Geyser Basin. Geophysical Research Letters, 44(20), 10-240. Yeck, W. L., Sheehan, A. F., Benz, H. M., Weingarten, M., & Nakai, J. (2016). Rapid response, monitoring, and mitigation of induced seismicity near Greeley, Colorado. Seismological Research Letters, 87(4), 837-847. Zhang, Y., Edel, S. S., Pepin, J., Person, M., Broadhead, R., Ortiz, J. P., ... & Evans, J. P. (2016). Exploring the potential linkages between oil-field brine reinjection, crystalline basement permeability, and triggered seismicity for the Dagger Draw Oil field, southeastern New Mexico, USA, using hydrologic modeling. Geofluids, 16(5), 971-987. CHAPTER 3 DETECTION METHOD BIAS IN INDUCED SEISMICITY STUDIES 3.1 Abstract A common method for enhancing earthquake catalogs uses earthquakes in existing regional catalogs as templates to search continuous data for smaller previously uncatalogued events. This approach relies on the underlying assumption that larger earthquakes in existing catalogs represent all source zones in the study region, and that the larger cataloged events from each source zone contain sufficient information for recovering complete faulting histories. Here, we test these assumptions by exploring interevent relationships in enhanced catalogs using single-link clustering. We find that a majority of newly cataloged events come from source zones not previously represented in existing catalogs. In the Denver-Julesburg and Permian basins, the most prolific new source zones are likely related to oil and gas production. We use subspace detection to further explore the role of small events in recovering faulting histories for a newly identify induced sequence in the Denver-Julesburg basin. Using traditional methods that rely on (larger) templates, we recover only 40% of a catalog that was recovered using small and large templates that are highly similar. These results suggest that modest increases in the diversity of the source representation evident by subtle changes to the 37 waveforms can alter the resolution and complexity of event time-histories subsequently recovered through waveform correlation methods. While the largest events in an area are more likely to be included in regional seismic event catalogs, we observe that these sources are not necessarily indicative of the most active sections of the crust and may not represent the complex source region or be used to fully define the initiation and duration of induced sequences. 3.2 Introduction Earthquakes in the Central United States (CUS) were relatively rare (~21 events >=M3/ year) until oil and gas sector activities began driving widespread seismic rate increases starting in 2001 (Ellsworth, 2013). Prolonged activity, emerging technologies and by-product activities such as wastewater injection have since inflated midcontinent earthquakes rates to the point that states such as California, where earthquakes are largely driven by tectonic processes, are no longer among the regions that generate the largest number of magnitude three and larger earthquakes annually in the United States. Instead, Oklahoma currently produces the largest number of felt earthquakes, a majority of which are thought to be induced by wastewater injection (Keranen et al., 2014). In recognition of the growing hazard posed by the recent surge in non-natural earthquakes across the CUS, in 2016, the United States Geological Survey (USGS) incorporated induced seismicity into the national hazard model with implications for an estimated 7 million people (Peterson et al., 2016). Despite having a basic physical understanding of the factors influencing induced seismicity, it remains difficult to constrain in situ subsurface parameters with enough 38 precision for a theoretical framework to guide earthquake predictions. Hence, induced seismicity forecasts are presently computed only using past seismicity rates as a guide to future seismicity (Petersen et al., 2016, 2017). Consequently, complete earthquake catalogs are critical in assessing anthropogenically driven changes in local stress states in the CUS. The intraplate tectonic environment and historic seismicity rates in the CUS have resulted in sparse regional seismic instrument coverage throughout this region. Generating seismicity catalogs with the resolution needed to link human activities and earthquakes often requires dense temporary station deployments (e.g., Justinic et al., 2013; Yeck et al., 2016) or catalog enhancement (e.g., Frohlich 2011; Nakai et al., 2017). One common way to enhance existing seismicity catalogs is through waveform correlation methods. These methods aim to increase the time-resolution of event histories for known sources (e.g., Holland, 2013; Holtkamp et al., 2015; Kim, 2013). An alternative approach is to reprocess raw seismic data in search of new seismogenic zones in regions of active industrial activity (Frohlich et al., 2011; Frohlich et al, 2015). The seismicity catalogs in this study were built using the detection strategy described in Chapter 2 and rely on the second approach of reprocessing raw seismic data. We will refer to these catalogs as CDC (contour detection catalogs) throughout this paper. As with all detection strategies, the methods for seismic catalog enhancement contain inherent biases that influence their output. The main bias in correlation-based detection is related to the earthquake source. Correlation methods are only able to recover sources that are already known to exist, or in the case of autocorrelation, are repeating. In contrast, generalized, or source-agnostic detectors such as commonly used short-term to 39 long-term average methods (STA/LTA), often bias by magnitude, requiring filters that reduce the number of false detections while also filtering out smaller events (Velasco et al., 2016). The aim of our analysis is to highlight how bias from different event catalog building strategies can alter or influence our understanding of local seismicity. In Chapter 2, we found that with higher levels of scrutiny, generalized or source-agnostic detection methods can increase the number of cataloged events within a given area. Typically, increases in event numbers are the result of adding smaller magnitude events to existing catalogs. These small magnitude additions may stem primarily from lowering detection magnitudes for known sources, or they may result from expanding our knowledge of source diversity within an area, or a combination of both. Here, we explore interevent relationships, using waveform similarity and single-link clustering (Frohlich & Davis, 1990; Harris, 2006) to determine how small magnitude earthquakes contribute to our understanding of the faulting structure and history within a given area. For select sequences, we enhance existing catalogs using subspace detection (Barrett & Beroza, 2014; Chambers et al., 2015; Harris & Dodge, 2006). The key question we address is whether a complete faulting history be recovered using only larger, cataloged events as templates. 3.3 Methods and Data We use the seismicity catalogs generated by the methods presented in Chapter 3 (CDC catalogs) for three sedimentary basins in the CUS (quake.utah.edu/catalogs/contour-detection-catalogs). These catalogs were enhanced (compared to existing regional catalogs produced by the Array Network Facility, or 40 ANF) using a detection strategy developed specifically to recover small magnitude events from gridded arrays with large station spacing. The resulting catalogs increased the event counts in each study area by as little as 50% in the Williston Basin, up to an order of magnitude increase in the Denver-Julesburg Basin. In this work, we use waveform character (degree of similarity) to cluster events using single-link clustering (Frohlich & Davis, 1990; Harris, 2006) in order to explore the relationship between previously cataloged seismicity and newly detected events. In single-link clustering, not all events will correlate at the chosen threshold value, but each event must correlate with at least one other event in the cluster at or above the threshold correlation value. We select two subcatalogs for further enhancement using subspace detection: Denver-Julesburg basin and the Willison Basin. In subspace detection, template events are combined and decomposed into basis vectors. A linear combination of the basis vectors is correlated across continuous data to detect more events that match the event family (Chambers et al., 2015; Harris, 2006; Harris & Paik, 2006). The first sub-catalog is a cluster of seismicity (cluster threshold of 0.60, 10-sec window) from the DenverJulesburg basin located within 2 km from an injection well near Hugo, Colorado. The CDC Hugo catalog contains 28 events, of which only the largest (M 2.2) was also cataloged by ANF. We use subspace detection to increase the sequence time-resolution using all 28 templates. We repeat this processing using only a single representative template, the largest magnitude 2.2 event, from the ANF catalog. We generate two catalogs in order to identify the limitations of using a single template to adequately recover small magnitude events within a complex or dynamic (potentially migrating) source region. 41 The second subcatalog we enhance using subspace detection is the 7-event catalog from the Williston Basin. Previous work in the Williston Basin has provided minimal evidence of induced seismicity (Frohlich et al., 2015; Skoumal et al., 2017). We use subspace detection to further test basin quiescence for smaller magnitudes related to known sources (Barrett & Beroza, 2014; Chambers et al., 2015; Harris & Dodge, 2006). In a separate analysis, we examine waveform characteristics using single-link clustering for the CDC Permian Basin catalog. This is a well-studied area with a larger template catalog provided by ANF (140 events) compared to catalogs from the Williston basin (2 events) or Denver-Julesburg basin (6 events). In this analysis, we explore what new information the additional 422 CDC detected events provides to understanding the faulting history for the two most prolific regions located in the Permian Basin-Cogdell, Texas and Dagger Draw, near Carlsbad, New Mexico. 3.4 Results 3.4.1. Denver-Julesburg There are 64 events in the CDC Denver catalog with magnitudes ranging from 0.2 to 5.1, of which 6 were previously cataloged by ANF (M 1.65-5.06). From single-link clustering, we find that 56% of the events in the Denver catalog do not link with any other events in the catalog. The remaining 44% of events that link with at least one other event do so in 1 of 3 families, with the majority belonging to a single group. This single group is located in southeastern Colorado near the town of Hugo, and accounts for 41% of the total seismicity (28 events out of 64) in the CDC Denver catalog. Members of the Hugo group are visually similar, and correlate at value of 0.60 using a 10-sec window 42 (Figure 3.1). For the Hugo sequence, of the 28 detected events, only 1 is in the ANF catalog (M 2.1; February 20, 2010; 10:13 UTC). For this sequence, we build two separate enhanced catalogs using different template sets. In the first catalog, we use the single event from the ANF catalog recorded at the nearest station Q26A as a template for waveform correlation. The ANF template offers access to some of the complexity represented in the 28-event Hugo sequence because it is highly similar (0.80 correlation value) to 10 other events in the sequence (SS0, Figure 3.2). In our second catalog, we again use data from station Q26A, but this time, we enhance the catalog using the 27 CDC templates in addition to the ANF event. All 28 template waveforms recorded at station Q26A are similar and link at a threshold of 0.60. If we raise the similarity link threshold to 0.8, the events group into four clusters and one singleton. For this second method, the four clusters and one singleton are used in subspace event detection. We build the two enhanced catalogs separately to help resolve to what extent new events in the Hugo sequence contain redundant information, and to determine how increased complexity can influence catalog outcomes. Both of the two enhanced Hugo catalogs contain new events. All new detections showed signal above background using a threshold calculated by evaluating the distribution of correlation values between the template and random samples of ambient noise, as in Harris and Paik (2006) and Chambers et al. (2015). Detections were reviewed and flagged as false when signals above the noise levels shared phase similarity but did not exhibit the clear earthquake character in the S/P inclusive templates. For this sequence, the majority of false detections were from monochromatic high-amplitude 43 signals of unknown source. The enhanced catalog built using the four subspaces and one singleton identified 289 additional events, resulting in a total catalog of 317 events, including the 28 templates. For this catalog, the maximum magnitude is 2.1 and the median magnitude is 0.16. Magnitudes for the newly detected events are calculated relative to the template magnitudes (Chambers et al., 2015). The catalog determined using the single ANF template recovered 114 of the 289 detections found with the subspaces, and 13 of the 28 CDC templates, or 40% of the total catalog. The majority of new events detected using just the ANF template cluster in the group the ANF event is a member (67%; Figure 3.2, blue group). The remaining events (33%) belong to a neighboring subspace (Figure 3.2, red group). Note that the second largest subspace in the Hugo sequence (Figure 3.2, cyan group), has no representation in the ANF-based catalog. Event counts from each subspace indicate that the number of events detected by a subspace is not proportional to the number of templates in the subspace. This indicates that sources well represented in the template space are not necessarily the most prolific overall. The single ANF-based enhanced catalog implies that the onset of the Hugo sequence began in December, 2009. However, the subspace-based enhanced catalog that includes 317-events suggests that the sequence started 10 months prior, in February, 2009 (Figure 3.3), and continued to at least the time that station Q26A was removed in October of 2010, and detection was no longer possible. In the Denver basin, we suggest there is a link between the Hugo sequence and injection. Following Frohlich and Davis (1990), this sequence of earthquakes is the first known for this area, the epicenters are in close proximity to injection wells, and there is 44 potential evidence of slight modulation from cumulative monthly volumes (Figure 3.3). The newly detected events also exhibit swarm type activity (Vidale & Shearer, 2006; Holdtkamp & Broudzinski, 2015; Skoumal et al., 2015), which has been suggested to also be indicative of induced sequences (Skoumal et al., 2015b). 3.4.2. Williston Basin in North Dakota (USA) Within the Williston basin, the CDC catalog identified 7 events, of which 3 are in the ANF catalog. Applying single-link clustering, we find that 1 of the CDC events formed a cluster with a previously cataloged ANF event. Combining these 2 events into a subspace recovered 4 additional events. The remaining 5 events gained no additional detections during a ~2-year time period. Unlike other basins with large volumes of fluid injection, we do not see evidence for many small earthquakes, in agreement with Frohlich et al., (2015). The widespread lack of seismicity suggests a physical condition that persists basin-wide. Although depth-to-basement as a controlling factor for induced earthquakes has been given variable importance in the literature (potentially significant: Horton, 2012; Kim et al., 2013; Skoumal et al., 2015b; Zhang et al., 2016; not necessarily significant: Weingarten et al., 2015, Skoumal et al., 2017) conclude that for the Williston, shallow injection relative to basement depth might explain the widespread lack of induced events. 3.4.3. Permian Basin, West Texas, USA The CDC earthquake catalog for the Permian basin contains 562 earthquakes, of which only a subset (140 events; 25%) were in the ANF catalog (Figure 3.4). These 45 earthquakes locate in two main areas, one near Dagger Draw, New Mexico, and the other near Snyder, Texas. Of these two regions, the one near Snyder, Texas, is more seismically prolific (a total of 336 events) and has a larger spatial footprint. We first discuss the seismicity near Snyder, Texas, which contains 68% of the Permian Basin seismicity. An event is selected for the Snyder catalog if the nearest station to the event is TA station Z30A or 130A. The number of subgroups in the catalog is determined using single-link clustering. For the Snyder catalog, we qualitatively assign a link threshold of 0.65. Using the 0.65 threshold, the 336-member catalog contains 40 subgroups and 68 events that do not correlate with any other events (singletons). In the following analysis, we consider all events that link at a given threshold to represent a specific source type. With this as a definition, the CDC catalog contains 108 source types (40 subgroups plus 68 singletons). This definition provides a subjective definition of source type, with dependence on link threshold values, as well as the parameters inherent in waveform correlation methods. While acknowledging this is an oversimplification, we adhere to this definition of source for simplicity. Of the 108 distinct source classes, 64 (11 subgroups, 53 singles) contain no events found in the ANF catalog, indicating that enhanced catalogs increased the number of known sources in the Snyder, Texas, region by 45%. If we raise the link threshold, the number of single events increases appreciably. For example, a clustering threshold of 0.80 returns 284 source classes, of which 216 or 76% of the sources added are new sources. In consideration of this instability, we maintain a threshold of 0.65. This provides a conservative estimate of the overlap between the information contained in the ANF catalog, and new information gained from catalog enhancement. 46 The second largest seismogenic area in the Permian basin is near Dagger Draw, New Mexico (135 events of 562 in the CDC Permian basin catalog). In contrast to the Snyder catalog, a majority of the Dagger Draw events (85% when using templates recorded at TA station 125A) belong to a single subgroup (SS0, Figure 3.2; Figure 3.4b). Within this major subgroup, 26% are in the ANF catalog. The second most populous subgroup includes only 2% of the total CDC Dagger catalog, with one ANF member. The remaining 11% of the Dagger Draw catalog are singletons, of which none are in the ANF catalog. 3.5 Discussion In general, enhancing seismic catalogs with additional small events increases the descriptive power of the enhanced catalog. New events in the CDC catalogs typically illuminate new sources. This is particularly true in the Williston basin, where 5 of the 7 events in the catalog were unrelated to each other-formed singletons. In the Denver Basin, over half (54%) of the catalog events were from distinct source classes, subspaces or singletons. In the Permian basin, the highest event counts occurred near Snyder, TX, where new sources were more abundant than known sources (64 and 44, respectively). Although identification of new sources is the most common outcome of catalog building in CUS basins, we also see sequences dominated by single clusters (e.g., single-source type), such as Dagger Draw, New Mexico, that are moderately well represented in existing catalogs (26% of events in the Dagger Draw catalog are also in the ANF catalog). For the sequence near Hugo, Colorado, subspace detection demonstrated that the 47 information available through the single ANF cataloged event was insufficient to capture the basic characteristics of this sequence, such as onset and duration. At a more nuanced level, temporal deviations in the event rate changes were dependent on which templates were used for catalog enhancement. For example, the temporal evolution of the sequence determined using only the ANF catalog information indicates the sequences is short (< 3 months), but for the larger 317 event catalog, the sequence is reported to be ongoing over a 10-month period. Likewise, the timing of the 28 templates used to recover the Hugo sequence cannot be used to faithfully predict the timing of the highest event rates for the entire sequence (Figure 3.4). These results suggest that events that are similar in character but smaller in magnitude than known sources can contain unique information. The addition of new information derived using diverse templates can significantly extend our understanding of event sequences, especially those that have source regions that spatially and temporally evolve, but remain physically related based on spatial location and driving processes. The modified Venn diagram in Figure 3.5 qualitatively represents the scope of information recoverable with templates of varying complexity for the Hugo sequence. The entire spatial expanse of the ellipse represents the total 317-count event space for the Hugo sequence, and the white overlay represents the portion of each family (represented by separate colors) within the sequence that is recovered using just the ANF template. We expect the representation of a discrete source to be adept at recovering the timehistory for sectors of the system to which it directly belongs (i.e., templates from each color should recover all or most detections from the immediate family of which they are members). Clustering by similarity gives us guidelines for what it means to be a discrete 48 source. The surprising result from the Hugo family was that an event family, with a level of waveform dissimilarity that is nearly indistinguishable by eye (correlation >= 0.6), can contain complex branches that are not easily recovered even with direct access to a representation that is near exact (correlation values >=0.8). By near exact, we mean that slight variations in source or location, something that might be expected from earthquakes induced by fluid effects, may cause original templates to become obsolete as fluid, pressure fronts, and poroelastic stresses migrate. A similar style of clustering was found for induced earthquakes located in Fox Creek, Alberta (Schultz et al., 2015). The intercluster correlations exceed 0.6, while correlations for intrasequence events reached as high as 0.9. These subtle differences in waveforms were also interpreted to be the result of variations in the source-different fault patches, en echelon faulting, or subtle changes in the source mechanism. Previous work suggests that correlation methods can recover events within distances of about one wavelength at the dominant frequency (Harris, 1991; Harris & Dodge, 2011; Hutchings & Wu, 1990). Small, high-frequency events (dominant frequencies of 5 - 10 Hz) and average shallow crustal velocities of ~6 km/s suggest that small templates (M 1-2) have at most just over a kilometer scale reach. If the effective range of fluids and the availability of simple structures are also confined within these limits, additional templates are less likely to add valuable information to event catalogs built with correlation detectors. This is because we expect additional templates to be redundant at these small distances. For the Hugo sequence, however, either source migration exceeds these limits, the range of source excitations are not fully represented in 49 the template space, or both. Increasing the source-receiver distance may increase the ability a template has to generalize across a sequence, but as in the case for Hugo, the advantage of increased generalization is far outweighed by the disadvantages inherent to signal detection with poor signal-to-noise ratios. Another strategy to capture gradual changes in waveform character when only a single representative template exists is to iteratively perform the correlation procedure as newly detected events are added to the template dictionary (Gibbons et al., 2016; Harris & Dodge, 2011). However, the ANF template was unable to recover events from certain subfamilies represented in the template dictionary, suggesting that for this sequence, iterative procedures that initiate with the single ANF template might increase the detection capability for some subgroups, while detections from other subgroups will remain unattainable using similar correlation thresholds. 3.6. Conclusions We investigate the interevent relationships for new seismicity catalogs in three CUS basins to assess how small magnitude earthquakes can influence our understanding of subsurface systems in these regions. These basins were chosen for study because they each contain sedimentary layers where nontraditional oil and gas exploitation is ongoing. When small magnitude events that may be linked to oil and gas exploitation are added to existing catalogs, these additions can significantly alter seismic interpretations related to human activities. First, they can illuminate new seismogenic zones in the crust. For example, in the Denver-Julesburg basin the ANF catalog contained only 1 event, but the CDC catalog contained 27 events, and reveals a prolific ongoing sequence near an active 50 injection well, where there were previously no known earthquakes. In areas of the Permian Basin such as Snyder, Texas, source diversity increased by ~50% in a region where it is known that human activities can influence seismicity on complex subsurface faults. The addition of small events to the Snyder catalog did not illuminate new activity related to oil and gas sector activity, but instead greatly increased our knowledge of source diversity within this area. A second way that small magnitude events can alter interpretations, even when they are related to previously known sources, is by adding clarity to sequence characteristics such as start-time, duration, and productivity. These characteristics allow researchers to investigate the correlation between pumping statistics, physical properties of fluid migration, and observed seismicity. For the Hugo sequence, the subspace enhanced catalog demonstrates earlier initiation and longer duration for the sequence then was obtained using the single ANF event. In the Dagger Draw catalog, for example, a majority (85%) of the nearby seismicity belonged to a single cluster. Within this cluster, previously cataloged events account for 26% of the total (35 events out of 135) and therefore can only provide information to describe a fraction of the seismic activity in this region and within the seismogenic structures in the crust. We find that in many cases, adding small magnitude events to existing catalogs increases our ability to more correctly interpret seismic data. However, when existing catalogs contain information about most of the available sources, like for Dagger Draw, generalized detection may not be the most efficient approach to recovering small magnitude events. If the goal is to focus only on known sources, correlation detectors (or source-specific detectors) have clear advantages over source agnostic detectors for 51 building high-resolution event time-histories specific to that known event type. However, we caution that using only existing templates when looking for small sources can limit and potentially bias the results. This is because changes in a source type or location, such as migrating sources, can alter the source's waveform characteristics enough to significantly degrade detection capabilities. An example of this is our results from the Hugo sequence, where a single representative template is unable to recover even half (44%) of the total seismicity, in spite of being highly similar (0.80) to many other (10 of 28) template events. This suggests that source-agnostic detection may be important in the characterization of dynamic sources, in addition to playing an important role in identifying new seismogenic zones. 3.7 Additional Resources and Acknowledgements We operate within the python framework and rely heavily on existing numerical libraries such as Numpy and Scipy as well as the community-driven seismic package, Obspy. For waveform correlation, we use DETEX available at https://github.com/dchambers/Detex. All data were downloaded from IRIS using Obspy and the FDSN web services client. The well database was generously provided by Matthew Weingarten and is discussed in detail in Weingarten et al. (2015). Catalogs from this work are available for download at quake.utah.edu/catalogs/contour-detection-catalogs. This work was funded by NSF EAR grant #1460360 "Collaborative Research: Capitalizing on EarthScope Transportable Array Data to Better Characterize Induced Seismic Sequences." 52 Figure 3.1. Earthquake catalog for the Denver-Julesburg basin. Of the 64 new events, 6 were also in the ANF (open cyan circles). The majority of events occurred in the southern basin. The group of events described in the text located near active injection wells (red open circles) are near and surrounding station Q26A. 53 Figure 3.2. Event families for the Hugo sequence. The waveforms for the 28 template events (a) are color-coded by family based on single-link clustering with a similarity of 0.80 for window 10 sec. 54 Figure 3.3. Event time-history for the Hugo sequence. Top: event magnitude vs. time for the CDC templates (filled black) and new CDC subspace detections (open black; or if also detected by the ANF template catalog open blue). Bottom: Event count vs. time (top and bottom share x-axis), with cumulative monthly injection in red. 55 Figure 3.4. The CDC catalog for the Permian basin (a). Events also cataloged by ANF have are indicated in cyan (edge color). The two most active regions in the Permian are Dagger Draw, NM (near stations 125A) and Snyder, TX (near stations Z30A and 130A). Pie charts of the Dagger Draw catalog (b) and Snyder catalog (c) divided by family based on single-link clustering and organized by family size (increasing clockwise). The fraction of previously cataloged events is indicated in cyan. 56 Figure 3.5. The event space for the Hugo sequence. The colors represent the fraction of the total contributed by each family. The grey circles are the template representation for each family. The white depicts the reach of the total domain that can be accessed using the single ANF template for event recovery. 57 3.8 References Barrett, S. A., & Beroza, G. C. (2014). An empirical approach to subspace detection. Seismological Research Letters, 85(3), 594-600. Chambers, D. J., Koper, K. D., Pankow, K. L., & McCarter, M. K. (2015). Detecting and characterizing coal mine related seismicity in the Western US using subspace methods. Geophysical Journal International, 203(2), 1388-1399. Ellsworth, W. L. (2013). Injection-induced earthquakes. Science, 341 (6142), 1225942. Frohlich, C., & Davis, S. D. (1990). Single-link cluster analysis as a method to evaluate spatial and temporal properties of earthquake catalogues. Geophysical Journal International, 100(1), 19-32. Frohlich, C., Hayward, C., Stump, B., & Potter, E. (2011). The Dallas-Fort Worth earthquake sequence: October 2008 through May 2009. Bulletin of the Seismological Society of America, 101(1), 327-340. Frohlich, C., Walter, J., & Gale, J. (2015). Analysis of transportable array (USArray) data shows earthquakes are scarce near injection wells in the Williston Basin, 2008-2011. Seismological Research Letters, 86(2), 492-499. Frohlich, C., Walter, J., Deshon, H., Stump, B., Hayward, C., & Hornbach, M. (2016). A historical review of induced earthquakes in Texas. Seismological Research Letters, 87(4), 1022-1038. Gibbons, S. J., Kværna, T., Harris, D. B., & Dodge, D. A. (2016). Iterative strategies for aftershock classification in automatic seismic processing pipelines. Seismological Research Letters, 87(4), 919-929. Harris, D. B. (1991). A waveform correlation method for identifying quarry explosions. Bulletin of the Seismological Society of America, 81(6), 2395-2418. Harris, D. B., & Dodge, D. A. (2011). An autonomous system for grouping events in a developing aftershock sequence. Bulletin of the Seismological Society of America, 101(2), 763-774. Harris, D. B. (2006). Subspace detectors:Theory (No. UCRL-TR-222758). Lawrence Livermore National Lab (LLNL), Livermore, CA (United States). Harris, D., & T. Paik (2006). Subspace detectors: Efficient implementation, Theory (No. UCRL-TR-223177). Lawrence Livermore National Lab (LLNL), Livermore, CA (United States). 58 Holland, A. A. (2013). Earthquakes triggered by hydraulic fracturing in south-central Oklahoma. Bulletin of the Seismological Society of America, 103(3), 1784-1792. Holtkamp, S. G., Brudzinski, M. R., & Currie, B. S. (2015). Regional detection and monitoring of injection-induced seismicity: Application to the 2010-2012 Youngstown, Ohio, seismic sequence. AAPG Bulletin, 99(9), 1671-1688. Horton, S. (2012). Disposal of hydrofracking waste fluid by injection into subsurface aquifers triggers earthquake swarm in central Arkansas with potential for damaging earthquake. Seismological Research Letters, 83(2), 250-260. Hutchings, L., & Wu, F. (1990). Empirical Green's functions from small earthquakes: A waveform study of locally recorded aftershocks of the 1971 San Fernando earthquake. Journal of Geophysical Research: Solid Earth, 95(B2), 1187-1214. Justinic, A. H., Stump, B., Hayward, C., & Frohlich, C. (2013). Analysis of the Cleburne, Texas, earthquake sequence from June 2009 to June 2010. Bulletin of the Seismological Society of America, 103(6), 3083-3093. Keranen, K. M., Weingarten, M., Abers, G. A., Bekins, B. A., & Ge, S. (2014). Sharp increase in central Oklahoma seismicity since 2008 induced by massive wastewater injection. Science, 345(6195), 448-451. Kim, W. Y. (2013). Induced seismicity associated with fluid injection into a deep well in Youngstown, Ohio. Journal of Geophysical Research: Solid Earth, 118(7), 3506-3518. Nakai, J. S., Sheehan, A. F., & Bilek, S. L. (2017). Seismicity of the Rocky Mountains and Rio Grande Rift from the EarthScope Transportable Array and CREST temporary seismic networks, 2008-2010. Journal of Geophysical Research: Solid Earth, 122(3), 2173-2192. Petersen, M. D., Mueller, C. S., Moschetti, M. P., Hoover, S. M., Llenos, A. L., Ellsworth, W. L., ... Rukstales, K. S. (2016). Seismic-Hazard forecast for 2016 including induced and natural earthquakes in the central and eastern United States. Seismological Research Letters, 87(6), 1327-1341. Petersen, M. D., Mueller, C. S., Moschetti, M. P., Hoover, S. M., Shumway, A. M., McNamara, D. E., ... Rubinstein, J. L. (2017). 2017 one-year seismic-hazard forecast for the central and eastern United States from induced and natural earthquakes. Seismological Research Letters, 88(3), 772-783. Schultz, R., Stern, V., Novakovic, M., Atkinson, G., & Gu, Y. J. (2015). Hydraulic fracturing and the Crooked Lake Sequences: Insights gleaned from regional seismic networks. Geophysical Research Letters, 42(8), 2750-2758. 59 Skoumal, R.J., Brudzinski, M.R., & Currie, B.S., (2015). Earthquakes induced by hydraulic fracturing in Poland Township, Ohio. Bulletin of the Seismological Society of America, 105(1), 189-197. Skoumal, R. J., Brudzinski, M. R., & Currie, B. S. (2015b). Distinguishing induced seismicity from natural seismicity in Ohio: Demonstrating the utility of waveform template matching. Journal of Geophysical Research: Solid Earth, 120(9), 6284-6296. Skoumal, R.J., Brudzinski, M.R., & Currie, B.S., (2017- in review). Proximity of Precambrian basement affects the likelihood of induced seismicity in the Appalachian, Illinois, and Williston Basins. Vidale, J. E., & Shearer, P. M. (2006). A survey of 71 earthquake bursts across southern California: Exploring the role of pore fluid pressure fluctuations and aseismic slip as drivers. Journal of Geophysical Research: Solid Earth, 111, B05312, doi:10.1029/2005JB004034 Weingarten, M., Ge, S., Godt, J. W., Bekins, B. A., & Rubinstein, J. L. (2015). High-rate injection is associated with the increase in US mid-continent seismicity. Science, 348(6241), 1336-1340. Yeck, W. L., Sheehan, A. F., Benz, H. M., Weingarten, M., & Nakai, J. (2016). Rapid response, monitoring, and mitigation of induced seismicity near Greeley, Colorado. Seismological Research Letters, 87(4), 837-847. CHAPTER 4 LEVERAGING LONG-TERM SEISMIC CATALOGS FOR AUTOMATED REAL-TIME EVENT CLASSIFICATION 4.1 Abstract We investigate the use of labeled event types from seismic catalogs to produce automated event labels on new incoming data from the catalog region. Our model relies on 90-sec station-level spectrograms for events in Utah between October 2012 and June 2017 (13K events, 104K station level examples). We explore the use of deep Convolutional and Recurrent Neural Networks to build a binary classifier for local earthquakes and quarry blast events and find that Recurrent Neural Networks perform best for our data. Our recurrent network uses Long-Short-Term-Memory and achieves station-level accuracies of 97.7% and network-level accuracy above 99%. We find no evidence that misclassification is driven in a significant way by characteristics such as depth, source-receiver distance, event magnitude, or physiographic province. Instead, we suggest classification confusion is primarily driven by 1) analyst mislabeled events and 2) complex combinations of source-path properties. We suggest that the primary value of the models presented here is in near-real time, highly accurate event classification for future Utah events. A secondary use of the models is for systematic surveys of analyst 61 classifications to identify mislabeled events. Similar methods may also replace or complement existing source discrimination efforts for nuclear test monitoring at local scales where catalogs exist. 4.2. Introduction Discrimination between local tectonic events and mining or quarry blasts is a problem common to seismic network operations and researchers working to build seismicity catalogs in regions where both tectonic seismicity and anthropogenic sources occur (Astiz et al., 2014). Within the University of Utah Seismograph Stations (UUSS) authoritative boundary (Lat: 37˚ to 42˚ N, Lon: 114˚ to 109˚ W), both earthquakes and quarry blasts occur in abundance, often in close spatial proximity. Discrimination or classification of events as either tectonic or anthropogenic is part of the daily operations performed by UUSS seismic analysts, because published seismic catalogs are typically intended to contain only events that pertain to the study of elastic strain partitioning and release in the crust, i.e., earthquakes. To accomplish discrimination, analysts rely on waveform patterns learned through years of experience, databases of known mine and quarry sites, event locations, historic patterns of seismicity, and time-of-day. Although local earthquake monitoring network operations typically rely on analysts for event classification, many automated or semi-automated approaches to source discrimination have been developed for research purposes (Koper et al., 2016; O'Rourke & Baker, 2016). Prior to the widespread use of machine learning, the majority of developed strategies relied on amplitude and spectral ratios of various phases to probe incoming waveforms for information related to the source. For explosive sources, the first 62 arriving P-wave energy is preferentially excited compared to S (Stump et al., 2002). At frequencies around 1-2 Hz, the amplitudes of P waves are typically much smaller than those of later arriving Lg phases (further distances) or Rg phases (closer distances), and an abundance of strategies exploit these relationships (Lg: Baumgardt & Young, 1990; Dysart & Pulli, 1990; Hartse et al., 1997; Kim et al., 1993; Walter et al., 1995; Rg: O'Rourke & Baker, 2016; Tibi et al., 2018). Although these methods perform well when tuned for specific datasets, their use can be limited when ratios rely on wave phases that are unavailable (at larger distances, for example), difficult to isolate (at very short distances), or occur in high-noise environments. More recent approaches to event discrimination avoid issues related to phase isolation through the use of coda waves. For example, Koper et al. (2016) show that the duration of coda waves can help discriminate surface events from tectonic earthquakes at local scales. However, methods that rely on waveform characteristics dominantly controlled by depth can have limited ability to resolve tectonic from anthropogenic sources in regions where tectonic events are shallow. Methods that specifically target source characteristics of anthropogenic events have also been developed. Delayed shot or ripple-fire techniques are widely used in blasting operations for their ability to efficiently fragment rock while minimizing ground motion that may be destructive to nearby infrastructure (Dowding, 1985). Spectral banding is often seen in the spectrograms of ripple-fired events, and although criteria based on spectral banding alone may not account for the complexity introduced by variation in delay times, methods that target source persistence specifically, such as cepstral analysis, have proven to be useful for discrimination (Dysart & Pulli, 1990; 63 Shumway, 1996). However, methods that specifically target ripple-firing may be limited in their ability to discriminate anthropogenic from tectonic sources when anthropogenic events involve both ripple-fire and single-charge explosions. Wüster (1993) suggests that multiple approaches to discrimination may be necessary when significant diversity exists in source types, paths, and locations. With the expanding use of machine learning, discrimination methods have advanced from simple targeting of a few key features into a highly nonlinear space where source discrimination is accomplished using complex nonlinear combinations of input features. Feature engineering, like the phase separation described above, is one way to restrict waveform data to content most relevant for discrimination. In practice, features are informed by expectations about source physics and wave propagation and require hypothesis testing or secondary methods to resolve features that result in separable event classes (Beyreuther et al., 2012; Del Pezzo et al., 2003; Dowla et al., 1990; Hammer et al. 2012; Knapmeyer-Endrun & Hammer, 2015; Kortström et al., 2016; Langer et al., 2009; Ohrnberger, 2001; Scarpetta et al., 2005). Many of these approaches work well, with methods often achieving accuracies above 90%, but at the cost of considerable effort prior to event classification. The goal of this work is to use information from the entire waveform to automatically perform a binary classification for tectonic earthquakes and quarry blasts at the local scale. Our Utah dataset contains a diversity of sources (thousands), from sub-km scale to 400 km source-receiver distances, and we do not attempt any segmentation (by phase), feature engineering, or path corrections. To accomplish this, we test three neural network-based learning approaches using event spectrograms as model input. 64 Networks often leverage increases data and computational resources by increasing the number of inner layers and hence their capacity for representation (Hinton et al., 2012). More sophisticated architectures, in addition to depth, make assumptions about the input domain that make them more powerful general classifiers. We use two specific architecture adaptations to deep neural networks in the work presented here. Our first architecture assumes that input data on a sample basis exhibits temporally dependent characteristics that change over the duration of the sample. Classifying seismic signals, as temporally dependent sequences, has been accomplished in the past primarily using hidden Markov Models (HMM), which have been successfully applied to the discrimination and detection of a variety of seismic sources (icequakes: Hammer et al., 2015; moonquakes: Knapmeyer-Endrun & Hammer, 2015; landslides/rockfalls: Dammeier et al., 2016; volcanic signals: Beyreuther et al., 2012; Ohrnberger, 2001). Recent algorithmic advances in the domain of speech recognition, however, make it possible to learn time-dependence using neural networks that are trained using both current and past information (recurrent neural networks, or RNNs) (Williams & Peng, 1990). We use a RNN variation called Long-Short-Term-Memory (LSTM) (Hochreiter & Schmidhuber, 1997) to build a classifier for seismic signals with variable source-receiver distances to discriminate between tectonic and explosive sources in Utah. The second model architecture we test frames seismic source discrimination as an image classification problem using convolutional neural networks (CNN) (LeCun et al. 1998). Most commonly used in image processing, CNNs are adept at identification of subjects that are not consistently centered between examples, i.e., they achieve a level of 65 translational invariance. The CNNs ability to recognize patterns in the input, regardless of location within the image or input, may explain some of their recent success in seismology (Krizhevsky et al., 2012; Perol et al., 2017). The third architecture we explore uses a CNN to learn a representation of the data (i.e., 3-channel spectrograms), which is then used as input to subsequent LSTM layers for classification. This model follows from the hypothesis that directed partitioning of the model between representation learning and classification may result in more accurate classification. Recent advances in speech modeling by Google supports this hypothesis (Sainath et al., 2015). An important difference between the work presented here and previous efforts is that our models explore classification independent of source-path characteristics. Rather than tie classification to specific source-path combinations, we require our model to generalize over any source-path combination for many thousands of discrete sources and for distances between sub-km to 400 km. We accomplish this with minimal restriction on incoming information, using the full event spectrograms as model input. 4.3 Data The UUSS, as a member of the Advanced National Seismic System (ANSS), began using the ANSS Quake Management System (AQMS) for seismic event processing in 2012 to generate a database of events for the Utah region. This study uses event solutions (phase arrival times, earthquake locations and magnitudes, etc.) from the UUSS database between October 2012 and July 2017 for events labeled as local earthquakes or quarry blasts by UUSS analysts. We limit our first-arrival time database to 66 stations used to determine the event location that have a phase pick quality of 1 or higher for local earthquakes, and 2 or higher for quarry blasts. The pick quality is assigned based on the timing certainty of the phase pick. The number of earthquakes and blasts (referred to as classes) are roughly balanced, but there are more quarry blasts at intermediate distances and local earthquakes at near-source distances (Figure 4.1). Waveforms are downloaded for each event from the Incorporated Research Institutions for Seismology (IRIS) using web-services for 10 sec of data prior to the first P-arrival in addition to 2 min following the first arrival. For all samples (events recorded on a single station with 1-3 channels), we resample the data to 100 Hz. We then detrend, taper (Hann window, 1%), correct for instrument sensitivity, and apply a high-pass filter at 1 Hz (4th order Butterworth). For 3-component data, we rotate the recorded northsouth and east-west horizontal channels to the radial (R) and tangential (T) orientations, respectively, using the azimuth from the receiver to the source. We then calculate a spectrogram using 2.56-sec windows (256 samples, 12% overlap), keeping only frequencies between 1 and 20 Hz for a 90-sec duration for all available channels. UUSS maintains a variety of sensor types, some of which are not triaxial. Examples for 3component data (48,221 examples) constitute 53% of all examples. For single-channel (vertical, Z) stations (45,722 examples), we fill the empty horizontal channels R and T with zeros. Valid data channels are log-scaled and normalized by the max spectral value. To test the capability of our model outside the Utah study region, we collect both earthquake and nonearthquake signals from the National Earthquake Information Center (NEIC) during a similar time window (2012 and 2017) from locations in the eastern United States. To access station level information, we use data from stations within 67 distances represented in our training data (less than 2 degrees) for which a clear first arrival can be identified using a simple short-term/long-term average (STA/LTA; 1,10 sec, ratio of 4). This gives us a catalog of nonearthquake sources (including explosions, mine collapses, sonic events, and quarry blasts) with phase pick-times for 68 individual nonearthquake sources. We limit the earthquake sources to 100 events, which we process in the same manner as the nonearthquakes sources. We then similarly generate spectrograms for eastern U.S. events. 4.4 Method 4.4.1 Long-Short-Term-Memory (LSTM) The basic element of a NN is a neuron, which sums all incoming values (individually, the product of the output of every other neuron connected to it and a learned weight for each connection) and outputs a transformation of the sum using a nonlinear function such as tanh. Typically, neurons in a NN layer are fully connected to all neurons in the directly preceding layer, or in the case of the first layer, the model input. The difference between a standard feed-forward NN and a RNN is that RNNs use input from current input and previous network output. Additionally, LSTM cells (rather than basic neurons) control how new and previous information is passed through internal gates that determine the memory state (C) and the hidden state (h) at each time-step (t) (Figure ´ 4.2). The four gate values (input (i), forget (f), candidate cell state (𝑐), and output (o)) are determined according to the following equations: 68 𝑓G = 𝜎I𝑊K 𝑋G + 𝑈K ℎGOP + 𝑏K Q (4.1) 𝑖G = 𝜎(𝑊# 𝑋G + 𝑈# ℎGOP + 𝑏# ) (4.2) 𝑜G = 𝜎(𝑊S 𝑋G + 𝑈S ℎGOP + 𝑏S ) (4.3) ´ 𝑐G = ϕ(𝑊: 𝑋G + 𝑈: ℎGOP + 𝑏: ) (4.4) Equations 4.1 - 4.4 combine the weight matrices (W, U) for incoming data X at time t and the output from the previous hidden state (ℎGOP ) with a bias term (b) for each gate. Each gate uses the sigmoid function (𝜎) or hyperbolic tangent (ϕ) to modulate gate values. The current memory state is then determined using: ´ 𝐶G = 𝑓G 𝐶GOP + 𝑖G 𝑐G (4.5) and the current hidden state is calculated as: ℎG = 𝑜G ϕ (𝐶G ) (4.6) Hochrieter and Schmithuber (1997) first conceived of the LSTM architecture. This adaptation of the RNN architecture solves issues introduced by back-propagating gradients used to train the network over many time-steps that early RNN implementations suffered from (Hochreiter, 1998). Available literature (often from the field of speech recognition) is replete with details regarding variations of recurrent neural networks 69 (Graves & Schmithuber, 2005), bi-recurrent networks (Graves et al., 2005), and LSTM specifically (Hochrieter & Schmithuber, 1997). For the LSTM architecture in this study, each example input has 40 frames (the time, or x-axis from event spectrograms). Each frame is a frequency-amplitude vector of length 48 (the y-axis of the spectrogram for each time-step). We expect the LSTM layers to model temporal progression through changes in amplitude at certain frequencies through time. The second architecture we test uses a convolutional neural network (CNN). CNNs are similar in design to standard neural networks, in that their learned representation is the result of optimized weights and biases, but connections in this network are localized. One of the benefits of localization is reduced computation as matrix multiplication at each layer becomes cross-correlation (called convolution by convention) over the input and subsequent filter layers. Like image data, seismic data enter the network as 3-channel, 40x48 pixel images (Vertical, Radial, and Transverse rather than Red, Green, Blue color channels in image data). Each filter has a size (sometimes called the receptive field) and the layer output is the sliding dot product between the filters and the input (Figure 4.3). The CNN learns which filters are the most valuable for describing the input space through the direction of a labeled sample set, similar to how LSTM uses labels to learn weights that result in correct predictions. Although learning is self-supervised using the example labels, it is necessary to give a priori direction on filter size and number of filters. We test parameters between 2 and 5 pixels, because we expect discriminating features in the wavefield to be on this scale (2.5 - 5 sec). We use Rectified Linear Units (ReLu; f(x) = max(0, x)), followed by pooling between each layer. Pooling is effectively 70 a down-sampling operator that uses only the maximum or average values of subsets of the rectified feature maps. Iterations of convolution and pooling add depth to the model, with each layer representing a different scale of abstraction on the input. The concept of the CNN was developed by LeCun et al. (1998) and we refer the reader to this reference and (Krizhevsky et al., 2012 and references therein) for further details. The third architecture we use is a simple hybrid of CNN and LSTM models. The CNN in this case is expected to learn a more compact and meaningful feature representation using a time-series of Power Spectral Density estimates over a frequency range 1-20 Hz for all 3 data channels. The output of the CNN is used as input into the LSTM layers (Figure 4.3). Training, or learning, occurs for each architecture by computing the error between the predicted output and the expected output and adjusting the cell output closer to a value that results in the expected prediction. For model error, we compute model loss (J) using cross-entropy: 𝐽 = −{𝑙 ∗ 𝑙𝑜𝑔(𝑝) + (1 − 𝑙 ) ∗ log(1 − 𝑝)} (4.7) where l and p are label and prediction, respectively. Cross-entropy loss, more intuitively, is minimizing the divergence between the distribution predicted by the model and the true class distribution, and is a typical choice for classification tasks in machine learning (Shore & Johnson, 1981). The model parameters (weights and biases) are then adjusted via an optimization algorithm (often Stochastic Gradient Descent) using the gradients of the loss function through back-propagation (Werebos, 1990). We find no discernible 71 differences in training speeds between optimizers, and based on common practice, we use ADAM (named for adaptive moment estimation) for the models presented here (Kingman & Ba, 2014; learning rage=0.01, 1=.9, 2=.999, epsilon=1E-08). For the final layer in each model, we interpret the output activations as a probability distribution over all the possible outputs using the softmax function (i.e., the probability of class j is computed by forcing the values of the last layer in the network to sum to 1, and be between 0 and 1). We perform a nonexhaustive grid search over the parameter space for each model using a random split (10% of data for testing, and the remaining 90% for training). For the final models, we use the parameters guided by trials on a single random split to train 10 independent models (10-fold cross-validation). We accomplish this by partitioning the data into 10 equally sized splits (10% each), and train each model using 9 of the 10 splits, while testing on the one remaining. The final model performance is reported as the median of model accuracy and the standard deviation between all models. During training, we observed sensitivity to example order. For example, ordering by event type gives an even sampling of class examples over batch sizes of 1000 (1 batch = the number of training examples in one iteration through the network), but high potential for single class dominance in small batch sizes, which can make learning difficult (Keskar et al., 2016; Li et al., 2017). Each event is associated with examples from as few as one station to up to 48 stations. In order to avoid class dominance over each batch, we randomize at the sample level rather than the event level using a batch size of 16, after split partitioning at the event level. Several iterations through a dataset (epochs) are often required to arrive at optimally trained model weights. Our dataset 72 required between 12 and 50 epochs before accuracy on test data began to stabilize, with the number of epochs seemingly related to architecture type and model depth. To assist training and reduce overfitting, we use batch normalization between CNN layers (Ioffe & Szegedy, 2015), and dropout (0.2-0.5% of cells) for LSTM layers (Srivanstava et al., 2014). 4.5. Results Table 4.1 presents cross-validation results for three models (LSTM, CNN, and CNN+LSTM) for all data, with accuracy reported at both the station-level and networklevels. Station-level accuracy is derived using all examples, regardless of which event they associate to. Network-level accuracy combines all station-level examples from a single event, using the maximum of the summed class probabilities to determine one classification per event. Table 4.1 also reports results from two additional models (LSTMu and CNNu) that use the same architecture as the previous models, but are trained on data within Utah only. These additions were added following the observation that a majority of misclassified events from the previous models occurred outside of UUSS authoritative boundaries. We offer discussion on these observations below, and report accuracies for all five models for all data and Utah-only data (~80% of all data, red columns in Table 4.1). The final column of Table 4.1 shows classification accuracy when we use only events with three or more station-level predictions. As we increase the required number of stations, some data combinations, or splits, report accuracy improvements on the order of fractions of a percent from adding additional station-level predictions until eventually for a subset of events, the models are able to match analyst 73 labels exactly (100%). However, for many individual data splits, increasing the stationcount requirement generally causes a decrease in accuracy beyond 10 stations (Figure 4.4). The three models trained using all data show notable similarities, with all achieving ~94% on station-level classification, ~97% using more than 1 station, and additional gains on the order of fractions of a percent from adding additional stations. Restricting test-data to events within Utah (trained on all data) increases station-level accuracies by 0.8%, 0.9%, and 0.7% for LSTM, CNN, and CNN+LSTM models, respectively. Models trained on Utah-only data, however, begin to diverge significantly; LSTMu, with the same architecture as LSTM but trained only on Utah data, achieves the highest accuracy on both Utah-only data (by 3%) and all data (outperforming itself trained on all data by 0.16%), while CNNu reports below 94% (station-level) for all data, and performs similar to LSTM trained on all data. Accuracy for the best performing model (LSTMu) is 97.7% for station-level predictions, 99.2% for network-level. There are 104,311 examples available for model training and testing. 50,581 examples (48%) are from single component (vertical) stations, with null horizontal channels values. For our data, models trained on individual channels give roughly the same reported accuracies (vertical=93.8%, radial= 93.9%, transverse=93.3% accuracy), but when used together, higher accuracy is achieved (by about 1.5% on single-fold trials). While null channels do not appear to be a significant impediment to accurate predictions, misclassified examples disproportionately come from vertical-only examples (62% vs. 38% from triaxial sensors). We validate our event duration selection (90 sec) using test data that include a 5- 74 min window following the event onset. Training back-to-back models on 5-min duration and 1.5-min duration trailing window yields equivalent accuracy. Using just the first 4 spectral windows following event onset (~9 sec), a trained CNN (2-4 layers) accomplished station-level accuracy of 84.7 ± 0.7% and event-level accuracy of 94.4 ± 0.9% using 10-fold cross-validation. We manually evaluated events where separate model misclassifications agreed and found that a significant fraction of misclassified events had unanimous station votes in favor of the opposite class and were highly confident (90%) in event predictions. Further scrutiny of these misclassified events revealed that a majority of them were mislabeled by analysts. Manual scrutiny can be difficult because discriminating evidence is often subtle even for a trained analyst. For a more systematic view of mislabels we observe the time of day dependence for quarry blasts events. Blasting is illegal in Utah outside of daylight hours, and although nondaylight event times are not a guarantee that these events are mislabeled, it provides some evidence that they might be. Using any model, a majority of the labeled quarry blasts that occur during night-time hours are classified as local earthquakes. Similarly, we test time-of-day dependence for the area with the largest density of misclassified earthquakes. These events spatially coincide with the location of known mines in southern Idaho and are interspersed within a large number of events labeled as quarry blasts. All (100%) of these events labeled as local earthquakes occur during daylight hours. We take this, along with spatial proximity to known quarry sites, as strong evidence that these events are mislabel as earthquakes by human analysts. These events alone account for up to 30% of the mislabeled earthquakes for all data. Although we have no equivalent method to probe the local earthquake class for mislabels, 75 the catalog we manually explored suggests mislabeled events exist for both classes. Most (92%) of station-level predictions are made with 90% certainty or greater when we interpret model output as a probability over available classes using softmax. Network-level predictions for events that agree and events that disagree with analyst labels also report high levels of certainty (96% are above a median class difference of 0.8, where 1 is completely sure and at 0.5 the prediction is equally split between classes). The best model (LSTMu) is more likely to be uncertain (softmax output of less than 90%) about local earthquakes (278/454 unsure events are local earthquakes), and most of the unsure local earthquake events occur in the Sufco mining area, central Utah (Figure 4.5), where tectonic seismicity and mining induced seismicity occur together. The Bingham mine dominates spatial event densities where the model incorrectly classifies quarry blast examples at the station level. We investigate the bulk characteristics for each class at three source-receiver distance bins to evaluate the characteristics of misclassified station-level examples (Figure 4.6). We find no indication that misclassifications are driven by spurious, nonevent-related content. In general, misclassified examples for the local earthquake class (where the model predicts they are quarry blasts) more closely resemble the average quarry blast characteristics. Similarly, disagreeing model predictions for quarry blasts closely match bulk earthquake characteristics. To test the influence of local crustal structure on model classifications, we use data from the eastern United States and find that accuracy rates on data outside the Utah region are significantly lower for quarry blasts, but not for local earthquakes. Only a small number of nonearthquake sources were available for investigation including sonic 76 booms, quarry blasts, mining-explosions, and mine collapses. We use all nonearthquake sources in our catalog to calculate a class accuracy of ~33%, which is well below the competency demonstrated for quarry blasts in the Utah area. A majority of nonearthquake sources come from a similar location in northwestern Minnesota. For the 100 local earthquake sources, 90.2% of station-level predictions were correct, while 100% of network-level predictions were correct. Our models exhibit a clear inclination toward the earthquake class in the eastern U.S. (Figure 4.7). We evaluate event distributions for misclassified events and find that misclassified local earthquakes on average have 18 km longer source-receiver distances, magnitudes that are 0.46 magnitude units larger, and occur at depths less than 2.5 km on average. Misclassified quarry blasts have average distances that are 3.8 km greater, 0.2 magnitude units smaller, and are 1.22 km shallower (Figure 4.8). 4.6. Discussion There is a clear temporal progression expressed in the records of seismic sources captured by distant receivers. This is the result of a complex source coupled with wave propagation through a complex earth medium. In our data, examples sample discrete portions of the crust over various distances with at least 4 distinct velocity profiles (Keller et al., 1975; Loeb, 1986; Pechmann et al. 1984; Roller, 1965). Previous work has navigated some of the complexity in waveforms through feature engineering, and data segmentation (e.g., using only information from specific wave phases that are precut from the data). When temporal characteristics are poorly modeled, network-level accuracy relies more on single station examples from source-receiver distances that are well modeled. The likelihood of this occurring for a given event in our data is high, 77 because for each event, there are typically many station examples and the distribution of the test data shares the same characteristics as the training data. Differences between network-level accuracy using median station predictions and the maximum class using the sum of all station predictions per class underline this point. When models use examples from various source-receiver distances equally well (i.e., when the last two columns in Table 4.1 converge), these models can be considered good at modeling travel-path variation. Conversely, the larger the spread in calculated accuracy between these two methods, the more the model depends on high-confidence ‘average' examples for reported accuracy. The above complications describe one reason machine learning (ML) models, and non-ML discrimination techniques, often benefit from simplifying their input space. For our data, the LSTM architecture most adeptly models the data, with high station-level accuracy and minimal spread between network-level accuracy calculated in two separate ways. The CNN architecture also performs well on data within Utah, but performance drops considerably compared to LSTMu when test data extend beyond UUSS authoritative boundaries. When mislabeled events exist in the test data, even in sparse abundance, they can obscure true estimates of model performance because each event has on average 8, and up to 48, associated examples, and because events remain associated within each data split. In addition, even though correctly labeled events dominate the training space for data inside and outside Utah, we find that a small number of mislabeled events can significantly impede model learning. If we conservatively estimate a ~1% label error rate (LER) for all data (based on the difference between accuracy for all data vs. Utah-only data for models trained on all data), no model architecture is able to accomplish network- 78 level classification above 98%, and model performance between architectures is equivalent. Decreasing the LER by fractions of a percent on training data produces significant increases in model performance for LSTM models. For our dataset, quarterly reviews by UUSS analysts offer secondary quality checks on examples within UUSS authoritative boundaries. Additional review may account for part of why events outside of Utah have a higher label error rate, evidenced by increased model performance for Utah-only data. The main objective of this work was to build a model capable of reproducing analyst classifications on new incoming data from the UUSS network in near real-time. To achieve this, one priority was to minimize the signal duration without compromising accuracy. We began with a conservative estimate of 5 min, and shortened to 1.5 min without appreciable decrease in reported accuracy. If we allow a decrease of several percent (94.4% at event level), and expect most events to occur in locations where multiple stations are available to make predictions, it is possible to reduce classification latency to a few seconds following the first arrival. The second consideration for realtime operations was an expectation that the model handle data from both singlecomponent and 3-channel stations. While sufficient information necessary to separate prediction, boundaries exists in each channel individually (R, T, Z demonstrate nearly equal performance), together they perform better (by ~1.5%). Allowing models to learn from triaxial data, but still make predictions on examples where only vertical data are available, extends the usage to events that occur outside of areas with broadband coverage (or fractionally about half of the network stations). There is remarkable consistency between station-level classifications for events 79 even when the model disagrees with the analyst label. This suggests that the dominant sensitivity of the model is related to source characteristics. On Utah data, station-level agreement that disagrees with analyst labels often identifies events that were mislabeled by analysts. For events in the eastern U.S., the decrease in model accuracy for nonearthquake events suggests that information about the underlying crustal structure still plays an important role in the encoded representation of the data. Large-scale structural differences in the eastern U.S. (Q values) and/or differences in blasting techniques for nonearthquake events (most of which come from a single mine) are likely responsible for the skew toward the earthquake class for eastern U.S. events. We know that a level of variation exists at the source level within each class. The mining industry employs a variety of blasting methods (Langefors & Kihlström, 1978; Dowding, 1985). For local earthquakes, variation can come from fault and local stress conditions. We do not attempt to resolve dependence on these variations for any of the misclassified examples, but we note that model uncertainty for local earthquakes is most highly concentrated in areas where mining-induced seismicity and tectonic seismicity are both known to occur. For quarry blasts, the model uncertainty is concentrated around the Bingham copper mine. 4.7. Conclusions The models presented here were developed to demonstrate how machine learning can be used to leverage years of analyst decisions to inform future decisions on new data. We show that a range of model types and input durations can successfully accomplish binary event classification when many stations are available for each event prediction, 80 but that time-dependent models such as LSTM may be best suited when only few or single stations are available, and when significant diversity is present in the data. Despite achieving a level of independence from some factors related to travel path, our models are not independent of local crustal structure; for example, they achieve lower performance on data from the Eastern U.S. where crustal properties, such as attenuation, differ significantly. In addition, we demonstrate that events that may be ambiguous in the analyst domain are often outliers within the algorithm domain. Typically, these events highlight systematic discrepancies introduced into event catalogs by analyst errors. We also highlight that events outside of Utah have a higher label error rate (~1%) than events within Utah, which has a significant impact on model learning. Our best model trained on Utah-only data performs above 99% for network-level classification and above 98% on station-level classification when mislabeled events are excluded. Locations within Utah that experience the largest model uncertainty occur in regions where both mining-induced seismicity and tectonic earthquakes or quarry blasts exist in proximity. This illuminates a level of complexity that is not represented in event labels, and might be identifiable though deviations in model certainty within these areas. When labeled catalogs exist for training, the methods described in this study may be beneficial for source discrimination at various scales. These tools could also be repurposed for other goals such as nuclear test monitoring. Although we do not perform an exhaustive comparison between our results and recently developed discrimination techniques at local scales, we suggest ML methods such as LSTM may be valuable alone, or in tandem with existing techniques because the source discrimination they perform is 81 not controlled by depth. 4.8. Acknowledgements and Resources We rely on the Center for High Performance Computing at the University of Utah for computational resources. We use Tensorflow and Theano for early implementations and iterate model parameters using Keras with the Tensorflow backend. Our figures were generated using Matplotlib, Basemap, and Seaborn. UUSS data were accessed from IRIS using web services and Obspy. Data processing relied on python numeric libraries Scipy and Numpy. 82 Table 4.1. Model performance for 5 trained models. Models trained on data within Utah only or accuracy measure on data within Utah only are shown in red. We measure network accuracy by summing all station-level classifications using the maximum class (4th column). Using this approach, each station-level classification is weighted by model certainty. An alternative approach uses the median of all the station-level predictions, which gives equal weight to each station-level classification (5th column). 83 Figure 4.1. Map of events (circles) and source-receiver paths (lines) from UUSS for quarry blasts (red) and local earthquakes (blue). Receivers (white circles) are labeled by station name. 84 Figure 4.2. Permutations of 3-channel spectrograms for model input. For each example, we use the spectrogram computed from raw waveform data as model input. In the LSTM architecture (top), the spectrogram enters the model as a 40x144 matrix (time-step, frequency feature) where the 144 dimension is a 3-channel frequency power spectral density vector for each time time-step. The CNN architecture (middle) takes example spectrograms as 40x48x3 volumes (time, frequency, channel). The red box indicates the starting computation using a 2x2 pixel filter that shares weights along the depth axis of the volume. The combined CNN+LSTM (bottom) uses a similar permutation as the LSTM architecture, but instead of processing column slices (time), we process as a flattened (40x144) image with 1D filters (red box, 1x3). 85 Figure 4.3. Example LSTM cell from the public domain gitbook of Leonardo Araujo dos Santos on Artificial Intelligence: https://www.gitbook.com/@leonardoaraujosantos. 86 Figure 4.4. Accuracy as a function of the number of stations required to make a networklevel prediction. For events mislabeled by analysts, station-level predictions often unanimously agree in favor of the opposite class. This explains why we are able to increase the number of station-level examples required for a network-level prediction (curves are colored by fold 1-10 with increasing station count toward the right), and accuracy overall decreases. This also demonstrates that misclassified events are not limited to small magnitudes, but persist through magnitudes up to where they may potentially represent felt events around magnitude 3.0. 87 Figure 4.5. Model uncertainty (where the model is below 90% certain) for station-level quarry blasts examples (reds) and local earthquakes (blues). Uncertainty for local earthquakes most commonly occurs in the underground mining areas in central Utah. For the quarry blast class, uncertainty is largely concentrated around the Bingham mine. 88 Figure 4.6. Bulk characteristics for each class (local earthquakes LE: a, e, i; quarry blasts QB: c, g, k), compared to station-level examples misclassified by the LSTM model. For each distance bin, we show the sum of all spectrograms for misclassified quarry blast events (where the model thinks they are local earthquakes; b,f,g), and local earthquakes (d,h,l). It is clear that misclassification is generally the result of individual examples having characteristics that most closely resemble the average character of the opposite class. 89 90 Figure 4.7. Station-level predictions for earthquakes (blue) and quarry blasts (red) in the Eastern United States. Left: NEIC nonearthquake events including quarry blasts, mining explosions, and sonic booms from 2012-2017. The CNN+LSTM model has 33% accuracy for nonearthquake events. Accuracy for these events is dominated by misclassified mining events (labeled ‘mining explosions' in the NEIC catalog) in Minnesota. For earthquakes (right), 100% of events were accurately classified, while 90.2% of individuation stations had correct classifications. The differences in classifier performance for the eastern U.S. using a model from the western U.S. are likely due to differences in crustal structure between regions. A secondary factor leading to misclassifications for quarry blasts may be the diversity of sources, which were not well represented in the model training data. 91 Figure 4.8. Violin plots showing the distribution of misclassified events (colored by class label and normalized by area) and the correctly classified events as a function of distance, depth, magnitude, and time for all data. For the distance view (top left), earthquakes are slightly more likely to be misclassified if the receiver is further away from the source than the average for correctly predicted examples. For the depth view (top right), there is a clear skew toward shallow events for the earthquake class. These events predominantly occur outside of Utah and are misclassified by analysts based on time-of-day and location. Misclassified examples in Magnitude space (bottom left) more closely resemble the opposite class for both local earthquakes and quarry blasts. In time space (bottom right), month-scale resolution indicates that there is no clear temporal bias to misclassified events, although for the earthquake class, misclassified events disproportionately occur slightly earlier in the event catalog. 92 4.9. References Beyreuther, M., C. Hammer, J. Wassermann, M. Ohrnberger, & T. Megies (2012), Constructing a Hidden Markov Model based earthquake detector: Application to induced 946 seismicity. Geophysics Journal International, 189, 602-610, doi:10.1111/j.1365246X.2012.05361.x Baumgardt, D. R., & Young, G. B. (1990). Regional seismic waveform discriminants and case-based event identification using regional arrays. Bulletin of the Seismological Society of America, 80(6B), 1874-1892. Dammeier, F., Moore, J. R., Hammer, C., Haslinger, F., & Loew, S. (2016). Automatic detection of alpine rockslides in continuous seismic data using hidden Markov models. Journal of Geophysical Research: Earth Surface, 121(2), 351-371. Dowla, F. U., Taylor, S. R., & Anderson, R. W. (1990). Seismic discrimination with artificial neural networks: Preliminary results with regional spectral data. Bulletin of the Seismological Society of America, 80(5), 1346-1373. Del Pezzo, E., Esposito, A., Giudicepietro, F., Marinaro, M., Martini, M., & Scarpetta, S. (2003). Discrimination of earthquakes and underwater explosions using neural networks. Bulletin of the Seismological Society of America, 93(1), 215-223. Dowding, C. H. (1985). Blast vibration monitoring and control (Vol. 297). Englewood Cliffs, NJ: Prentice-Hall. Dysart, P. S., & Pulli, J. J. (1990). Regional seismic event classification at the NORESS array: Seismological measurements and the use of trained neural networks. Bulletin of the Seismological Society of America, 80(6B), 1910-1933. Graves, A., Fernández, S., & Schmidhuber, J. (2005). Bidirectional LSTM networks for improved phoneme classification and recognition. Artificial Neural Networks: Formal Models and Their Applications-ICANN 2005, 753-753. Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5), 602610. Hammer, C., M. Ohrnberger, & D. Faeh (2012), Classifying seismic waveforms from scratch: A case study in the alpine environment. Geophysical Journal International, 192(1), doi:10.1093/gji/ggs036 Hammer, C., Ohrnberger, M., & Schlindwein, V. (2015). Pattern of cryospheric seismic events observed at Ekström ice shelf, Antarctica. Geophysical Research Letters, 42. doi: 10.1002/2015GL064029 93 Hartse, H. E., Taylor, S. R., Phillips, W. S., & Randall, G. E. (1997). A preliminary study of regional seismic discrimination in central Asia with emphasis on western China. Bulletin of the Seismological Society of America, 87(3), 551-568. Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., ... Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82-97. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780. Hochreiter, S. (1998). The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6(02), 107-116. Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv preprint arXiv:1502.03167. Keller, G. R., R. B. Smith, & L. R. Braile (1975). Crustal structure along the Great Basin Colorado Plateau transition from seismic refraction studies, Geophysical Research Letters, 80, 1093-1098. Kim, W.Y., Simpson, D.W., & Richards, P.G. (1993). Discrimination of earthquakes and explosions in the eastern United States using regional high-frequency data. Geophysical Reasearch Letters, 20(14), 1507-1510. Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. Knapmeyer-Endrun, B., & Hammer, C. (2015). Identification of new events in Apollo 16 lunar seismic data by Hidden Markov Model-based event detection and classification. Journal of Geophysical Research: Planets, 120(10), 1620-1645. Koper, K. D., Pechmann, J. C., Burlacu, R., Pankow, K. L., Stein, J., Hale, J. M., ... McCarter, M. K. (2016). Magnitude-based discrimination of man-made seismic events from naturally occurring earthquakes in Utah, USA. Geophysical Research Letters, 43(20). Kortström, J., Uski, M., & Tiira, T. (2016). Automatic classification of seismic events within a regional seismograph network. Computers & Geosciences, 87, 22-30. Krizhevsky, A., Sutskever, I., Hinton, G. E., Dahl, G. E., Yu, D., Deng, L., ... & Corrado, G. S. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 67, 2361-2367. 94 Langefors, U., & Kihlström, B. (1978). The modern technique of rock blasting. New York, NY: Wiley, 1978. - 438 p. Langer, H., Falsaperla, S., Masotti, M., Campanini, R., Spampinato, S., & Messina, A. (2009). Synopsis of supervised and unsupervised pattern classification techniques applied to volcanic tremor data at Mt Etna, Italy. Geophysical Journal International, 178(2), 1132-1144. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. Li, C. J., Li, L., Qian, J., & Liu, J. G. (2017). Batch size matters: A diffusion approximation framework on nonconvex stochastic gradient descent. arXiv preprint arXiv:1705.07562. Loeb, D. T. (1986). The P-wave velocity structure of the crust-mantle boundary beneath Utah (M. S. Thesis), University of Utah, Salt Lake City, Utah, 126 pp. O'Rourke, C. T., & Baker, G. E. (2016). A spectrogram-based method of Rg detection for explosion monitoring. Bulletin of the Seismological Society of America, 107(1), 34-42 Ohrnberger, M., (2001) Continuous automatic classification of seismic signals of volcanic origin at Mt. Merapi, Java, Indonesia (Ph.D. thesis), Institut fu¨r Geowissenschaften, Universita¨t Postdam. Perol, T., Gharbi, M., & Denolle, M. (2017). Convolutional neural network for earthquake detection and location. arXiv preprint arXiv:1702.02073. Pechmann, J.C., W. D. Richins, & R. B. Smith (1984). Evidence for a "double moho" beneath the Wasatch front, Utah. EOS, Transactions American Geophysical Union, 65, 988. Roller, J.C. (1965). Crustal structure in the eastern Colorado Plateau province from seismic-refraction measurements, Bulletin of the Seismological Society of America, 55, 107-119. Sainath, T. N., Vinyals, O., Senior, A., & Sak, H. (2015). Convolutional, long short-term memory, fully connected deep neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference (pp. 4580-4584). IEEE. Scarpetta, S., Giudicepietro, F., Ezin, E. C., Petrosino, S., Del Pezzo, E., Martini, M., & Marinaro, M. (2005). Automatic classification of seismic signals at Mt. Vesuvius volcano, Italy, using neural networks. Bulletin of the Seismological Society of America, 95(1), 185-196. 95 Shore, J., & Johnson, R. (1981). Properties of cross-entropy minimization. IEEE Transactions on Information Theory, 27(4), 472-482. Shumway, R. H. (1996). Array Detection of Ripple-Fired Signals: The Cepstral FStatistic (No. TR-326). Technical Report. Retrieved from http://www.dtic.mil/dtic/tr/fulltext/u2/a319833.pdf Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929-1958. Stump, B. W., Hedlin, M. A., Pearson, D. C., & Hsu, V. (2002). Characterization of mining explosions at regional distances: Implications with the international monitoring system. Reviews of Geophysics, 40(4). Su, F., Aki, K., & Biswas, N. N. (1991). Discriminating quarry blasts from earthquakes using coda waves. Bulletin of the Seismological Society of America, 81(1), 162-178. Tibi, R., K. D. Koper, K. L. Pankow, & C. Y. Young (2018). Depth discrimination using Rg-to-Lg amplitude ratios for seismic events in Utah recorded at local distances, Bulletin of the Seismological Society of America, accepted. Walter, W. R., Mayeda, K. M., & Patton, H. J. (1995). Phase and spectral ratio discrimination between NTS earthquakes and explosions. Part I: Empirical observations. Bulletin of the Seismological Society of America, 85(4), 1050-1067. Werbos, P. J. (1990). Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 78(10), 1550-1560. Williams, R. J., & Peng, J. (1990). An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Computation, 2(4), 490-501. Wüster, J. (1993). Discrimination of chemical explosions and earthquakes in central Europe-a case study. Bulletin of the Seismological Society of America, 83(4), 11841212.
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6fr4zqn