Framework and model for interactive spatiotemporal data analysis and visualization systems

Framework and model for interactive spatiotemporal data analysis and visualization systems

Title	Framework and model for interactive spatiotemporal data analysis and visualization systems
Publication Type	thesis
School or College	College of Engineering
Department	Computing
Author	Christensen, Cameron T.
Date	2019
Description	As spatiotemporal datasets grow, accessing and processing them for analysis and visualization are increasingly the primary bottlenecks for their use. Challenges include retrieving, resampling, and analyzing large and often disparately located data. Utilization of large-scale computing resources can be helpful, but may still incur delays due to extensive data transfers, job scheduling, and remote access. Furthermore, some applications, such as those for public safety, must remain interactive even as data sizes increase. To enable utilization of increasingly massive datasets, it is worthwhile to invest in the creation of workflows that guarantee interactivity, making the broadest set of inquiries possible at minimal cost. In this work, I present a framework that addresses several common pitfalls of interactive data analysis and visualization. It is comprised of an embedded domain-specie language (EDSL) and associated runtime specially designed for the interactive exploration of large, remote data ensembles. The EDSL is an extension of JavaScript, which allows users to express a wide range of analyses in a simple and abstract manner. The underlying runtime utilizes a streaming, multiresolution data layout, transparently resolving issues such as remote data access and resampling, and maintaining interactivity through progressive, interruptible computation. This framework enables interactive exploration of massive, remote datasets, such as the 3.5 petabyte 7km NASA GEOS-5 \Nature Run" simulation, for which remote users have previously been able to analyze only online or at reduced resolution. Most available climate data are stored using legacy le formats that prohibit incremental, multiresolution access. In order for the framework to automatically read these datasets, I developed an on-demand conversion module, currently deployed at Lawrence Livermore National Lab as part of the Earth System Grid Federation (ESGF) platform. Based on the techniques used for this framework, I also propose a general purpose model to aid creation and evaluation of other interactive workflows for large, remote data. I present the necessary components of such workflows along with important considerations regarding their design and integration, including comprehensive runtime management, elective communication, interruptibility, appropriate data formats, and programming models that facilitate progressive recement of results.
Type	Text
Publisher	University of Utah
Dissertation Name	Master of Science
Language	eng
Rights Management	© Cameron T. Christensen
Format	application/pdf
Format Medium	application/pdf
ARK	ark:/87278/s67h7jws
Setname	ir_etd
ID	1709789
OCR Text	Show FRAMEWORK AND MODEL FOR INTERACTIVE SPATIOTEMPORAL DATA ANALYSIS AND VISUALIZATION SYSTEMS by Cameron T. Christensen A thesis submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Master of Science in Computing School of Computing The University of Utah August 2019 Copyright c Cameron T. Christensen 2019 All Rights Reserved The University of Utah Graduate School STATEMENT OF THESIS APPROVAL Cameron T. Christensen The thesis of has been approved by the following supervisory committee members: Valerio Pascucci , Chair May 31, 2019 Date Approved Robert M. Kirby , Member May 31, 2019 Date Approved Feifei Li , Member June 3, 2019 Date Approved and by Ross T. Whitaker the Department/College/School of and by David B. Kieda, Dean of The Graduate School. , Chair/Dean of Computing ABSTRACT As spatiotemporal datasets grow, accessing and processing them for analysis and visualization are increasingly the primary bottlenecks for their use. Challenges include retrieving, resampling, and analyzing large and often disparately located data. Utilization of large-scale computing resources can be helpful, but may still incur delays due to extensive data transfers, job scheduling, and remote access. Furthermore, some applications, such as those for public safety, must remain interactive even as data sizes increase. To enable utilization of increasingly massive datasets, it is worthwhile to invest in the creation of workflows that guarantee interactivity, making the broadest set of inquiries possible at minimal cost. In this work, I present a framework that addresses several common pitfalls of interactive data analysis and visualization. It is comprised of an embedded domain-specific language (EDSL) and associated runtime specifically designed for the interactive exploration of large, remote data ensembles. The EDSL is an extension of JavaScript, which allows users to express a wide range of analyses in a simple and abstract manner. The underlying runtime utilizes a streaming, multiresolution data layout, transparently resolving issues such as remote data access and resampling, and maintaining interactivity through progressive, interruptible computation. This framework enables interactive exploration of massive, remote datasets, such as the 3.5 petabyte 7km NASA GEOS-5 “Nature Run” simulation, for which remote users have previously been able to analyze only offline or at reduced resolution. Most available climate data are stored using legacy file formats that prohibit incremental, multiresolution access. In order for the framework to automatically read these datasets, I developed an on-demand conversion module, currently deployed at Lawrence Livermore National Lab as part of the Earth System Grid Federation (ESGF) platform. Based on the techniques used for this framework, I also propose a general purpose model to aid creation and evaluation of other interactive workflows for large, remote data. I present the necessary components of such workflows along with important considerations regarding their design and integration, including comprehensive runtime management, effective communication, interruptibility, appropriate data formats, and programming models that facilitate progressive refinement of results. To my children, Bryson and Brianne, and to Karah and Neikho. CONTENTS ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x CHAPTERS 1. 2. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Primary Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Other Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 Refereed Conference Posters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.4 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 5 6 6 7 RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1 2.2 2.3 2.4 2.5 2.6 3. General Integrated Visualization Environments . . . . . . . . . . . . . . . . . . . . . . . 8 Domain-Specific Visualization Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Remote Data Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Workflow Management Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Domain-Specific Languages (DSLs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Runtime Loop Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 FRAMEWORK FOR INTERACTIVE ANALYSIS AND VISUALIZATION OF MASSIVE AND REMOTE SPATIOTEMPORAL DATA ENSEMBLES . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1 3.2 3.3 3.4 Workflow Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overall System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Processing EDSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Example Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Abstract Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Explicit Data Publishing Hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.4 Generalized Multidimensional Iterators . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Progressive Runtime System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Built-in Multidimensional Scalar Class . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Multiresolution Streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Incremental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.4 Loop Order and Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.5 Hybrid Client- and Server-Side Processing . . . . . . . . . . . . . . . . . . . . . . . 3.6 On-Demand Data Reordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 15 17 19 19 21 21 22 22 23 24 24 24 27 29 3.6.1 Data Reordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.6.2 Integration With OpenVisus Data Server . . . . . . . . . . . . . . . . . . . . . . . . 31 3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4. PROPOSED MODEL FOR INTERACTIVE LARGE DATA WORKFLOWS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1 Overview of Workflows for Creation, Analysis, and Visualization of Structured Spatiotemporal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Necessary Components for Interactive Workflows . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Distributed Interruptible Runtime System . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Streaming Data Layout and Distribution of Storage . . . . . . . . . . . . . . . 4.2.3 Suitable Analysis Language Features to Enable Flexible Interpretation . 4.3 Design Considerations for Interactive Workflows . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Incremental Advancement of Computation Results . . . . . . . . . . . . . . . . 4.3.2 Progressive Resolution Refinement of Output . . . . . . . . . . . . . . . . . . . . 4.3.3 Interruptibility to Interactively Guide Computation . . . . . . . . . . . . . . . 4.4 Component Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Nonblocking Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Direction of Data Movement and Effective Caching . . . . . . . . . . . . . . . . 4.4.3 Minimal Shared State for Module Independence . . . . . . . . . . . . . . . . . . 4.5 Workflow Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. EXAMPLE APPLICATIONS USING THE INTERACTIVE FRAMEWORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.1 CFD Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Multifield Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Localized Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.3 Draft Computation Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.4 Postsimulation Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Climate Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Multimodel Ensemble Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Annual Zonal Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Rank Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Using On-Demand for Climate Simulation Data . . . . . . . . . . . . . . . . . . 5.2.5 Performance Assessment of On-Demand . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.6 Application Scalability With Increasing Data Size . . . . . . . . . . . . . . . . . 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. 36 41 41 42 42 43 43 44 44 45 46 46 47 48 50 52 53 53 53 55 57 57 58 60 61 65 66 68 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 APPENDICES A. EDSL REFERENCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 B. EXAMPLE SCRIPTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 vi LIST OF FIGURES 3.1 Typical sequential pipeline for acquisition, analysis, and visualization. Each step requires the completion of all previous items, a linear process, the interactivity of which depends the size and location of data, complexity of analyses, and availability of computational resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Illustration of multiresolution data loading compared to loading from a “flat” row-major format. Using a multiresolution data format, coarse-resolution data can be loaded in much less time, providing quick preliminary results. . . . . . . . 16 3.3 Data layout obtained for a 2D matrix. The 1D array at the top of the diagram represents the disk distribution of the data. Each consecutive block in the 1D array corresponds to data of progressively finer resolution in the 2D matrix (from [?], used with permission). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4 The system pipeline for our interactive analysis and visualization framework, which exploits progressive computation and seamless local or remote execution of EDSL scripts to provide a highly flexible platform for the exploration of large-scale, disparately located data. Note the ability to utilize the EDSL to specify data analyses from both the client and the server. . . . . . . . . . . . . . . . . 18 3.5 The execution model used by the runtime system for the interactive execution of EDSL scripts that continuously process data requests, publish incremental results, and respond immediately to user input. . . . . . . . . . . . . . . . . . . . . . . . . 18 3.6 Comparison of parallel unordered loop execution for increasing thread counts for two algorithms: maximum intensity projection and zonal rank correlation. Dashed lines indicate perfect scaling. Tests conducted on a 16-core Intel Xeon E7-8890 v3 @ 2.50GHz running openSUSE 13.1 using locally cached data. . . . 26 3.7 Results of a temporal average computation (Listing 3.1) via two orderings for the inner loop. The error (plotted as RMSE) between the precomputed result and the incremental result decreases quickly when utilizing the lowdiscrepancy van der Corput sequence of timesteps versus a simple linear sequence. The results shown are for the average of total aerosol scattering for the period February through March, 2007, using 1-hour data intervals from the 7-km Ganymed Nature Run simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.8 Result of 100 iterations (of 1000 total) for calculating maximum intensity projection (MIP) along the Z axis of a 2-photon neuronal microscopy volume. Each iteration adds a 2D slice. (a) Linear order. (b) Low-discrepancy order. (c) Final MIP. (Data courtesy Angelucci Lab, University of Utah) . . . . . . . . . 28 3.9 Data server with on-demand conversion. Data movement is shown with thick arrows, requests with thin arrows. When data are requested (a), the data server first checks the cache (i), and if not cached the requested data are converted on the fly (ii) and sent to the client (b). . . . . . . . . . . . . . . . . . . . . . 30 3.10 The mechanism of data reordering utilized by the IDX format. (a) Traditional multiresolution image pyramid, which duplicates the image at each resolution level. (b), (c), (d), (e), and (f) The procession of resolution levels as they are stored in a single multiresolution IDX file, with the 2D image at the top and its layout on disk at the bottom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1 Nonsequential workflow for interactive acquisition, analysis, visualization, and processing. Data are ultimately the connecting point. This version of the workflow relies on a central data server to manage updates and facilitate production of incremental results, but the data server itself could be distributed across multiple centers or among the nodes of a cluster. . . . . . . . . . . . . . . . . . . 40 5.1 Exploring discrete regions of burning flame within a specific threshold of mixed fuel. (a) The original OH field. (b) The application of the mask to the original OH field where the mixture fraction of fuel and oxygen is between 36%-40%. (Data produced with the S3D application, courtesy Jackie Chen, Sandia National Laboratory.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.2 CFD simulation of a BSF coal boiler: (a) shows the scripting interface along with the result of the average temperature computation; (b) shows the results of the computation of the average O2 moles around the injection ports of the coal boiler during the last second of a 5-second simulation. . . . . . . . . . . . . . . 54 5.3 The results of computing the average temperature of a CFD simulation during the last second of a 5-second boiler simulation. Each image from left to right shows the comparison of the computation with the original result when computed using increasingly fewer samples. Note that the original result is computed by the simulation using 30x more samples that are not available because saving so much data would not be possible due to storage limitations. 56 5.4 From left to right, this image shows the computation of the standard deviation of the temperature of a CFD simulation during the last second of a 5-second boiler simulation. Since this value was not computed originally by the simulation, only 1/30 of the timesteps is available for the analysis. Based on the results of the previous comparison, we believe this is still less than 1% error versus an inline analysis that uses data from every timestep. . . . . . . . . . . . . . . 56 5.5 The comparison between climate simulation model ensembles. . . . . . . . . . . . . . 58 5.6 Annual zonal average of temperature and humidity. In (a), the daily spatial temperature average changes as we move along the temporal axis, which illustrates the change of seasons in a year. In (b), the duplication error in the humidity data is indicated by the bands along the temporal axis. . . . . . . 59 5.7 Pearson rank correlation between hydrophilic and hydrophobic black carbon on the 7km GEOS-5 Nature Run dataset. (a) Coarse-resolution rank correlation. (b) Full-resolution rank correlation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.8 Comparison of data size, computation time, and root-mean-square error (RMSE) for various resolution levels in the computation of the Pearson rank correlation. 62 viii 5.9 Overview of specialization of the on-demand conversion system for incremental access to remote climate datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.10 Computation time when input data are converted on demand versus already cached on the server. Temporal average of daily data (90 timesteps) from NIMR HadGEM2-AO “Historic.” Local caching disabled. Each timestep is 32bit floating point, resolution 192x143x8. Our progressive environment revealed serious and previously unnoticed errors in the original data. . . . . . . . . . . . . . . 65 ix ACKNOWLEDGMENTS This work is supported in part by NSF:CGV Award:1314896, NSF CISE ACI-0904631, NSF:IIP Award:1602127, NSF:ACI Award 1649923, DOE/Codesign P01180734, DOE/SciDAC DESC0007446, CCMSC DE-NA0002375, and PIPER: ER26142 DE-SC0010498. This material is based upon work supported by the Department of Energy, National Nuclear Security Administration, under Award Number(s) DE-NA0002375. This work was performed under the auspices of the U.S. Department of Energy with Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. This work is also supported in part by the Earth System Grid Federation (ESGF), the Distributed Resources for ESGF Advanced Management (DOE DREAM) project, and the LLNL program on Analytics and Informatics Management Systems (AIMS). I wish to thank Brian Summa, Peer-Timo Bremer, and Attila Gyulassy for their feedback in the development of a model for interactive systems, Giorgio Scorzelli for his help modifying the OpenVisus framework, Eric Brugger for guiding me in the addition of the IDX reader to VisIt, Venkat Vishwananth and George Thiruvathukal for their help using cloud frameworks on HPC systems, Sasha Ames and Anthony Hoang for supporting the installation of the OnDemand Data Reordering service at LLNL, and Christine Pickett for editing this thesis. CHAPTER 1 INTRODUCTION Interactivity has long been a desirable trait for many scientific visualization and analysis applications. Rapid feedback in response to user input enables flexible data exploration and streamlines the hypothesis-to-evaluation loop, which is vital for data-driven scientific discovery, as well as for a variety of interactive tools used in other fields, such as public safety. Yet as our ability to generate and acquire large datasets and our reliance on distributed storage and computations grow, both data acquisition and processing time can severely limit users’ capability to conduct effective data-driven analysis and experimentation. Existing paradigms used to develop these systems have consisted largely of isolated descriptions of the variety of independent processes involved. Interconnections between these processes are typically shown as some form of sequential pipeline. As a result, many possible optimizations may not be considered, and notions of interactivity are typically limited to subsets of these overall pipelines, such as for visualization. As data size and distribution of storage and computational resources grow, automating and streamlining analysis workflows is increasingly difficult. Describing even comparatively simple workflows, such as averages or comparisons, can quickly become nontrivial if multiple data sources, remote locations, or different resolutions or data formats are involved. The resulting scripts and solutions are typically customized for the specific analysis are difficult to adapt, and often contain manual steps, such as file transfers. In addition, the inherent processing latency can be prohibitive for a large dataset. Even assuming sufficient computational resources, operations on terabytes of simulation ensembles cannot be performed interactively using existing solutions, which makes any mistakes costly and can severely impede or even prevent comprehensive data exploration. Even the interactive portions of existing applications often rely on extensive, noninteractive preprocessing in order to reduce delays during use. Techniques such as building acceleration structures, prefetching large datasets, and precomputing certain common analyses are all used to facilitate interactivity at runtime. However, this preprocessing is itself becoming a serious performance bottleneck 2 as data become larger and more disparately located. Furthermore, such interactive data exploration is limited in scope to the extent and type of preprocessing that is performed, and user queries outside these bounds cannot easily be addressed. In addition, consider three issues for which interactivity is highly desirable and increasingly difficult. 1) Modern, high-resolution, time-sensitive data are becoming so large that processing takes a significant portion of the time available before the results are needed. For example, high-resolution versions of typical 87-hour SREF weather simulations used for forecasting can require more time than available for postprocessing and analysis [?]. 2) Emergency work such as tracking the recent wildfires in California requires integrating data from several sources, including satellites and simulations [?]. Real-time workflows are essential because time is of the essence for such critical situations. 3) Huge, acquired datasets, such as those from microscopes or satellites, take so much time to process that a new method is desired to allow scientists to validate data integrity during acquisition, and more quickly begin analyzing the data. One solution to this data expansion is the use of server-side processing in which a large back-end server performs computations and the user sees only the final result. Such batch processing systems can be effectively utilized for a variety of different tasks involving large computations, especially when the processing to be performed is already clearly understood, and the length of time required to perform these computations offline is tolerable for the scientists. However, if the most appropriate computations have not yet been identified, interactive data exploration can be extremely useful, allowing the user to rapidly experiment with a variety of different analyses and visualizations. Batch processing can make such interactive exploration more difficult due to its clustered nature and the often significant delays between the initiation of computations and the delivery of results. To enable interactivity for such algorithmic experimentation and in order to efficiently narrow the focus of their efforts, scientists or other users need to receive cursory results as quickly as possible. For this work, our focus is on the creation of a comprehensively interactive analysis and visualization framework for data exploration. Existing techniques exhibit a variety of failings due to their lack of comprehensive design and cohesive integration. Some of the methods that have been developed to facilitate efficient computation and visualization include the introduction of “fast queries” such as FastBit [?] and interactive rendering for isosurface extraction [?], but these methods typically rely on preprocessed local data and 3 focus on only a small part of the overall workflow that transforms raw data into visualized images or computational results. The full process for such transformations generally involves data movement, preprocessing, computation, caching, and visualization. Describing even comparatively simple workflows that encompass such processes (e.g., analysis tasks such as averages or comparisons) can quickly become nontrivial when these tasks involve multiple data sources, remote computation, and combining data of different formats or resolutions. Such workflows can be difficult to adapt, being highly customized for a specific analysis and often relying on manual steps such as file transfers. Furthermore, due to the potentially overwhelming size of the data, such analyses may need to be executed as noninteractive batch processes using high-performance computing resources. The inherent processing latencies can be prohibitive and thereby hinder data comprehension. Finally, mistakes at any point in these workflows can carry a heavy penalty, often requiring repetition of significant parts of these time-consuming processes. In essence, contemporary data analysis and visualization applications are designed by a process of trial and error, providing custom solutions to facilitate tolerable levels of interactivity, yet often allowing egregious times for startup, data ingestion, and computation. Furthermore, with increasing data size and distribution, these applications are not often designed to scale independently of computational resources. For large, interactive workflows, these issues are increasingly problematic. 1.1 Thesis Contributions By recognizing the importance of intermediate or partial results, we aim to address the challenges involved in providing them and aspire to create a truly interactive environment. We present the design and demonstration of an interactive framework for the analysis and visualization of massive, disparately located data ensembles. Interactivity is achieved through the creation of an embedded domain-specific language (EDSL) and complementary runtime to support incremental execution of arbitrary user-provided analyses, and both server- and client-side computation. The framework makes use of techniques such as incremental computation, multiresoution data layouts, and interruptible workflows, and is designed with ongoing consideration of the workflow as a whole. Our system relies in part on a multiresolution data layout to facilitate coarse-to-fine streaming of structured spatiotemporal data, but most available data are stored using legacy, row-major formats. Therefore, we also include an on-demand data reordering module that performs incremental data conversion suitable for the interactive nature of the workflows that use it. This 4 on-demand data reordering system is an independent component of our application that can also be utilized as a part of other systems. In general, reusability is a significant and somewhat surprising advantage of systems developed with intercomponent communication considered as a significant priority. This system is currently in operation at Lawrence Livermore National Laboratory (LLNL), where it is used for the analysis and visualization of climate simulation ensembles as part of the Earth System Grid Federation (ESGF), facilitating, for the first time, interactive remote exploration of massive datasets such as the 7km NASA GEOS-5 Nature Run simulation, which previously have been remotely analyzed only offline or at reduced resolution. The on-demand module is independently utilized to enable transparent access to petabytes of existing climate simulation datasets. Considering the challenges of this system, we next present the first steps in the creation of a general purpose model to be used for the effective design and assessment of interactive workflows for the analysis and visualization of large, remote, spatiotemporal data ensembles. These steps involve articulating the components of such workflows, including flow of tasks, required data movement, and availability of computational resources; and in addition, considering the scaling behavior of each component with respect to increasing data size, data distribution, and the algorithms involved in its analysis. By incorporating global constraints on time and latency, workflow designers can use this description to ensure the desired level of interactivity can be achieved by appropriately limiting the allocation of resources to each component of the workflow in the context of the entire system. Although we focus on techniques for structured spatiotemporal datasets, we anticipate many of the considerations for such data will also generalize to unstructured data modalities. To summarize, this thesis presents the following contributions with regard to interactive analysis and visualization workflows for large, disparately located, structured spatiotemporal data ensembles: 1) an embedded domain-specific language (EDSL), based on JavaScript, that provides a simple and abstract description of sophisticated analysis workflows, along with a corresponding runtime system that executes a given workflow in an interruptible, progressive manner and enables dynamic selection of various computational parameters; 2) a complementary end-to-end pipeline for automatic conversion and caching that enables transparent multiresolution access to large, distributed datasets of different formats; and 5 3) the proposal of a general purpose model for the design and assessment of such interactive workflows based on comprehensively articulating the necessary components and considering them as a whole in order to develop appropriate constraints on time and latency for each portion of the system. We demonstrate the EDSL, interactive runtime, and on-demand data conversion framework using applications for the analysis and visualization of neuronal microscopy acquisitions, computational fluid dynamics (CFD) simulations, and petascale climate data ensembles. These highlight the strengths of the framework in regard to increasing scale and distribution of data and availability of computational resources, considerations that are all included as part of the proposed model. 1.1.1 Primary Publications The following research articles that have been published or accepted for publication have contributed to this thesis: 1. C. Christensen, S. Liu, G. Scorzelli, J. Lee, P-T. Bremer, and V. Pascucci, “Embedded Domain-Specific Language and Runtime System for Progressive Spatiotemporal Data Analysis and Visualization,” in IEEE 6th Symposium on Large Data Analysis and Visualization (LDAV), 2016. 2. C. Christensen, G. Scorzelli, P-T. Bremer, S. Liu, J. Lee, B. Summa, and V. Pascucci, “Interactive Progressive Streaming to Process, Analyze, and Visualize Distributed Data Ensembles of Arbitrarily Size: Using a Progressive Runtime Server, On-Demand Data Conversion, and an Embedded Domain Specific Language Suitable for Incremental Computation,” in Earth System Grid Federation Conference, 2017. 3. W. Widanagamaachchi, C. Christensen, V. Pascucci, and P-T. Bremer, “Interactive Exploration of Large-scale Time-varying Data Using Dynamic Tracking Graphs,” in IEEE 2nd Symposium on Large Data Analysis and Visualization (LDAV), 2012. 4. J. McEnerney, S. Ames, C. Christensen, C. Doutriaux, T. Hoang, J. Painter, B. Smith, Z. Shaheen, and D. Williams, “Parallelization of Diagnostics for Climate Model Development,” in Journal of Software Engineering and Applications, 2016. 5. S. Kumar, J. Edwards, P-T. Bremer, A. Knoll, C. Christensen, V. Vishwanath, P. Carns, J. Schmidt, and V. Pascucci, “Efficient I/O and Storage of Adaptive-resolution Data,” in High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for , 2014. 6 6. S. Kumar, C. Christensen, J. Schmidt, P-T. Bremer, E. Brugger, V. Vishwanath, P. Carns, H. Kolla, R. Grout, J. Chen, M. Berzins, G. Scorzelli, and V. Pascucci, “Fast Multiresolution Reads of Massive Simulation Datasets,” in International Supercomputing Conference, 2014. 7. A. Venkat, C. Christensen, A. Gyulassy, B. Summa, F. Federer, A. Angelucci, and V. Pascucci, “A Scalable Cyberinfrastructure for Interactive Visualization of Terascale Microscopy Data,” in Scientific Data Summit (NYSDS), New York , 2016. 1.1.2 Other Publications 1. L. Hogrebe, A. Paiva, E. Jurrus, C. Christensen, M. Bridge, J. Korenberg, and T. Tasdizen, “Trace Driven Registration of Neuron Confocal Microscopy Stacks,” in Biomedical Imaging: From Nano to Macro, 2011 IEEE International Symposium on, 2012. 2. A.V. Grosset, M. Prasad, C. Christensen, A. Knoll, and C. Hansen, “TOD-tree: Task-overlapped Direct Send Tree Image Compositing for Hybrid MPI Parallelism,” in Proceedings of the 15th Eurographics Symposium on Parallel Graphics and Visualization, 2015. 3. G. Thiruvathukal, C. Christensen, J. Xiaoyong, T. Francois, and V. Venkatram, “A Benchmarking Study to Evaluate Apache Spark on Large-Scale Supercomputers,” in IEEE Cloud (in submission), 2019. 4. B. Summa, S. Kumar, V. Pascucci, P.-T. Bremer, C. Christensen, “Scalable Visualization and Interactive Analysis using Massive Data Streams,” in Cloud Computing and Big Data, editors C. Catlett, W. Gentzsch, L. Grandinetti, IOS Press, Inc 2013. 5. V. Pascucci, G. Scorzelli, B. Summa, P-T. Bremer, A. Gyulassy, C. Christensen, S. Philip, and S. Kumar, “The ViSUS Visualization Framework,” in High Performance Visualization—Enabling Extreme-Scale Scientific Insight, editors Wes Bethel, Hank Childs, Charles Hansen, CRC Press 2012. 1.1.3 Refereed Conference Posters 1. C. Christensen, S. Liu, G. Scorzelli, P-T. Bremer, J. Lee, B. Summa, and V. Pascucci, “A Nonlinear Model for Interactive Data Analysis and Visualization and an Implementation Using Progressive Computation for Massive Remote Climate Data Ensembles,” in American Geophysical Union Fall Meeting, 2017. 7 2. B. Summa, C. Christensen, G. Scorzelli, J. Lee, A. Venkat, P-T. Bremer, and V. Pascucci, “Interactive Scripting for Analysis and Visualization of Arbitrarily Large, Disparately Located Climate Data Ensembles Using a Progressive Runtime Server,” in American Geophysical Union Fall Meeting, 2017. 3. C. Christensen, F. Federer, A. Gooch, S. Merlin, V. Pascucci, and A. Angelucci, “Large scale imaging and 3d visualization of long-range circuits in clarity-treated primate visual cortex,” in Society for Neuroscience, 2015. 1.1.4 Source Code 1. G. Scorzelli, C. Christensen, A. Gooch, B. Summa, S. Petruzza, A. Venkat, and V. Pascucci, The OpenVisus Framework, https://github.com/sci-visus/OpenVisus GitHub repository, 2010-2019. 2. C. Christensen, S. Petruzza, and A. Venkat, On-Demand Data Reordering, https://github.com/sci-visus/ondemand GitHub repository, 2014-2019. 3. S. Petruzza and C. Christensen, OpenVisus Data Portal, https://github.com/sci-visus/visus-dataportal GitHub repository, 2017-2019. 4. C. Christensen, G. Scorzelli, and S. Petruzza, OpenVisus Javascript Library https://github.com/sci-visus/OpenVisusJS GitHub repository, 2018-2019. 5. C. Christensen, OpenVisus Web Viewer, https://github.com/cchriste/webviewer GitHub repository, 2015-2018. 6. C. Christensen, G. Thiruvathukal, and J. Xiaoyong, ANL Dataflow Analysis, https://github.com/hpc-dataflows/dataflow GitHub repository, 2015-2019. CHAPTER 2 RELATED WORK This work proposes a novel model to describe and assess dataflow visualization and analysis systems for spatiotemporal data. In this chapter, we examine related work and discuss how it compares with our efforts. 2.1 General Integrated Visualization Environments To lower the access barriers for complex visualization techniques, integrated visualization systems, such as VisIt [?] and Paraview [?], have been introduced to allow domain scientists to easily visualize their datasets using different algorithms, such as isosurfaces, volume rendering, and streamlines. However, even though these integrated systems provide extensive visualization capabilities and customized scripting, it is necessary to manually specify data types and explicitly define the exact data structures that will be produced by the built-in scripts. Even simply combining data of different resolutions is nontrivial. Furthermore, these applications are not capable of displaying the incremental updates necessary to maintain interactivity, and therefore entail workflows that involve scripts and processes with many of the same characteristics as the offline workflow. Essentially, the exploratory analysis process suffers from high “latencies” in the sense that parameter modifications or other changes require potentially lengthy reevaluations. 2.2 Domain-Specific Visualization Systems Besides general integrated visualization environments, many systems focus on a specific domain such as the Ultra-scale Visualization Climate Data Analysis Tools [?] (UV-CDAT) and DataViewer 3D [?] (DV3D). By concentrating on a more specific application, these systems usually have fewer but more specialized capabilities. For example, UV-CDAT is designed for climate data visualization. By incorporating many standard analysis and visualization techniques for climate data, scientists have an easy-to-use tool that is adequate 9 for most visualization needs. However, for a modified workflow, scientists are often required to write customized code to fill in missing features in these domain-specific visualization systems. 2.3 Remote Data Access Scientific analysis tools such as VisIt and Paraview enable complex workflows but struggle with remote data, and setting up the workflows can be difficult. Local data analysis tools can benefit from protocols such as OPeNDAP [?] that provide local access to remote data, but these protocols are tied to the same limitations as the underlying fixed-resolution data formats they serve, and do not do anything to facilitate the hierarchical access needed to scale interactive systems to extremely large data sizes. 2.4 Workflow Management Systems Sophisticated systems exist for distributed workflow management, such as Pegasus [?] and Kepler [?], but they are developed largely for offline use. Robustness to failures, data provenance, workflow abstraction, and reliability are their key concerns, and their use is not amenable to the requirements of an interactive system. HPC asynchronous many task (AMT) dataflow frameworks, such as Legion [?] or Charm++ [?], provide generic runtimes across numerous platforms, and similarly to BabelFlow [?], the work presented here could be built on top of these in order to utilize them transparently on various HPC systems. Some workflows are designed for critical applications such as crisis management [?], but these workflows may fail to remain interactive as the size and distribution of data increases. 2.5 Domain-Specific Languages (DSLs) Languages such as Diderot [?] and ViSlang [?] are specialized DSLs designed for visualization and do not handle remote data. Our work is intended for data processing of possibly remote data often used for the analysis and comparison of scientific datasets, rather than focused purely on visualization-specific tasks. Other DSLs, such as Ebb [?] and Simit [?], are designed for physical simulation while abstracting execution environments to enable CPU, GPU, and parallel execution of common code. Others, such as Vivaldi [?], combine a specialized DSL for visualization with a mixed-execution model. Our DSL and associated runtime enable interactive exploration through progressive remote data access and interruptible analyses rather than reducing total computation time by utilizing such hybrid execution backends. The results of our processing nodes could be used as input 10 for visualization-specific DSLs such as Vivaldi or Diderot, enabling these languages to be used for the visualization of a wider range of local and remote data. Languages such as Ebb or Simit could be useful to perform more efficient server-side computation for which interruptibility may be less desirable than fast computation. 2.6 Runtime Loop Optimizations Portability and optimization of analysis programs is an issue that has been addressed with the use of directives such as provided by OpenACC [?] and OpenMP [?], cross compilers that create optimized versions of a program [?], [?], and wrappers to provide a specific specialized set of portable optimized functions. Thrust [?], RAJA [?], and Kokkos [?] provide vector libraries to manage multidimensional arrays with polymorphic layouts and map those operations to fast manycore implementations. Overall, these works focus on providing specific optimizations of existing code rather than enabling a simple semantic for scientists to express iterative computations. CHAPTER 3 FRAMEWORK FOR INTERACTIVE ANALYSIS AND VISUALIZATION OF MASSIVE AND REMOTE SPATIOTEMPORAL DATA ENSEMBLES As our ability to generate large and complex datasets grows, accessing and processing these massive data collections are increasingly the primary bottlenecks in scientific analysis. Challenges include retrieving, converting, resampling, and combining large and often disparately located data ensembles. Existing tools for these tasks do not typically support efficient handling of large, remote data. In particular, existing solutions rely predominantly on extensive data transfers or large-scale remote computing resources, both of which are inherently offline processes with long delays and substantial repercussions in the form of lengthy recomputation or additional data transfers when any portion of the computation must be repeated. Such workflows severely limit the flexible exploration and rapid evaluation of new hypotheses that are crucial to the scientific process and thereby impede scientific discovery. Furthermore, applications designed for crisis management and public safety often involve time-sensitive computations and require workflows that remain interactive even as the size and distribution of data increases. In this chapter, we address these challenges by presenting a framework that utilizes progressive algorithms and multiresolution data formats in recognition of the utility of intermediate or partial results for the realization of a genuinely interactive data analysis and visualization environment. The key to streamlining data access and aggregation lies in the ability to allow the user to focus on high-level logic while automating low-level data operations. To this end, we introduce an EDSL to hide such low-level complexity from the user and to allow the runtime sufficient flexibility in the evaluation of arbitrary user-created scripts. For this application, our primary aim is to address the challenges of interactive 12 analysis and visualization of massive, disparately located data ensembles. The fundamental tools, however, can be used in other contexts, such as for offline batch style processing. For data analysis workflows created using the proposed EDSL, the user can focus on work that is directly associated with the analysis, such as statistical operations and comparison, whereas details such as the source location, data transfer, file formats, and grid resolutions are automatically handled by the underlying runtime system. To speed up data processing, the system accesses and transfers the least amount of data possible for the given computation. The generality of the EDSL allows great flexibility in its interpretation, enabling a suitable runtime system to exploit task parallelism appropriate for large, dispersed data. Progressive algorithms are adopted in order to provide incremental computation results, and the order of these computations is modified if the resulting analyses can more quickly converge. Finally, the design of the runtime system utilizes a multiresolution storage scheme such that preliminary results can be obtained without significant delay, followed by progressive refinement for increased accuracy. In this chapter, we present a novel framework that can be utilized for a wide variety of interactive purposes. The key contributions include the following: 1) an EDSL built into JavaScript that provides a simple and abstract description of sophisticated analysis and visualization workflows; 2) the corresponding runtime system that executes a given workflow in an interruptible, progressive manner and enables dynamic selection of various computational parameters; and 3) an end-to-end pipeline for automatic conversion and caching that enables transparent multiresolution access to distributed datasets of different formats. 3.1 Workflow Transition The first step to facilitate interactivity is to transition the overall workflow from sequential to concurrent. To illustrate the difference between sequential and concurrent paradigms, let us first consider Fig. 3.1, which illustrates a typical pipeline for data acquisition, processing, and visualization. The process begins with data acquired from an instrument (in this case, a microscope). The data must be copied to a server and then converted to a common format to be shared with downstream users. Note that each stage of the process must be completed prior to beginning the next (e.g., an acquisition is completed prior to beginning the copy; data are completely downloaded before beginning analysis; and preprocessing to build an acceleration structure must be completed before interactive visualization). This 13 Fig. 3.1 Typical sequential pipeline for acquisition, analysis, and visualization. Each step requires the completion of all previous items, a linear process, the interactivity of which depends the size and location of data, complexity of analyses, and availability of computational resources. sequential progression of data through the entire pipeline is not conducive to interactive use. However, if the work were performed in small, incremental pieces (i.e., streaming or pipelining), some results could be provided quickly because each step of the pipeline would be executed with some degree of concurrency with data buffering between stages. While a portion of the final result is made available to the user, the rest of the pipeline could be processing successive data to quickly refine those results. Even if a large amount of input data is available, the user could start to get results even after processing only a small portion of the data. Performing the work in this incremental fashion also enables interruptions to the pipeline that could facilitate interactive exploration. In summary, we will utilize two primary methods to create worksflows to facilitate generic user-driven analyses of massive, remote datasets: 1) an efficient multiresolution data representation that enables fast access to subsets of varying resolutions and automatic resampling to a common grid; and 2) the ability for the user to identify points in the execution of incremental algorithms at which partial results can be “published” to downstream nodes. Throughout this chapter, we will be working to transform the following simple but unscalable workflow for spatiotemporal data analysis and visualization: 1) acquisition: download all the data; 2) processing: perform analyses on that data; and 3) visualization: view the results of the analyses. Fig. 3.1 shows the simplest version of this analysis and visualization dataflow, which consists of three primary steps. First, the data to be analyzed 14 are downloaded from their original sources for access by local computational resources. Since datasets commonly range up to several terabytes in size, this step might require significant time to complete. Next, the desired analyses are performed using serial batch processing. Each analysis depends on the size of the data, and any downstream analyses or visualization must wait for the completion of preceding analyses before they can begin. Finally, the results are visualized using a variety of means depending on the size of the resultant data. Depending on the input data size and available network and computational resources, the dataflow described here could be an offline process with significant delays. Such analyses are costly and time-consuming to perform, and therefore scientists will be very careful to select only the most pertinent computations rather than experimenting with more creative choices. One strategy often used to cope with such costly postprocessing is to simply precompute certain analyses during the simulation itself, but this scheme increases the simulation time and output size while still suffering from an overly conservative selection of analyses. In addition, mistakes or errors that occur during any step of these workflows are inherently costly due to having to repeat a potentially large portion of the work. These types of errors could range from data integrity issues such as failed downloads to mistakes in the analysis scripts. In the design of this interactive framework, we aim to address these challenges by enabling lightweight exploration of a broader selection of analyses through the use of multiresolution techniques and incremental results. The trade-offs required can be described in terms of accuracy and completeness, the errors from which we will both characterize and reduce through judicious input selection. Rather than limit analyses to a set of predefined processes, our system allows scientists to perform any sort of computation by accepting generic scripts that might reference both local and remote data, and can be interpreted in such a way that cursory results are produced in interactive timeframes. The products of these exploratory analyses can be used to create more formal comprehensive analyses, or even used directly for time-critical applications. The motivation for this process is that the performance of existing applications does not scale as data sizes and computational costs increase. We will concentrate on the 3.5 petabyte NASA GEOS-5 “Nature Run” simulation as a target use case that quite clearly does not scale to most systems, and is practically inaccessible to most users who do not happen to have petabytes of scratch space to store downloaded data. 15 3.2 Building Blocks We will utilize the following three techniques in our framework to facilitate interactive, incremental analysis and visualization of massive, remote datasets. 1) Download only the portion of data used for the desired computation. 2) Compute results incrementally while more data are received. 3) Utilize a multiresolution format to enable fast computation of preliminary results. By receiving only the subregion of data being used for a given computation, its results can be presented more quickly. Each of the framework’s components thus requires the ability to request and receive the desired portion of the dataset. Incremental results are produced using only a subset of the data required for a given computation, but can provide the user with a preliminary version of the final analysis. The effectiveness of these incremental results depends on the order in which the data are received since different orders can enable faster convergence. Finally, as illustrated in Fig. 3.2, by utilizing a multiresolution format, data can be loaded and visualized at coarse-resolution, and then successively refined as more data are streamed into the pipeline, facilitatating fast cursory computations regardless of the full size of the data being requested. Each of these techniques relies on the same underlying foundation: utilization of a streaming, multiresolution data format that can quickly provide subregions of the stored data at varying resolutions in an incremental fashion. As a result, we build the proposed workflow on top of an existing multiresolution data format and its associated framework. Multiresolution data formats range from simple octrees to more complex or distributed schemes such as [?], [?]. We elected to utilize the IDX format, an efficient multiresolution data reordering based on the hierarchical Morton Z-order space-filling curve [?]. This reordering enables localized queries to be optimized using the Morton data order, the layout of which naturally favors queries of rectilinear subregions, and also facilitates rapid reading of lower resolution levels of the data, as shown in Fig. 3.3. We chose IDX for this manner of data access in part due to our familiarity with the Visus Framework [?], [?], which is built on IDX to enable streaming access to arbitrarily highresolution imagery. However, the proposed EDSL, runtime, and on-demand data conversion systems presented in this work are logically separated from the underlying multiresolution data format used by the framework. Therefore, the data format could be replaced by one of these other multiresolution approaches, and the work would still retain most of the benefits provided by utilizing the IDX format. 16 Fig. 3.2 Illustration of multiresolution data loading compared to loading from a “flat” row-major format. Using a multiresolution data format, coarse-resolution data can be loaded in much less time, providing quick preliminary results. Fig. 3.3 Data layout obtained for a 2D matrix. The 1D array at the top of the diagram represents the disk distribution of the data. Each consecutive block in the 1D array corresponds to data of progressively finer resolution in the 2D matrix (from [?], used with permission). 17 Additional tools already make use of IDX, including the extremely efficient parallel PIDX library [?]. For this work, we extend the Visus Framework with the necessary modifications to facilitate incremental computation of disparately located spatiotemporal datasets. In addition, we provide an on-demand conversion utility so that datasets not already in the IDX format can be converted upon request to this convenient multiresolution representation. Details of the multiresolution and data reordering algorithms are outside the scope of this work, and readers are encouraged to explore the references above for more information. Similar to other integrated visualization applications (e.g., VisIt or Paraview), the Visus Framework also includes a set of common visualization algorithms, such as volume rendering and isosurface extraction. The framework is multithreaded and implements a message-based dataflow pipeline using a directed acyclic graph, such that messages can be “published” by a given node to connected nodes. The multithreaded implementation enables visualization and computation tasks to be carried out simultaneously. 3.3 Overall System The overall system is illustrated in Fig. 3.4. This concurrent pipeline works as follows: An EDSL script is executed incrementally on the visualization client. When data are needed by the script, the client requests them from the multiresolution server, which first checks its local cache and, if found, immediately fulfills the request. If cached data are not found, the server requests the on-demand data reordering service to produce a multiresolution version of the data, which is cached and sent to the client. The visualization client produces results incrementally as they are computed. What differentiates the proposed framework from existing progressive visualization techniques is the ability to utilize an EDSL to specify data analysis workflows that hide the complexity of combining multiple input sources and spatial resolutions, and an interruptible script processing engine that facilitates progressive computation. Such a design, illustrated in Fig. 3.5, provides the user the expressive power to write custom, reusable analysis workflows suitable for rapid data exploration. The EDSL is designed to permit the types of interpretation necessary for an interactive workflow without compromising expressiveness or accuracy, and the runtime system and scripting engine enable interactive execution of these scripts. 18 Fig. 3.4 The system pipeline for our interactive analysis and visualization framework, which exploits progressive computation and seamless local or remote execution of EDSL scripts to provide a highly flexible platform for the exploration of large-scale, disparately located data. Note the ability to utilize the EDSL to specify data analyses from both the client and the server. Fig. 3.5 The execution model used by the runtime system for the interactive execution of EDSL scripts that continuously process data requests, publish incremental results, and respond immediately to user input. 19 3.4 Data Processing EDSL Our goal is to provide a simple and abstract language for describing rich data processing tasks that relieves users from having to deal with mundane tasks such as data import and resampling (also called “regridding”) and allows for incremental execution suitable for an interactive environment. We assert that necessary modifications to the host language can be limited to three aspects, discussed in the following sections, which are sufficient to facilitate interactive evaluation of generic data processing scripts: 1) a new built-in data type that abstracts the common modalities of scientific data (e.g., scalar or vector field data) and can be used directly as a first-class citizen of the language without regard to format, resolution, or location of the underlying data; 2) a hinting mechanism to facilitate incremental production of the results of ongoing computations (i.e., long-running scripts) by indicating to the runtime system appropriate opportunities at which the current state of the computation can be shown; and 3) a generic multidimensional iterator for loops that can be performed in any order (e.g., for computing an average), permits nonlinear evaluation of the loop body by the runtime system such that incremental results potentially converge faster toward the final result, and allows for parallelization of these loops. In the remainder of this section, we will explain each language addition in detail, beginning with a simple example script that illustrates their use. 3.4.1 Example Script Listing 3.1 shows an example of a basic incremental computation using the proposed EDSL. The script makes use of Welford’s method [?], [?] to compute a monthly average of hourly temporal climate data. Listing 3.1 EDSL script for incremental computation of a temporal average, the variable named output, using hourly data from the 7km GEOS-5 Nature Run simulation. Notice the ability to succinctly express a significant operation without explicitly addressing input format, resolution, dimension, or output type. // // Computes running a v e r a g e u s i n g // f i e l d = ’TOTSCATAU ’ ; s t a r t = query time ; width = 7 2 0 ; Welford ’ s method // a e r o s o l s c a t t e r i n g // c u r r e n t time // 720 h o u r s (30 days ) 20 output=Array . New ( ) ; var i =0; u n o r d e r e d ( t , [ s t a r t , s t a r t+width ] ) { f=i n p u t [ f i e l d +” ? time=”+t ] ; // i n i t i a l i z e o u t p u t // c u r r e n t co u nt // 1d i t e r a t o r , i n d e x t // read f i e l d a t time t // c r i t i c a l s e c t i o n f o r running a v e r a g e : // a v e r a g e and c ou nt must be u p d a t e d a t o m i c a l l y {{ output += ( f −output ) / ( i +1); // u p d a t e running avg i ++; // i n c r e m e n t c o un t }} doPublish ( ) ; // o u t p u t c u r r e n t r e s u l t } The script is able to tersely express a significant operation without the user needing to explicitly specify input formats, data resolution, or output type. Notice the use of the overloaded arithmetic operators +, -, += in the statement output += (f-output) / (i+1). For this expression, output and f are members of our new abstract data type, described next, representing the current state of the incremental average computation and the field at the current timestep of the iteration, respectively. The unordered loop could be interpreted just like a normal for loop, but using this facility enables the underlying runtime system to utilize other execution methods, such as parallel execution or varying loop orderings, as described in Section 3.5. The double brackets surrounding the lines that update the current output (the temporal average) and the running count protect these two statements by designating a critical section, ensuring the runtime is able to perform correct parallel execution by atomically updating these values. The call to doPublish allows for an incremental display of the result. For comparison, a similar computation in the VisIt expressions Python-based EDSL would require creating a specific class template structure in which the user must explicitly define output type and dimensions and manually create the VTK arrays to be computed by the script. The VisIt EDSL contains a specific function for computing temporal averages, average over time, but this type of specialization does not facilitate progressive production of in-progress results that are a focus of the proposed system in order to provide rapid preliminary visualization. Please refer to the appendices of this work for a comprehensive specification of the EDSL. 21 3.4.2 Abstract Data Type An abstract data type is necessary in order to enable spatiotemporal data manipulation using a uniform and generic interface without regard to format, resolution, or location. The use of this type avoids embedding details in the data processing scripts concerning the management of the underlying data. The runtime system will handle data loading, resampling, and conversion to a common format. We chose to make this a built-in type of the EDSL to enable features such as operator overloading that otherwise might not be feasible in the host language. The specific methods provided by our EDSL include statistical summary operations such as mean and variance, multifield operations that perform element-wise amalgamation such as average and maximum, and operations such as convolve that involve some degree of global dataset-wide access. A complete listing of the abstract data type methods is provided in Appendix A. Operator overloading is provided to enable natural expression of element-wise operations between fields of the new data type or with scalars. These methods are sufficient for constructing arbitrarily sophisticated scripts for the computation of temporal averages, rank correlations, image segmentations, maximum intensity projections, and other types of output used in scientific data analyses. 3.4.3 Explicit Data Publishing Hints Streaming algorithms provide incremental results based on incoming data that represent the best possible computation for the currently available input. These snapshots of ongoing computations present the user with an approximation of the final results of long-running operations, enabling errors to be caught and addressed much sooner. Feedback is particularly desirable for users of an interactive system, but for script-driven analysis the best times to show these incremental results are not always apparent. Attempting automatic determination could result in showing incorrect or undesirable results, such as when a script utilizes an output variable as a temporary. In order to show progressive results for streaming computations while avoiding output at the wrong time, we introduce the doPublish primitive operation, which indicates appropriate times for the scripting engine to send the current computation results to the rest of the workflow. The current results are stored in the variable named output, as shown in Listing 3.1. Using this primitive enables the corresponding workflows to be progressive with partial results being computed and updated continuously. The doPublish primitive has no effect on the computation itself, and can be safely ignored, enabling the runtime system to refresh output presented to the user at intervals suitable to maintain interactivity. 22 3.4.4 Generalized Multidimensional Iterators To complement the progressive asynchronous updates enabled by doPublish, we introduce an iterator for order-independent loops called unordered. This generalized facility allows for a variety of beneficial execution methods to be utilized by the runtime system, and provides for an expression of multidimensional loops that is both elegant and flexible. As we shall see in the examples later in this work, astute selection of the order in which input data are presented to an online algorithm [?] can lead to results that converge much more quickly toward a final result. Given the constraint that users must make decisions without complete information when utilizing these incremental computations, such improved convergence is highly desirable. The unordered primitive accepts as parameters the name of the variable to be used as an index inside the loop and the extents of the loop iterator. Loop indices are considered constant within the body of the computation. The result of the loop should be the same regardless of the order of execution (except for floating point differences that would be expected to occur anyway), and it is considered a bug for the user to construct an unordered loop body that depends on some particular order of execution. In addition to parallelization, other useful interpretations of unordered loops are described in detail in Section 3.5. The proposed EDSL described in this section is currently built on top of JavaScript, which is extended with these carefully chosen primitives and a new built-in data type for scientific data. This new EDSL allows users to express common workflows in an abstract manner, suitable for interactive execution. In the next section, we introduce a runtime scripting engine designed for progressive, interactive execution of these EDSL scripts for computations over arbitrarily large, disparately located datasets. 3.5 Progressive Runtime System Now we present the complementary runtime system for analyses written using this EDSL. It incorporates an interruptible script processing engine to evaluate these scripts in an interactive manner by enabling tuning of any necessary parameters in order for computations to be performed quickly and incrementally. Through the genericity of the EDSL, the runtime system can also enable direct and transparent transition from a local execution to a distributed workflow, including server-side execution and caching. To demonstrate the scripting engine presented here, we wrote our own JavaScript interpreter, used by the engine to directly execute scripts without any compilation to byte code or significant optimization. Type-checking is enabled at runtime using exceptions, which 23 display the problematic line of the script and a detailed error message to the user to enable debugging. The presented runtime system utilizes the techniques discussed earlier, such as multiresolution streaming and low-discrepancy sampling, to produce progressive results from input data as they are read. This facilitates achieving the goals of the overall framework by minimizing the trade-offs between accuracy and speed while continuously providing useful results during interactive data exploration. The following subsections describe the features of the runtime system that enable practical data exploration through interactive interpretation of EDSL scripts, including implementation of the built-in scientific data type, design of the progressive scripting engine, and making effective use of order-independent loops, parallelization, and distributed processing. 3.5.1 Built-in Multidimensional Scalar Class Because datasets can range up to petabytes in size and are located across many institutions worldwide, it is imperative that data transfer costs remain minimal. Therefore, an interactive workflow might rely more heavily on server-side processing, and request the transfer of only minimal subsets of the data necessary for a given computation. Yet this restriction of data means the user must sacrifice comprehensiveness by requesting subregions of a spatial domain, or spend tremendous time waiting for full-resolution data to be transferred. In order to provide the best of both efficient data transfer and high-resolution computation, we further utilize a multiresolution data layout that enables reading of coarse-resolution versions of the full spatial domain and efficient refinement of any desired subregion to a finer resolution. Although there remains a sacrifice in accuracy by performing a computation using coarse-resolution data, it provides a cursory estimate of the result that facilitates more interactive exploration of data and analyses. Moreover, for large, full-resolution data, this type of computation is impossible to perform quickly when using full-resolution row- or column-major data layouts (even if the data are further divided into smaller blocks). The runtime scripting engine utilizes a fast C++ multidimensional Array type to provide efficient implementations of the operations defined for the EDSL built-in data type, similar to the ndarray class of the numpy package for Python [?]. Since datasets can be manipulated without regard to their location, the runtime system uses additional metadata associated with a script to map its inputs to their corresponding local or remote data locations. Data read from remote locations may be automatically cached on the local system, and the results of a given script execution can be cached as well, allowing comparison of new results with previously computed data. Finally, the EDSL specifies element-wise operations that can be performed independent of the resolution of the operands. Variables of this type must be 24 implicitly resampled to the same resolution to be combined or compared. By default, the scripting engine uses upsampling to the largest resolutions in each dimension of the given operands and linear interpolation for resampling. These methods, however, can be changed by the user at any time without modification of the original script. Resampling data in order to perform computations among different models is a serious impediment for scientists, and we present a powerful case study in Chapter 5 that demonstrates our framework’s ability to effortlessly facilitate comparison of multiple climate ensembles of differing resolution. 3.5.2 Multiresolution Streaming As described in Section 3.2, as a building block for interactivity, our runtime system reads and caches input data using a lossless multiresolution format that provides efficient coarse-to-fine data loading and much faster access to local regions of interest compared to traditional row- or column-major order data [?]. In order to provide transparent access to multiresolution data from other data formats, an on-demand reordering facility is presented in Section 3.6. Multiresolution data can be used to provide fast, cursory computations by displaying the result of an initial coarse-resolution execution while refining it to provide more details when they are needed. The results of computation using coarse-resolution data can also be surprisingly accurate, as demonstrated in the computational fluid dynamics case study in Chapter 5. 3.5.3 Incremental Results When the runtime script interpreter encounters doPublish in a script, it can produce, or “publish,” the current state of an ongoing computation to provide the user with important and timely feedback. Such a call can be safely ignored by the downstream visualization without adverse effects to the computation, enabling results to be displayed at suitable intervals to maintain interactivity. The scripting engine implements doPublish as an asynchronous callback that creates a copy of the current computation output to be displayed by the visualization client. If a previously published result has not yet been displayed by the visualization system, that result is simply replaced with the new output, ensuring smooth performance in the rest of the workflow while allowing script execution to continue uninterrupted. 3.5.4 Loop Order and Parallelization Iterative calculations over a sequence often end with the same result regardless of the order in which the sequence is read, and with an alternate ordering, the incremental results 25 of a calculation may converge more quickly. Furthermore, parallelization of such loops can facilitate faster evaluation. The EDSL presented in Section 3.4 introduced the unordered primitive to allow for declaration of order-independent, multidimensional loops. Because parallel execution may require more delicacy in the implementation of an analysis script, the double bracket {{ ... }} notation was incorporated into the EDSL to denote a critical section. Code within these sections is executed atomically with respect to the other threads, ensuring correct results without unduly complicating the scripts. Note the use of double brackets in the monthly average computation in Listing 3.1: the running average is computed locally, and the current result and its count are set within a critical section in order to ensure nothing is overwritten by another thread during that part of the computation. The number of threads used to execute such analyses is selected by the runtime, so they can be scaled to the available system resources without modification. Parallel loop execution is implemented in the scripting engine using a thread pool for each unordered loop and assigning the work of one iteration to each thread, with a shared context of global variables and a thread-local context for variables introduced in the iteration block. Critical sections are facilitated by using a shared lock per loop. This strategy enables nested unordered loops, but one should beware of the potential explosion of tasks and consider rewriting the loop to instead utilize a multidimensional version of unordered. As shown in Fig. 3.6, parallelizing the execution of order-independent loops can provide a modest speed-up for even relatively naive algorithms. For unordered loops, any evaluation order can be selected, allowing the runtime to maximize the use of cached data or to choose a sequence that could enable faster-converging results, such as the low-discrepancy sequence introduced by van der Corput [?]. The desirable qualities of a low-discrepancy ordering are uniformity and incrementality, such that samples are evenly distributed over the given range, and therefore equal coverage will have been achieved if the processing is terminated mid-sequence [?]. Consider again the script from Listing 3.1. It could simply use a for loop, but since the final result of the computation does not depend on the order in which the loop progresses, we can choose an ordering for which the results converge significantly faster. Fig. 3.7 illustrates the difference between using a linear ordering and a low-discrepancy sequence as the computation proceeds. For higher dimensional iterators, another low-discrepancy ordering, called the Halton sequence [?], can be used to achieve similarly superior convergence. Another example, shown in Listing 3.2, computes the maximum intensity projection (MIP) along a given axis in a 3D volume by accumulating the largest intensity value of 26 Fig. 3.6 Comparison of parallel unordered loop execution for increasing thread counts for two algorithms: maximum intensity projection and zonal rank correlation. Dashed lines indicate perfect scaling. Tests conducted on a 16-core Intel Xeon E7-8890 v3 @ 2.50GHz running openSUSE 13.1 using locally cached data. Fig. 3.7 Results of a temporal average computation (Listing 3.1) via two orderings for the inner loop. The error (plotted as RMSE) between the precomputed result and the incremental result decreases quickly when utilizing the low-discrepancy van der Corput sequence of timesteps versus a simple linear sequence. The results shown are for the average of total aerosol scattering for the period February through March, 2007, using 1-hour data intervals from the 7-km Ganymed Nature Run simulation. 27 each voxel and presenting a 2D image of the result. This technique is often utilized by neuroscientists to aid their study of the connectivity of imaged neurons within a highresolution microscopy volume of the cortical tissue of the brain. The unordered loop specified in this script enables the runtime to select the most appropriate order to facilitate rapid convergence of incremental results of the compuation. Fig. 3.8 shows the first 10% of the computation of this projection, along with the final result, using both linear and lowdiscrepancy orderings. Note how much closer the results of the incremental computation are to the final result when using the low-discrepancy sequence. This enables faster, more dynamic comprehension of interactively-selected regions of interest within these massive microscopy volumes. Listing 3.2 Parallel, incremental computation of a maximum intensity projection for a 3D volume. num threads is a built-in variable. // // Maximum i n t e n s i t y p r o j e c t i o n o f m i c r o s c o p y volume // n e u r o n s=i n p u t . volume . n e u r o n s ; output=V i s u s . Array . New ( ) ; // Computations p a r t i t i o n e d i n t o s u b v o l u m e s a l o n g t h e z a x i s . width=n e u r o n s . dims [ 2 ] / num threads ; u n o r d e r e d (P , [ 0 , num threads ] ) { var I=P [ 0 ] ; f o r ( var i =0; i <width ; i ++) { var s l i c e =V i s u s . Array . c r o p ( neurons , [ [ 0 , 0 , width ∗ I+i ] , [ n e u r o n s . dims [ 0 ] − 1 , f 0 . dims [ 1 ] − 1 , width ∗ I+i ] ] ) ; outputP=V i s u s . Array . max ( [ outputP , s l i c e ] ) ; } {{ output=V i s u s . Array . max ( [ output , outputP ] ) ; }} doPublish ( ) ; } 3.5.5 Hybrid Client- and Server-Side Processing The multiresolution data server contains an identical version of the scripting engine used by the visualization client. Server-side processing can be utilized to perform computations using remote resources and thereby reduce data transmission. For example, when combining many ensemble members into a single average, the amount of data to be sent to the client (b) (c) Fig. 3.8 Result of 100 iterations (of 1000 total) for calculating maximum intensity projection (MIP) along the Z axis of a 2-photon neuronal microscopy volume. Each iteration adds a 2D slice. (a) Linear order. (b) Low-discrepancy order. (c) Final MIP. (Data courtesy Angelucci Lab, University of Utah) (a) 28 29 can be dramatically reduced by first combining the inputs on the server and then sending only the result to the client. On the other hand, if server-side resources are scarce or in high demand, it may be more efficient to transmit data directly to the client, perhaps at lower resolution to reduce network bandwidth. The runtime system specifies whether or not to perform a computation remotely without requiring any modification to the input script, enabling a single script to be executed on either the client or the server. Multiple scripts can be incorporated within larger dataflows mixed between both clients and servers for processing. The location in which to execute a computation is currently specified by the user on a per-script basis, but future work will aim to address automatic selection based on available resources. For the implementation of the runtime system, we extended the Visus Framework mentioned in Section 3.2 to include a new scripting engine that enables execution of generic EDSL scripts in a manner that is both progressive and accurate by making effective use of multiresolution data, asynchronous output, flexible iterator orderings, remote computation resources, and parallelization. A novel data ingest module was also added to automatically resample the various input datasets specified in the script to a common domain during I/O. High-level support was added to the UI for the selection of the various runtime parameters, such as the default order used for multidimensional iterators. 3.6 On-Demand Data Reordering Our framework utilizes a multiresolution data format to enable processing and visualization of data in a coarse-to-fine fashion, providing the user with very fast access to preliminary results, which can then be progressively refined. Although some simulation frameworks have adopted multiresolution formats as their default output [?], many existing datasets are not stored in this fashion and must be converted prior to being used in a streaming fashion. For our implementation, we selected the hierarchical IDX data format, which is a trade-off between efficient multiresolution access and fast data creation. It enables a wide variety of applications and at the same time provides for streamlined production in large parallel computation environments [?], [?]. In order to use our framework with other data formats, in this chapter we present a data reordering service that converts requested data on the fly to the multiresolution format utilized by our runtime system. This service operates transparently to the client, enabling access to data from other formats without requiring explicit preprocessing. Fig. 3.9 shows an overview of the on-demand conversion module in which some particular 30 Fig. 3.9 Data server with on-demand conversion. Data movement is shown with thick arrows, requests with thin arrows. When data are requested (a), the data server first checks the cache (i), and if not cached the requested data are converted on the fly (ii) and sent to the client (b). dataset is converted on the fly if it is not already in the streaming multiresolution format. Data requests from the client to the server, respectively labeled “client viewer” and “IDX data server” in this figure, work as follows. When a request for data is made by the client application to the data server along the path labeled a, the server first checks its cache, labeled “reordered data,” for the requested data, a query along the path labeled i, and if found, those data are read by the server, and the request is fulfilled by sending the data to the client via b. If the data have not already been reordered and cached, the server next makes a call to the on-demand service, labeled “converter,” along path ii. The converter service reads the full-resolution data and writes the multiresolution version to the reordered data cache, after which this lossless reordering of the original data can now be read by the server and sent to the requester via b. Since they were written to the cache by the converter, the multiresolution data are now available for future requests by other potential users. In our implementation, the cache size is a global configuration option and maintained by periodically removing least recently used data when the size grows beyond the specified maximum level. Our on-demand data converter is designed to provide streaming hierarchical versions of data volumes stored in flat formats, such as NetCDF. It operates in a user-directed manner such that specific fields at a given timestep are converted just in time, upon request, without needing to convert an entire dataset. Data reordering is a computationally light task, and the time required to convert a given volume is dominated by the time to read the original data and to write the reordered version. Since initial conversions are cached, the cost of the conversion is also amortizing across future requests. 31 Once data are reordered into the IDX data format, they can be streamed in a coarseto-fine fashion to users with IDX-compatible clients. The Visus IDX Data Server [?] is an Apache plugin that responds to requests for given regions of data at a specified resolution level. The client connection to the server is stateless: individual http requests contain all the necessary information to describe the dataset and desired region of interest to be retrieved. As mentioned in Chapter 1, the on-demand data reordering module is an independent component capable of being utilized as a part of other applications as well. Once a dataset has been converted, it can be opened from any IDX-compatible client. This section describes the on-demand module in detail, beginning with the motivation of data reordering as an efficient mechanism to facilitate coarse-to-fine streaming access, and continuing with pertinent implementation details and an overview of its use within our overall application. 3.6.1 Data Reordering As implied by its name, data reordering simply changes the layout of data on disk (or in memory) to efficiently facilitate operations such as downsampling and subregion queries. Thus, the size of the data on disk remains similar to the original data. Unlike a traditional multiresolution image pyramid, which creates a downsampled copy of the data at each resolution level, IDX reorders the original data such that individual samples are not repeated and the coarse resolution levels come first on disk, making them very fast to load. Data reordering facilitates more rapid, dynamic analysis and visualization by enabling coarseto-fine resolution loading of view-dependent regions of large multidimensional datasets. Reordering allows the size of data reads to remain constant while providing fast access to coarse-resolution levels and subregions of interest. Reordering can be done on the fly, or during data generation. The finest resolution is identical to the original data, so there is no loss of data fidelity. Furthermore, coarse-resolution levels support both spatial and temporal filtering so certain guarantees can be added, such as max/min/avg for a given coarse-resolution sample. The mechanism of data reordering utilized by the IDX format is shown in Fig. 3.10. 3.6.2 Integration With OpenVisus Data Server The implementation of the on-demand service uses Visus’ ability to cache retrieved data, so that as data are received the first time, they are saved in a local cache, accelerating future reads. This caching is implemented by describing the dataset as a multiplex of data access sources that are tried in order. The multiplex returns data from the first access that (e) IDX level 4 (b) IDX level 1 (f ) IDX level 5 (c) IDX level 2 Fig. 3.10 The mechanism of data reordering utilized by the IDX format. (a) Traditional multiresolution image pyramid, which duplicates the image at each resolution level. (b), (c), (d), (e), and (f) The procession of resolution levels as they are stored in a single multiresolution IDX file, with the 2D image at the top and its layout on disk at the bottom. (d) IDX level 3 (a) Traditional multiresolution 32 33 succeeds. The first access of the multiplex checks for the requested data in a local cache, and if that fails, the second access requests data from the on-demand data converter, which converts the requested data at the desired region and resolution level. However, the second access always simply returns a failed status, so the third access, identical to the first, is then requested, but this time there are data to be returned. A Visus data server multiplex access is a general-purpose mechanism that can be used to specify any arbitrary hierarchy of data access types and locations. In order to designate a dataset that will be converted or computed on demand, a bookmark is added to the data server that specifies the multiplex to be used when this dataset is requested. This multiplex includes the local cache location, the method to convert the requested data on demand, and any additional static parameters required for the conversion. An example bookmark entry for the Visus data server can be found in Listing 3.3. Listing 3.3 Example bookmark entry in the Visus data server to specify a multiplex of the cache location and on-demand path for a given dataset. <d a t a s e t name=” DatasetName ” p e r m i s s i o n s=” p u b l i c ”> <a c c e s s type=” m u l t i p l e x ”> <a c c e s s type=” d i s k ” chmod=” r ” u r l=” f i l e : / / / SSD/ D a t a s e t . i d x ” /> <a c c e s s type=” d i s k ” chmod=”rw” u r l=” f i l e : / / /HDD/ D a t a s e t . i d x ” /> <a c c e s s type=” network ” u r l=” h t t p : / / $ ( s e r v e r A ) / D a t a s e t ” /> <a c c e s s type=” network ” u r l=” h t t p : / / $ ( s e r v e r B ) / D a t a s e t ” /> </ a c c e s s > </d a t a s e t > Reordered data facilitate interactive analysis and visualization that would, in many cases, be impossible if the data remained in their original format. We demonstrate the specialization of this transparent, on-demand data reordering service in order to provide interactive access to large, remote climate data ensembles for interactive analysis in Chapter 5. 3.7 Summary In this chapter, we described the design and implementation of a framework to facilitate interactive processing and visualization of large, disparately located data ensembles. Every aspect of these workflows can vary dramatically, such as data size and location, transfer speed, available computational resources, and complexity of the analyses. Our framework utilizes a multiresolution, streaming data format, and an EDSL executed by a suitable runtime system to facilitate incremental production and refinement of results. In addition, 34 we created an on-demand data conversion system to enable the framework to be used with datasets not already stored in the streaming, multiresolution format. The EDSL provides a means for scientists to specify the points in their computations at which incremental results can be produced, and the runtime uses these to safely publish those results as they become available during execution. The arbitrarily ordered loops enable analyses to be performed in the most effective order possible such that incremental results converge most efficiently to the final result. Critical sections are used to ensure overlapping computations can be performed without incurring any errors. These additions were embedded into JavaScript, but can be added to other languages without much difficulty, and they are currently being incorporated into Python for the latest version of this framework. To enable interactive evaluation of a given workflow, the associated runtime utilizes the multiresolution data format to first perform the work at a coarse-reoslution and then to progressively refine the results, allowing users the ability to see the initial results of their calculations quickly, enabling them to interactively specify the location in which to further investigate the data being explored. A variety of decisions are made automatically by the runtime, such as where to get the data, what resolution to retrieve, and which computational resources to utilize, enabling users to focus on the high-level aspects of their work rather than being slowed by forcing them to manually address these issues. The on-demand conversion system can be specialized for use with a variety of different data formats, enabling streaming access to both local and remote nonstreaming data. Its use for climate simulation ensembles will be demonstrated in Chapter 5. Proper use of caching by the system can help facilitate convenient, interactive access to various types of scientific data, enabling users from fields with many different data types to utilize the framework presented in this work for interactive access and processing of their data. CHAPTER 4 PROPOSED MODEL FOR INTERACTIVE LARGE DATA WORKFLOWS The considerations and challenges we encountered during the design and implementation of the interactive workflow for analysis and visualization of large, remote datasets presented in Chapter 3 highlight the lack of a generic model that could be used to construct and compare such systems. To assist in their design and facilitate comparison with other applications, we next propose a generic model to describe and assess end-to-end data processing frameworks from data inception through analysis and visualization. Our aim is to identify the components and considerations necessary to encapsulate a workflow into a generic model so that it can be used to identify bottlenecks and throughput for a given dataflow as data increase in size and become more disparately located. The model described in this chapter provides a comprehensive, high-level description of the requirements for data processing workflows. We hope it will help to facilitate the future development and comparison of comprehensive frameworks designed for interactive analysis and visualization of arbitrarily large, disparately located spatiotemporal data ensembles. It can also be used to assess the value of integrating these new systems with existing pipelines with the goal of enabling more methodical transitions to interactive workflows. In order to keep the overall description of this model tractable, we break interactive workflows down into three component tasks: acquisition, processing, and visualization. Depending on the dataflow, these tasks may be performed simultaneously or serially. For certain applications (e.g., visualization), it may seem useful to limit design considerations to only a subset of the operations of a given workflow. In our model, however, we strive to include every part of the process, enabling comprehensive evaluation of the potential delays of each member of the entire set of operations (e.g., preprocessing) when considering the overall performance of the workflow. 36 4.1 Overview of Workflows for Creation, Analysis, and Visualization of Structured Spatiotemporal Data Our focus for this model is structured, spatiotemporal data that exist on a multidimensional grid, or data that can be converted to such a format. This modality is common in a variety of fields, ranging from climate analysis to combustion simulation to microscopy volumes. Furthermore, the trend for this type of data is toward larger volumes that are increasingly costly to move and store. Such data often reside on a central server, and scientists will perform their analyses remotely, or download a portion of the data to be utilized locally. We must first determine the common components of large data analysis and visualization pipelines for these data. Although each stage of the pipeline entails trade-offs in terms of generality and speed, our goal in presenting such an overview is to provide a means to objectively reason about each aspect of the process with regard to its placement within the overall workflow. This abstract data management pipeline can be used to identify and measure trade-offs in terms of quality and interactivity. The following description includes all portions of the process from data creation through final visualization. Although the framework we presented earlier does not explicitly touch on every stage of this process, presenting all of the steps here provides the structural context required to enable systematic reasoning in order to also design and assess other types of workflows. It is understood that the time required for each operation can vary dramatically. Therefore, the designers of an application must carefully balance the temporal budget in order to meet the performance requirements of their particular application. The demonstration presented later in this work identifies several opportunities to balance the accuracy and performance of these tasks in order to maintain interactivity when datasets are extremely large and stored in remote locations. We present these operations in a fairly typical order, but this order may vary somewhat for a given pipeline. For example, preprocessing could be performed on the server before the data are downloaded to the client, rather than after the download as shown below. We begin at the beginning, with data creation. • Creation. Complex simulations may be performed on massive supercomputers, requiring hundreds of millions of hours of computation to produce desired results. These systems are costly to operate, and efficient utilization is of paramount importance. Therefore, data are often organized specifically for the parallel system, whereas the analysis and visualization of the data is often performed later using more modest computational resources. The first step required to perform these downstream tasks 37 is to transfer the data to the analysis system. • Preparation. Analysis and visualization can take a variety of forms, and they often require data to be presented in a particular format considered most suitable to the application. For example, large spatial data may need to be thresholded or organized into a hierarchical structure prior to interactive visualization [?]. Conversely, small data blocks most suitable as output from a parallel computing system may need to be combined for more efficient ingest by other systems. For some analyses, the bulk of computation could be performed during this step, and only a superficial representation would then be sent to another portion of the pipeline. This task might sometimes be performed on the server prior to transferring data to a client. In the case of the example application presented in this work, we rely on serverside data preparation in order to format the data for efficient remote access even for clients who do not have access to a high-speed network. Supercomputers utilize data preparation for output by aggregating blocks across multiple cores, rather than writing a single block per core, which would be extremely inefficient. In general, designers of interactive workflows should aim to provide lightweight data preparation. Even acquisition devices such as microscopes might read data into a raw format that must then be prepared for interactive use by conversion to a streaming image format. This preparation could be made more efficient by overlapping the conversion with the acquisition itself. • Transfer. Often datasets are produced or stored in a location that is impossible or inefficient to access, and therefore they must be moved to a location where they will be accessible. For simplicity, we will refer to the original location of the data as the server and the destination of a transfer as the client (on occasion, the client could itself be an intermediate server ). We will often discuss serverside and clientside operations, which also refer to these locations. Note that data might be transferred and processed continuously in small portions, a process known as streaming, or transferred and processed as a whole, which we will call batch. For large data, batch processing is typically an offline operation. • Preprocessing. There is also “true” preprocessing, which involves more specialized types of data preparation that may be heavyweight, such as generating a 3D octtree from a stack of 2D images that would make the data reading faster for interactive rendering, but involves processing the data in their entirety. Such preprocessing can also include creation of metadata that facilitates search and retrieval of the data. In general, efficiently searching using raw data is difficult. Even textual data such 38 as Twitter feeds would be downloaded and preprocessed into a hash that facilitates fast queries. Examples such as NOSQL build search structures on the fly as data are being searched and created. As data sizes increase, interactive workflows cannot afford the traditional simple, blocking, and potentially time-consuming preprocessing when what they are designed to achieve is efficient data preparation that facilitates interactive access to massive data. These types of heavyweight tasks are considered preprocessing by the workflow, yet for manufacturers of such data, the term used for this work might be postprocessing, and this disparity in nomenclature exemplifies the problem that each side is pushing off significant work to the other. A comprehensive design must therefore consider all phases of the workflow. • Ingest. Finally, the data are ready to be utilized by an application. Data must be loaded into memory before they can be analyzed or visualized. This step is distinct from the data transfer step in that the data are now in the form specifically necessary for the application, and will be expanded or otherwise manipulated in such a way as to be of optimal use for possibly interactive operations. For example, an application might expect to read a compressed data file, but once it has been loaded, it is then expanded to its uncompressed form in order to make use of the data for visualization or analysis. Data ingest is an operation that is commonly performed while a user waits, and therefore it must be considered for optimization to reduce the delay to the user. If the data to be read are very large, it might be useful to consider strategies to enable some form of subsequent processing and visualization after only a portion of the data have been read, while continuing to read the rest in the background. An online algorithm [?] such as this can provide updated results as more data are ingested, and the competitive ratio of the amount of work to be performed by the online algorithms versus an ideal offline algorithm should be considered. Finally, when data are extremely large, sophisticated caching strategies might become necessary in order for the application to remain responsive and still enable exploration of large amounts of data. • Runtime Processing. Once loaded, it may be desirable to perform a variety of additional processing operations, such as those required for desired analyses or the final preparations for visualization. These operations could become quite complex and further tax the patience of a user with undesirable delays. Therefore, designers should consider methods for the application to remain responsive even during such potentially lengthy processing. 39 • Visualization. In order to look at the results of the processing, or simply the dataset itself, further computations or internal data transfers may be necessary, such as sending data to a graphics processing unit (GPU). In addition, an application commonly allows user interaction during visualization in order to permit the user to explore various portions of the data or to change the parameters used to display the data, such as how the data are mapped to the screen in terms of color, texture, or opacity. • Postprocessing. Finally, we include this step of the data analysis and visualization pipeline because even after data are analyzed and visualized, it may be desirable to further manipulate the results for the purposes of publication or secondary analysis. This task may even be carried out by a separate application or a different user. The stages described here are often combined, such as data ingest and runtime processing in the case of an online algorithm. Each operation requires some amount of time, and can be performed to some degree of completeness prior to its results being used in another part of the pipeline. It is rare, especially as data sizes increase, for a system to be able to perform every one of these steps in real time. One must reason about the design of data analysis and visualization workflows when interactivity is desired, and be explicit about which operations must be speedy. We believe such comprehensive reasoning will enable improvement in both the performance and advancement of interactive exploratory analysis systems. The design of workflows tasked with comprehensively managing data from inception through insight is a significant challenge. Thinking of them as linear data pipelines implicitly limits design patterns to those that restrict the form these systems will take upon implementation. In contrast, if we consider the processes comprehensively, from data inception all the way to the insight we hope to gain from it, we can develop applications composed of these processes in a manner that facilitates cohesiveness between cooperating tasks. Consider the linear workflow shown in Fig. 3.1 compared to the nonsequential workflow shown in Fig. 4.1. The figures illustrate essentially the same processes, but the nonsequential workflow shows them without blocking dependencies between the acquisition, processing, and visualization operations. Using this model, we are able to articulate task orders that would not otherwise seem possible with the sequential pipeline representation (e.g., requiring acquisition to precede visualization). Each aspect of the workflow operates independently using a push/pull mechanism to request data or update information on the server. Even the data server itself could be divided into independent processes and spread across many nodes. The individual modules can also communicate directly with each other. The model 40 Fig. 4.1 Nonsequential workflow for interactive acquisition, analysis, visualization, and processing. Data are ultimately the connecting point. This version of the workflow relies on a central data server to manage updates and facilitate production of incremental results, but the data server itself could be distributed across multiple centers or among the nodes of a cluster. proposed in this chapter can be used to describe and assess the performance and interactivity of future interactive systems. Workflows involving the creation, analysis, and visualization of spatiotemporal data typically involve numerous processes, and the notion of interactivity when applied to these workflows might at first seem daunting. Interactivity implies a degree of cohesiveness between the various components of these systems that may not be achievable due to the disparity between these processes. For example, data might be created using a simulation running on a supercomputer, combined with acquisitions from a physical scanner, and eventually displayed on a mobile device. Unifying these disparate processes into an interactive workflow requires utilization of a design that considers the implementation of each component as well as the mechanisms utilized for their interconnection. In the following sections, we describe the necessary components for the creation of an interactive model for spatiotemporal data workflows and discuss the considerations necessary for their effective utilization. 41 4.2 Necessary Components for Interactive Workflows At a high level, interactive workflow designers must consider the aspects of several important components that are relevant for such systems. These components include data layouts and storage strategies, the features of the programming language used to describe data analyses, and the underlying runtime system used to manage the workflow and the distribution of computations among its various components. These three components are critical to create workflows suitable for interactive execution. Some degree of integration between computation, data storage, and network resources is required in order to facilitate the communication necessary for computations to proceed. For example, the scripting features used for analyses can help the runtime system clearly identify the subsets of input data to be computed, helping it to inform the requesting components when these data become available. 4.2.1 Distributed Interruptible Runtime System A runtime system is needed to orchestrate the interactions of the other components. A runtime might be central or distributed, depending on the underlying infrastructure used to implement the workflow. For example, a distributed cluster would have a very different runtime compared to a single node in order to most appropriately manage the various interactions within the framework. Computations are commonly executed using resources with direct access to the necessary input data, and distributing these computations is an important task of the runtime. The runtime is not strictly necessary for every interaction between components, since many types of processing require only the components themselves. However, central management is still required in order to handle interruptions for user interaction or to direct computations properly. In these cases, the runtime must be able to communicate with the individual components of the workflow, and the components themselves must be designed in such a way as to enable these types of interruptions or modifications even during processing. 42 4.2.2 Streaming Data Layout and Distribution of Storage All operations tend to connect at a single point: the data. Therefore, how one manages data storage is of the utmost importance for developing an interactive workflow. Two main issues affect performance with respect to data: layout and storage. The strategy used to define the layout of data can dramatically affect the ability to read that data in an interactive fashion. Therefore, the desired workflow must be considered in order to select a file format with an appropriate data layout and distribution in order to avoid a deleterious impact on performance. For example, for large spatiotemporal data, a naive format might require significant processing for such common operations as generating lower resolution versions of the data, whereas this operation would be trivial using a hierarchical data layout. Another major issue is that data may be stored in a distributed fashion, within a cluster environment, or even across nodes that vary widely in geographic location and quality of network connectivity. The latter situation might occur when datasets from two different institutions must be compared, a common requirement for domains such as the study of climate and weather data. The ability to retrieve data in an efficient and prioritized manner is crucial for maintaining the interactivity of the frameworks that use these data. 4.2.3 Suitable Analysis Language Features to Enable Flexible Interpretation Finally, the scripting model used to describe the computations performed by users can dramatically affect the interactivity of a given workflow. Some form of interpreted scripting layer can be helpful to achieve interactivity, since requiring precompiled application libraries makes it more difficult to specify ad hoc analyses. However, the scripting must include features that can be flexibly interpreted in order to accurately produces results in a progressive manner. The mechanisms of evaluating these analyses must be designed to provide incremental results for long-running operations, and to allow these computations to be interrupted by the user. In addition to performing operations on small portions of data in a streaming fashion, progressivity and interruptibility can be facilitated using special language constructs and a flexible runtime. The scripting interpreter must work with the runtime in order to take into account the distribution of the computations and properly manage updates of the underlying data storage in order to prevent parallel computations from overwriting each other’s results. Strategies such as random selection of input blocks for large datasets or the indices of 43 long-running loops can lead not only to reduced contention but also to potentially faster convergence of the incremental results. We utilized such strategies for the EDSL presented in Chapter 3. A scripting language is just as important for what it does not do as for what it does. For example, the EDSL we presented does not explicitly specify the order of loops, and in this way it is able to perform such computations more effectively using parallelization and out-of-order evaluation. This ability enables faster convergence of the incremental results produced by these loops, as will be shown in the examples presented in Chapter 5. 4.3 Design Considerations for Interactive Workflows Effective scalable interactive workflows depend on utilizing an appropriate runtime coupled with expressive features of the scripting language for analyses and a suitable streaming data layout, but the mere presence of these elements is not sufficient to enable interactivity. Because the interconnection between components is critical for enabling interactive workflows, three important strategies must be considered for their design: production of incremental analysis results, progressive refinement of output data, and interruptibility of the workflow at any time. These strategies are necessary at every level since any one component in the chain can cause the others to be delayed and therefore reduce the interactivity of the entire system. For example, the visualization component of an application must operate incrementally and progressively, and it should always be interruptible. As one of the primary interfaces with which users interact with their data and computations, it is imperative that such components remain interactive. 4.3.1 Incremental Advancement of Computation Results Algorithms that produce the results of computations directly as the input data are being read are called online algorithms [?]. In contrast, offline algorithms have access to their complete input data. The competitiveness of an online algorithm is the ratio of its performance versus that of its optimal offline counterpart, if one exists. Workflows that incorporate progressivity might use online algorithms to show the results as they are being produced. These algorithms enable early detection of errors or anomalies, as well as possibly useful preliminary and intermediate results for long-running computations. Online algorithms can also be used to facilitate computational steering to modify inputs or direct 44 a simulation or analysis toward a desired region of interest. If necessary, offline analyses can be performed later once the desired specifics of a computation are determined using interactive exploration with online algorithms. Note that we may not have control over the input stream, such as data coming from an acquisition device like a microscope, but interactive workflows must accommodate to this potentially unsteady pace. When we do have the ability to select the order of input processing, we can utilize this flexibility to enable faster convergence of incremental results to a final solution, as demonstrated in the EDSL that was presented Chapter 3. Although the speed of convergence may change due to not having control over the input order, the result will be the same. 4.3.2 Progressive Resolution Refinement of Output When input sizes are very large, even a single iteration can become unwieldy and interactivity can diminish. For such massive data, one strategy is to perform an initial computation using a reduced resolution version of the input data in order to quickly compute a cursory result and then use the higher resolution data to refine the computation. A variety of such iterative refinement techniques exist that can be used to rapidly produce cursory results and then successively improve them. In our example application, we demonstrate one such strategy by utilizing an efficient multiresolution input data format that can be quickly accessed at both full and reduced resolution. Workflows may also take into account algorithmic considerations such as computational accuracy and variable processing time. Algorithms can be dynamically adjusted to accommodate the amount of available time, data, and processing power. Using online error-bound analytics, long-running computations can be curtailed such that they will finish an iteration within interactive time frames. This viewpoint of “good enough” might be contrary to the goals of a given workflow, in which case other strategies may be necessary, such as producing results in an iterative fashion, incrementally improving or computing on subsets of data. 4.3.3 Interruptibility to Interactively Guide Computation Perhaps the most obvious necessity for an interactive framework is the ability to interrupt computations while they are in progress rather than force the user to await their completion. Interruptibility facilitates interactive exploration by allowing the user to modify both the inputs to a workflow as well as the ongoing analyses. However, this feature affects the implementation of both the individual components as well as their interconnection. 45 Therefore, interruptibility must be considered at every stage of the design. The degree to which a workflow can be utilized interactively depends on its ability to be interrupted by the user. Techniques such as background processing are necessary but not sufficient to facilitate such workflows. The individual components of processing must to some degree be able to be paused or stopped during their operation. In graphical applications, the notion of a display loop incorporates user interruption in the form of interactively changing viewpoints within a virtual environment. There may be strict performance requirements for a given application depending on its intended usage. For example, 3D virtual reality applications typically require at least a 90 Hz refresh rate or else the user may experience disorientation or nausea. 4.4 Component Integration Considering comprehensive, interactive workflows from data inception through visualization and analysis can make it difficult to distinguish the boundaries between the overall system and the individual algorithms being utilized. The runtime must facilitate communication between disparate components, which requires some notion of their state, but too much transmission and storage of state could adversely affect the ability of the application to remain responsive. Individual components should be able to communicate directly but also continue to operate within the rest of the workflow. Effective integration of these components requires a high-level understanding of both the components themselves and the way they talk to each other. This communication is critical within a given workflow, including between the components of the workflow, the user interface, and the runtime system. The most important considerations to maintain interactivity when integrating of a set of components within a workflow are communication, data movement, and state. Applications designed for interactive data exploration must be able to scale as these data sizes increase. Decisions about the design of the overall workflow such as the granularity of data transfers or the degree of statefulness of the individual components can affect the required amount of communication as well as its tolerance to variations in underlying resources such as network latency and bandwidth. Here we describe several specific considerations pertinent to the implementation of any interactive workflow, such as the most appropriate mechanisms for communication, flexibility to changes in the underlying computation and communication resources, and the degree and type of caching appropriate for a given workflow. In regard to the users, interfaces must communicate the state of the workflow in an effective manner. Users should be aware of ongoing computations and updated results, even 46 if a long computation is ongoing and cannot be interrupted. Uninterruptible computations may violate interactivity, but it is even more vital for the application to communicate that fact to the users in order to manage their expectations. 4.4.1 Nonblocking Communication Even modern networks are susceptible to interruptions or fluctuations in bandwidth and latency, and such considerations are necessary when designing workflows that rely on network communication to operate. For example, a computation may be required to pause while other processing takes place if the inputs are not available quickly enough to maintain interactivity. User interfaces must be designed to provide feedback regarding the status of such ongoing computations, indicating their level of completeness as well as any measure of accuracy that can be determined. Individual components must be able to convey their state as well as their needs to other members of the workflow. The communication between individual components and the runtime should be sufficient to allow slow running operations to complete in their own time without blocking the rest of the workflow, but not so overwhelming as to hold up the progression of computation due to too much synchronization. 4.4.2 Direction of Data Movement and Effective Caching Effective design must be utilized to mitigate the inevitable delays involved in the transmission of data within a workflow. The direction of data movement can have a significant impact on interactivity due to the delays that might be introduced by the need to complete a read or write prior to continuing a given operation. Within the local workflow, many operations might be handled in a “push” style, which can be acceptable due to the locality of the receiving component. However, if a downstream receiver is busy with another operation, it may be responsible for an undesirable delay in the sending module. When this situation is likely, it may be more appropriate to select a “pull” model in which the receiving component requests data only when it is ready and able to process the data. In addition, the selection of push versus pull strategies for data movement might require some form of caching on either end of the transaction, depending on the nature of the computation being performed. For example, some computations, such as those required for applying a color palette prior to visualization, may be relevant only for the most recent data, in which case newly arriving data need not be queued, and instead they can simply replace any data that already exist in a module’s cache buffers. However, computations 47 that perform some comprehensive analysis of data, such as computing a temporal average, may be required to queue incoming or outgoing data. Network communications should be dynamically bundled according to the measured performance of the underlying transport layers, and caches should be utilized when they are available. The most effective strategies to select may not be obvious. For example, we might prefer to use a pull model for networking, but in a deep workflow pipeline we find that push is superior since it facilitates accumulation of intermediate results between components. Each application must consider the scope of its usage in order to develop the most appropriate design for the underlying framework. By storing some data locally, or near a given computing resource, we can improve the underlying performance of the framework and its ability to support offline modes of operation. However, additional logic is required to handle data updates to ensure that caches remain in sync as much as possible. Since caching issues can be complex and subtle, it is best for such modules to be automatic rather than manually specified, allowing the overall system to select the best form of caching for the computation at hand. Given the ubiquity and importance of caching both within and between components of an interactive workflow, this topic must be carefully and comprehensively considered for its design and assessment. 4.4.3 Minimal Shared State for Module Independence The state of a workflow encompasses a complete description of its current operation, which ideally allows the workflow to be restarted at any given point. It is important to understand how the existence or omission of state influences the ability of a workflow to remain interactive. On the positive side, a shared state enables terse communication between components (e.g., “rotate 45 degrees”). However, the degree of this coupling can lead to synchronization errors, particularly for geographically distributed systems. Therefore, the designers of a workflow must take great care to define a minimal set of variables agreed upon as the shared state, and to thoroughly consider potential failures with respect to the state that is tracked. In order to assess the potential for errors due to a poorly synchronized shared state, designers must first comprehensively articulate the shared state of the workflow, then create methods for its components to assess how far off they might be from the global state, and finally identify the probability of components falling out of sync with one another (e.g., based on their geographic distribution, frequency of updates, etc.). By minimizing the amount of shared state synchronization, errors can be mitigated. 48 The central runtime controller might be an obvious component required to store some notion of the workflow state, but its individual components need to keep track of their own state as well. There are trade-offs between the amount of state required for the operation of an application and its interactivity. In general, storage of state requires some form of token-based transaction to mediate communication between modules, which may involve locking or other mechanisms that could reduce interactivity. One significant challenge to storing state is when workflows become distributed, which requires that each component must keep track of its respective state, and these various states need to be synchronized in some fashion to avoid errors. Because of the potential for problems, it is necessary to utilize objective, concise communication between components, and to avoid any assumptions about the overall state. 4.5 Workflow Assessment Future workflows can be assessed using this proposed model, and the model itself could be refined to explicitly specify timing of tasks, enabling more objective comparisons. The level of interactivity to be targeted can be determined by incorporating temporal thresholds in describing the requirements for a given task. For example, people who are fighting forest fires must assess the areas to which a fire is most likely to spread. They may need to quickly integrate weather, climate, geography, and satellite data. Since the computation is already dealing with probabilities, an approximation would constitute a suitable result, and therefore providing a quick and cursory answer is superior to a slow but more accurate solution to this problem. Considering the availability of computational resources, if the user has only a laptop and a wireless connection, then certain summary techniques will be invaluable to getting any result at all, whereas if the user has access to an idle supercomputer, then computing the full solution might be trivial and quick. We can assess the scaling behavior of a proposed workflow using knowledge about the location of data, availability of computational resources, and algorithmic complexity of each portion of the computation. Combined with identification of the desired level of interactivity, this assessment aids in the understanding of interactive workflows in order to provide the most appropriate design for their intended usage, whether it be for raw computation or interactive exploration. One might wish to compare two workflows that ultimately produce the same result. An example might be a processing pipeline that computes a predetermined final result versus an application for interactive exploration that incrementally produces partial results. These 49 two systems might be designed very differently, and although the first system might be able to calculate a final result in less time than its counterpart, its lack of incremental results could detract from its overall utility for interactivity. This metric of overall execution time is useful for comparing various implementations. However, the value of the application with the slower overall time complete a given computation rests in its ability to facilitate interactive exploration. Thus, we augment the measurement of overall time to completion with two other metrics: update frequency (i.e., speed) and responsiveness. The proposed model could eventually be used to identify the following critical metrics for evaluating end-to-end data creation and analytics workflows: 1) time to completion to indicate the time required to perform a complete computation and produce a final result. Once known, the time for the complete computation can be divided by the update frequency in order to determine the number of incremental results that will be produced prior to the final computation; 2) responsiveness, defined as the time between a user interruption at some point in the workflow to when the first updated computation result becomes available. It may be possible to interrupt the workflow early in the computation, such as for steering simulations, or much later, such as when changing viewpoints of a visualization, and therefore responsiveness can be described at every point where interruption is possible. Startup costs are considered as an interruption at the very beginning; 3) update frequency, defined as the time it takes for some (possibly incomplete) result to be produced by the workflow. Since pipelines might amortize some processing and data transfer time between its nodes, we consider update frequency when the pipeline is completely full and operating at maximum throughput; 4) computational utilization to indicate the ratio of the total computational capacity of a given platform versus the portion in use by the data processing workflow; and 5) i/o utilization to indicate the ratio of the total possible data throughput for a given system versus the amount of throughput achieved by the workflow. In the following chapter, we demonstrate specific interactive data analysis and visualization applications built of the framework presented in Chapter 3. The general model just proposed could be used in the future to assess the strengths and weaknesses of these applications. Its utility and importance would increase with refinement by enabling more objective design and comparison of future interative workflows. 50 4.6 Summary In this chapter, we have presented a practical model that includes the components necessary for the design of interactive, comprehensive workflows for spatiotemporal data, from inception through analysis and visualization, empowering designers with the necessary considerations for the effective implementation of new workflows as well as integration with existing systems. The basic requirements for all such workflows include a flexible underlying runtime system, an appropriate streaming data layout, and appropriate features of the scripting language used for analyses in order for expressed operations to be performed by the workflow in a flexible manner. For scaling to arbitrarily large data, designers must observe the three critical techniques of incremental production of analysis results, progressive refinement of those results, and interruptibile components within the workflow. Other critical factors that influence their overall performance include effective management of data movement and caching, lightweight statefulness of the system, and the methods of communication between components. Although individual components of a given application might be considered independently, evaluation of interactive workflows must comprehensively consider all aspects of the system. Such aspects include the rate and quality of progressive data delivered from both creation and analysis tasks, the workflow’s ability to remain interactive with respect to varying network or storage latencies, and the capability to supersede ongoing tasks in order to explore data without disruption. We have presented the necessities for all interactive workflows. Designers might also consider features needed for specific domains or to meet particular goals. For example, they might consider aspects such as security that could also affect the performance of an application. In the future, a more formal model could be developed to incorporate all the techniques illustrated here in a more robust fashion. This chapter identified the important components and considerations for a model to be used in the design and comparison of data processing workflows. A more formal model should explicitly consider the update frequency, degree of interruptibility, quality of incremental results, and the degree of scalability as both data and computational resources become larger or more disparately located. CHAPTER 5 EXAMPLE APPLICATIONS USING THE INTERACTIVE FRAMEWORK In this chapter, we demonstrate and assess the usability of the interactive framework presented in Chapter 3 for various analysis scenarios in real-world scientific applications. Some of the EDSL scripts are presented here, and all other scripts for constructing these workflows can be found in Appendix B. The specific use cases demonstrated here include combustion simulations and interactive analysis and visualization of large, disparately located climate and weather simulation ensembles. These types of datasets are created by numerous institutions around the world, and can range in size up to several petabytes. Due to the importance of climate science for understanding or adapting to rapid changes in our planet’s climate, it is imperative these results be disseminated and compared by scientists worldwide. However, the sheer volume of data significantly impedes distribution, even when utilizing very fast networks and formidable computational resources. Therefore, these data are often analyzed using offline processes, and the costly nature of these computations entails an overly conservative selection of analyses. The creation of a generic model for the design and comparison of arbitary data processing workflows allows us to assess the use of our application for progressive, interactive spatiotemporal data analysis and visualization. For this assessment, we consider several of the techniques presented in Chapter 3 to facilitate interactivity in our application. In particular, in this work we utilize a simple reordered data format called IDX that enables coarse-to-fine access to arbitrary subregions of large, regular-grid spatiotemporal data. By utilizing this format for the remote data servers, we can perform interactive visualization without expensive preprocessing, attain cursory results of computations involving massive data, and easily share data among many disparately located users. However, since existing climate data may not already be in this format, we also provide a module for on-demand conversion of these datasets to the streaming format. We also utilize techniques for incremental computations that provide ongoing results of long-running analyses. Finally, we include methods to 52 integrate our framework with existing pipelines to show that methodical transitions from existing workflows can be achieved. The generic model presented in Chapter 4 serves as a foundation upon which to evaluate the workflow of our application. It can also be utilized to identify potential bottlenecks as data increase in size, computational resources exhibit unexpected delays, or either data or computational resources become more distributed. The proposed model can also show how these workflows might be improved. Evaluation of interactive workflows must consider several aspects of the overall system. For example, in contrast to a simple renderer, frames per second would be an insufficient metric, because rendering should be decoupled from data production, analysis, and caching. Although individual components of the application might be considered with respect to such metrics, we must consider aspects that affect interactivity in general. These aspects include the rate and quality of progressive data production for both creation and analysis tasks, the system’s ability to remain interactive with respect to varying network or storage latencies, and the capability to supersede ongoing tasks in order to explore data without disruption. Ultimately, a comprehensive assessment in terms of our proposed model will provide guidance to designers, helping them to use their resources most effectively to produce interactive systems for data acquisition, processing, and visualization. By emphasizing an overall consideration of such applications, it will be possible to identify the most salient high-level design choices and to discard less impactful solutions. 5.1 CFD Simulation Computational fluid dynamics (CFD) simulations solve partial differential equations on structured or unstructured grids, making use of various computational techniques to simulate the underlying physical phenomena. CFD simulations are crucial for modeling and analyzing complex chemical and physical processes. These simulations may be used in the search for more efficient energy utilization. One such environment is S3D [?], which has been integrated with the PIDX library [?] to directly produce multiresolution output that can be utilized by our runtime system. These simulations can produce up to terabytes of data per timestep and involve extremely large domains with up to hundreds of fields. However, by utilizing the multiresolution and progressive refinement, scientists can rapidly explore preliminary results of a variety of computations before committing to a final static analysis. 53 5.1.1 Multifield Analysis One standard analysis of CFD simulations to model efficient fuel combustion is to explore discrete regions of burning flame within a specific range of mixed fuel. The optional burning condition is usually achieved within such a range. Typically, scientists will compute a derived field offline by iterating through the full-resolution volumes of both the mixture fraction and OH field and mask the OH field by mixture fraction thresholds. Despite being a simple operation, this process is computationally intensive due to the sheer size of the data. As a result, scientists often cannot repetitively experiment with different thresholds to select the best one. Compared to the commonly used workflow, our application enables scientists to interactively explore different threshold values using a simple script. Fig. 5.1 illustrates the process of applying the threshold. The OH field is shown in Fig. 5.1(a), and the masked OH field where the mixture fraction of fuel and oxygen is between 36%-40% is illustrated in Fig. 5.1(b). Utilizing coarse-resolution data allows very rapid cursory exploration that can be refined as necessary. The interactive exploration facilitated by this work ensures that important data are not missed nor resources wasted with unnecessary computation. 5.1.2 Localized Computations Certain analyses are intended to be carried out only in specific subregions of the domain being simulated. Furthermore, for some simulations it is desirable to perform certain analyses only near the end of the simulation time. Fig. 5.2 shows the average of the O2 moles field around the ports of the coal boiler being simulated during only the last second of the simulation. Forcing users to customize the simulation to facilitate such specialized output for a given model only complicates the process, which leads to an increased possibility of error. Interactively taking advantage of the nearness of analysis results, such as the BSF temperature around the nozzles of a particular boiler model over the last second of a 5-second simulation, or working with other various fractions of the data, facilitates more exploration and discovery of other unanticipated issues in the results. The ability of our framework to localize results of a given set of analyses in both time and space enables considerable savings in terms of both computation and storage, so these resources can instead be utilized to further the progress of the underlying simulation in a variety of forms ranging from increased precision to saving of additional desired fields. 5.1.3 Draft Computation Accuracy The utility of cursory computations can be demonstrated by comparing the accuracy of using various fractions of the data and the time required for their computation. In the 54 (a) (b) Fig. 5.1 Exploring discrete regions of burning flame within a specific threshold of mixed fuel. (a) The original OH field. (b) The application of the mask to the original OH field where the mixture fraction of fuel and oxygen is between 36%-40%. (Data produced with the S3D application, courtesy Jackie Chen, Sandia National Laboratory.) Fig. 5.2 CFD simulation of a BSF coal boiler: (a) shows the scripting interface along with the result of the average temperature computation; (b) shows the results of the computation of the average O2 moles around the injection ports of the coal boiler during the last second of a 5-second simulation. 55 case of the average BSF temperature, the simulation precomputed and saved this value as a specific field in each timestep. We utilized this precomputed result to verify that both the EDSL script and associated runtime were computing the correct result for this analysis. In addition, we wanted to demonstrate the effectiveness of the incremental results for this type of analysis. Fig. 5.3 shows the difference between increasingly less frequent sampling using our computation and the exact solution computed by the simulation. Computationally, the result on the left is less than 1%, different from the exact solution saved by the simulation, even though it uses only 3% of the original simulation data. Furthermore, the results of increasingly cursory computations of the same analysis are shown from left to right, where the computation on the right uses only 1/100 of the time and data and still provides a suitably useful initial result. Like the examples shown for climate simulation analyses, these results provide a strong argument for the utilization of incremental, multiresolution analyses for the exploration of spatiotemporal simulation data. 5.1.4 Postsimulation Computations It is common practice in some simulation codes to precompute certain analyses expected to be utilized by scientists during their assessment of the results. Such precomputed results can be convenient (as well as more accurate for simulations that do not store output from every timestep), but they require additional computation and storage resources. In addition, there are numerous cases where desired analyses are identified only after the simulation completes, such as restricting the timeframe over which such analyses are performed. The example shown in Fig. 5.3 showed that one of these precomputed results could be performed with nearly the same accuracy using only the postsimulation results. Since these results can be computed to within some measurable degree of accuracy, in a tiny fraction of the time required for the original simulation to produce them, a formal assessment of their computation and storage costs, as well as their necessity to the scientists, could be utilized to determine whether they are actually worth including in the original simulation versus computed after the fact during its analysis. Fig. 5.4 demonstrates the computation of the standard deviation of the BSF temperature, an analysis not included in the original results of the aforementioned CFD simulation of a coal-fired boiler, computed for the last second of the 5-second simulation. The selected time range was identified by scientists only by looking at the final results of the simulation after it had completed. As can be seen in the figure, the cursory computations that use even a tiny fraction of the original data still provide an estimate of the overall result likely suitable for its analysis using only a tiny fraction of the comparable resources. 56 Fig. 5.3 The results of computing the average temperature of a CFD simulation during the last second of a 5-second boiler simulation. Each image from left to right shows the comparison of the computation with the original result when computed using increasingly fewer samples. Note that the original result is computed by the simulation using 30x more samples that are not available because saving so much data would not be possible due to storage limitations. Fig. 5.4 From left to right, this image shows the computation of the standard deviation of the temperature of a CFD simulation during the last second of a 5-second boiler simulation. Since this value was not computed originally by the simulation, only 1/30 of the timesteps is available for the analysis. Based on the results of the previous comparison, we believe this is still less than 1% error versus an inline analysis that uses data from every timestep. 57 5.2 Climate Simulation Governments and organizations have undertaken global climate research to understand the primary causes of the unusual warming observed over the past several decades, as well as to determine the extent to which this warming can be mitigated by changes in human behavior, such as a reduction in carbon dioxide emissions. According to domain scientists, as computational capability increases, these models become more sophisticated, and the size of climate simulation output grows dramatically (up to petabytes for simulations with extremely high spatial and/or temporal resolution). The increasing size and complexity of climate datasets have placed a huge burden on scientists to effectively perform analysis and visualization tasks. By utilizing the proposed workflow, scientists can streamline and automate large amounts of manual operations such as downloading, converting, or resampling. In this section, we will demonstrate the use of our framework with an example application focused on enabling interactive exploration and analysis of massive, disparately located climate data ensembles. The application provides for arbitrary user analysis and visualization of these data, including an EDSL and associated runtime to support incremental processing, cursory analyses, decoupled interactive visualization, and on-demand data conversion. It is accessible through both web-based and desktop clients, enabling interactive remote exploration of massive data ensembles ranging up to petabytes in size. Analysis tasks can be concisely expressed using our EDSL, and the convenience of the framework enables scientists to focus their energy on core analysis tasks. The time-savings of execution encourages them to experiment more, and this experimentation has already enabled the discovery of an error in a widely used public dataset. 5.2.1 Multimodel Ensemble Comparison Since numerous climate models have been developed, and the results of each of them might significantly differ, one of the important tasks for climate researchers is to validate these models against historical observations as well as compare them with each other [?]. These models can then be used in experiments that try to predict future climate under a variety of conditions, such as increased or decreased anthropogenic emissions. For each model (and a given experiment), a collection of runs is generated, each with different parameters and/or initial conditions. Such a collection is often referred to as an ensemble. These models are created by different institutions with different computational resources, and therefore the grid resolutions of the output data are usually different. As a result, resampling is necessary for comparison. Compared to the tedious manual workflow that 58 is often adopted by domain scientists, our application can utilize the remote data directly, streaming even very large data interactively at reduced resolutions, refining the data as necessary, and implicitly resampling the requested datasets to a common resolution for proper comparison. In Fig. 5.5(a), we visualize the temperature average from an ensemble of the FGOALS model (12-run ensemble). The average of the same experiment for the MIROC5 model (12-run ensemble) is shown in Fig. 5.5(b). The average and difference of these two models are illustrated in Fig. 5.5(c) and (d). As we can easily see from Fig. 5.5(d), these two models demonstrate the greatest divergence in the area between the Tibetan plain and the Indian subcontinent. By using our application, such observations can be obtained on the fly without tedious data conversion and grid resampling. 5.2.2 Annual Zonal Average Another interesting analysis called a zonal average can be applied to climate data. The temperature field zonal average in Fig. 5.6(a) shows the daily data for a whole year summarized in one figure. The average for the entire line of longitude is computed at each latitude for daily data. In the plot, each vertical line along the x-axis corresponds to one day’s planetary average. As we can see in Fig. 5.6(a), the temperature corresponding to each latitude changes over time, indicating seasonal variation. Fig. 5.5 The comparison between climate simulation model ensembles. 59 Latitude (a) Temperature field { Latitude Days in a year 30-days band (b) Humidity field Fig. 5.6 Annual zonal average of temperature and humidity. In (a), the daily spatial temperature average changes as we move along the temporal axis, which illustrates the change of seasons in a year. In (b), the duplication error in the humidity data is indicated by the bands along the temporal axis. By utilizing our application for cursory exploratory analysis, the scientist Jiwoo Lee also discovered a serious set of errors in the daily 3D data from NIMR (the Korean National Institute of Meteorology Research). As illustrated in Fig. 5.6(b), the zonal average shows unnatural bands in the horizontal (temporal) direction, which indicates unchanging daily data for each 30-day period. For each month in this particular ensemble, a single day’s data were erroneously duplicated for the entire month. Once we observed the flaw in one field, it was trivial to check the other 3D fields that also exhibited the error by simply changing the variable name in the script. The on-demand data conversion module transparently converted these additional fields, which would have otherwise required manual download to be examined. In addition, the low-discrepancy ordering of the unordered loops used to generate the zonal averages ensured that incoming data provided the best possible incremental calculation for the entire zone. What might have required significant manual effort and hours of computation was achieved in minutes and at a glance. 60 5.2.3 Rank Correlation Analysis Correlation can be used to measure the relevance between different fields, validating a model by comparing its output to corresponding observations, and even comparison of different regions within a single model. Rank correlation [?], [?], [?] is a widely used but relatively new technique for climate analysis that is rarely implemented in domain-specific analysis tools. Instead, scientists must manually write code to compute these correlations. Listing 5.1 shows an EDSL script that incrementally computes rank correlation. Notice the use of overloaded operators for summation (+=) and scaling (/ ) of the 3D fields. The algorithm is based on Welford’s method to compute the running variances of two variables that are necessary to calculate the rank correlation. Listing 5.1 EDSL script for incremental computation of Pearson’s rank correlation using hourly 3D data from the 7km GEOS-5 Nature Run simulation. // // Pearson ’ s rank c o r r e l a t i o n o f two v a r i a b l e s o v e r time // var i = 0 ; u n o r d e r e d ( t , [ s t a r t , s t a r t+width ] ) { f = d a t a s e t 1 [ f i e l d 1+” ? time=”+t [ 0 ] ] ; g = d a t a s e t 2 [ f i e l d 2+” ? time=”+t [ 0 ] ] ; // c r i t i c a l s e c t i o n : // u p d a t e running a v e r a g e , v a r i a n c e , and c o r r e l a t i o n // w . r . t . t h e i r c u r r e n t v a l u e s and t h e g i v e n i n d e x {{ var oldMf = Mf ; var oldMg = Mg; // running a v e r a g e Mf += ( f −Mf ) / ( i +1); Mg += ( g−Mg) / ( i +1); // running v a r i a n c e Vf += ( f −Mf ) ∗ ( f −oldMf ) ; Vg += ( g−Mg) ∗ ( g−oldMg ) ; // running c o r r e l a t i o n Vfg += ( ( oldMf−f ) ∗ ( oldMg−g ) ) ∗ ( ( i + 0 . 0 ) / ( i + 1 . 0 ) ) ; var S f = Array . s q r t ( Vf/ i ) ; var Sg = Array . s q r t (Vg/ i ) ; output = S f g / ( S f ∗Sg∗ i ) ; i ++; }} doPublish ( ) ; } // d i s p l a y i n c r e m e n t a l r e s u l t 61 The 2-year nonhydrostatic 7-km global mesoscale simulation created by NASA, know as the “Nature Run”[?], is one of the largest climate simulation datasets to date and an example of the future of global climate modeling. In this example, we try to understand the relationship between hydrophilic and hydrophobic black carbon (both important environmental pollutants [?]). Hydrophobic black carbon is believed to transform into its hydrophilic sibling shortly after emission from various sources, especially industrial. In order to quickly evaluate this theory, we apply rank correlation between these two fields using remote data cached at LLNL. Each timestep of the 3D fields is approximately 1.5 GB, and our cursory analysis considered 744 timesteps, more than a terabyte of remote data. As illustrated in Fig. 5.7 (a), a coarse-resolution of the data was selected by the user to rapidly identify the preliminary result. The final result using the full dataset is illustrated in Fig. 5.7 (b). The results of computation using coarse-resolution data can be surprisingly accurate. Fig. 5.8 shows the root mean squared error (RMSE) between the full-resolution and several partial-resolution calculations of a 2D rank correlation within the NASA Nature Run simulation. The graph shows the relationship among error metrics, total computation size, and total computation time for each resolution level. The full-resolution computation requires nearly 100 GB of data, but at low resolutions, the error is still quite reasonable and the computation time is dramatically faster. Regarding data access, the full-resolution raw data are available in the standard NetCDF4 format, but the time and space required to download them locally seriously inhibit access and analysis. We specialized the on-demand conversion system described in Section 3.6 for climate simulation data. It handles the complexity of data loading and format conversion, providing the same details without requiring any manual effort by the user. The details of our use of on-demand conversion for climate data and an assessment of its performance are described next. 5.2.4 Using On-Demand for Climate Simulation Data Petabytes of climate data are spread throughout the world, and most are stored in the NetCDF format. One particular issue common to this and many other “flat” data layouts is the lack of a hierarchical multiresolution representation. In order to facilitate dynamic remote analysis, climate scientists can use the language additions introduced in Section 3.4 to perform ad hoc user-directed analysis. However, converting all the existing NetCDF data to IDX is not realistic. Therefore, we introduced a conversion service that will convert 62 (a) Coarse resolution (b) Full resolution Fig. 5.7 Pearson rank correlation between hydrophilic and hydrophobic black carbon on the 7km GEOS-5 Nature Run dataset. (a) Coarse-resolution rank correlation. (b) Fullresolution rank correlation. Fig. 5.8 Comparison of data size, computation time, and root-mean-square error (RMSE) for various resolution levels in the computation of the Pearson rank correlation. 63 the data to IDX when requested, using the finest granularity (i.e., converting the smallest portion possible for the given request) and caching the results on the server to speed up successive accesses. This on-the-fly conversion provides the advantage of hierarchical access for analysis and visualization, especially useful for large spatial domains. Furthermore, once converted, the fields can be accessed interactively at different resolutions by any future users. The on-demand module for climate simulation data utilizes the OPeNDAP protocol to load requested components, obviating the need for an actual installation to be present at the institutions that host the data. The converted multiresolution data ensembles are stored in the cache at LLNL where the data reordering module is currently deployed. Fig. 5.9 shows the design of the overall system. Beginning at the ESGF search page (top left of the diagram), the user can download an xml configuration file describing how to load the selected climate dataset. When the user selects a dataset, its corresponding IDX metadata are created and registered with an associated Visus data server. This configuration will contain references to the multiple volumes that are part of the same climate model. The data can be loaded in a Visus client or compatible component. Available datasets can also be listed directly from the associated server. Once the user has the URL, the dataset can be opened from any IDX-compatible client, such as UV-CDAT (top right). Its hierarchical nature allows coarse-resolution data to be streamed very quickly, providing a preview of the final data and facilitating interactivity. When data are requested, a remote query is made to the Visus server, which checks if the data already exist in the cache. If so, the server sends the cached data immediately. Otherwise, it calls the climate data converter service, which converts the data and returns. After conversion, the data are available in the cache so successive attempts to access the data will succeed without incurring additional conversion costs. The implementation of this module consists of these three pieces: 1) a native service to create an empty IDX volume and read the associated metadata; 2) a method to identify the smallest portion of a dataset that can be converted; and 3) associated scripts to convert this portion of the data. When a particular field at a given timestep is requested, the service determines which file to open and creates it if necessary. In order to create the preliminary empty volume and associated metadata, the cdscan utility from the Climate Data Analysis Toolkit (CDAT) suite and its associated Python library, cdms2, are used to collect all the variables and domains for a given climate simulation. From this description, we create an empty IDX 64 Fig. 5.9 Overview of specialization of the on-demand conversion system for incremental access to remote climate datasets. volume to hold the dataset as it is converted. The on-demand access utilizes the requested field name and timestep, as well as the desired spatial region and resolution level, in order to generate or retrieve the desired dataset. If the dataset has already been converted, the cached data are streamed to the user. Otherwise, the on-demand converter reads that data, uses the provided scripts to convert the requested portion to IDX (i.e., to reorder the data), saves the results to the cache, and notifies the requester that the data are now ready. We have integrated this on-demand data reordering module as part of the Earth System Grid Federation (ESGF), a worldwide association to share data between climate scientists across many institutions. The service provides for converting both local and remote climate datasets to the multiresolution IDX format. The reordering service is implemented as a Python-based web service with read-only access to hundreds of terabytes of (possibly remote) climate data stored in the NetCDF format [?]. The typical method by which a user 65 of climate data federated by ESGF acquires new data is to first search for the desired dataset using the ESGF search page and then to manually select and download the datasets to be studied. These data may be very large and contain many fields not needed for the desired experiment, wasting time and local storage space. The multiresolution datasets provided by our service incorporate all fields and the entire time span of a given dataset, but no actual data are converted until they are specifically requested. This efficiency makes it simple for scientists to add or remove an unexpected field from their computations without converting unnecessary data. Fig. 5.10 shows the time required to compute a seasonal temporal average (see Listing 3.1) when data are converted on-demand versus when they already exist in the server cache. Note that client-side caching was disabled for this test. 5.2.5 Performance Assessment of On-Demand For any component of an interactive workflow, the analysis of its use should be considered with respect to the comprehensive system. To avoid huge delays like an offline preprocess, we strive to ensure that data are converted to the desired streaming format on the fly using the smallest chunks possible, reordering the minimal amount of data for a given request. In order to balance the cost of such data reordering with the necessary reads, our application utilizes caching on both the client and server sides. The on-demand data reordering component introduced in this section can and should be considered another component of the overall workflow and is therefore also capable of maintaining its own caches of streamed and/or converted data. Fig. 5.10 Computation time when input data are converted on demand versus already cached on the server. Temporal average of daily data (90 timesteps) from NIMR HadGEM2AO “Historic.” Local caching disabled. Each timestep is 32-bit floating point, resolution 192x143x8. Our progressive environment revealed serious and previously unnoticed errors in the original data. 66 Using on-demand data conversion for various types of data is a useful way to enable access to large-scale datasets that are not already stored in a suitable multiresolution format. Once converted and cached, the data are accessible with the same efficiency as any other multiresolution datasets. However, the conversion itself can introduce noticeable delays for data access. These pauses can be distilled down to two reasons: the time to access data to be converted and the minimal “chunk” of data required to be converted for a single request. Access time is dependent on the location of the data being read. If they are on the same server as the on-demand conversion utility, this may not be an issue, but remote data could be a bigger problem. Closely related is the minimum data size that can be converted. Consider a single timestep of a high-resolution simulation. Even if the initial request is for a coarse-resolution view, this requested portion of the data must be completely converted to the multiresoution format before it can be accessed in this coarse-to-fine fashion, since data are read in blocks, and each block is written in the appropriate order in the IDX volume. For the massive 7km GEOS-5 “Nature Run” simulation residing at NASA, the large fields of this dataset require more time to convert, but they are also the minimum amount of data that can be downloaded in a single request, so there is no ability to reduce the conversion time since such time would already be spent downloading the data. Our system attempts to achieve a balance between minimal conversion and avoidance of redundant work by converting only the single requested field of the given timestep but doing so completely rather than being forced to read all the data multiple times by converting only one resolution level at a time. It also utilizes an internal cache of any downloaded data in case it is required to retrieve large conglomerations of fields and/or timesteps at one time. Although these delays are generally outside the control of the installation, understanding them is still vital to predicting and assessing the performance of the interactive workflow as a whole. 5.2.6 Application Scalability With Increasing Data Size The combined power of the EDSL and progressive runtime system enables interactive visualization and analysis of extremely massive climate simulation data. For example, the rank correlation analysis described in Section 5.2.3 shows that the application can interactively compute the results of a calculation involving more than a terabyte of data, whereas other data analysis applications would simply be unable to load a volume of data this large, and therefore such calculations would necessarily be performed offline. However, by specifying a diminished resolution and utilizing an incremental algorithm, our system was able to show the preliminary results of the calculation almost immediately, and 67 incrementally complete it for an interactively selected subregion over just a few minutes. Considering the current capabilities of the system, our proposed model can help show the scalability of the workflow as datasets become larger. Bill Putman at NASA recently presented the latest 1.5km simulation data. The resolution is 26414 x 13445 per 2D slice, which is 355M pixels and requires 1.42GB per 2D image. The very large 7km data we have been showing is much smaller, only 16M pixels per 2D slice (64MB per image). For the new data, a 3D field showing 100 pressure levels means that one of the 30-minute timesteps is 142GB, around 6.8TB per day for a single 3D field. We will consider the following two scenarios to evaluate the ability of our application to continue to incrementally produce results as data sizes increase in scale. The first scenario is to imagine the data are already stored using a multiresolution data format. In this case, the streaming computations can be performed using the same size of data per step as the much smaller datasets we have been demonstrating. This type of scalability is highly desired by scientists in order to facilitate continued analysis of even more detailed simulations, and the continuing interactivity is possible even for remotely located data. For the second scenario, we imagine the data remain in the legacy flat file format, and must be converted before being able to be utilized by the interactive analysis and visualization framework. As explained in Chapter 4, the components of interactive workflows must be considered comprehensively. In particular, Section 4.4 describes the importance of communication and nonblocking data movement for maintaining interactivity. The module we presented for on-the-fly data conversion will reorder the requested data into a multiresolution format, but as explained above, there might be some restriction on the minimum amount of data to be converted, and the ordering of the original data can serve as a bound on the time required to read any desired region, such that each request might require reading and writing gigabytes of data. Furthermore, the location of the converter versus the location of the data can dramatically affect the time required to reorder a given region, since, regardless of the size of the requested region, a converter located remotely from the data will still be able to download data only in the sizes available from the server that hosts them. In the dataset mentioned above, even a single timestep of a 3D field is 142GB, which is the smallest amount that could possibly be retrieved when data are stored in such formats. In the future, formalized versions of the proposed model can be used to more objectively identify similar bottlenecks in an overall workflow. 68 5.3 Summary In this chapter, we have demonstrated the use of our interactive analysis and visualization framework for applications ranging from climate analysis to combustion simulation. For the former, the on-demand data conversion system was required in order to access the petabytes of disparately located available climate simulation data. The utility of this module remains even as data increase in size, but using the model presented in Chapter 4, we were able to identify the increasingly challenging bottlenecks in the framework caused by the use of legacy data formats that store the simulation results in a flat file layout. The model proposed in this work enables both comparing of various implementations and suggesting improvements by identifying bottlenecks. Its formalization and use could aid the process of designing future applications for interactive spatiotemporal data analysis and visualization, and enable more formal analyses of these systems in order to make the development and assessment of interactive workflows more objective. CHAPTER 6 CONCLUSION AND FUTURE WORK In this work, we developed a framework upon which to build workflows for interactive analysis and visualization of arbitrarily large, disparately located spatiotemporal data ensembles, including domain-specific additions for specification of analyses, an underlying runtime to execute workflows in a manner that facilitated interactivity even as their underlying tasks increased in complexity or the data became larger or more remote, and a module to read data on demand if they were not already in the multiresolution input format required by the framework. The implementation of this framework inspired the creation of a generalized model to be used in the design and assessment of similar interactive systems. This model enumerates the required components and describes the design considerations that facilitate responsiveness to the user, as well as the necessary aspects for the integration of these components. Finally, we demonstrated our framework in a variety of realistic applications ranging from microscopy to petascale climate data analysis. To support the community, we built the climate data analysis application using open-source components and enabled its general distribution using public frameworks, including Docker and Anaconda. It is currently in operation as part of the Earth System Grid Federation (ESGF) server running at Lawrence Livermore National Lab. For the interactive framework, we introduced a simple yet expressive embedded domainspecific language that abstracts the location and resolution of input data volumes, enabling robust incremental updates of results, and facilitating the fastest possible convergence of these results. The associated runtime system facilitates dynamic performance tuning and loop ordering for faster convergence of incremental, in-progress results, multiresolution streaming for rapid cursory computations, and the ability to execute analyses locally or remotely depending on the location of the data and computational resources. The internal data format used by the runtime system enables efficient multiresolution data loading, fast access to regions-of-interest, multilevel architecture-independent caching, and transparent on-demand data conversion. As a whole, the framework enables truly interactive analysis 70 and visualization workflows for massive simulation ensembles, closing a gap in the existing technology. Although our work is focused on structured spatiotemporal datasets, similar concepts and language extensions could be applied for interactive analysis and visualization of unstructured data modalities. We utilized the following ideas in the course of the design and implementation of our framework to enable interactive analysis and visualization of arbitrarily large, disparately located spatiotemporal data ensembles: 1) granularity of access and computation to strike a balance between processing speed and interruptible progressivity; 2) modifying the order of processing in order to improve the rate of convergence of the results, such as by utilizing low-discrepancy sequences; 3) utilization of multiresolution data formats to facilitate interactive visualizations and cursory analyses; and 4) providing a mechanism to transparently load nonstreaming data in the multiresolution format, reordering the minimal amount possible for each request in order to maintain the interactivity of the rest of the workflow. These ideas can be used to guide and assess the design of other interactive spatiotemporal data analysis and visualization workflows. Upon reflection about the implementation of this framework, we generalized the underlying ideas to propose a more comprehensive abstract model that we hope will continue to be refined and formalized for use in the design and assessment of similar systems in the future. This model articulates the necessary components and considerations for the design and analysis of interactive workflows. The three components required for such systems are an interactive runtime, an appropriate multiresolution data layout and data distribution, and a suitable programming model. The three most important design considerations for user interactivity are the enablement of progressive results, the refinement of those results, and the interruptibility of the workflow at any given time. The three necessities for proper integration are the communication abilities of the components, the mechanisms of data movement, and the mechanisms used to encapsulate the state of the workflow. We demonstrated the effectiveness of the interactive framework using several realistic examples, including integration with systems that already produced multiresolution data and those that required access to massive, remote data ensembles stored in a legacy row-major format. These examples were able to highlight the strengths of the framework in regard to 71 increasing the scale of data and availability of computational resources, as well as the utility of multiresolution data access and dividing computation between client- and server-side systems, considerations that are all included as part of the proposed model. Because our system is intended to facilitate cursory data exploration, the primary focus has been on interruptible workflows and incremental production of results rather than optimization of any single computation. In order to evaluate the effectiveness of our interactive applications, we took into account all aspects of the data analysis and visualization processes, which often involve manual steps for data access and conversion and the use of multiple applications in addition to the actual computations. The work enables many time-saving advantages over existing applications, such as transparent multiresolution data access, automatic resampling, and remote computation. To perform similar analyses using existing techniques typically requires users to manually download and resample specific variables of interest to the system that will perform the computation, and then manually construct and execute the various scripts used for the analysis. Each step can be tedious and time-consuming, and this cumbersome process curtails dynamic exploration of the full space of analyses. In contrast, our lightweight system enables hypotheses to be more easily tested, and even allowed for the rapid discovery and validation of errors in a particular data ensemble, as described in the Annual Zonal Average case study from Section 5.2.2 of Chapter 5. The transparent data access enabled by the on-demand data reordering system provides a significant advantage in simplicity to enable multiresolution data access, but this system is intended as a measure to ease the transition for users of legacy data. Although computationally efficient, reordering large datasets that are not currently provided in a multiresolution format still requires substantial time to read and write the data. Our on-demand data reordering module works to convert the minimum amount of data per request, but we hope and anticipate that the use of multiresolution data formats will become more common, and our demonstration of the use of this type of data in order to facilitate unprecedented interactive access to massive datasets, such as the 7km NASA GEOS-5 Nature Run simulation, adds to the growing body of evidence to support adoption of such formats. The IDX format utilized for this work supports variable-size blocks and different data orderings within the blocks themselves (either row-major or Hz order), each of which provides trade-offs in terms of disk usage, access time, and compressibility. Data storage is an ongoing area of research, and the methods demonstrated by this work can utilize any similar type of data format to equal advantage. The runtime and EDSL we presented in this work could be augmented to utilize additional multiresolution formats. 72 The EDSL and runtime presented in this work are able to perform the execution of a script using local or remote resources. This flexibility can be used to construct dataflows that utilize a combination of resources, including remote servers, local systems, and GPU hardware. For future work, we would like to explore using a more dynamic selection of computational resources in order to make the best use of such distributed systems. We adopted JavaScript as our host language because it was simple to write our own interpreter, but in the future we intend to explore alternatives, such as Python, which we have already integrated with our framework and begun incorporating the language embeddings presented in this work. Finally, we hope to extend the runtime with the ability to cache derived fields such as averages so that they could be shared like any other data in the workflow. Our system could then be utilized with other applications such as UV-CDAT or VisIt as a preliminary data processing facility that updates local data to be visualized by those applications. A variety of choices could improve the system we created, including a more deliberate selection of where to perform all aspects of the computations and on-demand data conversions (e.g., server-side, client-side, or somewhere else in the cloud). These choices might become increasingly complex as systems grow and become more spread out. Utilizing runtimes like Legion or Charm++ might be a good path toward improving performance, but incorporating the assessment of these types of utilities themselves would be an additional challenge for future iterations of the abstract model we proposed. In addition, there are factors that remain outside our control, such as the format in which simulation or acquisition data are stored, and we hope this demonstration of a comprehensively interactive system, in particular when used with simulation software that already produce multiresolution output, such as S3D (see Section 5.1 in Chapter 5), will encourage the incorporation of this type of data format in future applications. In addition, adaptive multiresolution (AMR) versions of streaming data formats are already available, including PIDX [?]. Future versions of the framework could utilize these to enable streaming access and incremental computation of modern AMR data. This work has demonstrated the creation of a flexible, interactive, distributed computational framework, including the required language embeddings that facilitate incremental computation; a runtime foundation that enables the flexible execution of such programs, such as whether computations should be performed on the client or on the server; and a complementary on-demand data conversion service with caching to enable streaming access to datasets that have not already been converted to a streaming, multiresolution format. The creation of this framework prompted articulation of the necessary components 73 and considerations of these types of systems in general, so that they could be more easily constructed and objectively compared in the future. We used this model to identify potential bottlenecks in our workflow and to facilitate its future improvements. Data analysis and visualization workflows are not typically expressed comprehensively, but instead as limited portions of the workflow based on the focus of some particular work such as computation or visualization. Considered as a whole, however, a complete workflow might be modeled as a linear pipeline with stages that retrieve data from remote sources, extract subsets of that data, resample multiple inputs to a common resolution (i.e., regridding), perform computations, and visualize the results. The visualization step itself often requires further preprocessing in order to view the data interactively. In the future, we anticipate the formalization of such a model representing the flow of tasks for a given workflow as a generic graph in the spirit of petri nets used in [?], [?], and ConcurTaskTrees [?], [?], which would be augmented with additional information, including algorithmic scalability, computational dependencies, and data movement. We can assess this representation to identify the primary bottlenecks that affect performance and thereby facilitate the design of more effective interactive systems for use by scientists and data analysts. Eventually, we hope to utilize these models in a manner similar to [?] to enable the systematic design and optimization of these workflows. APPENDIX A EDSL REFERENCE In this appendix, we provide a full listing of the addional JavaScript functions added to create the EDSL implemented in the scripting engine described in this work. A.1 Built-in EDSL Functions Implemented in the Runtime //embedded loops of arbitrary order unordered(index,[start,end]) { ... } //embedded critical sections {{ ... }} //embedded implicit script output output; //push embedded ’output’ to the workflow doPublish(); //foundation framework functions made available Visus.Log(arg); Visus.Timer(); Visus.Elapsed(arg)); Visus.Assert(arg); Visus.StringTree.New(arg); Visus.StringTreeEncoder.encode(arg); Visus.StringTreeEncoder.decode(arg); Visus.Array.New(dims,arg); Visus.Array.getAt(input,point); Visus.Array.setAt(input,point,val); Visus.Array.Clone(input); Visus.Array.interleave(list); Visus.Array.innerRange(input); Visus.Array.innerAvg(input); Visus.Array.innerSdv(input); Visus.Array.innerMed(input); Visus.Array.convolve(input,kernel); Visus.Array.medianHybrid(input,kernel); Visus.Array.median(input,kernel,percent); Visus.Array.cast(input,dtype); 75 Visus.Array.convertTo(input,dtype); Visus.Array.sqrt(input); Visus.Array.add(inputs); Visus.Array.sub(inputs); Visus.Array.mul(inputs); Visus.Array.div(inputs); Visus.Array.min(inputs); Visus.Array.max(inputs); Visus.Array.avg(inputs); Visus.Array.sdv(inputs); Visus.Array.med(inputs); Visus.Array.shrink(input,npixels); Visus.Array.crop(input,box); Visus.Array.paste(dst,src,loc); Visus.Array.brightnessContrast(input,brightness,contrast); Visus.Array.threshold(input,level); Visus.Array.invert(input); Visus.Array.levels(input,gamma,in_min,in_max,out_min,out_max); Visus.Array.hueSaturationBrightness(input,hue,saturation,brightness); //standard JavaScript built-in functions function eval(jsCode); function trace(); function charToInt(ch); JSON.stringify(obj, replacer); JSON.encode(obj); JSON.decode(src); Object.dump(); Object.clone(); String.indexOf(search); String.substring(lo,hi); String.charAt(pos); String.charCodeAt(pos); String.fromCharCode(char); String.split(separator); Integer.parseInt(str); Integer.valueOf(str); Array.contains(obj); Array.remove(obj); Array.join(separator); Array.range(); Math.rand(); Math.randInt(min, max); Math.abs(a); Math.round(a); Math.min(a,b); Math.max(a,b); Math.clamp(x,a,b); 76 Math.sign(a); Math.PI(); Math.toDegrees(a); Math.toRadians(a); Math.sin(a); Math.asin(a); Math.cos(a); Math.acos(a); Math.tan(a); Math.atan(a); Math.sinh(a); Math.asinh(a); Math.cosh(a); Math.acosh(a); Math.tanh(a); Math.atanh(a); Math.E(); Math.log(a); Math.log10(a); Math.exp(a); Math.pow(a,b); Math.sqr(a); Math.sqrt(a); Math.floor(a); Math.ceil(a); APPENDIX B EXAMPLE SCRIPTS In this appendix, we provide examples for each of the analysis workflows discussed in this thesis. B.1 Running Average Listing B.1 Incremental computation of a running average of 3D climate data // // Running average temporal data #1 // dataset=input.day1950; field=’hus’; //specific humidity (3D) width=12; //12 months //query_time is a built-in variable start=Math.floor(query_time/30); Visus.Log(start); //compute running average output=Visus.Array.New(); var i=0; unordered(t,[start,start+width]) { var I=t[0]30; Visus.Log(I); f=dataset[field+"?time="+I]; {{ //running average (Welford’s method) output += (f-output)/(i+1); i++; }} doPublish(); } //show incremental result 78 Listing B.2 Incremental computation of a running average of climate data // // Running average temporal data #2 // Notes: // Resolution need not be specified in EDSL. dataset=input.day1860A; field=’tas’; //temperature at surface (2D) width=30; //one month start=query_time; //compute running average output=Visus.Array.New(); var i=0; unordered(t,[start,start+width]) { Visus.Log(t[0]); f=dataset[field+"?time="+t[0]]; {{ //running average (Welford’s method) output += (f-output)/(i+1); i++; }} doPublish(); //show incremental result } //display average avg=Visus.Array.innerAvg(output); Visus.Log("Average:"); Visus.Log(avg); //Compare our result with the actual monthly data, //Validating both simulation and computation mon=Math.floor(query_time/30); t0=input.mon1860A[field+’?time=’+mon]; diff=output-t0; rmse=Visus.Array.sqrt(diffdiff); Visus.Log("RMSE:"); Visus.Log(mean); 79 B.2 Ensemble Comparison Listing B.3 Comparison of multiple climate ensembles //multi-model ensemble comparison // Notes: // The script includes a helper function. // It uses a 2D multidimensional unordered iterator. // Different dimension data fields are used. function month_in_season(season,month) { if (season==0) { if (month==11 \|\| month==0 \|\| month==1) return true; } else if (season==1) { if (month==2 \|\| month==3 \|\| month==4) return true; } else if (season==2) { if (month==5 \|\| month==6 \|\| month==7) return true; } else if (season==3) { if (month==8 \|\| month==9 \|\| month==10) return true; } return false; } FGOALS=[input.FGOALS_r1,input.FGOALS_r2,input.FGOALS_r3,input.FGOALS_r4,input. FGOALS_r5,input.FGOALS_r6,input.FGOALS_r7,input.FGOALS_r8,input.FGOALS_r9, input.FGOALS_r10,input.FGOALS_r11,input.FGOALS_r12]; MIROC5=[input.MIROC5_r1,input.MIROC5_r2,input.MIROC5_r3,input.MIROC5_r4,input. MIROC5_r5,input.MIROC5_r6,input.MIROC5_r7,input.MIROC5_r8,input.MIROC5_r9, input.MIROC5_r10,input.MIROC5_r11,input.MIROC5_r12]; DATASETS=[FGOALS,MIROC5]; field=’tas’; //decadal seasonal average start=Math.floor(query_time/120); width=120; //months in a decade season=1; //spring: 2,3,4 //compute running average of each ensemble avg=[Visus.Array.New(),Visus.Array.New()]; I=[0,0]; avgall=Visus.Array.New(); Ai=0; 80 output=Visus.Array.New(); for (var x=0;x<2;x++) { //use of multidimensional unordered iterator //120 months, 12 ensemble members unordered(T,[[start,start+width],[0,12]]) { var M=T[0]; var R=T[1]; var calendar_month = M % 12; if (month_in_season(season,calendar_month)) { f = DATASETS[x][R][field+"?time="+M]; {{ avg[x] += (f - avg[x]) / (I[x]+1); I[x]++; }} output=avg[x]; doPublish(); } } {{ avgall+=(avg[x]-avgall)/(Ai+1); Ai++; }} } //Compare the averages of the two models. // NOTE: avg[0].dims != avg[1].dims // diff.dims is determined by runtime setting, // by default it will be the largest of the // dimensions of the two averages. diff=Visus.Array.sub([avg[0],avg[1]]); output=diff; 81 B.3 Rank Correlation Listing B.4 Rank correlation computation of two fields in a climate simulation //Pearson’s rank correlation of two variables over time (3D example) dataset1=input.BCPHILIC; dataset2=input.BCPHOBIC; field1=’BCPHILIC’; //hydrophilic black carbon field2=’BCPHOBIC’; //hydrophobic black carbon //initialize global arrays and output Sf=Visus.Array.New(); Sg=Visus.Array.New(); Sfg=Visus.Array.New(); Mf=Visus.Array.New(); Mg=Visus.Array.New(); output=Visus.Array.New(); start=query_time; width=2190; //three months in hours //incremental computation of average, std dev, //and correlation var i=0; unordered(t,[start,start+width]) { f=dataset1[field1+"?time="+t[0]]; g=dataset2[field2+"?time="+t[0]]; //critical section: update running average, //std dev, and correlation w.r.t. their //current values and the given index {{ var oldMf=Mf; var oldMg=Mg; //running average Mf=Mf+(f-Mf)/(i+1); Mg=Mg+(g-Mg)/(i+1); //running standard deviation Sf=Sf+(f-Mf)(f-oldMf); Sg=Sg+(g-Mg)(g-oldMg); //running correlation Sfg+=((oldMf-f)(oldMg-g))((i+0.0)/(i+1.0)); var Sdvf=Visus.Array.sqrt(Sf/i); var Sdvg=Visus.Array.sqrt(Sg/i); output=Sfg/(SdvfSdvgi); i++; }} doPublish(); //display incremental result } 82 B.4 Masking Field Based on Value Range Listing B.5 Creation of a 3D mask based on a specific range of the field //create mask from specific range of mixfrac field mf_min=0.30; mf_max=0.40; mf=input.TJ.mixfrac; mask=Visus.Array.threshold(mf,mf_min); // (mf_min,max] { tmp=Visus.Array.threshold(mf,mf_max); // (mf_max,max] tmp=Visus.Array.invert(tmp); // [min,mf_max] mask=Visus.Array.mul([mask,tmp]); // (mf_min,mf_max] } Y_OH=input.TJ.Y_OH; // apply mask to Y_OH field output=Visus.Array.mul([mask,Y_OH]) 83 B.5 Zonal Average Listing B.6 Computation of a zonal average in a 3D climate dataset //zonal annual mean (by latitude, y-t) var dataset=input.day1860A; var field=’tas’; //temperature at surface var width=360; //one year var start=query_time/width; var DIMS=dataset[field].dims; var output=Visus.Array.New([width,DIMS[1]],"float64"); unordered(t,[start,start+width]) { var f=dataset[field+"?time="+t[0]]; Visus.Assert(f.dims[1]==DIMS[1]); for (var I=0;I<DIMS[1];I++) { var fc=Visus.Array.crop(f,[[0,I],[DIMS[0]-1,I]]); var avg=Visus.Array.innerAvg(fc); var P=[t[0],I]; Visus.Array.setAt(output,P,avg[0]); } doPublish(); //show incremental result } //display average avg=Visus.Array.innerAvg(output); Visus.Log("Annual Average:"); Visus.Log(avg); 84 Listing B.7 Hourly rank correlation of two 3D fields in a climate simulation //zonal hourly rank correlation of two fields over latitude (y-t) dataset=input.aer1; field1=’SO2CMASS’; field2=’SO4CMASS’; width=744; //SO2 column density //SO4 column density //one month var nlon=dataset[field1].dims[0]; var nlat=dataset[field1].dims[1]; // domain is time, range is latitude output=Visus.Array.New([width,nlat],"Float64"); start=query_time; //each iteration computes correlation between latitudes //of the two fields at the given time (a vertical line) unordered(t,[start,start+width]) { f=dataset[field1+"?time="+t[0]]; g=dataset[field2+"?time="+t[0]]; //for each latitude, compute correlation for (var I=0;I<nlat;I++) { var fc=Visus.Array.crop(f,[[0,I],[nlon-1,I]]); var gc=Visus.Array.crop(g,[[0,I],[nlon-1,I]]); var favg=Visus.Array.innerAvg(fc)[0]; var gavg=Visus.Array.innerAvg(gc)[0]; var fsdv=Visus.Array.innerSdv(fc)[0]; var gsdv=Visus.Array.innerSdv(gc)[0]; var zf=(fc-favg)/fsdv; var zg=(gc-gavg)/gsdv; var zfg=Visus.Array.mul([zf,zg]); // corr=sum(i=[1,n], zfg_i) / n-1 // NOTE: We use the fast Array.innerAvg function, //then scale by n/(n-1) var corr=nlon*Visus.Array.innerAvg(zfg)[0]/(nlon-1); var P=[t[0],I]; Visus.Array.setAt(output,P,corr); } doPublish(); //display incremental result (output) } //print overall correlation correlation=Visus.Array.innerAvg(output); Visus.Log("Correlation:"); Visus.Log(correlation); [12] L. Hogrebe, A. R. Paiva, E. Jurrus, C. Christensen,M. Bridge, J. R. Korenberg, and T. Tasdizen, "Trace driven registration of neuron confocal microscopy stacks," presented at 2011 IEEE Int. Symp. on Biomed. Imaging: From Nano toMacro, pp. 1345-1348. [13] A. Grosset, M. Prasad, C. Christensen, A. Knoll, and C. Hansen, "Tod-tree: Task overlapped direct send tree image compositing for hybridMPI parallelism," in Proc. 15th Eurographics Symp. Parallel Graph. and Vis,. Eurographics Association, 2015, pp. 67-76. [14] G. Thiruvathukal, C. Christensen, and V. Vishwanath, "A benchmarking study to evaluate apache spark on large-scale supercomputers," IEEE Cloud, in submission, 2019. [15] V. Pascucci, G. Scorzelli, B. Summa, P.-T. Bremer, A. Gyulassy, C. Christensen, and S. Kumar, "Scalable visualization and interactive analysis using massive data streams," Advances Parallel Comput.: Cloud Comput. and Big Data, vol. 23, pp. 212-230, 2013. [16] V. Pascucci, G. Scorzelli, B. Summa, P.-T. Bremer, A. Gyulassy, C. Christensen, S. Philip, and S. Kumar, The ViSUS Visualization Framework. Boca Raton, FL:Chapman & Hall/CRC Computational Science, 2012, ch. 19, pp. 401-414. [17] H. Childs, E. Brugger, B.Whitlock, J.Meredith, S. Ahern, D. Pugmire, K. Biagas,M. Miller, C. Harrison, G. H.Weber, H. Krishnan, T. Fogal, A. Sanderson, C. Garth, E.W. Bethel, D. Camp, 0. Ru bel,M. Durant, J.M. Favre, and P. Navr atil, ''Visit: An end-user tool for visualizing and analyzing very large data," in High Performance Visualization Enabling Extreme-Scale Scientific Insight, Oct. 2012, pp. 357-372. [18] J. Ahrens, B. Geveci, and C. Law, "Paraview: An end-user tool for large data visualization," in The Visualization Handbook, 2005. [19] E. Santos, J. Poco, Y. Wei, S. Liu, B. Cook, D. N. Williams, and C. T. Silva, "UV CDAT: Analyzing climate datasets from a user's perspective," Computing in Science and Engineering, vol. 15, no. 1, pp. 94-103, Jan./Feb. 2013. [20] T. Maxwell, "Exploratory climate data visualization and analysis using dv3d and uvcdat," presented at High Performance Comput., Networking, Storage and Anal. (SCC), 2012 SC Companion, Nov. 2012, pp. 483-487. [21] OPeNDAP Data Access Protocol, http://www.opendap.org/. [22] E. Deelman, K. Vahi, G. Juve, M. Rynge, S. Callaghan, P. J.Maechling, R.Mayani, W. Chen, R. Ferreira da Silva, M. Livny, and K. Wenger, "Pegasus: A workflow management system for science automation," Future Generation Comput. Syst., vol. 46, pp. 17-35, 2015. [23] B. Lud ascher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger,M. Jones, E. A. Lee, J. Tao, and Y. Zhao, "Scientific workflow management and the Kepler system," Concurrency and Comput.: Practice and Expe rie nce , vol. 18, no. 10, pp. 1039--1065, 2006. [24] M. Bauer, S. Treichler, E. Slaughter, and A. Aiken, "Legion: Expressing locality and independence with logical regions," in Proc. Int. Conf. High Performance Comput., Networking, Storage and An al., ser. SC '12. Los Alamitos, CA, USA: IEEE Computer Society Press, 2012, pp. 66:1-66:11. [25] L. V. Kale and S. Krishnan, "Charm++: A portable concurrent object oriented system based on c++," in Proc. 8th Annu. Conf. Object-oriented Programming Syst., Languages, and Appl., ser. OOPSLA '93. New York, NY, USA: ACM, 1993, pp. 91-108. [Online]. Available: http:/ldoi.acm.org/10.1145/165854.165874 [26] S. Petruzza, A. Gyulassy, V. Pascucci, and P.-T. Bremer, "A task-based abstraction layer for user productivity and performance portability in post-Moores era supercomputing," presented at 3rd Int. Workshop on Post-Moores Era Supercomputing (PMES), 2018. [27] J. van Diggelen, R.-J. Beun, R. M. van Eijk, and P. J. Werkhoven, "Information supply mechanisms in ubiquitous computing, crisis management and workflow modelling," in TAMODIA/ HCSE, 2008. [28] G. Kindlmann, C. Chiw, N. Seltzer, L. Samuels, and J. Reppy, "Diderot: A domain specific language for portable parallel scientific visualization and image analysis," IEEE Trans. Vis. Comput. Graphics, vol. 22, no. 1, pp. 867-876, Jan. 2016. [29] P. Rautek, S. Bruckner, M. E. Grller, and M. Hadwiger, ''Vislang: A system for interpreted domain-specific languages for scientific visualization," IEEE Trans. Vis. Comput. Graphics, vol. 20, no. 12, pp. 2388--2396, Dec. 2014. [30] G. L. Bernstein, C. Shah, C. Lemire, Z. DeVito, M. Fisher, P. Levis, and P. Hanrahan, "Ebb: A DSL for physical simluation on cpus and gpus," CoRR, vol. abs/1506.07577, 2015. [Online]. Available: http://arxiv.org/abs/1506.07577 [31] F. Kjolstad, S. Kamil, J. Ragan-Kelley, D. I. W. Levin, S. Sueda, D. Chen, E. Vouga, D. M. Kaufman, G. Kanwar, W. Matusik, and S. Amarasinghe, "Simit: A language for physical simulation," ACM Trans. Graph., vol. 35, no. 2, pp. 20:1-20:21, Mar. 2016. [Online]. Available: http:/ldoi.acm.org/10.1145/2866569 [32] H. Choi, W. Choi, T. M. Quan, D. G. C. Hildebrand, H. Pfister, and W. K. Jeong, ''Vivaldi: A domain-specific language for volume processing and visualization on distributed heterogeneous systems," IEEE Trans . Vis . Comput. Graphics , vol. 20, no. 12, pp. 2407-2416, Dec 2014. [33] CAPS Enterprise, Cray Inc., NVIDIA, and the Portland Group, "The OpenACC Application Programming Interface vl.0," Nov. 2011. [34] L. Dagum and R. Menon, "Openmp: An industry standard API for shared-memory programming," IEEE Comput. Sci. & Eng., IEEE, vol. 5, no. 1, pp. 46--55, 1998. [35] G. Martinez, M. Gardner, and W. C. Feng, "Cu2cl: A CUDA-to-opencl translator for multi- and many-core architectures," presented at IEEE 17th Int. Conf. Parallel and Distributed Syst. (ICPADS), Dec. 2011, pp. 300-307. [36] G. F. Diamos, A. R. Kerr, S. Yalamanchili, and N. Clark, "Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems," in Proc. 19th Int. Conf. Parallel Architectures and Compilation Techniques, ser. PACT '10. New York, NY, USA: ACM, 2010, pp. 353--364. [Online]. Available: http:/ldoi.acm.org/10.1145/1854273.1854318 [37] N. B. J. Hoberock, Thrust: A Productivity-Oriented Library for CUDA. Burlington, MA: Morgan Kaufmann,2012,ch. 26,pp. 359--371. [38] J. K. R. Hornung, "The raja portability layer: Overview and status," in Technical Report LLNL-TR-661403. Lawrence Livermore National Laboratory, 2014. [39] H. C. Edwards, and D. Sunderland, "Kokkos array performance-portable manycore programming model," in Proc. Int. Workshop on Programming Models and Appl. for Multicores and Manycores (PMAM 12),pp. 1-10,2012. [40] H.-C. Hege, A. Hutanu, R. K ahler, A. Merzky, T. Radke, E. Seidel, and B. Ullmer, "Progressive retrieval and hierarchical visualization of large remote data," Scalable Comput.: Practice and Experience, vol. 6,no. 3,2001. [41] Y. Tian,S. Klasky,W. Yu,B. Wang,H. Abbasi,N. Podhorszki,and R. Grout,"Dynam: Dynamic multiresolution data representation for large-scale scientific analysis," presented at IEEE 8th Int. Conf. Networking,Architecture and Storage (NAS),Jul. 2013, pp. 115-124. [42] V. Pascucci and R. J. Frank,"Global static indexing for real-time exploration of very large regular grids," in Proc. 2001 ACM/IEEE Conf. Supercomput., Denver, CO, USA, November 10-16, 2001, CD-ROM, G. Johnson, Ed. ACM, 2001, p. 2. [Online]. Available: http://doi.acm.org/10.1145/582034. 582036 [43] S. Kumar, V. Vishwanath,P. Carns,B. Summa,G. Scorzelli, V. Pascucci,R. Ross,J. Chen, H. Kolla, and R. Grout, "PIDX: Efficient parallel 1/0 for multi-resolution multi dimensional scientific datasets," in Proc. IEEE Int. Conf. Cluster Comput., Sep. 2011,pp. 103-111. [Online]. Available: http://www.scj.utah.edu/pubhcabons/kumarl 1/Kumar ICCC2011.pdf [44] A. B. P. Welford and B. P. Welford,"Note on a method for calculating corrected sums of squares and products," Technometrics, pp. 419--420,1962. [45] D. E. Knuth, The Art of Computer Programming: Seminumerical Algorithms Vol. 2 (3rd Ed.). Boston,MA, USA: Addison-Wesley Longman Publishing Co.,Inc.,1997. [46] R. M. Karp, "On-line algorithms versus off-line algorithms: How much is it worth to know the future?" in IFIP Congress, vol. 12,1992,pp. 416--429. [47] S. Van Der Walt, S. C. Colbert, and G. Varoquaux, "The numpy array: A structure for efficient numerical computation," Comput. Sci. & Eng., vol. 13,no. 2,pp. 22-30,2011. [48] J. V. der Corput, ''Verteilungsfunktionen. i. mitt," in Proc. Akad. Wet. Amsterdam, 38,1935,pp. 813-821. [49] S. R. Lindemann and S. M. LaValle, "Incremental low-discrepancy lattice methods for motion planning," in Proc. ICRA '03 IEEE Int. Conf. on Robotics and Automation, Sep. 2003,vol. 3, pp. 2920--2927. [50] J. Halton, "On the efficiency of certain quasi- random sequences of points in evaluating multi- dimensional integrals," Numer. Math., p. 2:8490,1960.
Reference URL	https://collections.lib.utah.edu/ark:/87278/s67h7jws