Description |
On the forefront of research in many scientific fields, experts rely on experimentation and simulation to better understand phenomena occurring in nature and society. The goal of such studies is typically to understand the effect of one or more stimuli on the condition under study. The number of effective "stimuli," or inputs, can vary from a single parameter to hundreds of parameters in modern day experiments and often represent a superset of the minimal, necessary set of inputs needed to predict an outcome. It is therefore important to identify the most crucial ones and to also understand the behavior of the input space in an efficient and reliable manner as obtaining data under such circumstances can be expensive, dangerous, difficult, and/or time-consuming. In this work, we explore marrying existing methods for regression analysis and data mining based on probability, statistics, and geometry with a combination of a visualization and branch of topology known as Morse Theory. Specifically, we focus on obtaining more informative data with fewer samples through topology-aware adaptive sampling in three different settings, and extracting the maximal amount of information from preliminary data through structured sensitivity analysis. Lastly, I comment on existing limitations of the approximate topological model used for the aforementioned multidimensional data generation and subsequent analysis and explore improvements by augmenting the underlying geometric graph model. Namely, we improve the scalability through exploiting the GPU in order to perform topological analysis on the scale of hundreds of millions of data points in up to ten dimensions and changing the underlying graph model to allow for more robust downstream representations of the data for classification and topological analysis. The techniques provided are demonstrated on applications arising from the field of nuclear engineering safety analysis. |