| Title | Convex optimization-based structure-preserving filtering for polynomial-based numerical methods |
| Publication Type | dissertation |
| School or College | College of Engineering |
| Department | Computing |
| Author | Zala, Vidhi |
| Date | 2021 |
| Description | In the simulation sciences, capturing real-world problem features as accurately as possible is desirable. Methods popular for scientific simulations such as the finite element method (FEM) and the finite volume method (FVM) use piecewise polynomials to approximate various characteristics of the problem, such as the concentration profile of chemicals and the temperature distribution across the domain. Polynomial-based approximations of functions with finite data often do not respect certain structural properties of the functions. "Structure" in our context refers to fairly general types of linear inequality constraints, such as positivity, monotonicity, maximum principle, flux, and integral conservation, etc. In addition, polynomials are prone to creating artifacts such as Gibbs oscillations while capturing a complex profile. An efficient and accurate approach must be applied to deal with such inconsistencies in order to obtain accurate simulations that match the real-world expectations of the data. This often entails dealing with the occurrence of negative values while simulating the concentration of chemicals, a percentage value over 100, and other such issues. We consider these inconsistencies in the context of partial differential equations (PDEs). The solution we propose is the application of an innovative filter based on a convex optimization approach to deal with the structure-preservation problems observed in polynomial-based methods. We provide a variety of tests on various multivariate function approximations and time-dependent PDEs that demonstrate the efficacy of the method. We empirically show that rates of convergence and solution accuracy are unaffected by the inclusion of the structural constraints. |
| Type | Text |
| Publisher | University of Utah |
| Subject | convex optimization; finite element methods; partial differential equations; simulations; structure-preservation |
| Dissertation Name | Doctor of Philosophy |
| Language | eng |
| Rights Management | © Vidhi Zala |
| Format | application/pdf |
| Format Medium | application/pdf |
| ARK | ark:/87278/s64zrm6q |
| Setname | ir_etd |
| ID | 2100248 |
| OCR Text | Show CONVEX OPTIMIZATION-BASED STRUCTUREPRESERVING FILTERING FOR POLYNOMIAL-BASED NUMERICAL METHODS by Vidhi Zala A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science School of Computing The University of Utah December 2021 Copyright © Vidhi Zala 2021 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of Vidhi Zala has been approved by the following supervisory committee members: Robert Michael Kirby II , Chair(s) 12/3/2021 Date Approved Akil Narayan , Member 12/3/2021 Date Approved Christopher R. Johnson , Member 12/3/2021 Date Approved Varun Shankar , Member 12/3/2021 Date Approved Aaron L. Fogelson , Member Date Approved by Mary W. Hall , Chair/Dean of the Department/College/School of Computing and by David B. Keida , Dean of The Graduate School. ABSTRACT In the simulation sciences, capturing real-world problem features as accurately as possible is desirable. Methods popular for scientific simulations such as the finite element method (FEM) and the finite volume method (FVM) use piecewise polynomials to approximate various characteristics of the problem, such as the concentration profile of chemicals and the temperature distribution across the domain. Polynomial-based approximations of functions with finite data often do not respect certain structural properties of the functions. “Structure” in our context refers to fairly general types of linear inequality constraints, such as positivity, monotonicity, maximum principle, flux, and integral conservation, etc. In addition, polynomials are prone to creating artifacts such as Gibbs oscillations while capturing a complex profile. An efficient and accurate approach must be applied to deal with such inconsistencies in order to obtain accurate simulations that match the real-world expectations of the data. This often entails dealing with the occurrence of negative values while simulating the concentration of chemicals, a percentage value over 100, and other such issues. We consider these inconsistencies in the context of partial differential equations (PDEs). The solution we propose is the application of an innovative filter based on a convex optimization approach to deal with the structure-preservation problems observed in polynomial-based methods. We provide a variety of tests on various multivariate function approximations and time-dependent PDEs that demonstrate the efficacy of the method. We empirically show that rates of convergence and solution accuracy are unaffected by the inclusion of the structural constraints. To Loki and Sir Nick Furry. And all the doggos in the world. They make the world a better place! CONTENTS ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix NOTATION AND SYMBOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi CHAPTERS 1. 2. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 State-of-the-art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 5 MATHEMATICAL BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1 2.2 2.3 2.4 2.5 2.6 3. Structure-preserving function approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Riesz representors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Least squares problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Problem discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Geometry of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 SETUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1 Filter setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Constrained optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Solutions to the constrained optimization problem . . . . . . . . . . . . . . . . . 3.2 PDE applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Problem discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Degrees of freedom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Mass conservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Setup for dG experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Elementwise boundary value conservation . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Setup for cG experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. 15 15 17 20 21 24 25 26 27 29 ALGORITHM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.1 Algorithmic ingredients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Convex feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Greedy projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Averaged projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 30 32 37 4.1.4 Hybrid algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.5 Algorithms for polynomial subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.6 Nonidentity matrices A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.7 Finding global minimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Filtering algorithm and its placement in time-dependent PDE . . . . . . . . . . . . . 4.2.1 dG specializations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. NUMERICAL RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.1 Function approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Algorithm comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 More complicated constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.3 Function approximation filtering examples . . . . . . . . . . . . . . . . . . . . . . . . 5.1.4 Convergence rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.5 Constrained approximation as a nonlinear filter . . . . . . . . . . . . . . . . . . . . 5.2 PDE applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Time-dependent PDE using dG formulation . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 1D time-dependent PDE using cG formulation . . . . . . . . . . . . . . . . . . . . . 5.3 dG FEM implementation of the PAC model with filtering . . . . . . . . . . . . . . . . . . . . 6. 38 39 39 40 42 43 45 47 47 48 51 52 53 53 60 63 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 APPENDICES A. ALGORITHMS FOR UNIVARIATE POLYNOMIAL SUBSPACES . . . . . . . . . . . 67 B. PARAMETER SELECTION FOR LINE SEARCH USED IN GD-BASED MINIMIZATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 vi LIST OF FIGURES 1.1 Example application: Platelet aggregation and blood coagulation (PAC) model 2 1.2 Correction functions used to enforce positivity in a univariate polynomial approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.1 The steps in the procedure that find the distance between ṽ and the hyperspace boundary representing each violating constraint. . . . . . . . . . . . . . . . . . . . 20 3.2 Demonstration of filter application to the ADR problem using dG formulation. 24 4.1 A geometric depiction of one iteration of the filtering procedure. . . . . . . . . . . . 31 4.2 Correction functions for degree-5 polynomial approximation. . . . . . . . . . . . . . 34 4.3 Correction functions for degree-30 polynomial approximation. . . . . . . . . . . . . 35 4.4 A graphical view of the correction process of filter during 1 iteration. . . . . . . . . 38 4.5 Example lattice used for detecting structural nonconformity in a mesh. . . . . . . 42 5.1 Algorithm results from unusual constraints for f ( x ) = | x |. . . . . . . . . . . . . . . . . 48 5.2 Greedy algorithm results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.3 Test function f 2 for different polynomial spaces V and ambient spaces H. . . . . 49 5.4 Galerkin projection of example functions on a quadrilateral and hexahedron element, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.5 H = H 0 convergence results for projection of test function f = f 0 (top row) and f = f 2 (bottom row). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.6 A p-convergence study of the filtered and unfiltered Galerkin projections from Figure 5.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.7 Corresponding to the experiment shown in Figure 5.2, bar plot showing unconstrained projection coefficients magnitude |vej | vs various constrained e j |. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 projection coefficients magnitude |w 5.8 5.9 Corresponding to Figure 5.3, bar plot showing unconstrained projection coefficients magnitude |vej | vs various constrained projection coefficients mage j |. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 nitude |w Convergence study for filtered and unfiltered DG solution to the 1D advection PDE applied to u( x, t) = 0.5 sin(2px 2pt 0.5p ) + 1 for a time period of T = 1 second. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.10 The initial state f for convergence study of filtered 1D dG solution to (5.5). . . . 55 5.11 Convergence study for the filtered and unfiltered dG solution to 1D advection PDE with Figure 5.10 as the initial state for a simulation run to T= 1. . . . . 56 5.12 Average number of elements flagged per timestep during the experiments shown in Figure 5.11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.13 Percentage of total simulation clock time spent inside the filtering procedure during the convergence experiments shown in Figure 5.11. . . . . . . . . . . . . . . . . 57 5.14 A 2D composite mesh used for solving (5.6). It contains 16 triangles and 8 quadrilaterals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.15 Different views of state in 2D advection experiment. . . . . . . . . . . . . . . . . . . . . . . 59 5.16 A summary of advection experiment using polynomial order = 5 and Dt = 10 3 on the 3D cube domains meshed using different canonical elements individually. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.17 L2 errors per timestep and p-convergence of advection experiment on a homogeneous cubical mesh with 24 tetrahedrons. . . . . . . . . . . . . . . . . . . . . . . . . 61 5.18 Initial and final state of the function u for e = 25, c = 20, and D = 1, µ = 1. . . . 62 5.19 Convergence study for the filtered and unfiltered solution to the 1D cG diffusion-reaction PDE applied to the function f shown in Figure 5.18 with constant timestep Dt = 10 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.20 Solution for filtered and unfiltered PAC model at different timesteps. . . . . . . . . 65 viii LIST OF TABLES 5.1 Performance summary of three proposed algorithms on the test function f = f 2 for different values of e, where e is as described in Section 4.1.4. . . . . . . . . . 47 B.1 Results for 2D experiment on a quadrilateral domain to find the optimal ranges of gradient descent line-search parameters: c and g. . . . . . . . . . . . . . . . . 72 NOTATION AND SYMBOLS PDE FEM FVM FDM cG dG RK-4 ADR PAC Partial Differential Equation Finite Element Method Finite Volume Method Finite Difference Method continuous Galerkin discontinuous Galerkin classical Runge-Kutta of order 4 Advection-Diffusion-Reaction Platelet Aggregation and blood Coagulation ACKNOWLEDGEMENTS This work could not have been possible without the enormous trust and support from my advisors, especially Mike and Akil. I would like to specially acknowledge everything they taught me and thank them for everything they did for me – academically and otherwise! I thank my momma dearest for the (intermittent) support, the many fights and disapprovals from the best DeeDee in the world, my sweetest Jiju, the kindness of all of my family for (mostly) not disowning a black sheep like me, the patience of all my teachers with my attention span and their tolerance for my poor jokes, a very dynamic list of friends, the literary proficiency and motivation derived from coffee and beer, my therapists’ will-power, etc. I am grateful for the support from the following grants that backed this research: sponsorship by ARL under cooperative agreement number W911NF-12-2- 0023, NSF DMS-1848508, National Science Foundation under Grant No. DMS-1439786 and the Simons Foundation Institute Grant Award ID 507536, National Science Foundation under DMS-1521748 and the Army Research Office under ARO W911NF-15-1-0222. Lastly, my deepest gratitude to the University of Utah community for always looking out for the students and staff, enabling us to excel. Thank you for making the campus and my time here absolutely enjoyable! CHAPTER 1 INTRODUCTION In this chapter states the problem statement addressed by this dissertation, and motivates the need to find a solution. We provide a brief survey of existing techniques in the literature and the key points of our proposed solution. 1.1 Statement The goal of this research is to develop and implement convex-optimization based structure-preserving filtering algorithms for polynomial-based numerical methods. 1.2 Motivation Since the advent of numerical computing methods such as the finite element method (FEM) and the finite volume method (FVM) that solve partial differential equations (PDEs), the scientific computing community has advanced these methods with the goal of having computed solutions that emulate real-world phenomena. Many such numerical methods rely on piecewise polynomial approximations involving a linear combination of basis functions and is a foundational technique in numerical analysis and scientific computing. In many cases, the theoretical and practical feasibility of numerical methods depends on how closely the computed approximation follows the requisite physical structure. “Structure” in our context refers to fairly general types of linear inequality constraints, such as positivity, monotonicity, maximum principle, flux and integral conservation, etc. A violation in structure may arise from seemingly benign approximation properties; for example, polynomial approximations that yield Gibbs oscillations often still converge in the mean-square sense, but these oscillations can cause a violation of positivity. We see that the phenomenon of invalid solutions occurs frequently in many fields that employ polynomial-based methods for simulations. PDE models in which the importance of structure preservation can be observed include combustion problems, fluid flow prob- 2 lems, numerical weather predictions, and modeling of biochemical processes. Figure 1.1 shows examples depicting modeling issues resulting from a nontrivial application. If the solution at any point during the PDE simulation violates the implicit structure, the resulting computation may produce unphysical predictions, and may cause solvability issues in numerical schemes. These simulations are prone to instability or failure as a result of the feedback of an invalid solution from one timestep to another. Therefore, it is important to address the structure-preservation issue carefully such that the underlying properties, including convergence and accuracy, are not affected. In this dissertation, we present a general framework for preserving structure in polynomial-based methods. The solution we propose for maintaining the structure is designed in the form of a postprocessing filter that is applied to fix structural inconsistencies. The filter is based on a constrained optimization approach, and therefore, we use the term filter and optimizer interchangeably throughout the subsequent discussions. In summary, the contributions of this research are as follows: • We formalize a new model for preserving structure in polynomial-based numerical 1 0.9 0.8 0.7 20 10-6 0.6 15 0.5 10 0.4 5 0 0.3 -5 -0.65 0.2 -0.6 -0.55 0.1 0 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Figure 1.1: Example application: Platelet aggregation and blood coagulation (PAC) model. Left: Snapshot in time of a FEM solution for the platelet aggregation and blood coagulation model [26] showing the evolution of the fluid phase chemical Thrombin (FPe2) near the injury site of the vessel. The value of the concentration of FPe2 at a particular point in time is observed to be negative because of the concentration profile near the injury site, which, in turn, causes the simulation to fail. Right: A simple implementation of the sharp change in concentration in a combustion-like scenario using polynomial projection on a 1D domain. Inset shows the concentration at point x = 0.6 is negative (not physically meaningful), which can lead to failure of the simulation. 3 methods in [43]. This model can successfully be applied to function approximations and PDE solutions to not only preserve the canonical structure such as positivity and/or monotonicity as shown in [44], but also embed much richer, nontrivial structure. The structure is preserved throughout the entire domain and not just a finite subset of points in the domain. A particular advantage of our approach is that the formalism provides an abstraction encompassing many different forms of structure; e.g., the procedure for preserving positivity versus monotonicity is fundamentally the same. • We show that this model corresponds to a finite-dimensional convex optimization problem. We subsequently characterize the feasible set as an intersection of conic sets (see Theorem 3.1.1), and show that the optimization problem, and hence our structure-preserving approximation model, has a unique solution (see Theorem 3.1.2). • Our convex optimization problem can be cast as a problem of projecting onto a convex set (the feasible set). Unfortunately, the feasible set is not, in general, a polytopic region in coefficient space. Hence, a finite number of linear inequality constraints cannot characterize the feasible set. We instead characterize the convex feasible set as one with an (uncountably) infinite number of supporting hyperplanes. We use this characterization to develop two types of algorithms for computing the solution to the optimization problem. We also combine these two algorithms into a hybrid approach that is more efficient than either algorithm alone. • We explore additional algorithmic advances in [44] that are of interest to some PDE solutions: conservation of mass and preservation of boundary conditions and interelement fluxes by incorporating extra constraints into the filter. We also investigate an efficient gradient-descent-based approach to find the global minimum in multidimensional cases. • We demonstrate with numerical results that the algorithm results in satisfying de- sired constraints and preserves the structure of the solution in one dimensional (1D), 2D, and 3D in [43–45]. We also show that, for our examples, rates of convergence of 4 multidimensional polynomial approximation and PDE solutions using discontinuous and continuous Galerkin (dG and cG) formulations are unchanged compared to the unconstrained case. We further provide empirical evidence that this optimization procedure has a negligible impact on absolute accuracy for these solvers. All our investigations in this dissertation are limited to time-dependent PDEs. We explore a combination of implicit and explicit approaches to solving the PDEs. In order to compute solutions to the constrained optimization problem, the algorithms iteratively “correct” an unconstrained initial guess. For one of our algorithms, these corrections are essentially Dirichlet kernels for the approximation space. We visualize some correction functions for enforcing positivity in polynomial approximation in Figure 1.2. Although the solution we propose corresponds to a conceptually simple convex feasibility problem, the convex feasible set is “complicated”; in particular, the feasible set is not a finite intersection of simple convex sets, such as Euclidean halfspaces. Therefore, many of the standard convex optimization algorithms cannot be used to directly solve this problem. 1 1 1 0.5 0.5 0.5 0 -1 -0.5 0 0.5 1 1 0 -1 -0.5 0 0.5 1 0 -1 1 1 0.5 0.5 -0.5 0 0.5 1 -0.5 0 0.5 1 0.5 0 -1 -0.5 0 0.5 1 0 -1 -0.5 0 0.5 1 0 -1 Figure 1.2: Correction functions used to enforce positivity in a univariate polynomial approximation. Our algorithm adds scaled/combined versions of these functions to enforce constraints. Shown are corrections targeted to enforce positivity at x = 0.5. Correction functions are shown for polynomials of degrees 5 (top) and 30 (bottom). The columns correspond to corrections in different ambient Hilbert spaces. Left: L2 ([ 1, 1]). Center: H 1 ([ 1, 1]). Right: H 2 ([ 1, 1]). 5 1.3 State-of-the-art This section presents a survey of the existing solutions to the problem considered in this dissertation and briefly discusses their limitations. The problem of structure-preservation is of interest in many numerical applications. For something simple such as enforcing positivity at a finite number of points in the domain, the feasible set is much easier to characterize and results in the applicability of several off-the-shelf algorithms [11]. However, these approaches do not guarantee positivity on the entire domain, which our structure-preserving model does enforce. Another p class of techniques uses mapping methods. For example, if we approximate f and square the resulting approximation, then the squared approximation is guaranteed to be positive. Finally, there are more complicated successful approaches to construct positive approximations [14]. Although these approaches are attractive, such mapping functions are not easy to construct for more complicated constraints. Yet another approach is to adapt the basis; for example, by expanding a function in Bernstein polynomials that are positive on some domain, we can ensure the positivity of the approximation on that domain if all the expansion coefficients are non-negative. Therefore, one forms an approximation subject to the positivity of the coefficients. However, this approach does not yield polynomial reproduction even in simple cases. Consider the following basis for quadratic polynomials in one dimension: v1 ( x ) = 1 v2 ( x ) = (1 x )( x + 3) and v3 ( x ) = ( x + 1)(3 x2 , x ). Note that on [ 1, 1], these three functions are all non-negative. However, the (unique) representation of f ⌘ 1 (that is also non-negative) in this basis is f (x) = 1 1 1 v1 ( x ) + v2 ( x ) + v3 ( x ), 2 4 4 which clearly does not have positive expansion coefficients. Alternative approaches use an adaptive construction scheme for certain kinds of constraints [9]; our framework allows much more general constraints and is not restricted by dimension, although we consider only univariate examples. For narrative purposes, we partition the existing methods into two broad categories: nonintrusive and intrusive. Methods are considered to be nonintrusive if they constrain the solution obtained from 6 the solver with minimal, often superficial, change to the numerical scheme. Prominent nonintrusive methods include limiters applied at each timestep [30, 32, 37, 38, 46, 47]. These limiters typically affect only the design of numerical fluxes and not the underlying scheme. However, they must be specially designed for different kinds of constraints and approximation spaces. These limiters, therefore, lack flexibility with respect to discretization, spatial dimension, and type of structure/constraint. We also note that many different types of structure-preserving constraints can be considered. For example, the technique from [23, 47] can successfully impose maximum principle-based constraints on a scalar or vector field. The Mass-Variable Mass-Target Optimization-based Remap (MVMR-OBR) approach described in [2] prescribes mass and maximum principle conservation successfully on a discrete set of points in the domain. However, some applications require that although two fields u and v need not individually obey a maximum principle, the sum u + v must obey such a principle [23]. In such scenarios, standard maximum principle approaches cannot be employed. The solution we propose belongs to this nonintrusive category but attempts to mitigate the previously mentioned flexibility issues. We consider a continuous version of the constraint satisfaction problem that ensures the constraints are satisfied on all the points in the domain and not on just a discrete set of points. Furthermore, any arbitrarily high polynomial orders can use the proposed method without changes to the algorithm. The second class of methods are those that are intrusive. An approach is considered to be intrusive if it needs to substantially change the underlying numerical scheme or properties of the solution domain. The intrusive methods look at the structure-preservation problem as a PDE-constrained optimization problem. Some of these methods modify the spatial discretization [42], and others use limiters derived from Karush-Kahan-Tucker (KKT) optimality conditions [40]. Another example of intrusive approach [31] proposes an operator-splitting positivity-preservation method based on the energy dissipation law. The strategies suggested for constraint satisfaction in [2] consider a version of the problem that is a subset of the one solved by the proposed solution. In [1] and its extensions [2, 33], the authors explore approaches for positivity preservation that are based on basis functions derived from Bernstein polynomials such that they are non-negative and possess partition of unity property. Therefore, the interpolated solution respects the original bounds at any 7 point in the domain. While successful in imposing structure as part of the PDE discretization, intrusive approaches often suffer from the limitation of having to modify the numerical scheme, and the type of modification is both problem and constraint dependent. It requires substantial human intervention to change the discrete PDE solver. Methods involving changes in domain to solve this problem are also largely problem-specific and therefore lack flexibility. For example, the authors in [42] use an optimization problem incorporated in the scheme, and the solution is computed on a curved mesh that tracks discrepancies. To summarize, existing strategies in the literature to preserve structure typically come in the form of intrusive methods, requiring nontrivial modification to numerical schemes, or nonintrusive methods, which typically affect existing numerical implementations in benign ways. The procedure we propose in this dissertation falls into the latter category, and our framework handles very general constraints. Furthermore, the mathematical formulation is agnostic to the type of (linear, convex) constraint and the spatial dimension of the problem. The price we pay for this generality is that some nontrivial (but convex) optimization Chapter 3 must be performed. We note the prior theoretical investigation of error estimates for best structure-preserving approximation [5, 6, 21, 35]. The remainder of this thesis is structured as follows: Chapter 2 establishes a mathematical background to build and analyze the feasible set of the model and shows that a unique solution exists. In the first part of Chapter 3, we introduce notation, describe the types of constraints considered, and present the structure-preserving model. It details the application to PDEs and some additional discussion of the filtering procedure using geometrical interpretations. The second part of Chapter 3 describes the setup for the numerical experiments for filter application. Chapter 4 presents our proposed algorithms for computing solutions. Chapter 5 contains numerical results and demonstrations. Finally, Chapter 6 concludes the findings in this thesis and sets ground for future work. CHAPTER 2 MATHEMATICAL BACKGROUND In this section, we define our notations and present the concepts central to the filter’s design. 2.1 Let W ⇢ Structure-preserving function approximation d be a spatial domain. Whereas our setup and theoretical results are valid for general W and d 1, we use d = 1 (unless otherwise specified) for the simplicity of discussion in this section. The restriction affects only algorithms and not the model or mathematical properties of our discussion. Consider a Hilbert space formed from scalarvalued functions over W: H = H (W) := f :W! k f k2 := h f , f i , kfk < • , with h·, ·i = h·, ·i H the inner product on H. We are mostly concerned with “standard” function spaces such as L2 (W) or Sobolev spaces1 . Let V be an N-dimensional subspace of H, with {vn }nN=1 a collection of orthonormal basis functions, V = span {v1 , . . . , v N } , ⌦ ↵ v j , vk = djk , for j, k = 1, . . . , N and djk the Kronecker delta function. For example, if H is L2 (W) with W = [ 1, 1] and V is spanned by polynomials up to degree N 1, i.e., order = N, then one choice for the v j basis functions are orthonormal Legendre polynomials. 2.2 Riesz representors We consider the dual V ⇤ of V, i.e., the space of all bounded linear functionals mapping V to 1 We . The Riesz representation theorem guarantees that a functional L 2 V ⇤ can be formally define L2 and some Sobolev spaces in Section 5.1. 9 associated with a unique V-representor ` 2 V satisfying L(u) = hu, `i , 8 u 2 V. Furthermore, this L $ ` identification is an isometry. We will use these facts in what follows. Given L that identifies `, we consider the coordinates b̀j of ` in a V-orthonormal basis, `( x ) = N ⌦ ↵ b̀j = `, v j = L(v j ).  b̀j v j (x), j =1 Correspondingly, we have the following relations: k LkV ⇤ = k`kV = kb̀k2 , where k · k2 is the Euclidean norm on vectors in 2.3 ⇣ ⌘T b̀ = b̀1 , b̀2 , . . . , `c , N N. Least squares problems We are interested in a common least squares-type approximation problem. Suppose that u 2 H is an unknown function for which we have M pieces of data. We wish to construct an approximation p 2 V to u that best matches these data points. We now formulate this abstractly. Let f1 , . . . , fM be M linear functionals on H that are bounded on V. We assume that the observations u j M j =1 M j =1 = fj (u) ⇢ are available to us (and also bounded), and we seek to solve the optimization problem, M p = argmin  fj (v) v 2V uj j =1 2 . For example, if fj is a point evaluation (the Dirac mass) at some location x j 2 W for all j = 1, . . . , M, then the problem above is equivalent to M p = argmin  v( x j ) v 2V j =1 This problem has a unique solution if the matrix A 2 ( A)m,n = fm (vn ), u( x j ) 2 M⇥ N 1 n N, . with entries 1 m M, has the rank equal to dim V = N; otherwise, infinitely many solutions exist. This least squares problem is well understood and computational algorithms to solve it given data fj (u) are ubiquitous [12]. 10 For an overdetermined system, where M > N, the method of ordinary least squares can be used to find an approximate solution. Details for this case are discussed in Section 2.5 after the discrete formulation of this problem. 2.4 Constraints The previous section explains how a function p can be constructed from data. However, we are interested in a particular kind of constrained approximation. Our investigation can be motivated by the following examples of types of constraints: 0 for all x 2 W • Positivity: p( x ) • Monotonicity: p0 ( x ) 0 for all x 2 W ⇢ • Boundedness: 0 p( x ) 1 for all x 2 W. We consider bounds [0, 1] for the purpose of exposition. However, the formulation can be extended to general bounds [a,b] seamlessly. The central focus here is solving a linearly constrained least squares problem, where constraints of the above type are imposed. We now give the abstract setup of our constraints, which specializes to the examples above. Our abstraction defines K families of linear constraints; for fixed k 2 {1, 2, . . . , K }, family k is prescribed by the tuple ( Lk , rk , wk ): • wk : a subset of W. • rk : an element of V • Lk : for y 2 wk fixed, Lk (·, y) is a y-parameterized unit norm element of V ⇤ . Our kth family of constraints on v is Lk (v, y) Lk (rk , y), y 2 wk . (2.1) The subset of V that satisfies constraint family k is Ek := v 2 V Lk (v, y) Lk (r, y) for all y 2 wk . (2.2) The elements in V that satisfy all K families of constraints simultaneously are E := \kK=1 Ek . (2.3) We assume that E is nonempty, i.e., that the constraints are consistent. Constraints can be inconsistent, e.g., simultaneously enforcing f ( x ) 0 and f ( x ) 1. However, one 11 can create more subtle inconsistencies in more complicated settings. Our procedure does not provide a means to detect inconsistent constraints, in which case the algorithm will simply not converge. Thus, we rely on the user to ensure consistent constraints. (Note that a corresponding constrained problem has no solution if inconsistent constraints are prescribed.) Particularly important later will be the formula for the {v j } N j=1 -coordinates of the Riesz representor Lk . As in Section 2.2, Lk (·, y) for fixed (k, y) can be identified with its Riesz representor `k (·, y) 2 V and its corresponding expansion coefficients b̀k (y). The unit norm condition of Lk then implies k Lk (·, y)k2V ⇤ = b̀k (y) 2 2 N =  ( Lk (vn , y)) 2 = 1. (2.4) j =1 Example 2.4.1 (Positivity). Consider W = [ 1, 1], and let V be any N-dimensional subspace of L2 (W) \ L• (W). We seek to impose p( x ) 0 for all x 2 W. Thus, we have K = 1, and the linear operator L1 should be point evaluation, appropriately normalized. Note that point evaluation is not a bounded functional in L2 , but it is on the finite-dimensional space V. Formally, this is L1 (v, y) := l ( y ) v ( y ), v 2 V, where l(y) is chosen so that L1 has unit norm for every y 2 wk ; the negative sign is chosen so that we can reverse the inequality in (2.1). We set wk = W, and choose rk ⌘ 0 2 V. Then, the constraint (2.1) is equivalent to v(y) 0 for every y 2 W. The constraint set E1 defined in (2.2) is E1 = u 2 V u(y) 0 for all y 2 w1 . It will be useful here to also demonstrate how b̀1 (y) can be computed. For fixed y, we can identify L1 (·, y) via its Riesz representor `1 (·, y): `1 (·, y) := N l(y)  v j (y)v j (·) 2 V, l(y) = j =1 so that l(y)v j (y) N j =1 " N  j =1 v2j (y) # 1/2 , (2.5) are the entries of b̀1 (y). The formula for l results from the normalization condition (2.4). Thus, the coefficient vector b̀1 (y) 2 orthonormal basis {v j } N j =1 . N has explicit entries in terms of y and the 12 Example 2.4.2 (Monotonicity). With the same setup as the previous example, we take V as any N-dimensional subspace of L2µ (W) \ W 1,• (W), where W 1,• (W) is the Sobolev space of functions that are in L• (W) and whose derivatives are also in L• (W). Again with K = 1, we define L1 and its corresponding Riesz representor as L1 (v, y) := `1 (·, y) := t ( y ) v 0 ( y ), N  n =1 t (y)v0n (y)vn v 2 V, " ⇣ ⌘2 t (y) =  v0j (y) 2 V, N j =1 # 1/2 , where again t is determined using the normalization condition (2.4). With r1 ⌘ 0, then (2.2) enforces v0 (y) 0 for all y 2 W. Example 2.4.3 (Boundedness). With the same setup as Example 2.4.1, we take V as any Ndimensional subspace of L2µ \ L• , and we further assume that V contains constant functions. Let K = 2, and define the operators L1 and L2 as L1 (v, y) := v ( y ), v 2 V, L2 (v, y) := v(y), v 2 V, for each y 2 w1 = w2 = [ 1, 1]. Then, with constraint functions r1 ⌘ 0 and r2 ⌘ 1 2 V, we have that Ek , k = 1, 2 are the sets E1 = u 2 V u(y) 0 8 y 2 [ 1, 1] , E2 = u 2 V u(y) 1 8 y 2 [ 1, 1] , so that their intersection E in (2.3) is the set of elements u in V such that 0 u( x ) 1 for each x 2 W. Example 2.4.4. We can also form constraints on different subsets of W. With all the notation in the previous example, we change only: w1 = [ 1, 0), w2 = (0, 1], 0 for x 2 [ 1, 0) and u( x ) 1 for x 2 (0, 1]. so that E contains functions u satisfying u( x ) The above examples illustrate the generality of our notation and the intuitive simplicity of the types of constraints that we consider. A constrained version of a least squares problem thus is formulated as M p = argmin  fj (v) v2 E j =1 uj 2 . (2.6) 13 For example, if fj is a point-evaluation (the Dirac mass) at some location x j 2 W for all j = 1, . . . , M, then the problem above is equivalent to M p = argmin  v( x j ) v 2V u( x j ) j =1 This problem has a unique solution if the matrix A 2 ( A)m,n = fm (vn ), 2 M⇥ N . with entries 1 n N,1 m M, has the rank equal to dim V = N; otherwise, infinitely many solutions exist. This least squares problem is well understood and computational algorithms to solve it given data fj (u) are ubiquitous [12]. For an overdetermined system, where M > N, the method of ordinary least squares can be used to find an approximate solution. Details for this case are discussed in 2.5 after the discrete formulation of this problem. 2.5 Problem discretization We now formulate the constrained problem (2.6) via coordinates in the basis {v j } N j =1 , which results in a discrete form amenable to numerical computation. Any v 2 V has the expansion v( x ) = N  vbj v j (x), j =1 and the expansion coefficient vector vb := (vb1 , . . . , vbN ) T 2 element v 2 V. This identification defines subsets of Ck := ( c2 N N  c j v j 2 Ek j =1 ) ⇢ N , N N uniquely identifies the corresponding to the sets Ek : C := K \ Ck . (2.7) k =1 Then, the optimization problem (2.6) is equivalent to c = argmin k Avb vb2C bk22 , b j = f j ( u ). (2.8) This problem is again a least squares problem and, therefore, is easily solved in principle, but unfortunately the set C is a quite complicated subset of Nevertheless, C is convex, which is a fact we exploit. N in practice. 14 2.6 Geometry of sets We recall some basic properties of cones and convex sets and functions that we utilize. N. In all the discussion below, the ambient space is A set C is convex if, for every x, y 2 ∂C, lx + (1 l)y 2 C 8 l 2 (0, 1), A set C is a convex cone if, for every x, y 2 C, ax + by 2 C 8 a, b 0. The set C is an affine convex cone if it is the rigid translate of a convex cone, i.e., if C = D + z, where z 2 N and D is a convex cone. In this case, we call z the vertex of the cone. The convex sets we consider are generated by an uncountably infinite number of N supporting hyperplanes. Given y 2 H0 = x is and a 2 h x, yi = a . The hyperplane H0 separates H (y, a) := Note that H (y, a) is a closed set in N. n x2 N , a hyperplane H0 is a set given by N into two halfspaces, one of which o h x, yi a . A hyperplane H0 (y, a) with an associated halfspace H (y, a) is a supporting hyperplane for a closed convex set C if C ⇢ H (y, a) and if H0 (y, a) \ ∂C 6= ∆. CHAPTER 3 SETUP This chapter presents the setup for the filter application to function approximations and time-dependent PDEs based on the mathematical background and notations from Chapter 2. The focus is on the following kinds of time-dependent PDEs: diffusion-reaction and advection. We use the FEM with method-of-lines discretizations to solve the PDEs using continuous Galerkin (cG) and discontinuous Galerkin (dG) formulations respectively. However, the setup remains the same for any time-dependent PDE solution using a polynomial-based method. The filter behaves as a postprocessing step that preserves the desired structure at each timestep 1 and is agnostic to the numerical method used to obtain the solutions. Chapter 5 shows numerical examples using the same setup on element types such as triangle, quadrilateral, hexahedron, tetrahedron, and prism and meshes comprising these element types. 3.1 Filter setup We look at the formalization of the filter by breaking it down into the components necessary for our approach. The focus of this section is solving the optimization problem (2.8). 3.1.1 Constrained optimization problem The optimization problem defined by (2.8) appears simple since it features a quadratic objective, and the feasible set C is convex (which we show in the next section). The main difficulty here is that C is not a computationally simple convex set in N, and hence computing, e.g., projections onto this set, is difficult. To begin, we establish that the sets Ck and C are convex cones in N. These properties will be used in the construction of 1 Within a defined tolerance. For example, in all our positivity preservation examples, we consider the tolerance to be numerical zero 10 7 . 16 algorithms for solving (2.8). Before proceeding, we note that each inequality function rk 2 V for k = 1, . . . , K, can be translated into its vector of expansion coefficients: rk ( x ) = N b rk = (b rk,1 , . . . , b rk,N ) T .  brk,j v j (x), j =1 (3.1) The definitions of C and Ck immediately yield convexity and conic properties of these sets. Theorem 3.1.1. The set C is a closed convex set in affine convex cone in N N, and each for k = 1, . . . , K, Ck is a closed, with the vertex located at b rk . Proof. Convexity, closure, and conic structure are preserved under isometries. Due to the isometric relation between V and N, we can thus prove properties in one space, which extends to the other space. We first show that Ck is closed directly in Rewriting (2.7) using the definition of Ek , we have ( ! Ck = \ y 2 wk c2 N Lk N  cj vj, y j =1 L k (r k , y ) ) =: \ y 2 wk N: c k ( y ). By definition, ck (y) is actually a halfspace in N , ⇣ ⌘ ck (y) = H b̀k (y), Lk (rk , y) and hence ck (y) is a closed set. Therefore, Ck = \y ck (y) is also a closed set, and thus, C = \k Ck is a closed set. We will now show the convexity and conic properties in V: fix k 2 {1, . . . , K } and y 2 wk . Let v, w 2 V be two elements in Ek . For any l 2 [0, 1], Lk (lv + (1 l)w, y) = lLk (v, y) + (1 l) Lk (w, y) Lk (rk , y), where the inequality is true since v, w 2 Ek . Therefore Ek , and hence Ck , is convex. Thus, we also have that C is convex since it’s an intersection of convex sets. We next show that Ek is a cone with the vertex at rk , i.e., we must show that for any t 0 and v 2 V, we have Lk (rk + t (v L k (r k + t ( v rk )) Lk (rk ). This is true since rk )) = Lk (rk ) + t [ Lk (v) Lk (rk )] Lk (rk ), so indeed, Ek is a convex cone with the vertex at rk , and hence Ck is a convex cone with the vertex at b rk . 17 Despite their conic convexity, the sets Ck are not polyhedral in general and, are hence, “complicated” to computationally encode. Consider the setup of Example 2.4.1. If we change the definition of w1 to e 1 = { x1 , . . . , x P } ⇢ W = [ 1, 1], w e1 = C ( L1 , r1 , w e 1 ) is strictly larger than the for any arbitrary P < •, the new constraint set C constraint set C1 in Example 2.4.1. In particular, p 2 V satisfying p( x j ) does not imply that p( x ) 0 for j = 1, . . . , P 0 for all x 2 [ 1, 1] unless V has very special properties (for example, if V contains only certain piecewise constant functions). Note that the supporting e are P < • halfspaces in hyperplanes of the constraint set C N e is polyhedral and hence C (if nonempty). However, if V contains polynomials, it is easy to construct a polynomial e 1 but not non-negative on W. Hence, the constraint set C1 defined that is non-negative on w e1 , here defined by ( L1 , r1 , w e ). by ( L1 , r1 , w ) in Example 2.4.1 is strictly smaller than C e1 Nevertheless, such discretization approaches, i.e., approaches that use a finite set w as a surrogate for an infinite set W, are common and frequently effective algorithms for solving (2.8), as is commonly done in semi-infinite programming problems. However, in this thesis we present algorithms that insist on global satisfaction of the constraints, and hence, adopt alternative approaches. Thus, the main computational difficulty of our optimization problem is that the set C cannot be exactly represented as a polyhedron in general, and in particular that projections onto C are in general difficult to compute. 3.1.2 Solutions to the constrained optimization problem Our main goal in this section is to demonstrate the unique solution to our constrained optimization problem (2.8). The result is straightforward from the closed convexity of the constraint set and strict convexity of the objective function. Theorem 3.1.2. Assume that the design matrix A has rank N and the feasible set C is nonempty. Then, the constrained optimization problem (2.8) has a unique solution. Proof. The first step is to observe that since A has full column rank, we can write the problem in transformed coordinates as a convex feasibility problem (specifically as a projection problem). Let A = USV ⇤ be the reduced singular value decomposition of A. 18 Since rank( A) = N M, S is N ⇥ N, diagonal, and invertible; V is N ⇥ N and orthogonal; and U is M ⇥ N with orthonormal columns. N -orthogonal With PW the then (2.8) can be written as argmin k Avb vb2C projector onto a subspace W , and R( A) the range of A, bk22 = argmin PR( A)? b vb2C = argmin kSV ⇤ vb vb2C 1 = VS where SV ⇤ C := SV ⇤ y 2 N 2 2 + Avb PR( A) b 2 2 U ⇤ bk22 argmin kz z2SV ⇤ C U ⇤ bk22 , (3.2) y 2 C . Thus, (2.8) has a unique solution if and only if argmin kz z2SV ⇤ C U ⇤ bk22 (3.3) has a unique solution. Theorem 3.1.1 establishes that C is closed and convex; thus, SV ⇤ C is a linear transformation of a closed convex set, so it is also closed and convex. Therefore, (3.3) seeks the `2 ( N )-closest point to U ⇤ b from a nonempty, closed, convex set. The Hilbert Projection Theorem guarantees the existence and uniqueness of such a point. The study of existence and uniqueness of approximations under convex constraints is not new [29, 36]. Indeed, our result is a corollary of these earlier results, but we have presented a brief proof above in order to be self-contained. The filtering approach is a map F from a given function u (that may or may not satisfy structural constraints) in a finite-dimensional space V to a unique function F (u) 2 V, which does satisfy these constraints. We define a general class of constraints, corresponding to a feasible set in V that is an affine convex cone, and shows that F (u) is the projection of u onto this feasible set, which is an affine convex cone in V. Unfortunately, this cone is not a polytope, and convenient parameterizations of V result in representation as an intersection of an (uncountably) infinite number of halfspaces. This formulation and parameterization does not easily lend itself to existing algorithms, so we develop some novel algorithms, based on seminal convex feasibility algorithms [41]. We now briefly discuss the types of constraints we consider and some algorithms to implement them. 19 We define more notations to describe the algorithm. Suppose V is an N dimensional 1 subspace of a Hilbert space H, with {yj } N j=0 a collection of orthonormal basis functions, V = span {y0 , . . . , yN ⌦ 1} , ↵ yi , yj = dij , i, j = 0, . . . , N 1, where h·, ·i is the inner product on V, and dij , the Kronecker delta function. For a particular 1 constraint k 2 K, we can represent u 2 V in its coordinates {vbj } N j=0 collected in a vector v 2 N. N 1 Any u 2 V that does not satisfy the desired constraints can be represented as  ṽ j yj = ṽ T y. We collect the coefficients of expansion in a vector ṽ 2 j =0 N. The optimization problem (3.7) is therefore equivalent to argmin kṽ v2C v k2 , (3.4) where v is the filtered version of ṽ and obeys the constraints, k · k2 is the Euclidean 2-norm on vectors, and C is the affine conic region in N corresponding to E ⇢ V. Whereas the basis function yj represents any orthonormal basis function, in general, use of any basis function is generally possible as long as a transformation to the orthonormal basis is done prior to the application of the filter. Note that the filter operates on the coefficients of expansion ṽ while ensuring that the filtered v can be mapped back to the baseline set of basis function, thus, maintaining the constraint satisfaction on all the points in the domain. If K contains a single family of the form (2.2), then the set C can be written as C= \ Hx , x 2W where Hx are halfspaces in planar surface in N N. In other words, for a fixed x, Hx is an ( N 1)-dimensional defined by the single linear constraint (2.2). The algorithms in Chapter 4 proceed by computationally inspecting the signed distance function, ⇢ dist(ṽ, Hx ), x 62 Cx , s( x ) := sdist(ṽ, Cx ) = +dist(ṽ, Hx ), x 2 Cx . (3.5) Here, “inspection” means, for example, the ability to compute the global minimum of s( x ) and/or to determine regions where s is negative. Based on this inspection, the algorithms project the state vector of the current iterate ṽ onto Hy for some y 2 W, or perform relaxed/averaged versions of these projections. Geometrically, the projection operation corresponds to projecting ṽ onto a supporting hyperplane Hx for C, which we depict in Figure 3.1. 20 Figure 3.1: The steps in the procedure that find the distance between ṽ and the hyperspace boundary representing each violating constraint. Subsequently, the algorithm greedily calculates the correction to ṽ. Left: A geometrical visual of the distance calculation from ṽ to the hyperplanes defining boundaries of the violating constraint. Right: Selection of Hy4 over Hy5 since it defines the hyperplane farther away compared to other violating constraints. Projection of ṽ on to Hy4 . The process is repeated until numerical convergence up to a tolerance (i.e., until the minimum value of the signed distance function is numerically 0). Thus, this algorithm is a type of generalized iterative cyclic/alternating algorithm applied to the case of an infinite number of convex sets (halfspaces). The incorporation of different structures of interest into one optimization problem is generalizable for an arbitrary number of linear constraints. However, increasing the number structures reduces the degrees of freedom for the filter accordingly. For example, positivity and mass conservation structural constraints can be imposed simultaneously using the same filter design, with restrictions as detailed in the subsequent sections. The algorithm that seeks to solve the optimization problem (3.7) is the key ingredient in our approach to preserve structure in time-dependent PDE simulations. 3.2 PDE applications We now discuss the application of the filter to time-dependent PDEs in multiple dimension. We will primarily focus on method-of-lines with a Galerkin-type spatial discretization. The setup for filter application to preserve structure for PDEs uses the main algorithmic ideas from the previous section, which is a major ingredient for our approach. Let W ⇢ d be a physical domain and V be a finite-dimensional Hilbert space of real- 21 valued functions on W (for example, polynomials up to some fixed, finite degree), and suppose u 2 V is a given function. The approach in Section 2.1 considers families of constraints, each of the form x 2 W, L x (u) `( x ), (3.6) where L x is a linear operator that is bounded on V, and ` is a function on W. The feasible set in V adhering to such a constraint family is convex. Note that the framework in Section 2.1 allows a finite number of such families to be considered simultaneously, so that boundedness, e.g., enforcing 0 u( x ) 1, is also a valid constraint. The families of constraints correspond to a feasible set K ⇢ V corresponding to the elements of V that satisfy the constraints. We solve the convex feasibility problem F (u) := argmin ku k2K k kV , (3.7) which is well posed. If 0 2 K, then this optimization problem is norm-contractive (Proposition 1) and therefore, can be interpreted as a filter. Proposition 1. Let E ⇢ V be a nonempty, closed, convex set in H. Given some v 2 V, let ṽ be the solution to (2.6) (i.e., also the solution to (2.8)). If 0 2 E, then, kṽk kvk. Proof. Projections onto closed convex sets in Hilbert spaces are nonexpansive [16]. I.e., kṽ P(0)k kv 0k, where P : V ! E is the projection operator from V to E. Since 0 2 E, then P(0) = 0. 3.2.1 Problem discretization In this section, we construct a discretized problem using 1D as an example for simplicity. However, the formulation does not change for any number of dimensions. Consider an advection-diffusion-reaction system in 1D defined as ut ( x, t) + a · u x ( x, t) = Du xx ( x, t) + r (u( x, t)) (3.8) where x 2 W, a is the velocity for advection, D > 0 is the coefficient of diffusivity, and r is a nonlinear function representing the reaction term. For simplicity, assume a and D to be 22 constant. Assuming appropriate initial u0 ( x, t = 0) and boundary conditions are defined for (3.8), we can formulate its semidiscrete form as follows: Galerkin-type methods assume an ansatz for u as a time-varying element of a fixed N-dimensional linear subspace V, where frequently V ⇢ L2 (W): u( x, t) ⇡ u N ( x, t) := N 1  i =0 ṽi (t)fi ( x ), V = span{f1 , . . . , fN }. (3.9) Note that the basis functions fi used in (3.9) represent the traditional FEM basis functions that span the entire W and are not necessarily orthogonal. We denote the orthonormal basis functions by yj . We assume that both basis sets span the same space and thus, a transformation between them exists. For example, FEMs partition W into E nonoverlapping subintervals. {e0 , · · · , eE 1 }. As a next step, the method assumes that the continuous functions in V are polynomials of a fixed degree N on each element ei ⇢ W 8i = 0, · · · , E 1. The discontinuities in derivatives are allowed only at partition boundaries. Similarly, dG FEMs define V in a similar way except that elements are allowed to be discontinuous at partition boundaries. The semidiscrete form for (3.8) is derived in the standard Galerkin way, by using the ansatz (3.9) and forcing the residual to be L2 -orthogonal to V. Usually, integration by parts is performed in the residual orthogonalization step, and often, depending on the equation and spatial discretization, a numerical flux and/or stabilization terms are included in the resulting weak formulation. The result is a system of ordinary differential equations prescribing time-evolution of the discrete degrees of freedom represented by vector ṽ = {ṽ0 , . . . , ṽ N M ∂ ṽ + Aṽ = ∂t 1 }. (3.10) DLṽ + r + F (ṽ) where M, A, and L are the N ⇥ N mass, advection, and Laplacian (stiffness) matrices, respectively, defined as ⌦ ↵ ( M)i,j = fi , fj , ( A)i,j = ⌧ ∂ fi , a fj ( x ) , ∂x ( L)i,j = ⌧ ∂ ∂ fi , fj . ∂x ∂x The N-vector r, defined as r= N 1  bri (t)fi (x), j =0 V = span{f1 , . . . , fN }, 23 has entries that are numerical approximations to the integral of the nonlinear reaction term, b ri (ṽ, t) ⇡ Z r (u N , t)fi ( x )dx, that we compute with a collocation-based approach. Finally, the term F (ṽ) is a generic term for any numerical fluxes or stabilization terms. For example, in an advectiondominated problem with a dG formulation, F might be the upwind flux corresponding to the continuous advection term au x . Finally, a fully discrete scheme is derived from (3.10) using appropriate discretizations. Let vn represent the solution of (3.10) at timestep n, and let us call ṽn+1 the solution at the timestep n + 1. For simplicity, assume that the solutions vn and ṽn+1 are transformed into orthonormal basis yj and back to original basis f I inside the filter as the first and last steps. This discrete scheme does not enforce the structural properties that we desire. As discussed in the Section 2.1, given a constraint set C ⇢ N, if vn 2 C, then it is not necessarily true that ṽn+1 2 C. To rectify this situation, we employ the optimization outlined in Section 2.1 as a postprocessing step. We will hereafter refer to this optimization as a nonlinear filter due its norm-contractivity properties for the types of constraints we consider proposition 1) Thus, our proposed procedure is a simple, nonintrusive augmentation of the standard fully discrete scheme (3.10): vn Timestepper for (3.10) ! ṽn+1 Filter ! v n +1 . (3.11) 1 bj (t)fj ( x ) and introduce it as the Thus, we obtain the filtered solution vn+1 =  N j =0 v state in the next step of the fully discrete scheme. The effect of the filter for an example of constraint: positivity preservation for ṽn is demonstrated in Figure 3.2, which shows a snapshot in time of an advecting wave. The property of interest here is the positivity, and in this particular example, a dG formulation is used as the PDE solver. The process of enforcing positivity (see inset) leads to changes in solution properties; in particular, elementwise boundary values and the mean of the discrete solution are not preserved. A change in element boundary values also changes the corresponding (say, upwind) fluxes at the next timestep. Hence, in this particular case, the filter changes physical behavior (numerical fluxes and mean value). We will assume hereafter that numerical fluxes are computed as explicit functions of left- and right-boundary values. Hence, enforcing element boundary value conservation 24 1 0.8 0.01 0.6 0 0.4 -0.01 0.76 0.2 0 0 0.2 0.4 0.6 0.78 0.8 0.8 1 Figure 3.2: Demonstration of filter application to the ADR problem using dG formulation. Numerical solution to the advection-diffusion-reaction (ADR) PDE using dG formulation with and without application of the positivity-constraining filter. An IMEX method (CNAB-2) has been used to implement the solution, with RK-4 timestepping. Shown are the exact solution, the unconstrained solution (i.e., the solution via the scheme (3.10)), and the constrained solution (the solution via (3.11)). for a dG solver becomes necessary to ensure that numerical fluxes at future times are unchanged by this procedure. One remedy for this particular issue is to impose additional equality constraints (i.e., function values at element boundaries) in the filter. The preservation of the mean value (e.g., total mass) can also be enforced in a similar way. Later in this section we elaborate on the construction of the filter for these additional constraints. Note that, between cG and dG discretizations, element boundary value constraints are relevant (for the purposes of conserving numerical fluxes) only for dG-type discretizations. 3.2.2 Degrees of freedom In order to describe our enforcement of equality constraints, we need to introduce some new notation that is specific to the physical discretization. Since dG discretizations have degrees of freedom that are decoupled across elements, the N degrees of freedom in the optimization described in Section 2.1 only need to be over element-local degrees of freedom. As an example, the relationship between the polynomial order of discretization in 1D (N) and the degree of freedom for a quadrilateral is given as follows Degree of freedom of the filter = ( N )2 . (3.12) In constrast, standard degrees of freedom in cG simulations (i.e., coefficients of “hat” and “bubble” functions) are coupled in L2 across elements. 25 3.2.3 Mass conservation A similar approach for mass (integral) conservation can be obtained for both cG and dG discretizations using a procedure that is essentially identical to the one described in the previous section. For example, in a cG discretization with orthonormal coordinates w̃, we might wish to perform the optimization (3.4) for w subject to the equality constraint, Z N 1  W i =0 wi yi ( x )dx = M := Z N 1  W i =0 w̃i yi ( x )dx, 1 where we recall that {yi } N j=0 are the orthonormal basis functions. This is equivalent to the equality constraint T q w = M, q j := Z W yj ( x )dx, and therefore, the procedure of Section 3.3.1 can be leveraged (with only one equality constraint and with an N-dimensional state vector). This process leads to the global conservation of mass. In many cases, having local mass conservation and constraint satisfaction across individual elements is desirable. This type of problem is possible to formulate by careful reconstruction of constraints without any changes to the algorithm. The constraint on the entire domain can be broken down per element, resulting in “E” constraints, where E = number of elements in the domain. Therefore, the constraint set looks like the one described in (3.15). The only difference in the problem formulation will be the definition of the constraint set, which will be a matrix of size N ⇥ E, with each column representing a mass conservation condition for each element, respectively. Since we use a linear programming solver in the algorithm, with the increase in number of constraints, the complexity of the solver increases proportionally. The complexity can be estimated by following the constraint set formulation and its properties in Theorem 3.1.1. For dG discretizations, the same idea can be applied, except that straightforward mass conservation couples all elements. To retain the elementwise decoupling efficiency, we impose a stronger condition that the mass per element remain unchanged. This condition allows us to use the same procedure as in Section 3.3.1, performing E size-P optimizations. In particular, we can simultaneously enforce both element boundary preservation and elementwise mass conservation with three equality constraints. Note that these procedures are generalizable for arbitrary linear constraints. In particular, if we have K linear constraints, then the dimension of the optimization problem can be 26 reduced to an ( P K )-dimensional problem using the procedure described in Section 3.3.1. This reduction, in turn, means that this optimization is meaningful only if the original size of the problem is more than K. For example, for dG simulations we can accommodate only P 1 linear constraints per element if there are P degrees of freedom per element. 3.3 Setup for dG experiments With a partition consisting of E subintervals of W, a dG discretization allows us to convert the optimization (3.4) in an N-dimensional space into a much more efficient and parallelizable set of E independent optimizations each in P := N/E ⌧ N dimensions. First, we recognize that a dG expansion of the form (3.9) can be written as u N ( x, t) = E P 1   ṽe,k ye,k (x), N = EP e =1 k =0 where {ye,k (·)}kP=01 for a fixed e are polynomials whose support is only on element e. In particular, these element-local polynomials can be chosen as L2 (W)-orthonormal (e.g., mapped Legendre) polynomials. Therefore, ⌦ ↵ ye,k , y f ,` = 0, k, ` 2 {0, . . . n 1}. (3.13a) The second observation we make is that the linear constraint operator L x defined in (3.6), for common examples we consider, also obeys this type of decoupling. If {We }eE=1 is the partition of W, then we assume that L x (ye,k ( x )) = 0, x 62 We . (3.13b) This assumption is satisfied for point and derivative evaluation, i.e., for L x (u) = u( x ) and L x (u) = u0 ( x ). Under conditions (3.13), both the constraint and the L2 norm are decoupled across elements, and so we have the following result: Proposition 2. Assume that conditions (3.13) hold. Given any ṽ 2 N, n-dimensional subvectors, each containing element-local degrees of freedom: 0 1 0 1 ṽ1 ṽe,0 B ṽ2 C B C B C .. ṽ = B . C , ṽe = @ A. . @ .. A ṽe,n 1 ṽE partition it into E (3.14) 27 Then the solution to (3.4) is given by argmin kv v2C 0 B B ṽk2 = B @ where the n-dimensional sets C e are defined as ( C e := w = ( w0 , . . . , w n 1) T 2 argminw2C1 kw argminw2C2 kw .. . argminw2C E kw n 1 ṽ1 k2 ṽ2 k2 C C C, A ṽE k2 n 1  wk ye,k (x) r(x) 8 x 2 We k =0 ) . We emphasize again that dG discretizations satisfy (3.13) so that Proposition 2 allows us to conclude that our optimization (for example constraining positivity and/or monotonicity) can be decoupled to operate on individual elements. The decoupling can result in substantial computational savings since frequently one N-dimensional optimization is much more expensive than E N/E-dimensional optimizations. Remark 1. The elementwise decoupling conclusion of Proposition 2 holds under slightly more general conditions. For example, in so-called p-adaptive simulations, the number of local degrees of freedom may depend on the element index, i.e., P = order of polynomial used for that element. However, we do not pursue p-adaptive simulations in this work, so this level of generality is not needed. 3.3.1 Elementwise boundary value conservation The goal in this section is to enforce the element boundary constraints to address the changes caused by filter application in physical behavior (numerical fluxes and mean value) for dG solutions. For simplicity of exposition, we assume in what follows that {ye,k }nk=01 are L2 -orthonormal. The procedure we describe can be extended to the case when this assumption is not satisfied. With {We }eE=1 the subinterval partition of W, we seek to solve the equality-constrained optimization problem, min kv v2C ṽk2 such that n 1 n 1 k =0 k =0  ṽe,k ye,k (xe± ) =  ve,k ye,k (xe± ) 8 e 2 {1, . . . , E}, (3.15) where ṽe,k and ve,k are components of the elementwise subvector partition of ṽ and v, respectively, that was introduced in (3.14). The point values xe± are left- and right-hand side values of subinterval e, We = [ xe , xe+ ]. As described in Proposition 2, the (non-equalityconstrained) dG optimization problem can be decoupled into elementwise operations. 28 Adding in the elementwise constraints described above does not change this decoupling property (since the boundary values on element e are independent of the degrees of freedom on any other element). The resulting optimization on element e has the form min kw w2C e ṽe k2 such that Q T (w (3.16) ṽe ) = 0, where Q is an n ⇥ 2 matrix with orthonormal columns q1 , q2 satisfying span {q1 , q2 } = span ye ( xe ), ye ( xe+ ) , ye ( x ) := (ye,0 ( x ), · · · ye,n 1 ( x )) T . In order to solve this n-dimensional linear-equality-constrained optimization problem, we reduce it to an (n 2)-dimensional optimization problem of the standard form (3.4) by working in the n 2 coordinates corresponding to R( Q)? . To set up for this procedure, let { p1 , p2 , . . . , pn P2 n⇥(n 2) , 2} be an(y) orthonormal completion of {q1 , q2 } in p1 P= p2 · · · p n 2 n, and introduce . Then, our equality constrained optimization problem (3.16) is equivalent to the nonequality-constrained problem P T ṽe min z b is the n where C ( b := z 2 C 2 -dimensional n 2 b z2C set, n 3  z j ybj (x) b̀(x) j =0 ) 2 (3.17) b̀( x ) := `( x ) , bj }n 3 are the P-projected functions, and {y j =0 yeT ( x )P =: , b0 ( x ) · · · y bn y 3 (x) ye ( x ) T QQ T ṽe , . Therefore, the reduced problem (3.17) is precisely a special case of our problem (3.4), and all the same tools described in Section 2.1 are applicable. The signed distance function in (4.10) requires knowledge only of the appropriate n The formula precisely is ⇣ s( x ) = l( x ) L x (z) ⌘ b̀( x ) , 2 projected orthonormal functions. l2 ( x ) = 1 bj ( x ))2 Ânj=03 ( L x (y , bj ( x ). Once this solution, say z⇤ , is computed, then the solution to (3.15) is and z = Ânj=03 z j y the n-dimensional vector QQ T ṽe + Pz⇤ . 29 To summarize, we wish to solve the N-dimensional problem (3.15), which simultaneously enforces inequality constraints (such as positivity) and preserves element boundary values (which we take as a proxy for preservation of numerical fluxes). This problem is equivalent to solving (3.16) for every element index e. For a fixed e, this latter problem, in turn, is equivalent to the (n 2)-dimensional problem (3.17), which can be solved with the techniques outlined in Section 2.1. Note in particular that all the matrices involved in this section can be precomputed and stored for use in online time-dependent simulations. Furthermore, if all physical elements are templated on a standard (reference) element, as is common in FEM code, then only one copy of all these size-n matrices is required for the entire simulation. 3.4 Setup for cG experiments In contrast to the dG framework, in cG discretizations, typically both the constraint operator (e.g., point evaluation operator) and the L2 norm are coupled across elements. Thus, there is no straightforward decoupling procedure that can be leveraged. Instead, the full optimization problem (3.4) in N-dimensional space must be solved. An additional difficulty with this optimization in cG simulations is that the discussion and algorithms presented in essentially require an expansion in orthonormal basis functions, i.e., (3.9) must be expressed as u N ( x, t) = N 1  j =0 ṽ j (t)fj ( x ) = N 1  j =0 w̃ j (t)yj ( x ), ⌦ ↵ yj , yk = dk,j . (3.18) Typically, cG simulations utilize nonorthonormal hat and bubble functions as degrees of freedom [24], and therefore, the use of this optimizer requires transformation of the hatbubble coordinates into an orthonormal set of coordinates. To accomplish this, (a Cholesky factor of) the mass matrix M must be inverted. Typically M is sparse, but its inverse Cholesky factor is usually not, which poses a challenge for large-N simulations when sizeN dense matrix linear algebra is computationally infeasible. In this dissertation, we consider cG simulations only in one spatial dimension where N is small enough so that direct inversion of M is feasible. However, more sophisticated procedures would be needed to extend this approach to two or three spatial dimensions. CHAPTER 4 ALGORITHM This section describes some algorithmic considerations of our approach. We emphasize that although in previous sections we have described details in different ways for cG versus dG discretizations, the fundamental tool, i.e., the algorithms in Section Section 2.1, are identical. The differences manifest only when we need to fit these types of discretrizations into the general framework of that section. 4.1 Algorithmic ingredients This section will also highlight some algorithmic differences between the cG and dG discretizations that surface in the use of our optimization filter. Note that, regardless of the discretization used, this filter preserves L2 stability properties since it is norm-contractive. For example, if a CFL condition for L2 stabilty of an unconstrained solver is used to determine a stable timestep value, then a filtered version of this solver will not require a change in timestep value to maintain this stability property. 4.1.1 Convex feasibility We now concentrate on solving the problem defined by (2.8), equivalently (3.3). To simplify the presentation, we will assume first that A = I so that both (3.3) and (2.8) reduce to argmin kc c2C bk22 , (4.1) i.e., a standard problem of projecting b onto a convex set C. The main bottleneck to applying standard optimization tools is that the feasible set C is not easily defined in terms of a finite number of conditions on c. The difficulty in our problem is not in minimizing the objective function, but instead the convex feasibility problem, i.e., to identify points in the convex feasible set. 31 Some of the most successful algorithms for solving the convex feasibility problem are alternating- or splitting-type algorithms. If C1 , . . . , Cr are convex sets with nonempty intersection C, these algorithms assume that projection onto any one of these sets is computationally feasible. A solution to (4.1) can be computed by alternating these individual projections. The original projection onto convex sets algorithm via iteration is due to Von Neumann [41], and much work has proceeded from this [4, 13, 19, 20, 22, 28]. When r > 2, the alternating algorithm becomes a cyclic one, and these cyclic projection algorithms have substantial theoretical underpinning, including convergence guarantees. The difficulty in applying these algorithms to our situation is that they characterize the feasible region with a finite number of convex sets. Although our collection of sets {Cj }Kj=1 is finite, we do not know how to project onto any of them individually. However, we have C= K \ k =1 Ck = K \ \ k =1 y 2 w k Hk (y), (4.2) Hk (y) := H (`k (y), Lk (rk , y)) , so that C is comprised of an (in general uncountably) infinite intersection of half-spaces, each of which is straightforward to project onto, see Figure 4.1 for a geometric visual. Our strategy here is to generalize certain types of cyclic/alternating algorithms to the case of an infinite number of convex sets (halfspaces). We broadly employ two strategies: greedy projection and averaged projection. Figure 4.1: A geometric depiction of one iteration of the filtering procedure. Left: The hatched volume represents the closed convex cone C1 Middle: Geometric depiction of intersecting hyperspaces H1 (y) and their respective boundaries defined by hyperplanes parameterized by y 2 W. Also shown is the distance calculation corresponding to (4.10). Right: A scenario that demonstrates the greedy strategy to select the direction in which y moves in the next step of the algorithm: H1 (y4 ) is farther away from c than H1 (y5 ). The optimization (4.5) seeks the hyperplane that is farthest away from c. 32 The major ingredient in our approaches is the ability to project onto any halfspace Hk (y). Since the functionals Lk (·, y) are unit norm, a computation shows that the signed distance between some point c 2 N and Hk (y) is sdist(c, Hk (y)) = Lk (rk , y) D E b̀k (y), c , (4.3) which is positive if c 2 Hk (y) and negative otherwise. Thus, the nearest-distance projection of c onto Hk (y) is PHk (y) c = c + `k (y) min {0, sdist(c, Hk (y))} . We consider an example to illustrate that these projections are easily computable. Example 4.1.1. Consider the positivity constraint setup of Example 2.4.1. The constraint functional L1 (·, y) is a (normalized, negative) point evaluation at y, and {vn }nN=1 are the first N orthonormal Legendre polynomials on [ 1, 1]. Then, the Riesz representor `1 (y) 2 V and its coordinates {b̀1,j (y)} N j=1 are explicit in terms of the Legendre polynomials via (2.5). In the context of harmonic analysis, `1 (y) is the y-centered, negative, normalized Dirichlet kernel for V. The function r1 describing the constraint is r1 ⌘ 0, so that b r1 = 0 and L1 (r1 , y) = 0. Now let v 2 V be any element with coordinates c 2 N in the orthonormal Legendre polynomials. Then, sdist(c, H1 (y)) = D E b̀1 (y), c = l(y)v(y). (4.4) Thus, the signed distance at y 2 W is simply scaled evaluation of the original function v. The projection of c onto the halfspace defined by Hk (y) is therefore PHk (y) c = c + b̀1 (y) min {0, v(y)l(y)} . Note that since l(y) > 0, this projection equals c if v(y) 4.1.2 0, as expected. Greedy projections Since projections onto individual halfspaces defined by Hk (y) are relatively simple to compute, we can devise one algorithm for computing the solution to (4.1) as a modification of cyclic projections. Although cyclic projection-type algorithms proceed by cycling through the enumerable constraint sets, our (uncountably) infinite collection of sets 33 prevents such a simple cycling. Instead, we can project onto the farthest or most violated constraint, i.e., with (y⇤ , k⇤ ) := argmin sdist(c, Hk (y)), (4.5) y2wk ,k 2[K ] We can update c via c c + `k⇤ (y⇤ ) min {0, sdist(c, Hk⇤ (y⇤ ))} . (4.6) where Hk (y) is the normal vector corresponding to hyperplane Hk of constraint family k that points toward Ck . This vector is readily computable from the orthonormal basis. This procedure is repeated until sdist(c, Hk⇤ (y⇤ )) vanishes to within a numerical tolerance or other termination criteria is triggered. Upon termination of the iterations, the output of the filtering procedure is the (updated) c, the set of coefficients that satisfy the constraints. The geometric picture associated to (4.5) and the greedy selection of the hyperplane H is shown in the right panel of Figure 4.1. The update process (4.6) can be repeated, resulting in an iterative algorithm. We summarize this procedure in Algorithm 1. This algorithm proceeds by iteratively “correcting” the vector c in (4.6). The associated operation in the function space V is that an unconstrained function is additively augmented by the Riesz representor correction function `k⇤ (y⇤ ) 2 V. These corrections are visualized in Figure 1.2 for polynomials. A more detailed understanding of these function is provided in Figures 4.2 and 4.3 where we show `k (y)( x ) as a function of ( x, y) for polynomials. Algorithm 1 Iterative greedy projection algorithm to compute the solution to (4.1). The unspecified “extra termination criteria” can be standard metrics, such as number of iterations, improvement in objective function, etc. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: Input: constraints ( Lk , rk , wk )kK=1 Input: coordinates c 2 N of a function v 2 V while True do Compute (y⇤ , k⇤ ) via (4.5). if sdist(c, Hk⇤ (y⇤ )) 0 or extra termination criteria triggered then Break end if Update c via (4.6). end while return c 34 1 1 1 1 0.2 0.8 0.5 0.6 0.5 0.02 0.5 0.1 0 0.4 0 0 0 0 -0.02 0.2 0 -0.5 -0.1 -0.5 -0.5 -0.2 -1 -1 -0.5 0 0.5 -1 -1 1 1 1 -0.04 -0.2 -0.5 0 0.5 -1 -1 1 1 -0.5 0 0.5 1 1 0.04 0.3 0.9 0.5 0.8 0.2 0.5 0.02 0.5 0 0.1 0.7 0 0 0 -0.02 0 0.6 0.5 -0.5 -0.5 -0.2 0.4 -0.5 0 0.5 -1 -1 1 1 1 -0.5 -0.06 -0.08 -0.3 0.3 -1 -1 -0.04 -0.1 -0.5 0 0.5 -1 -1 1 1 -0.5 0 0.5 1 1 0.2 0.6 0.15 0.8 0.5 0.4 0.5 0.5 0.1 0.2 0.05 0.6 0 0 0 0 0 -0.2 0.4 -0.5 -0.05 -0.5 -0.4 -0.5 -0.1 -0.6 -1 -1 0.2 -0.5 0 0.5 1 -1 -1 -0.5 0 0.5 1 -1 -1 -0.15 -0.5 0 0.5 1 Figure 4.2: Correction functions for degree-5 polynomial approximation. Plots of `k (y)( x ) are shown as functions of ( x, y) for various constraints enforcing positivity of the kth derivative (rows) and ambient Hilbert spaces (columns). Top: k = 0 positivity; middle: k = 1 monotonicity; bottom: k = 2 convexity. Left: L2 ([ 1, 1]); middle: H 1 ([ 1, 1]); bottom: H 2 ([ 1, 1]). 35 Figure 4.3: Correction functions for degree-30 polynomial approximation. Plots of `k (y)( x ) are shown as functions of ( x, y) for various constraints enforcing positivity of the kth derivative (rows) and ambient Hilbert spaces (columns). Top: k = 0 positivity; middle: k = 1 monotonicity; bottom: k = 2 convexity. Left: L2 ([ 1, 1]); middle: H 1 ([ 1, 1]); right: H 2 ([ 1, 1]). 36 A significant chunk of work in the iterative correction stage (step 8 in Algorithm 1) is done by evaluating the solution at the critical points in the domain. This evaluation is done by basis interpolation, an expensive operation that involves processing and storage of large interpolation matrices. To make the processing computationally efficient, we use the Barycentric interpolation method from [7], expanded upon by [25]. This approach reduces number of calculations and storage required to evaluate the basis functions at arbitrary points in the domain, thereby reducing the cost of the iterative correction stage. Although the implementation details of Barycentric interpolation method are beyond the scope of this dissertation, for the minimization part of the numerical experiments in Chapter 5, we observe a significant speed-up by using the Barycentric approach. Note that the bulk of the computational effort in Algorithm 1 corresponds to line 4 where the W-global optimization problem (4.5) must be solved, which can be of considerable expense at each iteration. We explain in Appendix A how we accomplish this optimization for univariate polynomial spaces V. It is straightforward to establish that under a special kind of termination in (see step 8 in Algorithm 1), we obtain the solution to (4.1). Proposition 3. If Algorithm 1, without any extra termination criteria, terminates after one only iteration of line 8, then the output c is the solution to (4.1). Proof. Assume without loss that the input to Algorithm 1 c is not in C. By (4.2), we have dist (c, C ) dist (c, Hk (y)) , for any (y, k). Let (y⇤ , k⇤ ) be the solution to (4.5), and note that since c 62 C, dist (c, Hk (y)) = sdist(c, Hk⇤ (y⇤ )) > 0. The assumption that Algorithm 1 terminates after one iteration implies that d := c + b̀k⇤ (y⇤ )sdist(c, Hk⇤ (y⇤ )) 2 C. Note d is returned by the algorithm. c 62 C, d 2 C, b̀k⇤ (y⇤ ) dist (c, C ) 2 = 1, and that sdist(c, Hk⇤ (y⇤ )), all imply that the above inequality is actually an equality, and thus d solves (4.1). 37 In standard cyclic projection algorithms, it is well known that directly projecting onto each set in each iteration produces a suboptimal trajectory for the iterates. The greedy algorithm described in this section suffers from this as well, which we show in the numerical results section. An improvement that somewhat ameliorates this deficiency is accomplished by averaging these projections. 4.1.3 Averaged projections A simple strategy to mitigate the oscillatory iteration trajectory produced by iterative greedy projections is via averaging. Precisely, given a current iterate c, we identify the subset of W where our constraints are violated: wk := y 2 wk sdist(c, Hk (y)) < 0 . (4.7) Under mild assumptions on V, e.g., that it contains only piecewise continuous functions, wk is either the trivial (empty) set, or of positive Lebesgue measure. (In other words, it cannot be a discrete or nontrivial measure-0 set.) Assume for simplicity that wk has a positive Lebesgue measure for each k. We then produce an update by a normalized average of corrections corresponding to values of y in wk : c c+ K 1  K |w | k =1 k Z wk b̀k (y)sdist (c, Hk (y)) dy. (4.8) Above, |wk | is the measure of wk ⇢ W. We again illustrate with an example that these quantities are computable. Example 4.1.2. Consider the positivity constraint setup of Example 2.4.1. As we saw in Example 4.1.1, the signed distance for our single constraint is given by (4.4). Note that in this 1D setup with finite-degree polynomials, the set wk is a finite union of subintervals of [ 1, 1], and hence the measure |wk | is just the sum of the lengths of these subintervals. Then, the correction term on right-hand side of the update scheme (4.8) is 1 | wk | Z wk b̀1 (y)l(y)v(y)dy = 1 | wk | where e j , j 2 [ N ] are the cardinal unit vectors in N  ej j =1 N. Z wk l2 (y)v(y)v j (y)dy, Thus, the integrals that must be computed have smooth integrands and can be efficiently approximated by standard quadrature rules, assuming the endpoints of the subintervals defining wk can be identified. 38 A variation of Algorithm 1 that uses this averaging approach is nearly identical: the only change required is that the update of the coefficient vector c in line 8 should be replaced by the update in (4.8). Figure 4.4 visually depicts both the greedy and averaged projections idea where V is a univariate space of polynomials and the constraint is positivity (i.e., Example 2.4.1). In particular, the value y⇤ that solves the greedy optimization problem (4.5) is shown, along with the averaging set w1 identified in (4.7). 4.1.4 Hybrid algorithms In experimentation, we have found that hybrid combinations of the greedy approach of Section 4.1.2 and the averaged approach of Section 4.1.3 work better than any algorithm alone. In particular, the greedy algorithm works well when c is “close” to the solution, but the averaged algorithm works better for an iterate that is “far” away. Thus, we utilize a standard switching procedure in optimization depending on the proximity to a basin of attraction. Through experimentation, we have found that the following switching mechanism works well: We perform averaged projections until the norm of the correction (4.8) reaches a certain tolerance. After a condition is met, we switch to greedy projections. The switching condition is the following: if i is the iteration index, consider the ratio, Figure 4.4: A graphical view of the correction process of filter during 1 iteration. v is the unconstrained L2 ([ 1, 1]) projection of the step function f ( x ) onto the space of degree-7 polynomials. For the positivity setup of Example 2.4.1, the greedy point y⇤ defined in (4.5) is shown, and the averaging set w1 ⇢ [ 1, 1] defined in (4.7). Also plotted is the signed distance l(y)v(y) of v to H1 (y). 39 ai = sdist(ci , Hk⇤i (yi⇤ ) sdist(ci 1 , Hk⇤i Our switching condition is triggered when |ai 1 ai (yi⇤ 1 ) 1| . e, for a user-specified e. At this point, we perform one more averaged update of the form (4.8), but multiply the right-hand side correction by 1/ai . Subsequently, greedy projections as in (4.6) are performed. While this procedure is quite ad hoc, we have observed that it consistently performs better than other hybrid variants we have tried. 4.1.5 Algorithms for polynomial subspaces As described in previous sections, the main computational expense in our convex optimization algorithm is the minimization of the signed distance function in (4.5) (for the greedy and hybrid algorithms) and identification and integration over the set wk in (4.7) (for the averaged and hybrid algorithms). Such problems for general function spaces are difficult to solve, and efficient algorithms will likely depend on what kinds of functions the subspace V contains. When V contains univariate polynomials, all the tasks in the algorithm can be reduced to the problem of computing the roots of polynomials, and hence are feasible in principle. We accomplish this computationally by computing the spectrum of a confederate matrix, although more sophisticated and practically effective methods are known. We describe this formulation and details of the approach in Appendix A. 4.1.6 Nonidentity matrices A The optimization problem we seek to solve is (2.8); the algorithms in this section have proceeded under the assumption that A = I. When this is not the case, we must first solve (3.3), so that the full solution is (3.2). Thus, we focus on the problem argmin kz z2SV ⇤ C U ⇤ b k2 . (4.9) Note that the only difference between this optimization and the simplified version (4.1) is that the feasible set is SV ⇤ C instead of C so that we need address only the presence of the linear map SV ⇤ . Since C is closed and convex, then SV ⇤ C is also closed and convex, and ek : in particular is defined as the intersection of closed, conic, convex sets C 40 e= SV ⇤ C =: C K \ k =1 ek := C K \ SV ⇤ Ck . k =1 Thus, all our previous algorithms apply, except that we just need to transform ( Lk , rk , wk ) ek . These transformations are straightforward for Ck into the appropriate quantities for C but technical, so we omit showing them explicitly. 4.1.7 Finding global minimum The major computational work in the proposed filter is the manipulation/minimization of s( x ) at each iteration. This signed distance function has the form s( x ) = l( x ) ( L x (u) `( x )) , l2 ( x ) := 1 1 ÂN j =0 L x (yj ) 2 , (4.10) where, ` 2 W, and s is a l-weighted version of u, and therefore, is easy to evaluate. However, computing a global minimum for s may be difficult in general. We have already shown that if V is a univariate polynomial space of degree N 1, then computing the minimum of s can be accomplished by computing the roots of a polynomial of degree 3N. We accomplish this by computing the spectrum of a confederate matrix associated with a Legendre orthogonal polynomial basis. Relative to 1D, minimization in higher dimensions is more complex and expensive. To the best of our knowledge, no confederate matrix approach exists for dimensions greater than 1. Correspondingly, many variants of the gradient descent-based (GD-based) methods have been proposed to perform this operation efficiently and accurately. These methods operate only up to a certain tolerance and, like all the GD-based methods, have the potential for stagnation. In 2D and 3D cases, the problem of finding the points on the domain that violate the structural constraints is nonconvex. We cannot ensure that all such points will be flagged for correction by any particular minimization algorithm. From our analysis, one suitable GD approach for this job is to use adaptive descent size and the ability to retrace the steps. For this reason, we choose the method of steepest descent with a backtracking line-search strategy. For all the algorithms in this section, this strategy is referred to as GD linesearch. The backtracking line search starts with a relatively large step size and repeatedly shrinks by a factor g until the Armijo-Goldstein 41 condition (4.11) is fulfilled. This widely used algorithm makes an informed choice about step size and direction at each iteration to efficiently arrive at the optimum value. It avoids stagnation by tracing back its steps if the difference between descent values falls below a fixed tolerance. An essential part of this algorithm is choosing the parameters denoted by c 2 (0, 1) and g 2 (0, 1) that determine the step size and backtracking performed by the algorithm. For an objective function f , with a starting position x, given parameters c and g the Armijo-Goldstein condition is defined as (4.11) f (x + g j p) f (x) + g j c r f ( x ) T p 8 j until convergence, (4.11) where r f ( x ) T is the slope in the direction of descent p and g is calculates as ( g, if j = 0 gj = cg j 1 otherwise. c and g play a vital role in testing the condition [3], which determines whether a stepwise movement from a current position to a modified position achieves an adequately corresponding decrease in the objective function. Although many recommendations exist [3, 8, 11, 17, 18] for an optimal way to choose c and g, these parameters must be tuned for maximum efficiency. In Appendix B we investigate the values for c and g to use for our numerical experiments. For higher dimensions, we follow the general idea that if the polynomial projections lose structural conformity, they are likely to do so at or between the quadrature points. We hope to capture these structural inconsistencies by checking the values on a lattice of quadrature points and the centroids formed by adjacent quadrature points. A quadrilateral is used as a sample element to describe the setup; however, the same applies to any canonical element type in 2D and 3D. Let us call such a lattice of points ( X L ). For a quadrilateral, let Q represent the number of quadrature points in one dimension. X L is defined by a combination of the quadrature grid on the quadrilateral (total Q2 points and (Q 1)2 centroids formed by the midpoints of quadrature grid) as defined in (4.12). X L = { Q2 points in the quadrature grid } [ {( Q 1)2 centroids of the quadrature grid.} (4.12) For a triangle, we can follow the same idea of combining the quadrature points and the centroids to construct the lattice. An example lattice on a mesh with quadrilaterals 42 and a triangles is shown in Figure 4.5. 4.2 Filtering algorithm and its placement in time-dependent PDE For PDE applications, the structure of the solution is preserved by applying the filter as a postprocessing step that uses iterative optimization to correct the state vector of the coefficients vb obtained from timestepping algorithm. Each iteration of the filter involves the following crucial computational processes. 1. The conversion from an input coefficient vector vb to corresponding coefficients ṽ in an orthonormal basis . 2. A minimization process to solve (4.5) that is a part of global search for structural violations. 3. Calculating the correction to vb using (4.6). Algorithm 1 presents a summary of steps taken by the filter for solving the optimization problem. The constrained PDE timestepping procedure shown in Algorithm 2 details the additional steps introduced to the time-dependent PDE solver. For multiple dimensions, the minimization step 4 from Algorithm 1 changes to the use of GD linesearch approach as described in line 3 of Algorithm 2. Figure 4.5: Example lattice used for detecting structural nonconformity in a mesh. 43 Algorithm 2 Constrained PDE timestepping 1: 2: 3: 4: 5: 6: 7: if dimension > 1 then if 9 sdist( X L ) < 0 8 X L in (4.12) then Compute (y⇤ , k⇤ ) via GD linesearch (see Section 4.1.7). end if else Compute (y⇤ , k⇤ ) via (4.5). end if 4.2.1 dG specializations As the first step of the optimization procedure, we must transform any given basis 1 N 1 {f} N j=0 that is assumed by the PDE solver to an orthonormal basis { y } j=0 . We use the standard procedure that requires application of the inverse of Cholesky decomposition of the mass matrix. In dG simulations, we can leverage elementwise decoupling as described in Proposition 2 so that the appropriate matrix algebra is performed locally on a local mass matrix associated with only a single element. However, when a continuous Galerkin formulation is used, the global mass matrix must be inverted, and this cost significantly slows down the optimization process. In contrast to dG, the structure of the global mass matrix M in cG using standard hat and bubble basis functions is “nearly” diagonal, but the inverse of M (or its Cholesky factor) is a dense matrix. In our simulations, we precompute the required global matrices for orthonormalization of the basis elements and coordinates. This procedure is feasible for our problems in one spatial dimension. However, this process is no longer feasible if the global number of degrees of freedom N becomes too large, as would happen with problems in two or three spatial dimensions. In general, for spatial dimensions two and three, the degrees of freedom of the optimizer are respectively N 2 and N 3 . Additionally, for dG discretizations, the global filtering operation is decoupled to act on individual elements. This element-by-element application lends itself to parallel implementation. In addition, we need not call the filter on elements where the constraints are already satisfied. For example, if positivity is the constraint, then at every timestep we can flag elements where positivity may be violated and run the optimizer only over those elements. In 1D implementation, we use confederate linearization to check for 44 the optima of function values on each element. This approach guarantees that all the optima are calculated accurately. We use the optima to determine whether there is a violation of positivity constraint on that element. The problem is now converted from optimization in an N-dimensional space into a much more efficient and parallelizable set of E independent optimizations in P = N/E dimensional space, where E represents number of elements in the domain. For higher dimensions, we can reduce the number of optimization problems by testing the solution for structural inconsistencies at a lattice of points defined by (4.12). We can filter the selected subintervals simultaneously, which further adds to the procedure’s efficiency. To flag the elements for filter application, a computational cost is associated with the eigenvalue solve on each element in 1D, followed by evaluation of function at the critical points. For higher dimensions, it is a result of function evaluation on the lattice, of points. However, the overall speed-up obtained by not having to calculate the signed distance function described in (4.5) on each element certainly justifies the additional cost. With this flagging scheme, we observe that the computational time spent on the filter is a fraction of the cost of the unconstrained solver (cf. our numerical results in Chapter 5). CHAPTER 5 NUMERICAL RESULTS In this chapter, we numerically investigate the proposed filter for preserving convex constraints on function approximations and solutions to PDEs in multiple dimensions. The chapter is divided into the following sections: 1. Section 5.1 details the experiments involving application of filter to function approximations. These results are derived from our works [43, 45]. 2. Section 5.2 details the filter experiments on PDEs. These results are derived from our works [44, 45]. 3. Section 5.3 is dedicated to the results for the platelet aggregation and blood coagulation FEM simulation, which is an advection-diffusion-reaction (ADR) model. It shows the simulation results with and without filter application and provides an analysis for the same. 5.1 Function approximations In all that follows, f is a given function in a Hilbert space H. Given a finite-dimensional space V ⇢ H, the function v is the H-best projection onto V, which does not in general satisfy any structural constraints. (Note from discussion in Section 4.1.6 that extensions to, e.g., collocation-based approximations, are straightforward.) The function ṽ is the output of the constrained optimization procedure. For 1D, the polynomial order is the same as the degree of freedom of the filter. However, for 2D and 3D, if polynomial order of projection is N each, the degrees of freedom of the filter are ( N + 1)2 and ( N + 1)3 , respectively. With the univariate Sobolev spaces, H q ([ 1, 1]) := f : [ 1, 1] ! k f k H2 < • , k f k2H q := q  j =0 Z 1 h 1 f ( j) ( x ) i2 dx, 46 our examples will consider the ambient Hilbert space H as H 0 (= L2 ), H 1 , or H 2 . The subspace V in all our experiments is the space of polynomials up to degree N V = p : [ 1, 1] ! Our test functions f j are defined iteratively for j f j +1 ( x ) = c j +1 Z x 1 1: deg p N . 1 as f j (t)dt, f0 (x) = ⇢ 0, x 0, , 1, x > 0, where c j+1 are normalization constants chosen so that f j+1 (1) = 1. Thus, f j has j weak L2 derivatives. Finally, most of our results will consider intersections of the following four types of constraint sets in V: • (Positivity) F0 := { f 2 H 0 8 x 2 [ 1, 1]} f (x) • (Boundedness) G0 := { f 2 H f ( x ) 1 8 x 2 [ 1, 1]} • (Monotonicity) F1 := { f 2 H f 0 (x) • (Convexity) F2 := { f 2 H f 00 ( x ) 0 8 x 2 [ 1, 1]} 0 8 x 2 [ 1, 1]} Our final example considers a slightly more exotic set of constraints, which we discuss later. In order to understand how much our algorithms “change” the input v when producing constrained approximation ṽ, we measure the following quantity: h := Since f p ṽk H . vk H (5.1) v is H-orthogonal to V, then kf Thus, kv kf ṽk2H = (1 + h 2 )k f vk2H . 1 + h 2 measures the error in the constrained approximation relative to the (best) unconstrained approximation. Values on the order of 1 imply that this optimization problem commits an additional error that is approximately the same as the error committed by the best (unconstrained) approximation. Algorithm 1 is the greedy algorithm, but it is the template for the averaging and hybrid algorithms as well. For example, a hybrid algorithm needs to replace only line 8 in that 47 algorithm by the update (4.8). However, we have left some details of the termination criterion in line 5 unexplained. For example, we do not actually enforce sdist(c, Hk⇤ (y⇤ )) 0 as stated due to finite precision. Instead, we enforce sdist(c, Hk⇤ (y⇤ )) d, where we set d = 10 10 d > 0, (5.2) and have implemented the procedure in double precision. In addition, the number of iterations I required before termination will also be reported. 5.1.1 Algorithm comparison A short summary of all the experiments investigating the hybrid approaches and their comparison with the greedy and the averaging methods is given in the Table 5.1. 5.1.2 More complicated constraints Finally, we show that our formalism allows for more complicated constraints than the ones we have previously shown. With H = H 0 and V a space of degree-( N 1) polynomials as before, we consider two new kinds of constraints: • J1 = { f 2 V | f ( x ) • J2 = { f 2 V | | x | 8 x 2 [ 1, 1]} sign( x ) f ( x ) | x | 8 x 2 [ 1, 1]} Constraint set J1 can be defined as the intersection of two conic constraints: for x 2 [ 1, 0], we enforce f ( x ) f (x) x. For x 2 [0, 1] we enforce f ( x ) x. Constraint set J2 enforces x for x 2 [ 1, 0] as before, but now enforces f ( x ) x for x 2 [0, 1]. Note that J2 implicitly enforces f (0) = 0, but we do not explicitly require this in our algorithm. Since Table 5.1: Performance summary of three proposed algorithms on the test function f = f 2 for different values of e, where e is as described in Section 4.1.4. The constraint set is E = F0 . Polynomial order 6 I e Greedy Averaging Hybrid 10 3 20 36 4 Polynomial order 31 I h 10 5 20 36 16 10 3 1.147 1.148 1.1464 10 5 1.147 1.148 1.148 10 3 23 383 2 h 10 5 23 383 3 10 3 0.986 0.985 1.142 10 5 0.986 0.985 1.054 48 x 2 V when N 2, we can handle these constraints with our setup. For the test function f ( x ) = | x |; the optimization successfully terminates and results are shown in Figure 5.1. 5.1.3 Function approximation filtering examples We present two examples of function approximation to preserve structure in this section. The first example takes H = H 0 and the test function f = f 0 , which is a step (discontinuous) function. We present results for different dimensions of V and different constraints. Figure 5.2 illustrates the results of the greedy algorithm. We compare mediumdegree polynomial approximation N = 6 with high-degree polynomial approximation N = 31. The three kinds of constraints are (a) positivity; (b) positivity and boundedness; and (c) positivity, boundedness, and monotonicity. We observe that both the positivity and monotonicity constraints accomplish what is desired: the approximation ṽ satisfies the desired constraints, but still features Gibbs-type oscillations. However, enforcing monotonicity as well results in a nonoscillatory approximation. All computed values of h < 1 show that the constrained approximation commits an error that is comparable to that of the H-best approximation. Our second experiment uses the test function f = f 2 , which has a piecewise-constant second derivative. We use a fixed constraint: positivity, monotonicity, and convexity. Using again N = 6 and N = 31, we investigate the approximation for different ambient spaces H = H 0 , H 1 , and H 2 . Results are displayed in Figure 5.3. We observe much larger values of h in this experiment, but note that the values of h decrease as the order 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 -1 -0.5 0 0.5 1 0 -1 -0.5 0 0.5 1 1 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 1 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.8 0 0 0 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 Figure 5.1: Algorithm results from unusual constraints for f ( x ) = | x |. Top: Constraint set J1 . Bottom: constraint set J2 . Left: N = 4. Center: N = 9. Right: N = 31. 49 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 -1 -0.5 0 0.5 -1 1 0 -0.5 0 0.5 -1 1 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 -1 -0.5 0 0.5 -1 1 -0.5 0 0.5 1 -0.5 0 0.5 1 0 -0.5 0 0.5 1 -1 Figure 5.2: Greedy algorithm results. Test function f 0 for different constraint sets E and polynomial spaces V. Top: N = dim V = 6, bottom: N = dim V = 31. Left: Constraint E = F0 . Center: Constraint E = F0 \ G0 . Right: Constraint E = F0 \ G0 \ F1 . 1 1 0.8 0.8 1.2 1 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.6 0 -1 0.4 -0.5 0 0.5 1 0 -1 0.2 -0.5 0 0.5 1 0 -1 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 -1 -0.5 0 0.5 1 0 -1 -0.5 0 0.5 1 0 -1 -0.5 0 0.5 1 -0.5 0 0.5 1 Figure 5.3: Test function f 2 for different polynomial spaces V and ambient spaces H. The constraint is E = F0 \ F1 \ F2 . Top: dim V = 6, bottom: dim V = 31. Left: H = H 0 . Center: H = H 1 . Right: H = H 2 . of the Sobolev space increases. We also observe that the visual discrepancy between the constrained approximation and the underlying function is also considerably larger in this experiment. However, the approximation quality still appears good for the larger value of N = 31. Extending to higher dimensional function approximation problems, consider functions defined in (5.3) and (5.4) in 2D and 3D respectively, which are both discontinuous clamped versions of a smooth sinusoidal function. The initial projections of f ( x, y) on a quadrilateral and f ( x, y, z) on a hexahedron are shown in Figure 5.4. The discontinuity produces oscillations similar to Gibbs phenomenon upon projection, which is interesting for the application and analysis of the filter. 50 Figure 5.4: Galerkin projection of example functions on a quadrilateral and hexahedron element, respectively. The example function for the quadrilateral is defined by (5.3) and for the hexahedron is defined by (5.4). The polynomial order for 2D = 7, and for 3D = 5. Left: Unfiltered version. Right: Filtered version ⇣ f ( x, y) = n( x, y) sin p (0.2 where, n( x, y) = ( 1, 0, ⌘ ⇣ ⌘ x ) sin p (y + 0.2) (5.3) if x 2 [ 0.8, 0.2] and y 2 [ 0.2, 0.8], otherwise ⇣ f ( x, y, z) = n( x, y, z) sin p (0.2 ⌘ ⇣ ⌘ ⇣ ⌘ x ) sin p (y + 0.2) sin p (z + 0.2) (5.4) where, n( x, y, z) = ( 1, 0, if x 2 [ 0.8, 0.2] and y 2 [ 0.2, 0.8] and z 2 [ 0.2, 0.8] otherwise It is evident from the left subfigures in Figure 5.4 that the projection of discontinuous non-negative functions leads to the negative values as shown in the highlighted regions. As seen in the right subfigures of Figure 5.4, the application of the structure-preserving filter restores the non-negative structures of (5.3) and (5.4), respectively. 51 5.1.4 Convergence rates Optimal Hilbert space projections of smooth functions onto polynomial spaces converge at a rate commensurate with the function smoothness. We investigate in this section whether the corresponding constrained projections have similar convergence rates. Figure 5.5 shows convergence of H = L2 -optimal (unconstrained) polynomial projections versus the output from the constrained optimization procedure. Our constrained approximations are less accurate, but the convergence rates are unchanged. To analyze the p-convergence for 2D and 3D experiments shown in Figure 5.4, we consider different polynomial orders to obtain the convergence plots Figure 5.6. As shown in Figure 5.6, the time taken by the filter scales with the size of the solution space. Another observation suggests that the 2D filtering experiments take a fraction of time compared to 100 100 100 10-1 10-1 10-1 10-2 10 20 30 40 50 10-2 10 20 30 40 50 10-2 10-1 10-1 10-1 10-2 10-2 10-2 -3 -3 10-3 10-4 10-4 10-4 -5 -5 10 10 10 10 10 20 30 40 50 10 20 30 40 50 10-5 10 20 30 40 10 20 30 40 50 50 Figure 5.5: H = H 0 convergence results for projection of test function f = f 0 (top row) and f = f 2 (bottom row). Left: Constraint E = F0 . Center: Constraint E = F0 \ G0 . Right: Constraint E = F0 \ G0 \ F1 . Figure 5.6: A p-convergence study of the filtered and unfiltered Galerkin projections from Figure 5.4. The right y-axis represents the time spent inside the filtering routine. Left: f ( x, y) defined in (5.3) and Right: f ( x, y, z) defined in (5.4) 52 their 3D counterparts. This results from the complexity of finding the global minimum in the domain, which is much higher in 3D than in 2D. After analyzing the behavior and time taken by the filter in a worst-case scenario: projection of discontinuous (non-smooth) functions, for subsequent experiments in multidimensional cases, we consider smooth continuous functions. 5.1.5 Constrained approximation as a nonlinear filter The right-hand panels in Figure 5.2 show that the monotonicity constraint removes oscillations in the approximation. These empirical results suggest that the constrained optimization procedure is a type of spectral filter. There is a stronger theoretical motivation for this observation as well; see Proposition 1. In general, the assumption that E is closed and convex is automatically satisfied from our apparatus in Chapter 3 and Chapter 4. The only nontrivial requirement is that v = 0 is a member of the constraint set E. All the examples in Figures 5.2 and 5.3 satisfy 0 2 E, and thus we expect that the optimization problem decreases the norm of the function, just as a standard linear filter would. Note, however, that our “filter” (optimization) is a nonlinear map. To illustrate this filter interpretation, we compare in Figures 5.7 and 5.8 the magnitude of the before-optimization and after-optimization expansion coefficients. These figures correspond to the experiments in Figures 5.2 and 5.3, respectively. 10-1 10-1 10-1 10-10 10-10 10-10 10-15 10-15 1 2 3 4 5 6 10-15 1 10 2 3 4 5 6 1 2 3 4 5 6 -1 10-5 10-5 10-10 10-10 10-15 10-10 10-15 10-15 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 Figure 5.7: Corresponding to the experiment shown in Figure 5.2, bar plot showing unconstrained projection coefficients magnitude |vej | vs various constrained projection e j |. Top: NV = 6. Bottom: N = 31. Left: Constraint E = F0 . coefficients magnitude |w Center: Constraint E = F0 \ G0 . Right: Constraint E = F0 \ G0 \ F1 . 53 10-1 10-1 10-1 10-10 10-10 10-10 10-15 10-15 10-15 1 2 3 4 5 1 6 2 3 4 5 6 1 2 3 4 5 6 1 10-5 10-5 10-10 10-10 10-15 10-10 10-15 1 2 3 4 5 6 10-15 5 10 15 20 25 30 5 10 15 20 25 30 Figure 5.8: Corresponding to Figure 5.3, bar plot showing unconstrained projection coefe j |. Top: ficients magnitude |vej | vs various constrained projection coefficients magnitude |w 0 1 N = dim V = 6, bottom: N = dim V = 31. Left: H = H . Center: H = H . Right: H = H 2 . For the step function example shown in Figure 5.7, we see that when monotonicity is enforced, there is a steeper decay of the higher order coefficients in the constrained approximation. The stronger decay of coefficients is also observed when only positivity/boundedness is enforced, but the increase in decay is less pronounced. All these observations are qualitatively consistent with Figure 5.2. We emphasize that this constrained optimization procedure is nonlinear, so that our approximation cannot easily be written in coefficient space as a standard (linear) spectral filter. 5.2 PDE applications We numerically investigate the proposed procedure for preserving convex constraints in solutions to PDEs. We consider a bounded 1D spatial interval W ⇢ consider positivity constraints (i.e., u( x ) . We will 0 over all W), and will also investigate cell boundary value (“flux”) preservation and mass conservation (for dG simulations). We are primarily interested in the effect that the filter has on convergence rates and in quantifying the computational efficiency of the procedure. All simulations perform the procedure summarized in Algorithm 1. We use the following machine to report all the performance and error numbers in this section: 256 Intel(R) Xeon(R) CPU E7-4850 v4 @ 2.10GHz cores (HT) with 1024 GB of RAM running Redhat Enterprise 7.5 (Maipo). 5.2.1 Time-dependent PDE using dG formulation Consider the 1D advection equation, 54 ∂u ∂u +a = 0, ∂t ∂x (5.5) where a is a fixed constant. For a = 1, we investigate this problem for the exact solution u( x, t) = 0.5 sin(2px 2pt 0.5p ) + 0.5. We use a dG formulation with periodic boundary conditions over the domain W = [ 1, 1], which results in a system of ordinary differential equations prescribing the evolution of the Galerkin coefficient. We employ Runge-Kutta-2 to integrate in time and compute up to a final time T = 1 using upwind flux calculation. Once fully discretized, the numerical solution can correspond to a function that is negative in some parts of the spatial domain, and hence we apply the filtering procedure in Algorithm 1 to enforce positivity on all W. This results in two numerical solutions: • ṽ is the solution resulting from a standard dG solver, i.e., one that does not employ our filter. • v is the numerical solution resulting from application of the filter as specified in Algorithm 1. As discussed earlier, this filtered solution does not respect mass conservation, and it in general changes values on the boundary, which changes numerical fluxes. Therefore, we use Algorithm 1 to generate two more numerical solutions: • vF is the positivity-constrained solution that adds in elementwise equality constraints to preserve inter-element boundary (flux) values. • vI + F is the positivity-constrained solution that includes both inter-element flux preservation and elementwise mass conservation. Figure 5.9 investigates both h and p convergence of the numerical solution to (5.5) for all four numerical solutions. We observe that, regardless of the constraint that we impose, the convergence rates are unchanged, and even the error values are nearly identical for all numerical solutions. Thus, for this example, our proposed procedure can guarantee positivity (and flux/mass conservation as well, if desired) without a notable impact on the accuracy of the solver. 55 10-1 10-2 10-3 10-4 10-5 10-6 10-7 10-8 10-9 10 20 30 40 50 100 10-1 10-2 10-3 10-4 10-5 10-6 10-7 10-8 10-9 2 3 4 5 6 Figure 5.9: Convergence study for filtered and unfiltered DG solution to the 1D advection PDE applied to u( x, t) = 0.5 sin(2px 2pt 0.5p ) + 1 for a time period of T = 1 second. ṽ refers to the unfiltered solution, v refers to the positive solution, v F refers to the positive solution with flux (boundary values) conserved for each element, and v I + F refers to the positive solution with flux and mass conserved for each element . Left: h-convergence using constant polynomial order = 3, Dt = 10 5 . Right: p-convergence at constant number of elements H = 3, Dt = 10 4 . Repeating the same convergence study for the same PDE, but instead for a triangular hat function with non-negative values as shown in Figure 5.10; also shown in the figure is the unconstrained numerical solution at T = 1, which violates positivity. Following the same set of experiment = using different orders and number of elements, get the h and p convergence results as shown in Figure 5.11. From the results shown in Figures 5.9 and 5.11 it is observed that the convergence for filtered solution v and its variants remains largely unchanged/comparable to the Figure 5.10: The initial value f for convergence study of filtered 1D dG solution to (5.5). Note that the f ( x, 0) and ṽ( T ) overlap because of the periodic nature of f ; however ṽ( T ) does not comply to the positivity structure of f ( x, 0) as expected. After the application of filter, we obtain v( T ), which changes ṽ( T ) to preserves the positive structure of f . 56 10-1 10-2 10-3 10-4 10 20 30 40 50 100 Figure 5.11: Convergence study for the filtered and unfiltered dG solution to 1D advection PDE with Figure 5.10 as the initial state for a simulation run to T= 1. Vector ṽ represents the unfiltered solution coefficients, v refers to the positive solution coefficients at T = 1, v F refers to the coefficients of the positive solution with flux (boundary values) conserved for each element, and v I + F refers to the positive solution coefficients with flux and mass conserved for each element . Left: h-convergence at constant polynomial degree = 3 , Dt = 10 5 . Right: p-convergence at constant number of elements H = 51 , Dt = 10 4 . Here the choice of an odd number of elements for the h convergence study ensures that the middle discontinuity of the function’s derivative is located on an element boundary. unfiltered counterpart. Regarding the cost of the filtering procedure, there is a one-time orthonormalization cost of the basis function used as the initial setup, but since we can employ the filter over each element individually, this cost is negligible. The computational cost of filtering procedure per timestep depends on the time taken by the global minimum finding step of the filter, which we investigate next. For each simulation involving the filter, we compile the number of elements per timestep where the filter is employed. (Recall that we perform a cheaper check to certify positivity before calling the filter optimization.) For each of the three filtered solutions v, vF , and vI + F , Figure 5.12 plots the average number of filtered elements per timestep during the convergence experiments run in Figure 5.11. As seen from Figure 5.12, the filter is not indiscriminately applied across all elements in the domain, and instead is called only for a few flagged elements per timestep where constraint violations are observed. This can substantially reduce the required computational overhead compared to applying the filter across every element. The percentage of total simulation time that is spent inside the filtering procedure for experiment Figure 5.11 is shown in Figure 5.13. For some experiment configurations, the filtering cost can be more than half the simulation time (e.g., H = 11 for the black line in Figure 5.13, left). However, in many 57 4 3 2 1 0 11 21 31 41 51 61 Figure 5.12: Average number of elements flagged per timestep during the experiments shown in Figure 5.11. Left: h convergence experiment with Dt = 10 5 and polynomial degree = 3. Right: p convergence experiment with Dt = 10 4 and H = 51. Figure 5.13: Percentage of total simulation clock time spent inside the filtering procedure during the convergence experiments shown in Figure 5.11. Left: h convergence experiment with Dt = 10 5 and polynomial degree = 3. Right: p convergence experiment with Dt = 10 4 and H = 51. cases, the filtering procedure can require less than 20% of the total computational effort. Note that for the p-convergence experiment in Figure 5.13, the percentage of filtering time is higher for some filters with fewer constraints. For example, in Figure 5.13, right, less simulation time is required to impose positivity and flux preservation compared to simply imposing positivity. One possible explanation for this phenomenon is that filters with a larger number of equality constraints require optimization in a lower dimensional space. Thus, the optimization problem can be less expensive to solve in this case. However, with additional constraints (e.g., the black line in Figure 5.13, right), even though the dimensionality of the optimizer decreases, the number of iteration required by the filter increases. Therefore, a suitable balance must be struck from application to application. For applications to 2D and 3D PDE solutions, consider a dG formulation of the advec- 58 tion problem (5.6) on a 2D domain W 2 [ 1, 1] ⇥ [ 1, 1], as shown in Figure 5.14. ut + a · ru = 0. (5.6) Let function f be the initial condition defined on [ 1, 1] ⇥ [ 1, 1] as (5.7): f ( x, y, t = 0) = cos ⇣ px ⌘ 2 cos ⇣ py ⌘ 2 . (5.7) Given velocities in x- and y-directions to be a = [1, 1] and the periodic boundary conditions on all boundaries, the first step is projecting f on a set of E elements { e0 , e1 · · · e E 1} 2 W using the typical non-orthogonal (hats and bubbles) basis f. Locally, for an element e, we have f e ( x, y) = P 1  ubi fi (x, y), i =0 where W is the mesh shown in Figure 5.14 consisting of a mix of quadrilaterals and triangles. The filter steps does not change for different element types as emphasized by the choice of composite mesh. Using the timestep Dt = 10 3, RK-4 integration scheme, and upwind flux calculations, we get the results shown in Figure 5.15. Row 2 of the figure shows the simulation state at a particular timestep and highlights the negative values in the domain. We observe in row 3 after the application of the filter, the non-negative structure is restored. The filter is applied at each timestep to preserve the structure (positivity) of the solution on a lattice of points of interest. The lattice is a set of points defined in (4.12). If a structure violation is found at any point in the lattice, the parent element is flagged for filtering. In Figure 5.14: A 2D composite mesh used for solving (5.6). It contains 16 triangles and 8 quadrilaterals. 59 Figure 5.15: Different views of state in 2D advection experiment. Row 1 left: A side view representation of (5.7) on the composite mesh Figure 5.14. Row 1 right: Top view of the projected solution at t = 0 using polynomial order = 4. Row 2: Unfiltered solution at t = 0.4. Row 3: Filtered solution at t = 0.4. 3D, selective application of the filter becomes increasingly important as the cost of overall filtering per timestep depends on the number of elements upon which the filter operates. Let us consider a 3D advection problem (5.8), with the initial condition defined by the smooth even function (5.9) on the domain [ 1, 1] ⇥ [ 1, 1] ⇥ [ 1, 1] and all the boundaries are periodic. ut + a · ru = 0, ⇣ f ( x, y, z, t = 0) = 0.2 [1 q ( x2 + y2 )]2 + z2 (5.8) ⌘ (5.9) For a hexahedron, the experiment setup looks like a torus inside a cube at the initial state. The advection experiment is repeated on element types hexahedra, tetrahedra, and prisms individually, with Dt = 10 3 for 4000 timesteps and velocity a = [1, 1, 1] on 60 homogeneous structured meshes. Figure 5.16 shows the results before and after the filter application at a particular timestep for all these meshes. The analysis of the filter’s effects on the accuracy and convergence of the advection process on the meshes with hexahedron and prism elements reveals that the difference between the L2 errors is too small to warrant a convergence comparison. However, for the mesh with tetrahedral elements, the L2 errors perceptibly vary at each timestep, as shown by a longer run of the same experiment (total 4000 timesteps) in Figure 5.17. 5.2.2 1D time-dependent PDE using cG formulation We now present the results on the cG implementation of a diffusion-reaction equation, ∂ ∂2 u( x, t) = D 2 u( x, t) + r (u( x, t)). ∂t ∂x (5.10) Figure 5.16: A summary of advection experiment using polynomial order = 5 and Dt = 10 3 on the 3D cube domains meshed using different canonical elements individually. Left to right: The mesh profile, the unfiltered solution, and the filtered solution at timestep = 1000 respectively. Top row: Cube mesh with 18 hexahedra, Middle row: Cube mesh with 54 prisms, Bottom row: Cube mesh with 162 tets. 61 Figure 5.17: L2 errors per timestep and p-convergence of advection experiment on a homogeneous cubical mesh with 24 tetrahedrons. Left: Per timestep L2 error for advection problem for polynomial order= 4 on a cube mesh of 24 tetrahedral elements. The timestep = 10 3 , and the total steps = 4000. At each timestep, for the L2 error computation, we consider 10 quadrature points in each direction. Right: p-convergence for filtered and unfiltered L2 error at timestep 2000 for the advection problem on a cube mesh of 24 tetrahedral elements. We consider the problem on a bounded domain W ⇢ and W = [ 1, 1]. Consider the following advecting smoothed Heaviside function as a solution to this problem: u( x, t) = e Dt ⇣ ⌘ at) + 1 , tanh(e( x + 0.4) where D is coefficient of diffusivity, e is a shape parameter, and a refers to the speed at which the exact solution moves in space over time. The nonlinear reaction term in this u2 ( x, t)), where µ is a constant. Our Galerkin experiment is set as r (u( x, t)) = µu( x, t)(1 discretization of this problem integrates r exactly, which is possible since r depends polynomially on u. A diffusion-reaction PDE with a stiff quadratic reaction term is interesting as it represents a source term with the potential to blow up and becomes active only at the discontinuity [27]. Whereas numerical schemes may converge, they may converge with approximants that violate positivity. Using the discrete form of the continuous problem (3.8), we choose the IMEX CNAB-2 scheme, which deals with the advection term using Crank-Nicolson, and uses a secondorder Adams-Bashforth scheme for the reaction term. The unconstrained cG discrete time update then is ṽn+1 = B 1 ⇣ Aṽn + r n ⌘ 62 where A and B are given by A=M (0.5 D DtL) B = M + (0.5 D DtL), with M and L the mass and Laplacian matrices, respectively, and r n the projection coefficients of the reaction function r (ṽn ). Let us set D = 1 and a = 1. Figure 5.18 shows the initial and final states of u( x, t) for a simulation up to T = 1. As described earlier, the cG formulation uses a nonorthonormal (“hat” and “bubble” function) basis, meaning that a spatially global mass matrix inversion must be performed for the filtered versions of this experiment. Advances that make this procedure efficient are currently being investigated, but Figure 5.19 demonstrates that, like the dG formulation, error and convergence rates are largely unaffected for this problem by inclusion of the filter. If the constraints are to be satisfied locally in the cG formulation, the order of accuracy we observe in this example may not hold. This problem is widely known in the context of flux limiters, and many fixes have been proposed, e.g., [33]. This example also demonstrates the flexibility of our approach: Although cG and dG discretizations can be very different mathematically and computationally, Algorithm 1 is the generic template for applying our procedure, independent of most PDE discretization details. Note that we consider global constraint satisfaction in the current example. In the case in which an additional advection term is present in the PDE, conserving the structure of the solution locally is often desirable. Such conservation can be achieved by the current method after reformulating the optimization problem such that E constraints are in the 2 1.5 1 0.5 0 -1 -0.5 0 0.5 1 Figure 5.18: Initial and final state of the function u for e = 25, c = 20, and D = 1, µ = 1. 63 0.5 10-1 10-2 10-3 10-4 10-5 10-6 10-7 10-8 10-9 5 10 20 30 40 0.5 10-1 10-2 10-3 10-4 10-5 10-6 10-7 10-8 10-9 2 3 4 5 6 Figure 5.19: Convergence study for the filtered and unfiltered solution to the 1D cG diffusion-reaction PDE applied to the function f shown in Figure 5.18 with constant timestep Dt = 10 4 . Final time T = 1. Left: h-convergence at constant polynomial degrees = 7. Right: p-convergence at constant number of elements H = 100. constraint set, where E is the number of elements in the domain. Each constraint represents the property to be conserved in each element. 5.3 dG FEM implementation of the PAC model with filtering A prominent scenario where the problem of structure preservation is encountered is in the mathematical-biology domain. A detailed mathematical model for platelet aggregation and blood coagulation (PAC) by [26] is considered for the purpose of this experiment. The model describes the process of evolution, interaction, and decay of the chemical species involved in the process of thrombosis. Although the details of the individual species in the PAC process are beyond the scope of this dissertation [26] has a detailed explanation of the nature and the evolution of the chemical species. The model can be summarized as a comprehensive collection of ODEs and PDEs that track all the chemical species in various phases during the process. The results in [26] use a finite difference method with a specified limiter to truncate to zero concentration values that are less than zero. In order to implement this model using the FEM without loss of structure, an alternative approach to truncation, such as the one presented in this dissertation, is needed. In this section, we implement the model using the dG method. Instead of a truncation limiter, we use the structure-preserving filter. For simplicity, the focus is on the system of equations that track the chemical species that advect, diffuse, and react. One such species 64 is fluid-phase thrombin (FP e2 ), an essential component of the thrombosis process. For this section, the focus is to track the evolution, advection, diffusion, and decay of thrombin in the fluid-phase. For the detailed results of evolution of all the chemical species, including thrombin, under various circumstances in the original model, refer to [26]. We set up a version of PAC model that solves the species evolution problem by a combination of advection, diffusion, and reaction (ADR) PDEs. Unlike the original model, the velocity u of the fluid medium is not reported by a modified Navier-Stokes solver. Instead, the velocity is constant u = [u x , uy ] = [5, 0]. To solve this system of PDEs on a sample blood vessel represented by a rectangular domain W = [0, 300] ⇥ [0, 20], we employ the dG FEM. The domain is tessellated using 2048 quadrilaterals. The bottom wall has an injury site spanning from x = 20 to x = 60. We use polynomial order = 4 and timestep of Dt = 10 2 to solve the problem. The total number of species tracked in our implementation of the model is 56. Since the focus is on the behavior of thrombin, we present the PDE describing the evolution of FP e2 (5.11). For the details on the other chemical species involved in (5.11), refer to [26]. ∂e2 = u · r e2 + r · ( D r e2 ) | {z } ∂t Transport by Advection Diffusion b,a + kon + N2 Pse,a z2mtot e e2 ( N2 P |2 {z of f e2mtot ) + k e2 e2m } Binding to platelet receptor + (kcat z5 :e2 + (kcat z7 :e2 + (kcat z8 :e2 | | | + k z5 :e2 )[ Z5 : E2 ] {z Activation of V + k z7 :e2 )[ Z7 : E2 ] {z Activation of VII + k z8 :e2 )[ Z8 : E2 ] {z Activation of VIII k+ z5 :e2 z5 e2 } (5.11) k+ z7 :e2 z7 e2 } k+ z8 :e2 z8 e2 } The boundary conditions vary depending on the chemical being tracked. For FP e2 , we have a Dirichlet zero boundary on the left, a no-flux boundary on the right and top. The bottom boundary has two kinds of conditions: a robin boundary condition on the injury site and Dirichlet zero everywhere else. Figure 5.20 shows filtered and unfiltered runs of our implementation of this model and the resulting concentration profile of FP e2 . At timestep = 2, we get the invalid (negative) values around the injury site because of the sharp concentration changes (refer to the 65 Figure 5.20: Solution for filtered and unfiltered PAC model at different timesteps. Top left: The unconstrained simulation solution becomes invalid as shown in the inset at timestep 2. This causes a blow-up at timestep 87, leading to the failure of the simulation. Top Right: The constrained simulation runs for total 674 timesteps until the depletion of chemicals causes the model to terminate. Bottom left: Profile of fluid phase thrombin in unconstrained simulation at timestep 2 on a domain slice at y = 5. Bottom right: Profile of fluid phase thrombinin in constrained simulation at timestep 674 on a domain slice y = 5 slice at y = 5 in bottom left part of Figure 5.20). Therefore, the fluid phase thrombin concentration does not adhere to the structure (positivity) desired, and the simulation does not succeed. If this experiment is allowed to continue, the effect of invalid values will compound over time, rendering the simulation results nonphysical and useless. At timestep 38 the simulation fail because of a blow-up resulting from the accumulation and propagation of negative values. At this point, the code for this simulation reports invalid values such as NaN (datatype: Not a Number). When the filter is applied, it runs for significantly more timesteps (total 674) and terminates naturally. The right side of Figure 5.20 shows the state of FP e2 at the depletion point of the chemicals involved. CHAPTER 6 CONCLUSION AND FUTURE WORK In this dissertation, we have proposed a formalism to perform constrained approximation for the structure-preservation problem and applied it to polynomial-based applications as a postprocessing filter. The central idea of the filter is based on a convex optimization approach that can be cast as a problem of projecting onto a convex set (the feasible set). In [43] we present the mathematical foundations and ideas that form the backbone of the structure-preserving algorithm. The construction of the filter allows for preservation of multiple structures simultaneously, which can be canonical (such as positivity) as well as more involved (such as flux and mass). The efficacy, utility, and robustness are demonstrated using numerical examples of function approximations in [43], PDE applications for cG and dG formulations in [44, 45], and the real-world problem of solution to the platelet aggregation and blood coagulation mathematical model in [45]. Problems in higher dimensions are considered, and the filter is streamlined to efficiently preserve structure for those problems. Convergence analysis for time-dependent PDE empirically demonstrates the negligible effects of filter application on the convergence and accuracy of the solution. The cost analysis of the filter suggests an increase in the computational cost of PDE solvers, but it often requires less than 50% of the total simulation time. The exact study of the time taken by the filter for different elements and meshes for PDE applications is the subject of future work. The work in this dissertation can be extended to multidimensional cG formulations for different models. One of the difficulties in doing so is the handling and inversion of large global mass matrices. Ideas to restructure the global mass matrix using the H 1/2 projection as described in [24] can be used to reduce the cost and overhead involved in handling large matrices. There is also a scope for extension of the alternative hybrid approaches explore in [43] to multidimensional function approximations and PDEs. APPENDIX A ALGORITHMS FOR UNIVARIATE POLYNOMIAL SUBSPACES We present procedures for solving the greedy and averaging optimization procedures in sections 4.1.2 and 4.1.3 under the assumption that V is a complete, univariate polynomial space. More formally, we make three specializing assumptions. The first assumption is that H an L2 -type space. A typical setup in one dimension is that W is a interval in (and possibly equal to) , and a weighted L2 space is defined by a probability density function r: hu, vi L2r := Z W u( x )v( x )r( x )dx. (A.1) The second specializing assumption in this section is that V is a complete polynomial space. For a finite N 2 , the space V contains polynomials up to degree N 1. Then, {v j } N j=1 can be chosen as the first N orthonormal polynomials under the weight r on W. It is classical knowledge that such a family of polynomials satisfies the three-term recurrence: xvn ( x ) = bn+1 vn+1 ( x ) + an+1 vn ( x ) + bn vn with the starting conditions v0 ⌘ 1 and v recurrence coefficients [39]. 1 1 ( x ), n 1, ⌘ 0, where an = an (r) and bn = bn (r) are the The third specializing assumption is that we are in the setup of Example 2.4.1 where the constraints enforce positivity v( x ) 0 for every x 2 W. We will see that this assumption can be relaxed substantially; indeed, we make this assumption here only to clarify some computations. An important technique that we will need to exploit for this special setup is the ability to compute roots of polynomials from their expansion coefficients, i.e., if v 2 V has expansion coefficients {vbj } N j=1 , then the N the spectrum of the ( N 1) ⇥ ( N 1 (complex-valued) roots of v coincide with 1) confederate matrix T = T (v): 68 T (v) = J where e N (vb1 , . . . , vbN 1 2 1 ). bN 1 eN vbN N 1 0 a1 b1 B b1 a2 B B b2 J=B B @ T e , b 1v b2 a3 .. . 1 b3 .. . bN is the cardinal unit vector in the ( N 2 aN 1 C C C C C A (A.2) ebT = 1)st direction and v The matrix J is the Jacobi matrix and is independent of v. We use direct eigenvalue solvers to compute the spectrum of T (v) = v 1 (0). Note that there are back- wards stable versions of the task of computing roots from the spectrum of related matrices [34]. An analogous approach that operates on expansion coefficients in a monomial basis uses the spectrum of the companion matrix. Note that our strategy is rather rudimentary compared to more sophisticated methods for computing roots of polynomials [10], e.g., one can compute polynomial roots on subintervals and perform refinement. However, this consideration is not the main innovation of our algorithm, and so we use the procedure above mainly for simplicity. We do perform a numerical stability check where we switch between companion and confederate matrices depending on which has smaller condition number. In all the examples we attempted for this dissertation, this check was sufficient to robustly and accurately compute roots of polynomials. A.1 Greedy projections With the setup of Example 2.4.1, the problem (4.5) requires us to compute (4.4) y⇤ = argmin sdist (vb, H1 (y)) = argmin v(y)l(y). y2W y2W To minimize the last expression, we can compute the critical points, which are the roots of the derivative. Using (2.5), we have " N d 3 [v(y)l(y)] = l (y) v0 (y)  v2j (y) dy j =1 N v(y)  j =1 v j (y)v0j (y) # . Note that l3 cannot vanish, so the critical points coincide with the roots of the bracketed expression above, which is a degree-(3N d dy [ v ( y ) l ( y )] l3 ( y ) 4) polynomial. Thus, 3N 3 =  j =1 for some coefficients gbj . The computation vbj gbj v j (y) =: g(y), 7! gbj can be accomplished using only the recurrence coefficients in O( N 2 ) time without resorting to, e.g., quadrature. 69 In summary, the global minimum in (4.5) can be computed by first computing the gbj expansion coefficients defined above, and then by computing the spectrum of the (3N 4) ⇥ (3N 4) matrix T ( g). To compute the global minimizer, we then need to evaluate only the discrete minimum of v(y)l(y) over the eigenvalues located in W. A.2 Averaged projections The main task for the averaged projections procedure is to compute the integral in (4.8). In our specialized setup, this task reduces to computing 1 | w1 | Z w1 b̀1 (y)v(y)l(y)dy, which is an N-component vector, where component j of this vector has the entry 1 | w1 | Z w1 v j (y)v(y)l(y)dy. (A.3) The first step is to identify the set w1 defined in (4.7), which in this special case is equivalent to w1 = y 2 [ 1, 1] v(y) < 0 . Therefore, this set can be identified by examining the roots of v, which are the eigenvalues of T (v). Thus, we partition [ 1, 1] into subintervals on which v is single-signed, after which determining the sign of v on an interval can be accomplished by evaluating v in this interval. After w1 is identified as a disjoint collection of subintervals of [ 1, 1], we compute the components of the update (A.3) by employing an M-point Gaussian quadrature rule; since the integrand v j vl is a smooth function on [ 1, 1], this can be completed efficiently. We employ M = N + 1 quadrature points for this same computation. APPENDIX B PARAMETER SELECTION FOR LINE SEARCH USED IN GD-BASED MINIMIZATION To fix the values for c and g (4.11) for the numerical results in Chapter 5, we first choose a set of functions. The choice of the functions depends on factors such as the constraint types and the domain geometry. For example, for an application where positivity is the primary structural concern, a list of functions with values close to zero is a good choice, or for applications with discontinuities in the function or the domain, discontinuous functions would be a good choice. The feasible space for both c and g is (0, 1). A noncomprehensive sampling on (0, 1) is performed for both parameters, and the GD is called with parameters set to permutations of the sampled values. GD linesearch denotes the call to the GD with line search. As its output, the GD linesearch routine provides the number of iterations (denoted by niter) taken by the algorithm. The error refers to the difference between the minimum value found and the known minimum value. It is desirable to choose the parameter values for which GD converges in the least number of iterations as a large number of GD iterations per filter iteration make the filtering process expensive. Another important quantity to consider here is the error. It is desirable to have the GD return with the minimum value as close to the known (or golden) minimum value as possible. With these quantities (niter and err) as the metrics, we prescribe a procedure to select the ideal value of c and g in Algorithm 3. The selection criteria narrow the samples to a range of c and g with the minimal number of iterations and then pick from that subset the ones with the smallest error. In the case of a tie, all the c and g values are included in the return variables. The procedure can be summarized as taking a list of functions that fairly represents the complexity of space and returns two ranges of ideal values: one for each of the parameters c and g, respectively. Since the quadrilateral and hexahedron represent richer space than their other canonical 2D and 3D counterparts, 71 Algorithm 3 is applied and analyzed on a quadrilateral and a hexahedron. Algorithm 3 Experiment to determine ideal values of c and g given a set of functions f i , for i = 0, 1, · · · , n 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: G P samples in (0, 1) for each f i 8 i = 0, 1, · · · , n do H {} for each g j in G do for each ck in G do (niter, err ) GD linesearch ( f i , ck , g j ) n o H H; {ck , g j , niter, err } end for end for end for H1 select ranges of c and g with least niter from H H2 select ranges of c and g with least err from H1 return Ranges for c and g from H2 Future developments in the approaches to find optima in multiple dimensions may improve the efficacy and efficiency of the filter. To this end and to provide robustness, we employ the object-oriented principle by modularizing the minimization part of the filter. We consider different example functions (B.1) on 2D domain [ 1, 1] ⇥ [ 1, 1] and (B.2) on 3D domain [ 1, 1] ⇥ [ 1, 1] ⇥ [ 1, 1], which have varying complexities and projection orders to set the GD parameters in a comprehensive and fair manner. f 0 ( x, y) = ( x + 0.6)2 + (y f 1 ( x, y) = f 2 ( x, y) = ( sin(( x 1, 0, f 5 ( x, y, z) = ( sin(( x 1, 0, 0.1) + 0.5p ) cos(y 0.2). if x 0 and y 0, otherwise . f 3 ( x, y, z) = ( x + 0.6)2 + (y f 4 ( x, y, z) = 0.2)2 . (B.1) 0.2)2 + (z + 0.1)2 . 0.1) + 0.5p ) cos(y 0.2) cos(z if x 0 and y 0 and z 0, otherwise. 0.2). (B.2) Using the orthogonal basis for (4.6), ṽ is obtained, which is then used by Algorithm 3 to find the values of the parameters c and g. For functions that have an exact analytical 72 minimum, it is straightforward to calculate the error between the exact minimum and the optimum returned by the GD. For the discontinuous functions such as f 2 and f 5 , an approximate golden minimum is calculated by projecting the function using polynomial order N = 8 on a dense grid of points in W and finding the least value of the projected polynomial. In all our test cases, GD converges to the exact (or golden) minimum within acceptable tolerance (numerical 0). Therefore, the selection of the parameters is made on the basis of the fewest iterations taken by GD (niter) to converge. Consider k discrete equispaced samples of 2 (0, 1). For k = 9, by applying Algorithm 3, we get the results shown in Table B.1. Based on the results of these tests, we infer that for the numerical experiments, the appropriate choices for c and g in 2D are 0.7 and 0.7, respectively. Following a similar procedure, Algorithm 3 on a hexahedron for functions defined in (B.2), the choices of c and g for 3D are 0.2 and 0.7, respectively. These values are used for all the numerical experiments in this section. The C++ implementation of the filter in Nektar++ [15] supports change in c and g as parameters in the configuration file. Table B.1: Results for 2D experiment on a quadrilateral domain to find the optimal ranges of gradient descent line-search parameters: c and g. M denotes the polynomial order for projected functions (B.1). Number of iterations taken by GD is denoted by niter. The number of quadrature points for projection is constant (Q = 11) f 0 ( x, y) f 1 ( x, y) f 2 ( x, y) N Range c Range g niter Range c Range g niter Range c Range g niter 3 4 5 6 7 8 9 10 [0.4,1) [0.5,1) [0.5,1) (0,1) [0.2,1) [0.7,1) [0.7,1) [0.5,0.7] (0,1) (0,1) [0.3,1) [0.5,1) (0,1) [0.5,1) [0.7,1) [0.5,1) 6 7 7 7 7 6 6 7 [0.7,1) (0,1) [0.6,1) [0.3,1) [0.3,1) (0,1) [0.5,1) [0.5,1) (0,1) [0.3,1) [0.2,1) [0.5,0.8] (0,1) [0.7,1) (0,0.2] [0.3,1) 3 4 4 4 5 4 6 6 (0,1) [0.2,1) (0,1) [0.4,0.9] [0.5,1) [0.3,1) [0.2,0.6] [0.5,1) (0,1) (0,1) (0,1) [0.7,1) (0,1) (0,1) (0,1) [0.8,1) 3 7 7 6 10 10 7 9 REFERENCES [1] Anderson, R., Dobrev, V., Kolev, T., Kuzmin, D., de Luna, M. Q., Rieben, R., and Tomov, V. High-order local maximum principle preserving (MPP) discontinuous Galerkin finite element method for the transport equation. J. Comput. Phys. 334 (2017), 102–124. [2] Anderson, R. W., Dobrev, V. A., Kolev, T. V., and Rieben, R. N. Monotonicity in high-order curvilinear finite element arbitrary Lagrangian–Eulerian remap. Internat. J. Methods Fluids 77, 5 (2015), 249–273. [3] Armijo, L. Minimization of functions having Lipschitz continuous first partial derivatives. Pacific J. Math. 16, 1 (1966), 1–3. [4] Bauschke, H., and Borwein, J. On projection algorithms for solving convex feasibility problems. SIAM Rev. 38, 3 (Sept. 1996), 367–426. [5] Beatson, R. Restricted range approximation by splines and variational inequalities. SIAM J. Numer. Anal. 19, 2 (Apr. 1982), 372–380. [6] Beatson, R. K. The degree of monotone approximation. Pacific J. Math. 74, 1 (1978), 5–14. [7] Berrut, J. P., and Trefethen, L. N. Barycentric Lagrange interpolation. SIAM Rev. 46 (2004), 501–517. [8] Bertsekas, D. P. Nonlinear programming. J. Oper. Res. Soc. 48, 3 (1997), 334–334. [9] Berzins, M. Adaptive polynomial interpolation on evenly spaced meshes. SIAM Rev. 49, 4 (Jan. 2007), 604–627. [10] Boyd, J. P. Computing zeros on a real interval through Chebyshev expansion and polynomial rootfinding. SIAM J. Numer. Anal. 40, 5 (2003), 1666–1682. [11] Boyd, S., Boyd, S. P., and Vandenberghe, L. Convex optimization. Cambridge University press, Cambridge, UK, 2004. [12] Boyd, S., and Vandenberghe, L. Introduction to applied linear algebra: Vectors, matrices, and least squares. Cambridge University Press, Cambridge, UK, 2018. [13] Bregman, L. M. The method of successive projection for finding a common point of convex sets. Sov. Math. Dok. 162, 3 (1965), 688–692. [14] Campos-Pinto, M., Charles, F., and Després, B. Algorithms for positive polynomial approximation. SIAM J. Numer Anal. 57, 1 (Jan. 2019), 148–172. [15] Cantwell, C. D., Moxey, D., Comerford, A., Bolis, A., et al. Nektar++: An open-source spectral/hp element framework. Comput. Phys. Commun. 192 (2015), 205–219. 74 [16] Cheney, W., and Goldstein, A. A. Proximity maps for convex sets. Proc. Amer. Math. Soc. 10, 3 (1959), 448–450. [17] Crockett, J. B., and Chernoff, H. Gradient methods of maximization. Pacific J. Math. 5, 1 (1955), 33–50. [18] Curry, H. B. The method of steepest descent for non-linear minimization problems. Quart. Appl. Math. 2, 3 (1944), 258–261. [19] Deutsch, F., and Hundal, H. The rate of convergence for the cyclic projections algorithm I: Angles between convex sets. J. Approx. Theory 142, 1 (Sept. 2006), 36–55. [20] Deutsch, F. R. Best approximation in Inner Product Spaces. Springer, New York, 2012. [21] DeVore, R. A. Degree of monotone approximation. In Linear Operators and Approximation II/Lineare Operatoren und Approximation II. Springer, New York, 1974, pp. 337–351. [22] Gubin, L. G., Polyak, B. T., and Raik, E. V. The method of projections for finding the common point of convex sets. Comput. Math. Math. Phys. 7, 6 (Jan. 1967), 1–24. [23] Guo, H., and Yang, Y. Bound-preserving discontinuous Galerkin method for compressible miscible displacement in porous media. SIAM J. Sci. Comput. 39, 5 (2017), A1969–A1990. [24] Karniadakis, G., and Sherwin, S. Spectral/hp element methods for computational fluid dynamics. Oxford University Press, Oxford, UK, 2013. [25] Laughton, E., Zala, V., Narayan, A., Kirby, R. M., and Moxey, D. Fast barycentricbased evaluation over spectral/hp elements. Accepted by SIAM J. Sci. Comput. (2021). [26] Leiderman, K., and Fogelson, A. L. Grow with the flow: A spatial–temporal model of platelet deposition and blood coagulation under flow. Math. Med. Biol. 28, 1 (2011), 47–84. [27] LeVeque, R. J., and Yee, H. C. A study of numerical methods for hyperbolic conservation laws with stiff source terms. J. Comput. Phys. 86, 1 (1990), 187–210. [28] Lewis, A. S., Luke, D. R., and Malick, J. Local Linear Convergence for Alternating and Averaged Nonconvex Projections. Found. Comput. Math. 9, 4 (Aug. 2009), 485–513. [29] Lewis, J. Approximation with convex constraints. SIAM Rev. 15, 1 (Jan. 1973), 193– 217. [30] Light, D., and Durran, D. Preserving nonnegativity in discontinuous Galerkin approximations to scalar transport via truncation and mass aware rescaling (TMAR). Mon. Weather Rev. 144, 12 (2016), 4771–4786. [31] Liu, C., Wang, C., and Wang, Y. A structure-preserving, operator splitting scheme for reaction-diffusion equations with detailed balance. J. Comput. Phys. 436 (2021), 110253. 75 [32] Liu, X. D., and Osher, S. Nonoscillatory high order accurate self-similar maximum principle satisfying shock capturing schemes I. SIAM J. Numer. Anal. 33, 2 (1996), 760–779. [33] Lohmann, C., Kuzmin, D., Shadid, J. N., and Mabuza, S. Flux-corrected transport algorithms for continuous Galerkin methods based on high order Bernstein finite elements. J. Comput. Phys. 344 (2017), 151–186. [34] Nakatsukasa, Y., and Noferini, V. On the stability of computing polynomial roots via confederate linearizations. Math. Comp. 85, 301 (2016), 2391–2425. [35] Nochetto, R., and Wahlbin, L. Positivity preserving finite element approximation. Math. Comp. 71, 240 (2002), 1405–1419. [36] Rice, J. Approximation with convex constraints. J. Soc. Ind. Appl. Math. 11, 1 (Mar. 1963), 15–32. [37] Sanders, R. A third-order accurate variation nonexpansive difference scheme for single nonlinear conservation laws. Math. Comp. 51, 184 (1988), 535–558. [38] Shu, C. W., and Osher, S. Efficient implementation of essentially non-oscillatory shock-capturing schemes. J. Comput. Phys. 77, 2 (1988), 439–471. [39] Szeg, G. Orthogonal Polynomials, vol. 23. American Mathematical Society, Providence, RI, 1939. [40] van der Vegt, J. J., Xia, Y., and Xu, Y. Positivity preserving limiters for time-implicit higher order accurate discontinuous Galerkin discretizations. SIAM J. Sci. Comput. 41, 3 (2019), A2037–A2063. [41] Von Neumann, J. Functional Operators (AM-22), Volume II: The Geometry of Orthogonal Spaces. Princeton University Press, Princeton, NJ, 1951. [42] Zahr, M. J., and Persson, P. O. An optimization based discontinuous Galerkin approach for high-order accurate shock tracking (need conf.). In 2018 AIAA Aerospace Sciences Meeting, Kissimmee, FL, American Institute of Aeronautics and Astronautics (2018), p. 0063. [43] Zala, V., Kirby, M., and Narayan, A. Structure-preserving function approximation via convex optimization. SIAM J. Sci. Comput. 42, 5 (2020), A3006–A3029. [44] Zala, V., Kirby, M., and Narayan, A. Structure-preserving nonlinear filtering for continuous and discontinuous Galerkin spectral/hp element methods. Accepted by SIAM J. Sci. Comput. (2020). [45] Zala, V., Kirby, R. M., and Narayan, A. Structure-preserving convex optimization for multidimensional finite element simulations. Under preparation for J. Comput. Phys. (2021). [46] Zhang, X. On positivity-preserving high order discontinuous Galerkin schemes for compressible Navier-Stokes equations. J. Comput. Phys. 328 (2017), 301–343. [47] Zhang, X., and Shu, C. W. On maximum-principle-satisfying high order schemes for scalar conservation laws. J. Comput. Phys. 229, 9 (2010), 3091 – 3120. |
| Reference URL | https://collections.lib.utah.edu/ark:/87278/s64zrm6q |



