{"responseHeader":{"status":0,"QTime":10,"params":{"q":"{!q.op=AND}id:\"713032\"","hl":"true","hl.simple.post":"","hl.fragsize":"5000","fq":"!embargo_tdt:[NOW TO *]","hl.fl":"ocr_t","hl.method":"unified","wt":"json","hl.simple.pre":""}},"response":{"numFound":1,"start":0,"docs":[{"ark_t":"ark:/87278/s6jq494r","setname_s":"ir_uspace","restricted_i":0,"department_t":"Chemical Engineering","format_medium_t":"application/pdf","creator_t":"Sutherland, James Clayton","identifier_t":"uspace,19402","date_t":"2014-01-01","bibliographic_citation_t":"Isaac, B. J., Thornock, J. N., Sutherland, J., Smith, P. J., & Parente, A. (2014). Advanced regression methods for combustion modelling using principal components. Combustion and Flame, 1-30.","mass_i":1515011812,"publisher_t":"Elsevier","description_t":"Modelling the physics of combustion remains a challenge due to a large range of temporal and physical scales which are important in these systems. Detailed chemical kinetic mechanisms are used to describe the chemistry involved in the combustion process yielding highly coupled partial differential equations for each of the chemical species used in the mechanism. Recently, Principal Components Analysis (PCA) has shown promise in its ability to identify a low-dimensional manifold describing the reacting system. Several PCA-based models have been developed which may be well-suited for combustion problems; however, several challenging aspects of the model must be addressed. In this paper, the parameterization of state-space variables and PC-transport equation source terms are investigated. The ability to achieve highly accurate mapping through various nonlinear regression methods is shown. In addition, the effect of PCA-scaling on the ability to regress the surface is investigated. Finally, the present work demonstrates the capabilities of the model by solving a reduced system represented by several PC-transport equations for a perfectly stirred reactor (PSR) configuration.","first_page_t":"1","rights_management_t":"(c) Elsevier ; Authors manuscript from Isaac, B. J., Thornock, J. N., Sutherland, J., Smith, P. J., & Parente, A. (2014). Advanced regression methods for combustion modelling using principal components. Combustion and Flame. http://dx.doi.org/10.1016/j.combustflame.2015.03.008.","title_t":"Advanced regression methods for combustion modelling using principal components","id":713032,"publication_type_t":"pre-print","parent_i":0,"type_t":"Text","thumb_s":"/8e/20/8e20038b873a81d6ca31f452622f87d75bcc8090.jpg","last_page_t":"30","oldid_t":"uspace 11049","metadata_cataloger_t":"CLR","format_t":"application/pdf","modified_tdt":"2015-05-04T00:00:00Z","school_or_college_t":"College of Engineering","language_t":"eng","file_s":"/6a/ff/6aff7aa929598b3040689a927f5d36bab361c357.pdf","format_extent_t":"2,168,601 bytes","other_author_t":"Isaac, Benjamin J.; Thornock, Jeremy N.; Smith, Philip J.; Parente, Alessandro","created_tdt":"2015-05-04T00:00:00Z","_version_":1664094557860528128,"ocr_t":"Advanced regression methods for combustion modelling using principal components Benjamin J. Isaaca,b,, Jeremy N. Thornocka, James Sutherlanda, Philip J. Smitha, Alessandro Parenteb aDepartment of Chemical Engineering, University of Utah, Salt Lake City, UT, 84112, USA bService d'Aero-Thermo-Mecanique, Universite Libre de Bruxelles, Bruxelles, Belgium Abstract Modelling the physics of combustion remains a challenge due to a large range of temporal and physical scales which are important in these systems. Detailed chemical kinetic mechanisms are used to describe the chemistry involved in the combustion process yielding highly coupled partial dierential equations for each of the chemical species used in the mechanism. Recently, Principal Components Analysis (PCA) has shown promise in its ability to identify a low-dimensional manifold describing the reacting system. Several PCA-based models have been developed which may be well-suited for combustion problems; however, several challenging aspects of the model must be addressed. In this paper, the param- eterization of state-space variables and PC-transport equation source terms are investigated. The ability to achieve highly accurate mapping through various nonlinear regression methods is shown. In addition, the eect of PCA-scaling on the ability to regress the surface is investigated. Finally, the present work demonstrates the capabilities of the model by solving a reduced system repre- sented by several PC-transport equations for a perfectly stirred reactor (PSR) conguration. Keywords: Combustion; Nonlinear Regression; Low-dimensional manifolds; Principal Component Analysis; Reacting Flows; Reduced-order modelling. Corresponding author. Address: 155 South 1452 East, Room 350, Salt Lake City, UT 84112, USA. Phone: 1 801 585 1456. Email addresses: Benjamin.J.Isaac@utah.edu (Benjamin J. Isaac), J.Thornock@utah.edu (Jeremy N. Thornock), James.Sutherland@utah.edu (James Sutherland), Philip.Smith@utah.edu (Philip J. Smith), Alessandro.Parente@ulb.ac.be (Alessandro Parente) Preprint submitted to Elsevier March 9, 2015 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript 1. Introduction The ability to accurately model a turbulent combustion system remains chal- lenging due to the complex nature of combustion systems. A simple fuel such as CH4 requires 53 species and 325 chemical reactions [1] to be accurately de- scribed. More complex fuels require increasingly complex chemical mechanisms. Each resolved chemical species requires a conservation equation which is a cou- pled, nonlinear partial dierential equation. Such systems are only possible to solve under very limited situations at this time due to computational costs. Current computational expenses result in a need for reduced models which can adequately describe the chemical reactions. Many methods attempt to reduce the complexity of the mechanism by splitting the system into slow and fast variables, using equilibrium assumptions for fast chemical processes, and occu- pying the computational resources on the more pertinent evolution of species within the reacting system [2, 3]. Indeed, in these complex combustion reac- tion mechanisms many of the species evolve at time-scales much larger than the time-scales of interest, allowing for decoupling of fast and slow processes while maintaining accuracy. Low-dimensional manifolds exist in these systems which can describe the governing characteristics of the ames. Several models take advantage of this, including the steady laminar amelet model (SLFM) [4, 5, 6], amelet-generated manifolds (FGM) [7, 8], or the ame prolongation of ILDM (FPI) [9, 10, 11] to name a few. As a fundamental example, the steady laminar amelet model uses the mixture fraction and mixture fraction variance to de- scribe the ame as an ensemble of steady laminar diusion ames undergoing various strain rates. In some cases, this provides a good representation of the entire system with a reduced number of variables. Recently, principal component analysis (PCA) has been investigated for its use in combustion modelling. Several advantages of PCA include: its ability to identify orthogonal variables which are the best linear representation of the system; its ability to reduce in dimensionality requiring fewer coordinates; and the ability to do the analysis on canonical systems, such as the counter diusion ames or empirical data-sets containing highly complex turbulent chemistry interaction. Parente et al. [12, 13] used PCA to identify the low-dimensional manifold in one-dimensional turbulence and experimental data. Biglari and 2 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript Sutherland [14] and Yang and Pope [15, 16] enhanced the capability of the PCA concept by combining the analysis with nonlinear regression, allowing a nonlinear mapping between state-space variables and the linear PCA basis. The work of Biglari and Sutherland showed that the PC parameterization is superior to the standard amelet parameterization, for the ODT data-set investigated in the study. Mirgolbabaei and Echekki [17] extended the nonlinear mapping concept using articial neural networks and investigated the potential of kernel PCA [18, 19], showing the high compression potential derived by transforming the initial problem into a non-linear featured space where linear PCA is carried out. In addition, several combustion models have been proposed based on the concepts from PCA. Sutherland and Parente [20] derived transport equations for the principal components (PCs), and discussed the feasibility of a model where the PCs are used directly to construct state-space variables. Biglari and Sutherland [14] extended the concept of transporting the PCs by suggesting the nonlinear regression in order to increase the accuracy and reducibility of the model. Coussement et al. [21], Isaac et al. [22] and other groups [23] proposed transporting a reduced set of state-space variables and used the PC basis for reconstructing the variables which are not represented. Naja-Yazdi et al. [24] used PCA to identify optimal progress variables to use the amelet-generated manifold framework. The present work seeks to advance the understanding and application of the PC-transport approach of Sutherland and Parente [20, 14] by rst analyzing the eect of several scaling methods on the PC basis, and the resultant ability to regress the nonlinear state-space variables to the PC basis. Various regression methods used in previous studies [14, 17], as well as several alternative methods are analyzed in their ability to approximate the reacting state-space from the PCs. In order to demonstrate the accuracy of the method within a numerical solver, an unsteady perfectly stirred reactor (PSR) calculation is shown using the PC-transport approach. The PSR provides a validation of the approach by comparing the reduced model to the detailed simulation results. To the authors knowledge all published analysis on the PC-transport concept using nonlinear regression has been carried out on various data-sets using a priori analysis [14, 17, 19, 18]. Only recently, a posteriori work has begun in this 3 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript area. Specically, the work of Mirgolbabaei [25], who provides an a posteriori demonstration of the nonlinear PC-transport approach using one-dimensional turbulence (ODT) simulations. 2. Theory A principal component analysis is done by taking a data-set consisting of n observations and Q independent variables and organizing it as an n Q matrix (X). The data X is centered to zero by its corresponding means X, and scaled by the diagonal matrix, D, containing a scaling value for each of the k variables: Xs = (X X )D1 (1) For sake of simplicity, Xs will be simply indicated as X in the following. In a PC analysis, the principal components (Z) are identied by performing an eigenvalue decomposition of the covariance matrix of X: 1 Q 1 XTX = A1LA (2) The eigenvector matrix A (referred to here as a `basis matrix') is then used to project the original state-space into PC space: Z = XA (3) Now given a subset of the basis matrix A, denoted as Aq and applying the previous equation, an approximation of the original centered and scaled state- space can be made using the following: X ZqATq : (4) In the PC analysis, the largest eigenvalues correspond to the rst columns of A. This means the largest amount of variance in the original variables is described by the rst PCs. Accordingly, when one truncates the basis matrix (Aq), the resultant approximation from Equation 4 may be very accurate, while repre- senting the system with fewer variables. In the work of Sutherland and Parente [20], a combustion model is proposed where conservation equations for the PCs are derived from the general species 4 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript transport equation [26]: @ @t (Yk) + @ @xi (uiYk) = @ @xi Dk @Yk @xi + Rk (5) Where Rk is the net production rate of species k. One can easily derive the transport equations for the PCs (Zq) given the basis matrix Aq, the scaling vector dk, being the diagonal components of D, and the centering vector Yk: @ @t (Zq) + @ @xi (uiZq) = @ @xi DZq @ @xi (Zq) + sZq (6) sZq= 1 XQ k=1 Rk dk Akq (7) where sZq is simply the net production rate of the principal component. The term DZq @ @xi (Zq) is the diusion ux for the principal component. For a more detailed discussion on the treatment of the PCs diusive ux, where molecular diusion is important refer to [27]. According to the proposed formulation, one can theoretically use PCA with its inherent advantages. These advantages in- clude: the ability to represent the system with a reduced number of variables; the option to include a predetermined amount of reconstruction error (depen- dent on q, the number of retained PCs), and possibly a reduction in stiness if the selected PCs are highly weighted with reacting species that change more slowly, such as the major species. In order to use PCA to its fullest potential, several aspects of PCA must be studied. One of these aspects, is how the data is scaled (Equation 1). The various eects of scaling have been studied previously in [14, 28, 22]. The same approach has been followed in the present paper to nd the best scaling option for the present application of PCA, using a data-set which exhibits physics of interest. A one-dimensional turbulence (ODT) data-set of a non-premixed synthesis/air jet has been considered here [29, 30]. The simulation includes 11 chemical species [31] (H2, O2, O, OH, H2O, H, HO2, CO, CO2, HCO, N2), and 21 chemical reactions and it is initialized with a temperature of 500K, with air as the oxidizer (0.7241 N2 and 0.2759 O2 by mass) and a fuel stream containing 0.0078 H2, 0.5511 CO, and 0.4411 N2 by mass. The ODT realizations are saved on a uniform grid of 672 grid points evenly spaced over a 0:01 m domain. The velocity eld is initialized with a Reynolds numbers of 2500. The 5 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript ODT data-set is particularly interesting because of the turbulence/chemistry interaction observed in the data, including physical eects such as extinction and re-ignition. Similarly to previous investigations [14, 28, 22], the a priori analysis showed that pareto scaling has a distinct advantage for major species and source terms reconstruction. The a priori analyses showed, however, that at least 8 PCs were required to accurately reconstruct the ODT data-set and the corresponding source terms, due to the linear nature of the PC-based model. Considering the original 11 degrees of freedom of the system (with dierential diusion, enthalpy and ele- mental mass fractions are not constant), q = 8 implies only a minor problem reduction. An alternative to the direct reconstruction of X is to use nonlinear regression functions, which can be used to map the nonlinear reaction rates or nonlinear species concentrations to the lower dimensional representation given by the PCs. Biglari and Sutherland [14] suggest applying a nonlinear mapping to the linear underlying surface by using nonlinear regression. It has been shown [14, 17, 15, 18, 19] that nonlinear regression allows to fully exploit the underly- ing manifold identied by the PCs. It is important to note that the linear basis derived from the PCs is critical as it allows for the derivation of simple transport equations; however, by using nonlinear functions on top of the basis, the model can capture the nonlinearities which are present in combustion systems. 2.1. Regression models In this study, nonlinear regression is used to model the highly nonlinear state-space variables as a function of the principal components (Z). In place of Equation 4, now the various state-space variables and PC source terms (sZ) are mapped to the PC basis using the nonlinear regression function f: f (Zq) (8) where represents the state-space variables, or in terms of regression, the dependent variables (i.e. Yi, T, , and, sZ). Until now, two nonlinear regression methods have been applied to mapping to Z. In the work of Biglari and Sutherland [14] and Pope [15], multivariate adaptive regression splines are used. In the work of Mirgolbabaei and Echekki 6 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript [17, 18], articial neural networks are investigated. Here, in addition to pre- viously used regression techniques, several other methods are investigated, in- cluding support vector regression [32], and gaussian process regression [33, 34]. In summary, the following regression techniques are investigated: Linear Regression Model (LIN) The linear model applied in multiple dimensions is of the form: = Za + v (9) where a is the regression coecient vector and v is the intercept vector [35]. The implementation for the linear model found in the statistical computing software R [36] was used for the regression analysis. Multivariate Adaptive Regression Splines (MARS) Multivariate adaptive regression splines use the concept of building up the model from product spline basis functions. This model creates a num- ber of basis functions, and automatically determines knot location and implements splines at knot boundaries. The model is of the form: = MX m=1 amBm(Z): (10) where Bm are the basis functions and am are the expansion coecients [37]. The implementation of MARS, found in the mda package of the statistical computing software R [36], was used for the regression analysis. The default options for MARS were used. The mda package determines the degree of the polynomials as well as the number of knot boundaries, given user settings such as: degree (default is 1, specifying the interaction degree), threshold (default is 0.001), and penalty (default is 2, specifying the cost per degree of freedom charge). Articial Neural Networks (ANN) Articial neural networks uses the concept of networking various layers of estimation resulting in a highly accurate output layer. Following the theory of Pao [38], the model works as follows: rst, t hidden networks (NETt) are calculated as a weighted (wt) sum of the training data inputs 7 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript (ki = [Z;]): NETt = XN i=1 wtiki + bi: (11) A sigmoid transfer function is then used to generate an output for the network: Zt = [1 + exp (NETt)]1 : (12) Next, the output networks are calculated: NET = Xh t=1 tZt + bo (13) Again, the network is scaled and a prediction of is then given: = [1 + exp (NET)]1 : (14) In the present study, the implementation of ANN (ANNGA) in R [36] was used. One hidden layer with 20 neurons and one additional neuron in the output layer were used for the design, 1000 chromosomes for the popu- lation of each generation, a mutation rate of 0.2 was used, and crossover rate of 0.6. Support Vector Regression (SVR) Support vector regression is a subset of support vector machines (SVM). The idea behind SVR is again to create a model which predicts sZ given Z using learning machines which implement the structural risk minimization inductive principle. The basic model form is = XN i=1 ( i i)K (Z0;Zi) (15) where i and i are Lagrange multipliers, and K (Z0;Zi) is the kernel operator. In the current study, a radial-based kernel was used and the optimum kernel hyper-parameter as well as the insensitive-loss function were determined by doing various calculations over a range of input pa- rameters. The implementation of SVM within the e1071 package for R was used for the regression analysis of SVR. The kernel hyper parameter gamma was optimized by running a series of SVM ts over a range of values (exp(3) to exp(3)), a value of 1e-3 was used for epsilon, and the cost was optimized by running over a range of values (exp(3) to exp(3)). 8 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript Gaussian Process Regression (GPR) Gaussian process regression is founded on the idea that dependent vari- ables can be described by a gaussian distribution [33, 34]: N 0;K(Z;Z) + 2n I (16) Here Z is the data matrix containing all sample points in PC space; K(Z;Z) is the kernel function for Z; in the current study, the gaussian kernel is used: K(Zp;Zq) = 2 sexp 1 2 (Zp;Zq)T W(Zp;Zq) : (17) Given query points Z it can be shown that a prediction can be made using the following formula: = KT K+ 2n I 1 (18) where K = K(Z;Z) and K = K(Z;Z). A value of 1 was used as the initial guess for the kernel's hyper-parameters: the characteristic length scale, and signal variance. A gradient-based marginal likelihood optimiza- tion was used nd the optimal values. The GPR implementation from the MATLAB toolbox gpml [34] was used for the regression analysis of GPR. The hyper parameters were found using the gradient-based marginal likelihood functions in the toolbox. In order to map the highly nonlinear reaction rate surface (dependent variables) to PC space (independent variables) it is useful to understand how nonlinear the reaction rates and other state-space variables are with respect to the underlying manifold represented by the principal components. A simple way to do this in multiple dimensions is to divide the independent variable space onto a coarse grid, and assess locally the variation of dependent variables within a local section of the independent variable space. Locally, if the dependent variable has a large variation, then the ability to regress the dependent variable locally will be more dicult because of the nonlinear nature or even local scatter in the data. The following equation is used to calculate the locally normalized variance for the ith coarse grid cell (i ): i = ((Zq i)) ((Zq)) (19) 9 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript where (x) = D (x hxi)2 E is the variance function which is calculated on the observations within the ith coarse grid cell ((Zq i)) or for all observations ((Zq)). Now, summing over all course grid cells in PC space, we obtain the overall manifold nonlinearity for dependent variable : = Xc i=1 i (20) Table 1 shows the manifold nonlinearity calculation for the various dependent variables in the ODT data-set mentioned previously. It is clear from the analysis that some scaling methods have distinct advantages for several of the depen- dent variables. In particular, pareto scaling has an advantage when comparing several major species (O2, CO, CO2, and N2), temperature, and density, with a weaker performance for some of the radical species (OH, H). All methods show the regression for sZ1 is challenging; however, the regression for sZ2 appears promising with pareto scaling. Table 1: Manifold nonlinearity () for state-space variables, while using dierent scaling methods. std range pareto vast level H2 5:7 11:4 10:8 12:3 3:5 O2 4:0 1:9 0:3 0:7 4:9 O 12:6 11:8 17:2 28:8 7:5 OH 16:6 17:3 21:5 41:5 6:8 H2O 6:1 5:1 4:9 5:3 7:0 H 14:6 22:3 30:0 46:1 5:2 HO2 7:1 9:6 6:2 3:3 7:2 CO 2:4 1:3 0:1 1:8 1:7 CO2 5:0 5:0 0:8 3:0 6:2 HCO 6:9 14:6 18:1 29:4 2:5 N2 1:7 0:7 0:1 0:7 1:4 T 7:0 6:5 2:0 4:0 9:2 7:8 6:9 2:5 5:0 9:6 sZ1 256:5 292:2 300:5 404:0 210:1 sZ2 150:0 172:7 25:8 143:7 95:9 Given the results for both the state-space reconstruction and the manifold nonlinearity, it is clear that the pareto scaling method has some unique ad- vantages for this particular data-set dealing with syngas combustion. Several other studies have reached similar conclusions with the pareto scaling method 10 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript as shown in [21], [22], [28]. With this observation in mind, the various regression models are now tested with the pareto scaling method. The nonlinear regression analysis is done using a combination of computing software packages including the statistical computing software R [36], and MATLAB [39], as described pre- viously. The R code implementations for LIN, MARS, ANN, and SVR were used. For GPR, the MATLAB toolbox gpml [34] was employed. The models are trained on n = 5000 sample points evenly distributed over Z space, with q = 2 or 3. The models are then tested on another subset of points of the same size, ensuring that training points are not used again as testing points. This is done to ensure that over-tting is avoided. Table 2 shows the regression results for sZ1 as a function of Z, with q = 2 and q = 3, using normalized root mean squared error (nrms(xp; x)=max((xp); (x))) or R2 error ( NP i=1 (xp;i x)2= NP i=1 (xi x)2) metrics. As expected, the linear regres- sion method has diculty mapping the highly nonlinear dependent variables. Complex methods also struggle with the mapping while q = 2. Table 1 shows that sZ1 is highly non-linear. One can easily conclude that methods such as linear regression will fail, polynomical methods such as MARS may also strug- gle given the degree of non-linearity. Methods which use local tuning (ANN, SVR, GPR) may be able to better approximate the problematic regions of the manifold. When moving to q = 3, the later 3 methods are beginning to show higher accuracy. In this particular case, GPR produces the most accurate recon- struction. The approximation shows a vast improvement especially if compared with the results of the direct computation (Equation 7), with the same level of accuracy being achieved with q = 8. Table 2: Nrms error and R2 statistics for the prediction of sZ1 while using pareto scaling and q = 2 or q = 3. Method nrms error (q = 2) R2 (q = 2) nrms error (q = 3) R2 (q = 3) LIN 0.99 0.02 0.67 0.55 MARS 0.30 0.91 0.26 0.93 ANN 0.22 0.95 0.20 0.96 SVR 0.23 0.95 0.19 0.97 GPR 0.22 0.95 0.18 0.97 It is important to note that the results given in Table 2 are related to the specic implementation of the regression methods, as well as to any tun- 11 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript ing or optimization that was performed for each method. Indeed, the results for the GPR regression may be optimal because of the robust optimization of the hyper-parameters that the implementation utilizes. The various regression methods may indeed improve given more tuning, or using dierent implemen- tations. However, tuning the dierent regression methods is not the purpose of the present study. The focus of the present investigation is the benchmark of various non-linear approaches, based on state-of-the art implementations found in the literature. Ultimately, the PC-Transport approach will be utilized within a CFD solver. Several factors are important in deciding which regression method to use. In addition to the methods accuracy, the methods ease of use, its applicability to dierent problems, its ability to optimize tuning parameters, and its expense within a CFD algorithm are important factors. Because of the numerous varia- tions and implementations of the regression methods, general conclusions about the methods cannot be made. However, these factors can be addressed for the implementations used in the current study. Table 3 summarizes these factors for the various regression methods. Table 3: Summary of the relative accuracy, ease of use, applicability to problems of a certain size, optimization, and relative cost for the various regression methods. A scale, ranging from 1 to 3 is used to rank the regression methods, 1 representing poor performance, and 3 excellent performance. Method Accuracy Ease Problem size Optimization Cost LIN 1 3 3 - 3 MARS 2 2 3 2 3 ANN 3 1 3 2 2 SVR 3 1 1 2 1 GPR 3 1 1 3 1 While MARS and LIN are easier to use, the authors found the implementa- tion of ANN, SVR and GPR the most dicult to use, due to the complexity of the methods and the various inputs required to use them. Both SVR and GPR methods employ qxq matrix inversions (q being the number of observations), which make the method slow with larger data sets. GPR often took the longest to run, but required the smallest amount of optimization work from the user due to the minimization functions, which optimize the methods hyper-parameters. As far as run-time costs, all methods except for SVR and GPR may be suitable. 12 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript It is however possible to tabulate the regression results and use a simple table look-up to reduce the run-time costs associated with the expensive methods. 2.2. Subset PCA In the work of Mirgolbabaei and Echekki [17], the PCA analysis is done on a subset of species in order to recover suciently accurate source terms. This has the benet of removing certain species which may be contributing highly nonlinear source terms to sZq . The drawback to doing this is that there is no guarantee that the underlying manifold computed from the subset will be able to adequately predict the species removed from the analysis. In the current study, the retained species are selected by choosing variables which tend to pertain to the slower chemical time-scales of the system, such as the major species. The following subset of species were selected for the present analysis: H2, O2, H2O, CO and CO2. With the selected subset of species, the PCA analysis is repeated, again with pareto scaling. Figure 1 shows the scree plot [40], which gives the percentage of variance accounted for while selecting q PCs. The gure compares the full PCA version using 11 variables and the subset PCA using 5. It is clear that the PCA based on the subset of variables represents the variation in the system with fewer variables. Figure 1: Scree plot from the eigenvalue matrix, showing the fraction of explained variance (y-axis) as a function of the number of PCs (q) for the system containing a subset of the original species ('x' markers), and the full system ('o' markers). Table 4 shows the error statistics for the entire set of state variables while using GPR and pareto scaling. It is interesting to note that even though several 13 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript of these variables were not included in the analysis, the PCA basis computed from the major species in combination with the nonlinear regression is sucient for mapping these highly nonlinear minor species. Table 4: nrms error and R2 statistics for the prediction of while using pareto scaling and q = 2. nrms error R2 H2 0.05 0.997 O2 0.04 0.999 O 0.06 0.996 OH 0.07 0.995 H2O 0.06 0.997 H 0.05 0.997 HO2 0.17 0.969 CO 0.05 0.998 CO2 0.05 0.997 HCO 0.03 0.999 N2 0.04 0.998 T 0.04 0.998 0.04 0.999 sZ1 0.22 0.949 sZ2 0.16 0.974 The subset PCA also allows to more easily associate a physical interpre- tation to the PC structure. Table 5 shows the basis matrix weights from the PCA analysis on the major species. The weights from the rst PC have large positive values for carbon containing variables (CO, CO2), and a large negative value on the oxidizer (O2). This appears to be very similar in nature to Bil- ger's mixture fraction [41], . Figure 2 shows a plot of Z1 against ; the plot shows that Z1 is clearly correlated with . The weights for Z2 show positive correlations for H2, O2 and CO, with negative correlations for H2O and CO2. These weights appear to be related to the extent of reaction, where reactants have negative stoichiometric coecients, and products have positive reaction coecients. With a larger initial mass-based concentration of CO (compared with H2), a large amount of CO2 is produced, and a much smaller amount of H2 is present leading to a smaller positive weight on H2 and smaller negative weight for the product H2O. It is interesting to point out that without any prior understanding or assumptions of the combustion systems, the PC analysis is able to identify two important variables which are often used to characterize 14 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript combustion systems. Table 5: Eigenvector matrix, A, from the PC analysis. species weight Z1 Z2 Z3 Z4 Z5 H2 0.047 0.117 -0.302 0.900 0.288 O2 -0.627 0.119 -0.034 -0.230 0.734 H2O 0.176 -0.186 0.895 0.222 0.292 CO 0.624 0.656 -0.040 -0.243 0.348 CO2 0.431 -0.713 -0.325 -1.124 0.414 Figure 2: A scatter plot of mixture fraction (x-axis) versus Z1 (y-axis), illustrating the corre- lation between the variables. It is evident that the linear PC model in conjunction with a nonlinear re- gression has the potential of delivering accurate state-space variables as well as relatively accurate reaction rates for the ODT data-set that has been studied in the current section. 3. Results and discussion As a rst step in advancing the PCA based models, a perfectly stirred reactor is used, which contains complexity in reaction space, without complexity from mixing. This system is ideal for demonstrating the approach as it is simple to implement, compute, and validate. 3.1. Perfectly stirred reactor An implementation for the perfectly stirred reactor was made using MAT- LAB. The following governing equations were implemented and solved using the 15 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript CVODE toolbox in MATLAB [42]: dH dt = H0 H (21) dYi dt = Y 0 i Yi + RiWs;i (22) where H is the mixture enthalpy, Yi and Ri are the ith species mass fraction and molar reaction rate (kmole=m3=s), (seconds) is a constant representing the residence time through the reactor, Ws;i is the ith species molecular mass, and is the density (kg=m3). The temporal solution to the equations are solved using the Newton nonlinear solver, and the BDF multi-step method. The problem is initially solved using a stoichiometric mixture of syngas-air using the same mechanism which was used for the ODT data-set ([31]), where the mechanism includes 11 chemical species and 21 reactions. The inlet conditions for the re- actor (Y 0 i ) are set at an equivalence ratio of 1 with a temperature of 300K. The initial conditions for the reactor (Yi) are set at chemical equilibrium us- ing a Gibbs free energy minimization method (constant enthalpy and pressure). The elemental composition and enthalpy of the inlet mixture yield an equilib- rium solution which is set as the initial condition for all of the PSR cases. The temporal solution of the system is then solved until a steady-state solution is reached. This process is repeated for various residence times between 105 and 10 seconds. Each PSR simulation is modelled assuming constant volume, resi- dence time, and pressure. All PSR simulations (including the transient solution) are then assembled into one data-set. The PCA process described in Section 2 is then applied to the data to create the basis matrix Aq, and the regression functions f for the state-space variables, . The approach is then tested with various values of , which were not used when creating the data-set. The regression of is carried out using q = 2 resulting in R2 of 0:9995 or higher for all variables including sZq . The simulations are then performed with 2 transport equations instead of 11, yielding a signicant reduction. Figures 3a- 8b show the temperature and species mass fractions of the system. The markers show the steady-state solution for a given using the PC-transport model. The underlying solid-lines in the gures show the full solution calculated over a range of residence times. The top plot (a) shows the results of the model using GPR for the nonlinear mapping with q = 2, and the results on the bottom (b) show the 16 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript standard model without the regression step while varying q. The results show remarkable accuracy for the model with regression over the range of residence times for the predicted temperatures, and both major and minor species. A similar degree of accuracy is not observed in the model without regression until q = 7. In the current system, constant enthalpy and elemental mass is observed yielding 7 degrees of freedom, which would imply virtually no reduction due to the degrees of freedom. (a) (b) Figure 3: PSR temperature as a function of the residence time, with the solid-line representing the full solution. The markers represent the results for the model with GPR regression (a) using q = 2 PCs, and the standard model without regression (b) while varying q. Although the previous gures have shown the accuracy of the models for the 17 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript (a) (b) Figure 4: Major species products as a function of the residence time, with the solid-line representing the full solution. The markers represent the results for the model with GPR regression (a) using q = 2 PCs, and the standard model without regression (b) while varying q. 18 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript (a) (b) Figure 5: Major species reactants as a function of the residence time, with the solid-line representing the full solution. The markers represent the results for the model with GPR regression (a) using q = 2 PCs, and the standard model without regression (b) while varying q. 19 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript (a) (b) Figure 6: Minor species as a function of the residence time, with the solid-line representing the full solution. The markers represent the results for the model with GPR regression (a) using q = 2 PCs, and the standard model without regression (b) while varying q. 20 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript (a) (b) Figure 7: Minor species as a function of the residence time, with the solid-line representing the full solution. The markers represent the results for the model with GPR regression (a) using q = 2 PCs, and the standard model without regression (b) while varying q. 21 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript (a) (b) Figure 8: Minor species as a function of the residence time, with the solid-line representing the full solution. The markers represent the results for the model with GPR regression (a) using q = 2 PCs, and the standard model without regression (b) while varying q. 22 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript steady-state solution, accurate representation of the transient solution is also essential. Figure 9 shows the transient solution for a reactor with a residence time of 104 s. Figure 9a shows the evolution of temperature and Figure 9b the evolution of the OH radical mass fraction. The `o' markers in the gures show the results for the regression method using only q = 2 PCs. As observed, an accurate transient solution is achieved given the signicant reduction provided by the method. Accurate prediction of the PC transport source terms is essential to the PCA based model. In order to illustrate this, three cases with residence times of 105 s, 104 s and 103 s were selected. The PC source terms with no ap- proximation from the training data set are computed using: sZ = R d A. These source terms are then compared with the source-terms computed from the re- gression analysis at run-time. Figure 10 shows the transient results of the rst and second PC source terms for the three dierent cases. It is evident that the regression method gives a good approximation of the actual source terms (indi- cated with the solid black line). As observed, both the rst and second source terms are accurately predicted, temporally, by the regression method. One non- linear regression is able to accurately predict the source terms for three dierent residence times. These results indicate that the PCs yielded an optimal basis for regression, being able to parameterize the non-linear source terms. 4. Conclusion The current work has addressed the ability to use nonlinear regression meth- ods to estimate source-terms for the PC-transport combustion model. Various nonlinear regression methods have been analyzed showing the ability to pro- duce accurate estimation, even when using a lower number of Z. In particular, the SVM and GPR methods have shown improved accuracy in estimating . A method for dening the regressibility of a manifold has been presented. In addition, the eect of the various PCA-scaling methods on the regressibility of the system has been assessed. The pareto scaling method appears to achieve the greatest reduction with fewer components, and produces the most regressible surface. The current work outlines an example of an a priori analysis which pro- vides the best regression and scaling method for a given turbulent combustion 23 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript (a) (b) Figure 9: Temperature [K] (a), and OH radical mass fraction (b) as a function of time. Given a residence time of 104 [s], and the chemical equilibrium solution (constant enthalpy and pressure) as the initial condition, the temporal evolution is shown. The solid-line represents the solution given the full system of equations. The markers represent the results for the either model, with `o' markers for the solution using regression (q = 2 PCs), or `+' markers for the solution using the standard model (q = 7 PCs). 24 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript (a) (b) Figure 10: Comparison of regressed PC source terms as a function of time, with (a) and (b) showing the results for the rst and seconds PC source terms. Several cases are shown, with the following residence times: 105 s (`o' markers), 104 s (`x' markers), and 103 s (`' markers). The solid-line is the actual PC source term for the various residence times. 25 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript data-set. The work includes the rst demonstration of the PC-transport model using nonlinear regression within a numerical solver. In the case of the PSR, the model provided a computational reduction factor of 0:71, resulting in an accu- rate representation of the original system with q = 2 variables of the 7 degrees of freedom in the system. Future work will include a validation study, look- ing into how the approach compares with experimental values, and with other combustion models. Acknowledgements We are grateful to our sponsor for which part of the present research was funded: The National Nuclear Security Administration under the Accelerating Development of Retrottable CO2 Capture Technologies through Predictivity program through DOE Cooperative Agreement DE NA 00 00 740. References [1] G. Smith, D. Golden, M. Frenklach, N. Moriarty, B. Eiteneer, M. Gold- enberg, C. Bowman, R. Hanson, S. Song, W. Gardiner Jr, et al., Gri- mechanism 3.0. [2] R. Fox, Computational Models for Turbulent Reacting Flows, Cambridge University Press, 2003. [3] W. Jones, R. Stelios, Rate-controlled constrained equilibrium: Formulation and application to nonpremixed laminar ames, Combustion and Flame 142 (2005) 223{234. [4] N. Peters, Laminar diusion amelet models in non-premixed turbulent combustion, Progress in Energy and Combustion Science 10 (1984) 319{ 339. [5] N. Peters, Laminar amelet concepts in turbulent combustion, Proceedings of the Combustion Institute 24 (1986) 1231{1250. [6] H. Pitsch, N. Peters, A consistent amelet formulation for non-premixed combustion considering dierential diusion eects, Combust. Flame 114 (1998) 26{40. 26 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript [7] J. v. Oijen, L. d. Goey, Modelling of premixed laminar ames using amelet-generated manifolds, Combustion Science and Technology 161 (1) (2000) 113{137. [8] J. Van Oijen, L. De Goey, Modelling of premixed counter ow ames using the amelet-generated manifold method, Combustion Theory and Mod- elling 6 (3) (2002) 463{478. [9] O. Gicquel, N. Darabiha, D. Thevenin, Liminar premixed hydrogen/air counter ow ame simulations using ame prolongation of ILDM with dif- ferential diusion, Proceedings of the Combustion Institute 28 (2) (2000) 1901{1908. [10] B. Fiorina, R. Baron, O. Gicquel, D. Thevenin, S. Carpentier, N. Dara- biha, et al., Modelling non-adiabatic partially premixed ames using ame- prolongation of ILDM, Combustion Theory and Modelling 7 (3) (2003) 449{470. [11] B. Fiorina, O. Gicquel, S. Carpentier, N. Darabiha, Validation of the FPI chemistry reduction method for diluted nonadiabatic premixed ames, Combustion science and technology 176 (5-6) (2004) 785{797. [12] A. Parente, J. C. Sutherland, P. J. Smith, L. Tognotti, Identication of low- dimensional manifolds in turbulent ames, Proc. Combust. Inst. 32 (2009) 1579 { 1586. [13] A. Parente, J. C. Sutherland, B. B. Dally, L. Tognotti, P. J. Smith, Inves- tigation of the MILD combustion regime via principal component analysis, Proceedings of the Combustion Institute 33 (2) (2011) 3333{3341. [14] A. Biglari, J. C. Sutherland, A lter-independent model identication tech- nique for turbulent combustion modeling, Combustion and Flame. [15] S. B. Pope, Small scales, many species and the manifold challenges of tur- bulent combustion, Proc. Combust. Inst. 34 (2013) 1 { 31. [16] Y. Yang, S. B. Pope, J. H. Chen, Empirical low-dimensional manifolds in composition space, Combustion and Flame 160 (10) (2013) 1967 { 1980. 27 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript [17] H. Mirgolbabaei, T. Echekki, A novel principal component analysis-based acceleration scheme for LES{ODT: An a priori study, Combustion and Flame 160 (2013) 898 { 908. [18] H. Mirgolbabaei, T. Echekki, Nonlinear reduction of com- bustion composition space with kernel principal component analysis, Combustion and Flame 161 (1) (2014) 118 { 126. doi:http://dx.doi.org/10.1016/j.combust ame.2013.08.016. URL http://www.sciencedirect.com/science/article/pii/S0010218013003209 [19] H. Mirgolbabaei, T. Echekki, N. Smaoui, A nonlinear principal com- ponent analysis approach for turbulent combustion composition space, International Journal of Hydrogen Energy 39 (9) (2014) 4622 { 4633. doi:http://dx.doi.org/10.1016/j.ijhydene.2013.12.195. URL http://www.sciencedirect.com/science/article/pii/S036031991303187X [20] J. Sutherland, A. Parente, Combustion modeling using principal compo- nent analysis, Proc. Combust. Inst. 32 (2009) 1563{1570. [21] A. Coussement, O. Gicquel, A. Parente, MG-local-PCA method for reduced order combustion modeling, Proc. Combust. Inst. 34 (2013) 1117 { 1123. [22] B. Isaac, A. Coussement, O. Gicquel, P. Smith, A. Parente, Reduced- order pca models for chemical reacting ows, Combustion and ame 10.1016/j.combust ame.2014.05.011. [23] Y. Yang, S. B. Pope, J. H. Chen, Empirical low-dimensional manifolds in composition space, Combustion and Flame 160 (2013) 1967 { 1980. [24] A. Naja-Yazdi, B. Cuenot, L. Mongeau, Systematic denition of progress variables and intrinsically low-dimensional, amelet generated manifolds for chemistry tabulation, Combustion and Flame 159 (2012) 1197 { 1204. [25] H. Mirgolbabaei, ow-dimensional manifold simulation of turbu- lent reacting ows using linear and nonlinear principal compo- nents analysis, Ph.D. thesis, North Carolina State University, http://www.lib.ncsu.edu/resolver/1840.16/9479 (2014). 28 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript [26] T. Poinsot, D. Veynante, Theoretical and Numerical Combustion, R.T. Edwards, Inc., 2001. [27] H. Mirgolbabaei, T. Echekki, Nonlinear reduction of combustion compo- sition space with kernel principal component analysis, Combustion and Flame (In Press). [28] A. Parente, J. C. Sutherland, Principal component analysis of turbulent combustion data: Data pre-processing and manifold sensitivity, Combus- tion and Flame 160 (2013) 340 { 350. [29] E. R. Hawkes, R. Sankaran, J. C. Sutherland, J. H. Chen, Scalar mixing in direct numerical simulations of temporally evolving plane jet ames with skeletal CO/H2 kinetics, Proceedings of the combustion institute 31 (1) (2007) 1633{1640. [30] N. Punati, J. C. Sutherland, A. R. Kerstein, E. R. Hawkes, J. H. Chen, An evaluation of the one-dimensional turbulence model: Comparison with direct numerical simulations of co/h2 jets with extinction and reignition, Proceedings of the Combustion Institute 33 (1) (2011) 1515{1522. [31] S. G. Davis, A. V. Joshi, H. Wang, F. Egolfopoulos, An optimized kinetic model of h2/co combustion, Proc. Combust. Inst. 30 (2005) 1283 { 1292. [32] A. J. Smola, B. Sch olkopf, A tutorial on support vector regression, Statistics and computing 14 (3) (2004) 199{222. [33] D. Nguyen-Tuong, M. Seeger, J. Peters, Model learning with local gaussian process regression, Advanced Robotics 23 (15) (2009) 2015{2034. [34] C. E. Rasmussen, Gaussian processes for machine learning. [35] W. S. Cleveland, E. Grosse, W. M. Shyu, Local regression models, Statis- tical models in S (1992) 309{376. [36] R Development Core Team, R: A Language and Environment for Statisti- cal Computing, R Foundation for Statistical Computing, Vienna, Austria (2011). URL http://www.R-project.org/ 29 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript [37] J. H. Friedman, Multivariate adaptive regression splines, The annals of statistics (1991) 1{67. [38] H.-T. Pao, A comparison of neural network and multiple regression analysis in modeling capital structure, Expert Systems with Applications 35 (3) (2008) 720{727. [39] MATLAB, version 7.10.0 (R2010a), The MathWorks Inc., Natick, Mas- sachusetts, 2010. [40] I. T. Jollie, Principal Component Analysis, Springer, New York, NY, 1986. [41] R. Bilger, The structure of turbulent nonpremixed ames, in: Symposium (International) on Combustion, Vol. 22, Elsevier, 1989, pp. 475{488. [42] S. D. Cohen, A. C. Hindmarsh, CVODE, a sti/nonsti ODE solver in C, Computers in physics 10 (2) (1996) 138{143. 30 UU IR Author Manuscript UU IR Author Manuscript University of Utah Institutional Repository Author Manuscript"}]},"highlighting":{"713032":{"ocr_t":[]}}}