| Title | Parallel distributed, reciprocal monte carlo radiation in coupled, large eddy combustion simulations |
| Publication Type | dissertation |
| School or College | College of Engineering |
| Department | Chemical Engineering |
| Author | Hunsaker, Isaac L. |
| Date | 2015-12 |
| Description | Radiation is the dominant mode of heat transfer in high temperature combustion environments. Radiative heat transfer affects the gas and particle phases, including all the associated combustion chemistry. The radiative properties are in turn affected by the turbulent flow field. This bi-directional coupling of radiation turbulence interactions poses a major challenge in creating parallel-capable, high-fidelity combustion simulations. In this work, a new model was developed in which reciprocal monte carlo radiation was coupled with a turbulent, large-eddy simulation combustion model. A technique wherein domain patches are stitched together was implemented to allow for scalable parallelism. The combustion model runs in parallel on a decomposed domain. The radiation model runs in parallel on a recomposed domain. The recomposed domain is stored on each processor after information sharing of the decomposed domain is handled via the message passing interface. Verification and validation testing of the new radiation model were favorable. Strong scaling analyses were performed on the Ember cluster and the Titan cluster for the CPU-radiation model and GPU-radiation model, respectively. The model demonstrated strong scaling to over 1,700 and 16,000 processing cores on Ember and Titan, respectively. |
| Type | Text |
| Publisher | University of Utah |
| Subject | combustion; parallel computing; radiation; ray tracing |
| Dissertation Institution | University of Utah |
| Dissertation Name | Doctor of Philosophy |
| Language | eng |
| Rights Management | © Isaac L. Hunsaker |
| Format | application/pdf |
| Format Medium | application/pdf |
| Format Extent | 27,149 bytes |
| Identifier | etd3/id/4020 |
| ARK | ark:/87278/s6pc69qx |
| DOI | https://doi.org/doi:10.26053/0H-5D17-5700 |
| Setname | ir_etd |
| ID | 197570 |
| OCR Text | Show PARALLEL DISTRIBUTED, RECIPROCAL MONTE CARLO RADIATION IN COUPLED, LARGE EDDY COMBUSTION SIMULATIONS by Isaac L. Hunsaker A dissertation submitted to the faculty of The University of Utah in partial ful llment of the requirements for the degree of Doctor of Philosophy Department of Chemical Engineering The University of Utah December 2015 Copyright c Isaac L. Hunsaker 2015 All Rights Reserved Th e Uni v e r s i t y o f Ut a h Gr a dua t e S cho o l STATEMENT OF DISSERTATION APPROVAL The dissertation of Isaac L. Hunsaker has been approved by the following supervisory committee members: Philip J. Smith , Chair 02-02-2015 Date Approved Eric Eddings , Member 0--2015 Date Approved Geoff Silcox , Member 0--2015 Date Approved Jeremy Thornock , Member 02-4-2015 Date Approved Steven Parker , Member Date Approved and by Milind Deo , Chair/Dean of the Department/College/School of Chemical Engineering and by David B. Kieda, Dean of The Graduate School. ABSTRACT Radiation is the dominant mode of heat transfer in high temperature combustion envi- ronments. Radiative heat transfer a ects the gas and particle phases, including all the associated combustion chemistry. The radiative properties are in turn a ected by the turbulent ow eld. This bi-directional coupling of radiation turbulence interactions poses a major challenge in creating parallel-capable, high- delity combustion simulations. In this work, a new model was developed in which reciprocal monte carlo radiation was coupled with a turbulent, large-eddy simulation combustion model. A technique wherein domain patches are stitched together was implemented to allow for scalable parallelism. The combustion model runs in parallel on a decomposed domain. The radiation model runs in parallel on a recomposed domain. The recomposed domain is stored on each processor after information sharing of the decomposed domain is handled via the message-passing interface. Veri cation and validation testing of the new radiation model were favorable. Strong scaling analyses were performed on the Ember cluster and the Titan cluster for the CPU-radiation model and GPU-radiation model, respectively. The model demonstrated strong scaling to over 1,700 and 16,000 processing cores on Ember and Titan, respectively. CONTENTS ABSTRACT : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : iii LIST OF FIGURES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : vii NOMENCLATURE : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xi ACKNOWLEDGMENTS : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiv CHAPTERS 1. INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 General Radiation Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 History of the Monte Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Other Numerical Radiation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Parallel Ray Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Previous RMCRT Work By Paula Sun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.6 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.7 Expected Signi cance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.8 Broad Description of Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.9 Components and Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2. MODEL DESCRIPTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12 2.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Parallel RMCRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Adaptive Focus Mesh Re nement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4 Function Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 GPU Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3. RAY TRACING : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23 3.1 Cubic Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Random vs. Cell-Centered Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3 Non-cubic Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 Adaptive-focus Mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4.1 First Step in a New Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.5 When a Ray Leaves the Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.6 Stopping Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.7 Intrusion Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.7.1 Ray/Intrusion Boundary Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.7.2 Parallel Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.7.3 E ciency Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.8 Re ections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.9 Re ection Veri cation Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.10 Scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.10.1 Veri cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.10.2 Properties of Pulverized Coal Particles . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4. FLUX CALCULATIONS : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 51 4.1 Explanation of Boundary Fluxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2 Generating Rays on a Hemisphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3 Rotating Rays Onto a Hemisphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.4 Shifting the Rays To a Cell Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.5 Flux Ray Tracing and Ray Weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.6 Ray Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.7 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.8 Volumetric Integral Vs Surface Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.8.1 The Volumetric Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.8.2 The Surface Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5. COUPLING THROUGH THE ENTHALPY EQUATION : : : : : : : : : : : 64 6. RADIATION PROPERTIES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 66 6.1 Spectral Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.1.1 Experimental-Property Data Digitization . . . . . . . . . . . . . . . . . . . . . . . . 68 6.2 Hottel Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.3 Turbulent Radiation Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.4 Parallelism and Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.5 Strong Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7. VIRTUAL RADIOMETER MODEL: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 80 7.1 Virtual Radiometer Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.2 Virtual Radiometer Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.3 Computational Modelling of Radiometers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.4 Governing Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.5 User-speci ed View Angles and Orientations . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.5.1 User-speci ed View Angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.5.2 User-speci ed Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.6 Ray Marching in Unstructured Meshes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.7 Veri cation and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.7.1 Veri cation of Ray Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 7.7.2 Veri cation of Participating Media Physics . . . . . . . . . . . . . . . . . . . . . . 87 7.7.3 Veri cation of Ray Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7.7.4 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7.8 E ciency Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7.8.1 Uintah and Sandia Di erences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 8. IFRF CASE STUDY : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 98 8.1 IFRF Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 8.1.1 Fluxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 8.1.2 Timing and Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 8.2 F85y4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 8.2.1 F85y4 Modeling Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 v 8.2.2 F85y4 Modeling Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 8.2.3 F85y4 Analysis and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 9. FUTURE WORK: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 110 9.1 Adaptive Focus Mesh Re nement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 9.2 Runtime Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 10. SUMMARY AND CONCLUSIONS : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 114 APPENDICES A. AMDAHL'S LAW : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 116 B. BENCHMARK CASES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 117 C. NOTE TO THE USER : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 119 D. IFRF F85Y4 OXYFLAM1B UPS INPUT FILE : : : : : : : : : : : : : : : : : : : : : 120 E. DETERMINING WHICH CELLS HAVE BOUNDARY FACES : : : : : : 132 F. RAY DIRECTION REASSIGNMENT FOR FLUX RAYS : : : : : : : : : : : 133 REFERENCES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 134 vi LIST OF FIGURES 3.1 Pseudo code for the ray marching algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2 First step of ray marching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.3 Second step of ray marching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4 Third step of ray marching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.5 Fourth step of ray marching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.6 Fifth step of ray marching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.7 Sixth step of ray marching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.8 First step in a cubic cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.9 First step in a non-cubic cell with y x=2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.10 Multilevel mesh. The processor owns the ne mesh information indicated by the red square, and is passed coarsened versions of the mesh for regions outside of the local extents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.11 The segment length of the rst step in a new level is a function of the location of the ne cell of interest relative to the coarser cell. . . . . . . . . . . . . . . . . . . . . 47 3.12 Specular re ection about the surface normal, N. Note that Ry = Iy: . . . . . . 48 3.13 Re ection ray marching. This gure demonstrates that the values of Tmax and TDelta need not be adjusted after a re ection. The x and y faces are breached in the same order even after a re ection. For example, after the ray has reached the rst nonblack boundary, indicated by point A, the following breach occurs at a y face as shown both at point B and point Br. Subsequently, there is another re ection at point Br followed by an x breach both at points C and Cr. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.14 Exact solution for the radiative- ux divergence compared to RMCRT with 1000 rays and 413 cells for Modest's benchmark case 13.2. . . . . . . . . . . . . . . . . 49 3.15 Exact solution for the radiative- ux divergence compared to RMCRT with 100,000 rays and 413 cells for Modest's benchmark case 13.2. . . . . . . . . . . . . . . 49 3.16 Convergence rate of the L2 error norm of RMCRT on benchmark 13.2 as a function of ray number from 1 to 1,00,000 rays. The blue circles represent the L2 error norms from RMCRT data with a ray threshold of 103 on a grid of 413 cells. The red line is a curve t of these norms. The pink circle represents the L2 error norm of RMCRT with N=1,000,000 rays and a threshold of 104. The red circle represents the L2 error norm of RMCRT with N=1,000,000 and a threshold of 104, but on a grid of 1013 cells. . . . . . . . . . . . . . . . . . . . . . . . . 50 3.17 RMCRT-computed (*) and analytical (-) surface uxes along the bottom plate, where the analytical solution is provided by the case described by Siegel at varying optical thicknesses, and scattering albedo. [1]. . . . . . . . . . . . . . . . . . . 50 4.1 A hexahedron with its six faces labeled. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2 Ray convergence for boundary uxes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3 RMCRT vs. Burns' converged solution at 10M rays. . . . . . . . . . . . . . . . . . . . . 60 4.4 RMCRT vs. Burns' converged solution at 100k rays. . . . . . . . . . . . . . . . . . . . . 60 4.5 RMCRT vs. Burns' converged solution at 1,000 rays. . . . . . . . . . . . . . . . . . . . . 61 4.6 Invariability of the L1 error norm of the uxes as a function of direction for the symmetric Burns and Christon case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.7 Ordering of the 12 face normals of a hexahedron. . . . . . . . . . . . . . . . . . . . . . . . 62 4.8 Grid convergence of the 12- ux method of computing the ux divergence. Note the positive slope indicating growing error with ner mesh resolutions. . 62 4.9 Grid convergence analysis of the six- ux method of benchmark1. Grids of size 33, 93, 273,413; and 813 were analyzed. The L1 error norm decreases with mesh re nement at an approximately rst order rate. . . . . . . . . . . . . . . . . . . . 63 6.1 Digitized data of the imaginary component of the refractive index of ash with 5.47 percent hematite. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.2 Digitized data of the real component of the refractive index of ash with 5.47 percent hematite. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.3 Hottel-Saro m absorption coe cients produced through the RMCRT interface (blue) and DOM interface (green) at timestep 1. . . . . . . . . . . . . . . . . . . . . . . . 75 6.4 Hottel-Saro m absorption coe cients produced through the RMCRT interface (blue) and DOM interface (green) at timestep 10. . . . . . . . . . . . . . . . . . . . . . . 76 6.5 Hottel-Saro m absorption coe cients produced through the RMCRT interface (blue) and DOM interface (green) at timestep 100. . . . . . . . . . . . . . . . . . . . . . 77 6.6 Hottel-Saro m absorption coe cients produced through the RMCRT interface (blue) and DOM interface (green) at timestep 100, with values clipped at 0.5 to allow viewing of the smaller values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.7 Strong scaling of the reciprocal monte carlo radiation model performed on the Titan GPU cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.8 Strong scaling analysis of RMCRT on 8 to 1728 processors using 100 rays per cell on a domain of 1503 on the Updraft CPU cluster. . . . . . . . . . . . . . . . . . . 79 7.1 Simple schematic indicating . The view angle of the radiometer is 2 . . . . . 91 7.2 With current radiation solvers,the radiometer at location \a" would register an under-predicted ux, whereas the radiometer at location \b" would over- predict the incident ux. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.3 The ray e ect is visible in this cutaway that shows the spatially varying radiative ux of a simulation of a propellant re. . . . . . . . . . . . . . . . . . . . . . . 92 viii 7.4 Random numbers that vary uniformly with the polar angle produce incorrectly distributed points clustered near the poles as shown in (a). Random numbers appropriately weighted by the polar angle produce points that are correctly distributed (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 7.5 Distributing random points on a sphere requires the scaling by arccosine of the random number R, where R = 2R2 1, such that R has a range of -1 to 1. 93 7.6 Schematic representation of the view factor of a circular disk as viewed by a point at the bottom of a cylinder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7.7 Absorption coe cient as speci ed in the case described by Burns and Christon. 94 7.8 Results of the participating media physics veri cation test. The virtual ra- diometer model demonstrates excellent agreement with the numerically-exact solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 7.9 L2 error norm as a function of ray number, using the converged solution of 1.4M rays to compute relative error. The solution converged at the expected order of approximately 1 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 7.10 Convergence of the virtual radiometer results relative to the quasi-exact solution 95 7.11 Validation results indicate good agreement between the experimental results (black bars) and model results at varying emissivities (dots). . . . . . . . . . . . . . . 95 7.12 Radiative- ux divergence along a center-line of Burn's benchmark case. Two runs were made using the Sandia virtual radiometer using a segment length of 1 cm (Sandia) and 1 mm (Sandia Fine). Four runs were made using the Arches radiometer using grid resolutions of 1013; 2013; 3013; and 4013: . . . . . 96 7.13 Grid convergence analysis for U of U virtual radiometer. . . . . . . . . . . . . . . . . 97 8.1 Radiative ux divergence from RMCRT (+) and DOM (line) on a z-line through an x-y slice in the center of the domain of an IFRF case. RMCRT used 50 rays per cell, DOM used SN8 in this case. . . . . . . . . . . . . . . . . . . . . . 104 8.2 Radiative ux divergence for several RMCRT cases and a DOM SN8 case (hollow circle) on a z-line through an x-y slice in the center of the domain of an IFRF case. The three RMCRT cases are, respectively, 1 ray per cell with re ections and an emissivity of 0.5 (line), 10 rays per cell with an emissivity of 0.5 without re ections (dot), and 10 rays with black walls (dash). . . . . . . . . 104 8.3 Radiative ux as calculated by RMCRT (lines) vs. DOM (+) for varying positions along the z direction of a center-line through the boiler. . . . . . . . . . 105 8.4 Filtered solution of the radiative ux as calculated by RMCRT(lines) vs. DOM(+) for varying positions along the z direction of a center-line through the boiler. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 8.5 Boiler con guration of the IFRF f85y4 oxy am 1 cases. . . . . . . . . . . . . . . . . . . 106 8.6 Burner geometry of the IFRF f85y4 oxy am 1B case. . . . . . . . . . . . . . . . . . . . . 106 8.7 Reduction in noise from a postprocessing ltering technique wherein the ra- diative ux values are ltered with each of the six adjacent cells. . . . . . . . . . . 107 ix 8.8 The radiative ux of ltered 10,000 ray-per-cell RMCRT uxes (blue) non l- tered 10,000 ray-per-cell RMCRT uxes (red), and experimental values (black). 107 8.9 Radiative ux at various x values for un ltered 10,000 ray-per-cell RMCRT (red) and experiment (black) at z = 15 cm. . . . . . . . . . . . . . . . . . . . . . . . . . . 108 8.10 Radiative uxes from six di erent timesteps shortly prior to the nal timestep of the simulation (colored lines), and experimental values (black circles). . . . . 108 8.11 Temperature eld values for modeled (line) and experimental results (dots) at various x locations for y = Y /2, and z = Z/2, where Y = domain width, and Z = domain length. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 9.1 Fine CFD mesh on which the uid/particle equations are solved (left). Single level mesh at a coarser level for the radiation physics (center). Multilevel, adaptive-focus mesh, a.k.a data onion (right). . . . . . . . . . . . . . . . . . . . . . . . . . . 113 9.2 RMCRT with adaptive focus mesh re nement compared with the Burns and Christon numerical solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 x NOMENCLATURE E = Exact solution R = Uniformly distributed random number L = Length of one side of a cube = Statistical error 2 = Statistical variace = Solid angle Q3 = 3rd quadrant of a 2D Cartesian grid Q4 = 4th quadrant of a 2D Cartesian grid A = Rotation matrix ~b = x,y,z coordinates rotated into the appropriate orientation ~x = x,y,z coordinates prior to rotation into the appropriate orientation = Radiometer rotation angle about the y axis = Radiometer rotation angle about the x axis = Radiometer rotation angle about the z axis v = Radiometer view angle r = Ray polar angle r = Ray azimuthal angle ir = Ray index number l = A point along a ray T = Temperature N = Number of rays traced per cell q = Radiative ux I = Radiative intensity = Absorption coe cient ~n = Radiometer normal vector M = Number of discretized segment lengths of the ray f = Unity minus the fraction of intensity attenuated by all previous wall re ections = Optical thickness = Absorptivity G = Incident radiation function s = Scattering coe cient = Scattering phase function s = Pre-scattering direction vector s0 = Post-scattering direction vector i = Incident b = Black body o = Outgoing w = Wall n = Net r = Ray cv = Control volume sur = Surface TRI = Turbulent Radiation Interactions DivQ = Radiative Flux Divergence RMCRT = Reverse Monte Carlo Ray Tracing DOM = Discrete Ordinates Method LES = Large Eddy Simulation GPU = Graphics Processing Unit CPU = Central Processing Unit SN4 = 24-direction DOM SN8 = 80-direction DOM FSK = Full Spectrum k-Distribution Property Model RHS = Right hand side RNG = Random Number Generator IFRF = International Flame Research Foundation E = Exact solution R = Uniformly distributed random number L = Length of one side of a cube = Statistical error 2 = Statistical variace = Solid angle Q3 = 3rd quadrant of a 2D Cartesian grid Q4 = 4th quadrant of a 2D Cartesian grid A = Rotation matrix ~b = x,y,z coordinates rotated into the appropriate orientation ~x = x,y,z coordinates prior to rotation into the appropriate orientation = Radiometer rotation angle about the y axis = Radiometer rotation angle about the x axis = Radiometer rotation angle about the z axis v = Radiometer view angle r = Ray polar angle r = Ray azimuthal angle ir = Ray index number l = A point along a ray T = Temperature N = Number of rays traced per cell q = Radiative ux I = Radiative intensity = Absorption coe cient ~n = Radiometer normal vector M = Number of discretized segment lengths of the ray f = Unity minus the fraction of intensity attenuated by all previous wall re ections = Optical thickness = Absorptivity G = Incident radiation function s = Scattering coe cient = Scattering phase function s = Pre-scattering direction vector s0 = Post-scattering direction vector i = Incident b = Black body xii o = Outgoing w = Wall n = Net r = Ray cv = Control volume sur = Surface TRI = Turbulent Radiation Interactions DivQ = Radiative Flux Divergence RMCRT = Reverse Monte Carlo Ray Tracing DOM = Discrete Ordinates Method LES = Large Eddy Simulation GPU = Graphics Processing Unit CPU = Central Processing Unit SN4 = 24-direction DOM SN8 = 80-direction DOM FSK = Full Spectrum k-Distribution Property Model RHS = Right hand side RNG = Random Number Generator IFRF = International Flame Research Foundation xiii ACKNOWLEDGMENTS It is only because of the tremendous amount of help I have received from friends and colleagues that I was able to complete this work. I would rst like to thank my advisor, Professor Philip J. Smith, for his support and encouragement. Dr. Smith's energy and optimism created an environment that was condusive to creativity and productivity. His interest and concern for his students' welfare are indicative of his sincerity as a manager and friend. Dr. Jeremy Thornock was a true mentor to me, giving guidance in areas ranging from Unix tips to writing philosophy. During his stay in the United Kingdom, he still made time to collaborate with me, even at odd hours of the night. Dr. Todd Harman performed the work that made scaling of RMCRT possible. Dr. Tony Saad introduced me to the multi-platform program \Inkscape" which greatly enhanced the quality of my graphics. He was always willing to lend an eye to help track down code bugs. I thank Dr. Eddings and Dr. Silcox, two of my committe members, who took time out of their busy schedules to review and give feedback for my proposal and this dissertation. Dr. Charles Reid was active in helping me get up to speed in the world of high perfor- mance computing, and even provided a well-documented template for this dissertation. The work of this dissertation is built upon the work of many others, most notably, the prior graduate student Dr. Xiaoxing (Paula) Sun. Many of the concepts of ray tracing I learned from her thesis and from communication via telephone and email. Dr. Steven Parker's wisdom in ray tracing e ciency has been invaluable. His insight into parallel ray tracing schemes aided in solving the dilemmas inherent to extreme scaling. I owe thanks to Dr. Mathieu Francoeur who taught an advanced radiation course in a manner that helped elucidate many complex topics. Lyubima Simeonova performed the work that allowed for the discretization of spectral- radiation properties. I would like to thank Professor James Sutherland for his help in ensuring the e ciency of my model as well as his help in implementing the models developed by Ms. Simeonova. Dr. Sean Smith was always willing to lend an ear when I needed to sound out my thoughts. He was also instrumental in the formulation of statistical analyses for various tests. This work would not have been possible without the support of the Center for High Performance Computing at the University of Utah. This research was made possible by generous grants from the National Nuclear Security Administration under the Advanced Simulation and Computing program through DOE Research Grant # DE-NA0000740 and DE-NT0005015. Finally, I wish to thank my wife Emily, and our two sons, Jayden, and Aaron for their love and support. xv CHAPTER 1 INTRODUCTION All models are wrong. Some are useful. -Philip J. Smith. 2 1.1 General Radiation Transport Radiation is critical to heat transfer in high temperature combustion environments [2]. Radiative heat transfer in turbulent ames a ects the gas phase and all associated combustion chemistry. The turbulence-radiation interactions (TRI) have been shown to be of great importance in turbulent ames [3, 4, 5, 6, 7]. Modeling TRI is di cult due to the nonlinear coupling between temperature, species concentrations and radiative intensities [8, 5]. Further, coupling parallel simulations of combustion and radiation poses several numerical challenges. The uid mechanics of combustion are local phenomena, making them amenable to domain decomposition. Conversely, radiation is a long-distance, and potentially all-to-all phenomenon, creating di culties for domain decomposition. Further, accurate calculation of radiative transfer requires spatially resolved information regarding the temperature and species composition elds. Traditional modeling of turbulent systems has included Reynolds-averaged Navier Stokes (RANS) simulations. The RANS model provides, at a relatively low computational cost, temporally averaged values of the gas temperature and species elds. However, for highly nonlinear physics such as radiation, spatial averaging in this manner may introduce large errors [9]. Alternatively, direct numerical simulation (DNS) fully resolves the power spectrum of eddies, giving access to the full spatial distribution of the pertinent elds. Wu et al. [10] and Deshmukh [11] have coupled a monte carlo ray tracing method to solve the radiative transfer equation in a turbulent reacting ow modeled by DNS. Unfortunately, due to its high computational demand, DNS remains impractical for use in large-scale combustion simulations. In contrast, large eddy simulations (LES) resolve the largest uid motions, down to the Nyquist limit for a given turbulent eld and mesh resolution. Therefore, LES gives a better description of the uid mechanics than RANS, and does so without the computational cost of DNS [12]. The various levels of accuracy in which thermal radiation has been modeled in com- bustion simulations have been reviewed by Snegirev [13] and Sacadura [14]. The radiation models cited include the optically-thin approximation [15], the discrete ordinates method [16, 17] the discrete transfer method [18], and the nite volume method [19]. The optically thin model neglects the participation of media (absorption, emission, and scattering), and has been shown to introduce error even in small ames [20]. The remaining methods model radiative emission as energy emanating along a set of prede ned directions. Such angular discretization su ers from the ray e ect [21]. Conversely, monte carlo techniques that select randomly-distributed rays at each time step have low sensitivity to angular discretization and are applicable regardless of media optical thickness [13]. In his earlier 3 work, Snegirev presents a RANS model of buoyant turbulant di usion ames coupled with statistical modeling of thermal radiation transfer. Although Snegirev's earlier model used a robust formulation of thermal radiation via the monte carlo method, his turbulence model su ered from the lack of resolution of the sharply varying uctuations of temperature and species concentrations that are lost in RANS approximations. More recently, Snegirev coupled monte carlo radiation with large eddy simulations [22, 23]. These simulations operated on modest meshes of approximately 498,000 control volumes. Other examples of coupled LES monte carlo radiation models are rare, but include the work of Zhang et al., in which a larger mesh of 4.7 million cells were used [24]. In this emerging eld remain several unresolved issues. One such issue is how to deal with increasing mesh sizes that are run on increasingly parallelized highperformance computing systems. Modern super computers are composed of hundreds of thousands of computing cores, and are used to run simulations with meshes composed of billions of computational cells [25, 26, 27]. Strong scaling in massively-parallel computing is di cult to obtain due to load imbalancing and interprocessor communication demands. The strong scalability limit of a code is reached when an increase in the number of parallel processors used on a xed problem size does not result in a decrease in computational wall time [28]. Numerous examples of parallelized monte carlo radiation models were investigated by the author, most of which cease to scale beyond 100 processors [9, 24, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43]. An example of a coupled combustion and monte carlo radiation model that has a scalability limit above 200 processors was not found in the literature. In this work, a new numerical technique has been developed to perform large eddy simulations of largescale combustion ows coupled with a three-dimensional reciprocal monte carlo ray tracing radiation model. This model has been optimized for use on high-performance computing systems, runs on meshes composed of 8 million+ cells, and achieves nearly-ideal strong scaling to over 16,000 processors. As mentioned above, uid mechanics and most other phenomena in combustion physics are localized phenomena and are readily solved on domain-decomposed meshes. In this work, to represent the long-range e ects of radiation, the computational domain is recom- posed at the time of each radiation solve. This is accomplished over a message passage interface, through which each processor shares the temperature and radiative-properties elds (absorption coe cients, scattering coe cients, and boundary information) with all other processors. This reconstructed domain combined with the mutually exclusive nature of reciprocal monte carlo rays is amenable to massive parallelism. Radiative properties 4 are calculated via the Hottel and Saro m method [44]. For e ciency, these calculations are precomputed and tabulated in narrow increments of temperature and species mixture fraction values. 1.2 History of the Monte Carlo Method The monte carlo method was developed by Enrico Fermi, John von Neumann, and Nicholas Metropolis for the Manhattan Project during World War II [45]. The method modeled the behavior of neutrons and involved following the histories of these neutrons during ssion. In the early 1960s, Fleck, and later Howell and Perlmutter, applied this method to thermal radiation heat transfer [46, 47, 48, 49, 50]. The monte carlo method was later adopted and greatly enhanced by the computer graphics community [51, 52, 53]. Since then, the monte carlo technique has been widely applied to practical problems with participating media [46]. Additionally, this method has been used to produce semiexact solutions to problems that have no known analytical solution [54]. The monte carlo method is not without its drawbacks. Because it is a statistical technique, the variance of the error is inversely proportional to the number of rays used in the sampling. The standard deviation is therefore a function of the square root of the number of samples, resulting in slow convergence rates [46]. The monte carlo method is also computationally expensive when run in serial on a single processor. However, because of the uniqueness of the solutions to each of the rays, reverse monte carlo ray tracing (RMCRT) is amenable to massive parallelism. In certain cases shown in later sections, this attribute outweighs the low serial e ciency, making RMCRT an attractive option in such scenarios. 1.3 Other Numerical Radiation Methods There is a handful of other numerical models that perform approximations of the RTE. These include the spherical harmonics method, the discrete ordinates method (DOM), the zonal method, the discrete transfer method (DTM), and the nite volume method. Each of these models is given a cursory overview in the proceeding paragraphs. The spherical harmonics method approximates the RTE with a set of simultaneous partial di erential equations. This approach was developed by J.H. Jeans in the early 20th century as a way to model stellar radiative-heat transfer [55]. The set of partial di erential equations that represent the RTE is signi cantly simpler than the RTE itself, allowing radiation calculations to be carried out by hand. It was therefore quite popular prior to the advancements of electronic computing. The major downfall of this method is that an 5 increase in the accuracy of the solution comes at the price of higher order partial di erential equations, and subsequently much longer computation times [56]. Similar to the spherical harmonics method, the discrete ordinates method (DOM) trans- forms the RTE into a set of simultaneous partial di erential equations. This is accomplished by discretizing the angular domain into a well-de ned set of ordinate directions, and in- tegrating along the path lengths. It was rst proposed by Chandrasekhar for work in stellar radiation [57], and was later adopted by the neutron transport community [58]. Fiveland and Smith then optimized the DOM for use in general radiative heat transfer applications [59, 60]. Since its inception, this method has been used in many applications including furnaces, diesel engines, and composite materials [61, 62, 63]. Although the discrete ordinates method performs well in serial, an obvious path to GPU-based parallelism is not available [17, ?]. It also su ers from an angular discretization artifact known as the ray e ect, which can become particularly pronounced if surface uxes to small objects are of interest [21]. Further details of the discrete ordinates method can be found in the books by Kourgano , Davison, and Murray [64, 65, 66]. Another method exists that, unlike the previous two methods, discretizes the domain spatially, rather than angularly. This method is known as the Zonal Method, and was developed by Hottel and Cohen in 1958 [67]. Each subvolume, or zone, was treated as isothermal, whereby radiative exchange rates between the zones could be computed. An energy balance throughout the domain is then performed to solve for the unknown heat uxes. Initially the method could handle only nonscattering, gray gases with constant absorption coe cients, but in 1968, Hottel and Saro m extended the method's capabilities to include isotropically scattering media with nonconstant nongray absorption coe cients [44]. The discrete transfer method (DTM) was developed by Lockwood and Shaw in 1981 as an attempt to address some of the shortcomings of previous numerical radiation techniques. It was developed speci cally for general combustor prediction procedures. The DTM is somewhat of a chimera of several methods and includes features of the zonal, monte carlo, and ux model solution methods. In the words of Lockwood, this method is based on the solving of representatively directed beams of radiation within the enclosure between the known wall boundary conditions and on the subsequent computing of the radiation sources which arise within the nite di erence control volumes of the ow procedure due to the passage of the beams. It is fast, exact applicable to complex geometries, and it retains in evidence the physics of the problem by avoiding complex mathematics [18]. 6 Unfortunately, in the process of simplifying the mathematics, this model has been shown to represent poorly the e ects of anisotropic scattering, and similar to the DOM, still falls victim to the ray e ect [68, 69, 61, 70, 71]. To address the prevalence of the ray e ect in the DTM, Li later modi ed the model by further discretizing each angular direction into 9 rays with discrete quadrature. This mitigated the ray e ect without increasing the number of simultaneous partial di erential equations to be solved [72]. More recently, a method has been developed that is catered to unstructured meshes and is designed to accommodate simultaneously for heat conduction, convection and radiation. This method, known as the control volume nite element method, was developed by Rousse in 1999. True to its name, this method creates unique, nonoverlapping control volumes around each node in the domain. It then performs operations similar to the discrete ordinates method by discretizing the angular domain and solving a set of coupled partial di erential equations related to the RTE, which has been modi ed to account for convection and conduction. This method has similar advantages and disadvantages to the DOM, with the additional advantage of being more amenable to unstructured meshes [69]. 1.4 Parallel Ray Tracing Algorithmic parallelism involves dividing tasks among multiple processors simultane- ously to solve a given problem [73]. In theory, parallelism can lead to a wall-time speedup that is proportional to the number of processors used in parallel. Ideal speedup occurs when the time spent passing information between processors is negligible compared to the work done by each processor, and when no computers sit idle while others complete their tasks. The former constraint is met by e cient code writing that ensures that all or most of the information a given processor needs to complete its computations is available to that processor. The latter constraint is met via proper load balancing that distributes the work load equally between processors. The rst attempts to parallelize monte carlo methods did so on single-instruction multiple- data stream (SIMD) machines. On this architecture, vectors or groups of rays were dis- tributed to the processors. This, however, led to poor scalability as it necessitated the termination of all rays before generating a subsequent group. More recent algorithms generate new rays at the onset of termination of a ray to avoid creating idle time amid processors [74]. Load balancing may be accomplished in at least two general ways{dynamic load balanc- ing, and static load balancing. Static load balancing schemes distribute the load only once, at the onset of computation. However, because the computation times of di erent regions 7 of a domain are problem-dependent and rarely uniform, static load balancing often creates idle time amid processors. Dynamic load balancing begins with an initial load distribution that can then be modi ed if and when computers complete their original tasks. Heirich and Arvo have noted that when total computational time is of importance, static load balancing is insu cient for parallel ray tracing on massive high-performance computing systems [75]. Strong scaling in massively-parallel computing is di cult to obtain due to load im- balancing and interprocessor communication demands. The strong scalability limit of a code is reached when an increase in the number of parallel processors used on a xed problem size does not result in a decrease in computational wall time [28]. Some methods to parallelize ray tracing for radiation applications do so by passing between processors rays that have breeched the local grid extents. Wise and Abel of Princeton and Stanford Universities, respectively, have expended considerable e orts on their parallelization strat- egy of their ray casting scheme for the coupled hydryodynamics radiation code, ENZO. Unfortunately, strong scaling analysis of this algorithm showed no improvement of com- putational time for parallelism at 70 or more processing cores [29, 30]. This is perhaps due to the large amount of communication that resulted from the passing of rays between processors. Kuiper et al: developed a similar parallel ray tracing scheme for computing radiation transport in stellar formations, but to date, have not demonstrated strong scaling beyond 64 processors [31]. In 2009, Gentile successfully scaled ray tracing for radiation calculations to 128 processors. Any further increase in processors resulted in no further decrease in computational time [43]. Numerous examples of parallelized monte carlo ra- diation models were investigated, most of which ceased to scale beyond 100 processors [29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 9, 40, 24, 41, 42, 43]. An example of a coupled combustion and monte carlo radiation model that has a scalability limit above 200 processors was not found in the literature. As part of this dissertation, a parallel-capable radiative- ux solver that scales to 16,000 processors has been created. Such scaling is attained by avoiding the passing of rays by stitching together a decomposed domain into a global domain for each processor, and uti- lizing reverse monte carlo rays that traverse freely throughout the domain until extinction. The computer graphics community has taken advantage of graphics process Units (GPUs) to run parallel ray tracing. The shared memory of the GPU and breakdown of each GPU into hundreds of cores allows for the simultaneous tracing of hundreds of rays per GPU. GPUs are allowing for real-time volume rendering in the graphics community [76]. In 2008, Despres showed that a simpli ed monte carlo ray tracing algorithm performed 6 times faster 8 on an nVidia 7600 GS GPU than it did on a Xeon 2.4 GHz CPU. This speedup is further increased for larger numbers of computational cells, indicating that at ner meshes, the GPU will dominate in ray tracing e ciency [77]. Todd Harman and Alan Humphry of the University of Utah have translated the RMCRT model described in this dissertation into the CUDA language, allowing RMCRT to run on the GPU nodes of the supercomputer, Titan. The CUDA version of RMCRT attained ideal strong scaling to 16,000 cores. 1.5 Previous RMCRT Work By Paula Sun A former student of our laboratory, Paula Sun, made signi cant contributions to a reverse monte carlo ray tracing algorithm. She performed an extensive literature review on the topic, consulted a ray tracing expert, and designed a standalone radiation solver [45]. Her dissertation outlined the governing equations for a RMCRT scheme for an absorbing, emitting, scattering medium with gray, di use/specular boundaries. She also identi ed sev- eral benchmark cases that have proven useful in the veri cation of the algorithms described in this dissertation. Although her work was not compatible with the Uintah framework and was not designed for parallel performance, it paved the way for this work that has such capabilities. 1.6 Research Objectives Accurate radiative-heat transfer solving methods that handle complex physics are in- herently computationally expensive. The legacy radiation solver, the discrete ordinates method, which is used within the Arches component, is not amenable to GPU applications. It also su ers from the ray e ect inherent to xed-angular discretization techniques. To address the above issues, the following objective has been proposed and met: Develop a reverse monte carlo ray tracing scheme that computes the radiative- ux divergence for interior cells, and the net radiative ux for boundary cells of a computational domain. To accurately represent reality, the model incorporates the following physics Nonhomogeneous, absorbing, emitting media Homogeneous, isotropic, scattering media Black or gray wall absorption and emission Specular wall re ections for arbitrary re ectivities Complex domains which may include intrusion features 9 Fluxes with arbitrarily sized view angles, orientations, and locations This nal item comprises the \virtual radiometer" feature and allows the user to de ne the parameters of radiometers such as locations, orientations, and view angles. This will allow the user to perform validation and uncertainty quanti cation with ux measurements from experimental radiometers. From the onset of this research project, parallel capability of the radiation solver has been an area of focus. Several parallelism techniques were considered. These include 1. Parallelize by patch domain decomposition with global information, eliminating the need to hand-o rays 2. Parallelize by patch domain decomposition with local information only, and hand-o rays between the patches 3. Parallelize with a hybrid of (1) and (2). Each processor stores the ne information for a di erent region of the grid which becomes the focal region for that processor. Each processor is then passed a coarsened version of the rest of the domain, eliminating the need to hand o rays The details of these techniques as well as their respective advantages and disadvantages are given in Section 2.2. 1.7 Expected Signi cance High performance computing systems are being equipped with an ever-increasing number of graphics processing units (GPUs) [78]. Examples include the Keeneland initial delivery system (KIDS) and the conversion of the DOE Jaguar system to Titan [79, 25]. Radiation calculations, which represent a small fraction of the overall physics involved in a complex re simulation, can consume 20 to 80% of the total simulation time when run with existing radiation solvers such as the discrete ordinates method. For a given level of accuracy, this research suggests that RMCRT run on CPU clusters is faster than the DOM (see section 9.2). A GPU-compatible version of RMCRT has demonstrated an additional 4 to 5X improvement over the CPU version [?]. One of the key developments of this research has been the capability to stitch together a patch-decomposed domain into one that is uni ed. This has eliminated the need to hand o RMCRT rays between processors during run time, reducing interprocessor communication allowing for the parallel scalability demonstrated above. 10 1.8 Broad Description of Modeling An e cient reverse monte carlo ray tracing algorithm includes a handful of key compo- nents. Among them are a fast, reliable random number generator, an e cient ray marching algorithm, and generality to handle phenomena including re ecting rays, non-cubic cells, and scattering. The model ultimately solves for radiative uxes and radiative- ux diver- gences, so appropriate equations for those physics are also used. The developed model is one that gives the user the ability to dial-in the desired accuracy/speed through variable numbers of rays and/or mesh resolutions. Another primary objective has been to develop a radiation model that can scale in massive parallelism. The chosen route to parallelism has been to reconstruct a domain-decomposed mesh to allow for the uninterrupted tracing of rays to extinction. In summary, the modeling undertaken for this research has been two-fold: First, pro- duce in the University of Utah's ARCHES code, a high- delity RMCRT algorithm that incorporates the major physics of radiation, and second, create a scheme that allows for e cient massive parallelism of this algorithm. 1.9 Components and Architecture The Uintah framework was originally designed by the Center for the Simulation of Accidental Fires and Explosions (C-SAFE). This software suite was funded by the Depart- ment of Energys Accelerated Strategic Computing Initiatives (ASCI) Academic Strategic Alliance Program (ASAP). Uintah provides a software system in which complex multiscale, multiphysics chemistry and engineering simulations can be performed. To provide a means under which a variety of problems can be simulated, Uintah makes use of a component sys- tem under which large pieces of software (components) can be implemented independently [80, 81]. To the maximum extent possible, RMCRT was developed generically to allow the model to be used by any Uintah component. Because members of our research group have the greatest experience in the Arches component, validation and veri cation were performed primarily therein. The Arches component solves the conservative, nite volume, compressible, low-mach formulation of the Navier-Stokes equation [82]. It was designed to solve the mass, mo- mentum, mixture fraction, and thermal energy governing equations inherent to coupled turbulent reacting ows. Arches is a large eddy simulation component, and as such resolves the uid motion down to the Nyquist limit of the power spectrum. It has relied heavily on the legacy discrete ordinates method (DOM) for the radiative source term solve [83]. 11 Performance pro ling indicates that the DOM solve represents the most computationally intensive portion of an Arches combustion simulation. Although DOM can be made to scale [84], it is unknown whether this radiation solver is amenable to GPU computing and embarrassing parallelism. Conversely, RMCRT lends itself to scalable parallelism because the intensities of the rays are mutually exclusive. Therefore, multiple rays can be traced simultaneously from any location in the computational domain [85]. CHAPTER 2 MODEL DESCRIPTION 13 The robustness of monte carlo ray tracing to predict radiative uxes has previously been established by Snegirev and Modest among others [13, 86]. In a massively parallelized framework, where the computational domain is heavily decomposed, traditional forward monte carlo methods (FM) su er due to the large number of traced rays that never reach the subdomain of interest handled by a particular processor. Therefore, an emission-based reciprocity method (ERM) similar to that developed by Tesse etal. [87, 88] has been implemented. In this model, optical paths (i:e:, rays) propagate away from cells whose radiative-source terms are currently being solved, and the emission from the cells along the paths are attenuated in a reciprocal manner back to the origin cells. In this manner, rays are generated only from cells where results are expected [89]. The governing equation for reciprocal monte carlo ray tracing in nonhomogeneous, participating media was developed by Walters and Buckius [90]. Speci cally, Ii;k = Zlk 0 Ib;cv (l0)exp[ Zlk l0 (l00)dl00]dl0 + Io;sur(Tw)exp[ Z lk lw (l0)dl0]; (2.1) where Ii;k represents the incident intensity at location k, represents the absorption coe cient, and l0 represents the locations of the segment lengths along a ray. Subscript w indicates a wall property, subscript cv indicates a control volume value, and sur indicates a surface value. In a discretized domain, piecewise homogeneity is assumed and Eqn. 2.1 is posed in the following form, Ii;k = MX m=1 Ib;v(e R lk lm (l0)dl0 e R lk lm1 (l0)dl0 ) + Io;s(Tw)e R lk lw (l0)dl0 ; (2.2) where M represents the total number of discretized segment lengths of the ray. Thus, at a given location a distance l0 away from the starting point of the ray, R lk lm (l0)dl represents the optical thickness of the path from lk to l0 and R lk lm1 (l0)dl is the optical thickness of the path from lk to the previous l0. The intensities from each of the rays of a cell are then weighted according to the solid angle that each ray subtends [45]. Assuming uniform distribution of the rays, each ray subtends N steridians, where is 4 Sr. for ow cells, 2 for boundary cells, and N is the number of rays per cell used in the simulation. The radiative ux is then calculated from the intensities of the rays, weighted by the discretized solid angle, qi = N XN r=1 Ii(ir)cos( (ir)); (2.3) 14 where Ii(r) and (r) represent for a particular ray, the incoming intensity and angle from the cell boundary normal, respectively. The radiative ux divergence of ow cells is calculated as r q = (4 Ib Z 4 Iind ); (2.4) where R 4 Iind is represented by XN r=1 Ir 4 N : The origin locations of the rays are distributed randomly throughout the cell. In this model, the Mersenne Twister random number generator is used to select the origin locations and ray orientations [91]. In Cartesian meshes, randomly distributed ray location generation is trivial, and is accomplished by scaling three random numbers with the length, width, and height of the cell, respectively. Randomly-distributed ray orientation requires more treatment. Rays propagating from boundary surfaces are distributed over a hemisphere as follows, = 2 R1; = arccos(R2); ^x = sin( ) cos( ); ^y = sin( ) sin( ); ^z = cos( ); where, R1 and R2 are random numbers that vary between zero and one, and are the azimuthal and polar angles, respectively, and ^x, ^y, and ^z are the resulting components of the direction vector in Cartesian coordinates. The above formulation generates rays that are randomly distributed over a hemisphere with a normal vector in the positive z direction. The ray marching model adjusts the ray directions into the proper orientation based on the surface normal of the boundary cell at hand. This is accomplished by changing the order and sign of the three direction components. For ow cells, no re-orientation of the direction vector is necessary, as the rays are randomly distributed over the full 4 Sr. Direction assignment is as follows, 15 ^z = 2R1 1 r = p 1 z2 = 2 R2 ^x = r cos( ) ^y = r sin( ) Ray marching proceeds in a manner similar to that described by Amanatides and Woo [92]. The location and orientation of a ray are used to calculate the distances to each of the three potential exit faces of the cell in which the ray currently resides. The shortest of these three distances is used in determining through which face the ray will pass. This information is then used to calculate the next cell in which the ray will reside. Re ections are allowed to occur on nonblack boundary faces. The temperature and emissivity of the boundaries are referenced, and the intensity at the ray-boundary intersection is computed and attenuated to the target location. For nonblack surfaces, the ray re ects o the surface, and the subsequently referenced intensities are attenuated both by the total optical thickness and by the absorption of the boundary. Ray marching continues until the optical thickness of a ray exceeds a predetermined threshold value. In general, the threshold is met when fe < 0:01, where is the current optical thickness, and f is unity multiplied by one minus the absorptivity of each intersected boundary, (1 b). In other words when less than 1% of the intensity from a location in the domain will reach the target cell, ray tracing of the current ray ceases. 2.1 Scope The scope of this dissertation includes the demonstration of a parallel, reverse monte carlo ray tracing (RMCRT) model that incorporates the following physics. Nonhomogeneous, absorbing, emitting media Homogeneous, isotropic, scattering media Black or gray wall absorption and emission 16 Specular wall re ections for arbitrary re ectivities Complex domains which may include intrusion features Fluxes with arbitrarily sized view angles, orientations, and locations Each of these items has been discretized and modeled in the RMCRT algorithm. Some detail on the implementation of each of the above items is given below. To accommodate for item (1), nonhomogeneous, absorbing, emitting media, each cell in the domain traces N number of rays from each cell within the patch, and follows those rays throughout the domain until extinction. These rays pick up and attenuate intensity according to the radiative transfer equation (RTE) for an absorbing, emitting and scattering medium, according to the following expression, @I(s; ^ s) @s = Ib(s) ( + s)Ib + s 4 Z 4 I(^si) (^si; s)d i: (2.5) The intensity of the cells through which the rays pass are accumulated and attenuated along their respective paths back to the origin. The intensity contributions from all the rays for a given cell are summed, and scaled by N , where represents the solid angle, and is generally speci ed as 4 Sr:, a full sphere. This scaled value gives, for a speci c cell, the contribution from the intensities from all cells in the entire domain, and is frequently denoted as the incident radiation function, G. This value is then be used to yield the radiative- ux divergence as follows, r q = (4 Ib G): (2.6) These equations are solved and discretized for use in a Cartesian mesh. When item (2), homogeneous, scattering media, is introduced, the ray marching algo- rithm allows the ray direction to change at any cell boundary within the domain. This involves calculating scattering lengths to determine when a ray will scatter, as well as determining the new ray direction based on the scattering phase function. The distance between scattering events is determined from the cumulative probability function, l s = 1 s ln 1 R s ; (2.7) where s is the scattering coe cient, and R s is a random number with a range of zero to one. 17 The azimuthal and polar angles of the new direction are be determined from R = R 0 0 R 0 (s s0)sin 0d d 0 R 2 0 R pi 0 (s s0)sin 0d d 0 ; (2.8) and, R = R 0 (s s0)sin 0d R 0 (s s0)sin 0d ; (2.9) where is the scattering phase function and is a function of the particles in the media as well as the original direction vector, s, and the new direction s0. When isotropic scattering is assumed, as in this model, the following formulation is used to compute the components of the post-scattering direction vector. z = 2R 1 r = p (1 z2) = 2 R y = rsin( ) x = rcos( ) For item (3), wall absorption and emission, the intensity of a ray is augmented by the emission and absorption of the boundaries that the ray strikes. When a ray strikes a wall, the intensity that is accumulated at the ray origin is increased by the following amount, Io;sur(Tw)exp[ R lk lw (l0)dl0]; where Io;sur is the intensity of the wall at the surface location struck by the ray, and the exponential function represents the attenuation of that intensity on its path back to the ray origin. To incorporate these physics into the RMCRT model, Eqn. 2.2 is modi ed to include this additional term, Ii;k = MX m=1 0 @Ib;cv(Tm)(exp[ Zlk lm2 (l0)dl0] exp[ Zlk lm1 (l0)dl0]) 1 A+Io;sur(Tw)exp[ Zlk lw (l0)dl0]: (2.10) For item (4), specular wall re ections for arbitrary re ectivities, several modi cations were introduced. First, a subroutine was created that is called at the moment a ray enters a non ow cell. This new cell is referenced for item (3). Because the location of the ray is now outside of the ow domain, the ray is backed out into the ow region. The next step is to accurately determine the new direction of the ray. Assuming the domain is one in which the faces of all cells line up with the Cartesian directions, as is the case in Arches, then this step is simpli ed signi cantly. The calculations essentially reduce to ipping the sign of the component of the direction vector that corresponds to the boundary face that was struck. See section (3.8) for a more detailed derivation. 18 For item (5), complex domains which may include internal features, an additional test along each step of the ray marching algorithm is implemented. The test is simple and is essentially an if/else statement conditional on the boundary information of the current cell. If the boundary information corresponds to a ow cell, ray marching continues without interruption. Otherwise, the subroutines for items (3) and (4) are called. Item (6) has been labeled as the virtual radiometer model. Its implementation consists of allowing the user to specify in an input le a radiometer location, orientation, and view angle. These parameters would correspond to a true radiometer that was placed in a re during a combustion experiment. This has allowed validation e orts to be performed between experiments and simulations. In addition to the six items mentioned above, the developed RMCRT model o ers versatility in accuracy and speed. The user may select any positive integer number of rays to produce the desired accuracy/speed ratio. Further, the number of rays used for boundary uxes, ux divergences, and virtual radiometer uxes remain independent. This gives the user the ability to separate the ux solution from the r q solution. For example, if the user is most interested in the uxes at a handful of locations or a series of locations that lie in a single plain, as often occurs in validation cases, the user can run with relatively few rays to produce basic radiative e ects in the ame, while using extremely ne angular discretization at the locations of interest. 2.2 Parallel RMCRT RMCRT lends itself to massive parallelism because the intensity of each ray is indepen- dent of those from all other rays. Multiple rays are traced simultaneously at a given time step. In theory, one could simultaneously trace as many rays as one has processing cores. For this method to scale, however, inter-processor communication should be low relative to the work of ray tracing. This is challenging, considering that radiation is a globally-coupled phenomenon. Several parallelism techniques were considered. These include 1. Parallelize by patch domain decomposition with global information, eliminating the need to hand-o rays 2. Parallelize by patch domain decomposition with local information only, and hand-o rays between the patches 3. Parallelize with a hybrid of (1) and (2). Each processor stores the ne information for a di erent region of the grid which becomes the focal region for that processor. Each 19 processor is then passed a coarsened version of the rest of the domain, eliminating the need to hand-o rays Item (1) has the advantage of avoiding the message passing interface (MPI) for each ray. The disadvantage, however, is that this scheme would require information about the entire domain to be stored on each processor. Speci cally, the absorption coe cient, temperature, boundary information and scattering coe cients elds, each of which is a double precision variable, of O(n3) cells, where n is the number of cells in a single direction, would be stored on each processor. This is a strict requirement, and would limit the size of the domain to approximately 3503 for the current computers. Another advantage of item (1) is that it uses existing software in the Uintah framework that allows the domain to be decomposed by patches, or subsections of the domain. In this approach, a processor computes the ux divergence of each cell within its patch by tracing N rays from each cell and allowing them to march through the entire domain. Item (2) known as the data onion, or adaptive-focus mesh, is not limited by memory as item (1) is, as it does not store any global information. However, it is su ers from another factor{interprocessor communication. Because radiation is a global phenomenon, physically-accurate rays would travel through the entire domain. Yet for item (2), as soon as a ray leaves a local patch, the information necessary to compute an incoming intensity is not available to the processor that owns the origin cell of the ray. Therefore, the processor would either request the information of the adjacent patch, (and in optically thin domains, the adjacent to the adjacent, and so on), or it would hand-o the ray to an adjacent processor. This would amount to the handing o of millions of rays for a single time-step of a typical simulation, and would burden the MPI, likely degrading the parallel performance. Item (3) is somewhat of a hybrid of items (1) and (2) in that the domain is decomposed into patches, the ne information existing only locally, yet the global information still existing on the processor, but in a coarsened state. The simplest case would be a ne focal level that has the same resolution as the rest of the domain, i:e:, the whole domain is on the same re nement level, leading to the case described in situation (1). A slightly more advanced case is one where all but the local patch are coarsened to a single, coarse state, leading to a total of two levels. More advanced cases can be imagined where the patches adjacent to the focal patch are coarsened slightly, the ones adjacent to those are coarsened moderately, and the most distal patches are coarsened heavily. Such a multilevel approach allows the information most pertinent to the incoming intensity to be left relatively un-coarsened, yet information that is distal to be less accurate, and therefore less memory 20 intensive. Justi cation for coarsening the distal information is two fold. First, the cells that are not proximal to a given set of focal cells is separated by an optical thickness that will attenuate the intensity between them, thereby limiting the e ect on the focal cells. Second, the distal cells subtend a smaller solid angle than the proximal cells, again limiting the e ect on the focal cells. This second e ect is demonstrated numerically in that a distal cell will have a far smaller probability of being intersected by a given ray than would a proximal cell of comparable size. Using item (3), the most pertinent information of the domain is preserved, intra-timestep message passing is avoided, and the distal regions are coarsened to an extent that bal- ances the desired accuracy with the memory constraints of the computer at hand. The radiation solver has been designed to run with a coupled computational uid dynamics (CFD) component such as Arches or Wasatch, and as such, deal with information that is domain decomposed. To stitch together and coarsen the nonfocal region of the domain, the domain information that is scattered between the processors is requested by each processor. Therefore, at the beginning of each time-step, a considerable amount of information passes through the message passing interface. Shy of reverting to the use of (2), which has its own message passing requirements, this \start-up" message passing is unavoidable. Fortunately, the amount of start-up message passing decreases when multiple levels are used, as coarser and coarser versions of the domain are being stitched together. To optimize run-time e ciency, items (1) and (3) were selected for development. At present, item (1) has been successfully developed and implemented and favorable results have been obtained. Item (3) has also been developed, but results were less favorable. Both methods involve a mesh reconstruction technique that allows ray generation and propagation to occur on a each processor independently, negating the passing of rays, and minimizing inter-processor communication. At each radiation solve, the decomposed domain used for parallelism of the combustion model is recomposed and the radiation-speci c eld variables from each processor are shared with all other processors. Information sharing is accomplished through a message-passing interface. 2.3 Adaptive Focus Mesh Re nement The patch recomposition technique mentioned in item (1) of section 2.2 is viable only so long as each processor has su cient memory to store the temperature, absorption coe cient, boundary information, and scattering coe cient elds for the entire domain. However, for a typical processor commanding 4GB of RAM, segmentation faults begin appearing for 21 domains larger than approximately 3503 cells. A potential solution to this problem is to discretize the domain such that each processor is handed a subset of the domain, called a patch. For regions outside the patch, the processor is handed a coarsened version of the rest of the domain. This takes advantage of the existing framework common to parallelized LES codes such as the Arches code developed previously by the Institute for Clean and Secure Energy [93]. To create a mesh with multiple re nement levels, framework and data management adjustments have been made to the ARCHES component. The ray tracing algorithm has also been modi ed to run on this adaptive-focus mesh (a.k.a. Data Onion). Modi cations include adjustment of the step size once a re nement level boundary has been reached, as well as an algorithm to determine the new ray location upon entering a new level. At present, the Data Onion approach is producing unfavorable timing and accuracy results. Further investigation may reveal a aw in the programming or perhaps the method itself. Details of the implementation are given in section (9.1). Although the adaptive-focus mesh addresses the issue of global storage of radiation eld values, thus avoiding the passing of rays between processors, it does not completely avoid the MPI. Arches and other Uintah components perform parallelism via patch domain decomposition. In this paradigm, an intact version of the entire domain simply does not exist. Portions of the domain are stored amid the various processors, so during the creation an adaptive-focus mesh, each processor gathers the patches from all other processors. The more time that is spent on passing information between processors, the less e ciently the algorithm will scale. To mitigate the amount of data that is handled on the MPI, aggressive coarsening on regions of the domain that are distal to the focus region is used. Justi cation for this is two-fold. First, the physical distance between the distal and focus regions increases the optical thickness. Therefore, any contribution from the distal regions will be attenuated exponentially along that path length, thus decreasing the e ect of the distal region on the origin cells. Second, the distal regions subtend a smaller solid angle than do proximal regions, again limiting the impact on the focus cells. 2.4 Function Abstraction Included in RMCRT are four distinct physical phenomena that require information re- garding intensities along rays. Speci cally, these phenomena are boundary uxes, imaginary surface uxes, virtual radiometer uxes, and ux divergences. To accommodate for these various phenomena, the intensity solver was isolated into its own function that could be called independently from the other solvers. For instance, the virtual radiometer model uses 22 cell-centered rays that have direction vectors distributed across a small solid angle de ned by the view angle of the radiometer, while surface uxes use face-distributed rays that have direction vectors that span the 2 hemisphere of the surface, yet both of these methods ultimately require the intensity integrated along each ray. Therefore, it was most intuitive to have the ray-marching and intensity solver abstracted into its own function that takes as arguments the location and direction of a ray as speci ed by the solver at hand. There was a marginal increase in computational time (8%) that was introduced when the intensity solver was abstracted into a function. However, the advantages of this approach include the following, Reduced number of conditional statements that correspond to the physics required for various cases The avoidance of cloning portions of the code into multiple les Cleaner code that is easier to maintain and enhance. 2.5 GPU Implementation The author collaborated with Todd Harman and Alan Humphrey of the University of Utah as they developed a GPU version of RMCRT. The radiation model was translated from its original language of C++ into the GPU-speci c language, CUDA. This allowed the model to be run on the GPU processors of the super-computing cluster, Titan. Strong scaling was demonstrated up to 16,000 processors. Results and discussion can be found in section (6.4) CHAPTER 3 RAY TRACING 24 3.1 Cubic Cells This section demonstrates the procedure of the ray marching algorithm in determining the piecewise path taken by a ray as it moves through the computational domain. The algorithm depicted here is a modi ed version of that described by Amanatides and Woo [92]. Pseudo code is given in Figure 3.1 The description for the pseudo code is as follows. For simplicity the description will be carried out in two dimensions, although the same principles are applied in three-dimensional cases. Let us begin with a small computational domain with two rows of cells, and six columns of cells, indexed as (i,j), where i represents the row number starting from the bottom of the domain, and j represents the column number starting from the left of the domain. Let the vertical lines represent x planes, and the horizontal lines represent y planes. A ray is to be traced from cell (1,1) from the ray origin indicated by the blue circle in Figure 3.2. Assume that a ray direction has already been determined, and will be represented by the long dashed line in the following gures. The length of the ray segment from the origin to the rst cell wall that is breached by the ray is determined. At this point in the algorithm, it is unknown whether the x plane or a y plane will rst be breached by the ray. To determine which plane will be breached (and subsequently determine the next cell that the ray will enter) the distance from the origin to the rst x plane, TmaxX (green), is compared to the distance from the origin to the rst y plane, TmaxY (red) in the direction of the ray. In two dimensions, this is accomplished by a simple \if/else" statement (additional comparisons are necessary in three dimensions). Once the shortest of the two distances is determined, the current cell is updated by use of the step variable. In this case, the shortest direction is TmaxX, so the ray steps in the x direction. The distance traveled (in this case the green line above) is stored as disMin, and will be used later in an algorithm that determines ray attenuation. Because the x component of the direction vector is positive, the cell index is incremented byone in the x direction, and the current cell becomes (1,2). The ray has progressed, and the scenario is now represented by Figure 3.3. Note that the rst segment of the dashed ray has now become solid. The distance from the origin to the rst y plane has not changed, so the red line representing TmaxY, remains unchanged. However, the distance from the current location to the next x plane has changed, and its length has been increased by the distance TDeltaX. TDeltaX represents the distance required to traverse one cell length in the x direction. With the updated value for TmaxX, again the green line is compared to the red line. Because TmaxX is still shorter 25 than TmaxY, the ray again steps in the x direction, incrementing i. The current cell then becomes (1,3), and TDeltaX is stored as disMin for later use. Figure 3.4 demonstrates the current state of the ray. Now, the second segment of the dashed ray has become solid, and TmaxX has been increased by TDeltaX. Note that for a given ray in a uniform mesh, TDeltaX and TDeltaY do not change, as the distance required to traverse a cell in the x or y direction is independent of the current cell. Again TDeltaX is compared to TDeltaY. The green line is still shorter than the red, so the ray again steps in the positive x direction, TDeltaX is stored as disMin, and cell (1,4) is reached as shown in Figure 3.5. The third segment of the ray has now become solid. The value of TmaxX is increased by TDeltaX, and for the rst time in this example, TmaxX exceeds the length of TmaxY. Thus, the ray steps in the y direction and TDeltaY is stored as disMin. Because the y component of the direction vector is positive, j is incremented and the ray enters cell (2,4), as shown in Figure 3.6. The fourth segment of the ray is shown as solid, and TmaxY has been increased by the distance TDeltaY. It is visually apparent that TmaxX is much shorter than TmaxY, and therefore the comparison in the algorithm would lead to a subsequent step in the x direction into cell (2,5), as shown in Figure 3.7. The fth segment has become solid, and TmaxX has been increased by TDeltaX. At this time, the reader should be familiar enough with the algorithm to predict that the next two steps will be in the positive x direction, at which point the ray would either terminate if the wall is black, or re ect based on the re ection algorithm which will be discussed in later sections. Also in later sections, the reader will nd a discussion of the attenuation of radiation from each of the cells along the ray path back to the origin, and the importance of disMin will become apparent. 3.2 Random vs. Cell-Centered Origins Within RMCRT, the user has the option to specify either randomly-distributed, or cell- centered ray origins. The cell-centered rays have random direction vectors, but for a given cell, all share the cell's center as the origin location. Conversely, the randomly-distributed rays each have a unique origin location as well as a random direction. In this case, the rays are distributed equally throughout the cell, giving a better representation of the cell's incident intensity. At a 5% increase in computation time,the cost of the random method is marginal. However, when speed is of most importance, the cell-centered option may be used. 26 3.3 Non-cubic Cells For domains that contain cells with nonunity aspect ratios, (non-cubic cells), addi- tional considerations become necessary in the ray marching algorithm. Throughout the above described algorithm, distances are handled in units of cell width, which can then be converted to physical units simply by multiplying by the cell width of x. However, when x is not equal to y or z, this conversion becomes nontrivial, and requires additional computations. By normalizing the lengths of y and z by x, the number of additional computations is minimized, such that only six lines of code require modi cation. Explanation of this procedure is as follows. Take the distance x to be of unit length. Then y and z have normalized lengths of y x and zz x , respectively. The rst section of the algorithm that requires modi cation is then the determination of ray origins. Previously, for cubic cells, it was assumed that the origin was located at (i+rand(); j+rand(); k+rand()), where rand() represents a function call to the random number generator which returns a random number distributed between 0 and 1. For non-cubic cells, the random numbers for the y and z directions are scaled by y x and z x , such that the origin of a ray becomes (i + rand(); j + y xrand(); k + z x ; rand()). In this manner, the origins are randomly distributed throughout the cell, and not simply throughout a cube. The second portion of the algorithm in need of modi cation is the determination of the original TMax values. Recall that the initial TMax values represent the distance from the origin to each of the respective x,y, and z planes. For example, recall that for cubic cells, TMaxY is calculated as follows TMaxY = (j + sign[1] rayLocation[1]) invDirV ector[1]; (3.1) where sign[1] is a boolean with a value of 1 if the y component of the direction vector is positive, and zero otherwise. invDirV ector[1] represents 1 divided by the y component of the direction vector. For instance, in Figure 3.8, the origin is located at 17.343, 8.617. The direction vector has components of 0.7071, and 0.7071, such that the sum of their squares is equal to 1. Because the sign of the y component of the direction vector is positive, 1 is added to 17 to represent the location of a y breach, and that value is subtracted from the y value of the origin, then multiplied by the inverse of the y direction vector. Implementing Eqn. 3.1 TMaxY is computed as follows TmaxY = (8 + 1 8:617) 1 :7071 = 0:5416: (3.2) 27 This value is smaller than that of TmaxX, which is computed as follows, TmaxX = (17 + 1 17:343) 1 :7071 = 0:9291: (3.3) For non-cubic cells, however, when a given component of the direction vector is positive, the origin value is subtracted not from 1 + j, but from 1 y x +j. When the component of the direction vector is negative, this multiplication of y x becomes unnecessary. This is because the negative face value (8 in Figure 3.8) is independent of the skewness ratio. To elegantly handle the condition of multiplying by the ratio y x the following formulation is used for the more general case of non-cubic cells. TMaxY = (j + sign[1] y x rayLocation[1]) invDirV ector[1] (3.4) To illustrate, see Figure 3.9 where a cell with y x = 2 is superimposed onto the same set-up as illustrated in Figure 3.8. Here, TMaxX is still equal to 0.9291, but TMaxY is solved as follows TMaxY = (8 + sign[1] 2 1 8:617) 1 :7071 = 1:959: (3.5) Therefore, the ray will not breach the y face during the rst step, but will instead breach the x face and enter into the cell to the right. TDeltaY and TDeltaZ are solved in a similar manner, such that for non-cubic cells, the following formula holds. TDeltaY = invDirV ector[1] y x : (3.6) 3.4 Adaptive-focus Mesh To accommodate for memory and message-passing constraints, a mesh with multiple levels is passed to each processor. An example of one such mesh is shown in Figure 3.10. The ray tracing algorithm has been modi ed to accomodate such a mesh. Because the rst step of a ray on a new level is handled in a unique fashion, the general case of ray marching in the coarser regions is considered rst. Recall from Eqn. 3.6 that the values of TDelta are obtained from the direction vector and the normalized lengths of the cell in the x, y, and z direction. Therefore, for general ray marching in a coarsened domain, these values simply need to be scaled by the respective coarsening ratios in each of the Cartesian directions. Notice that this scheme can work with any arbitrary number of levels, by simply using the current level as the \coarse" level, and the previous level as the \ ne" level. 28 3.4.1 First Step in a New Level The rst step in a new level requires special attention because the values of Tmax cannot simply be incremented by TDelta of the previous level, nor can it simply be incremented by TDelta of the current level. This is demonstrated by the four di erent segment lengths through cell L1:01 of Figure 3.11. The four rays have equivalent direction vectors, but each enters the coarser cell through a di erent ne cell. With careful attention to the location of the ne cell relative to the coarser cell, the four segment lengths of the rays can be deduced. The cell indices of Figure 3.11 are written as if the left and bottom edges represent the boundary of the domain. However, even if this is not the case, equivalent indices can be obtained by use of the modulus operator (%). Let cur represent a Uintah vector that contains the nonmodultated cell indices of the ner cell after a new level has been reached, but before the indices have been mapped to the new coarser level. Then, to get the cell indices of the ne cell relative to the coarser cell, the following operation is performed: cur % C, where C is also a Uintah vector and represents the coarsening ratio between the two levels in each of the three directions. For example, C in the y direction is given as follows, Cy = yc yp ; (3.7) where yc represents the current cell spacing in the y direction, and yp represents the cell spacing in the y direction on the previous level. Notice that in Figure 3.11, after the y face has been breached, but before cur is mapped to the coarser level, the indices of cur would become 0,2; 1,2; 2,2; and 3,2. Performing the modulus operation on these values relative to their respective coarsening ratios yields the following values: 0,0; 1,0; 2,0, 3,0. Now, in order to determine the segment length of each of the rays through cell L1:0,1, the value of Tmax[dir] is subtracted from Tmaxprev. Tmaxprev is the length from the ray origin to the level boundary breach, and is equivalent in all four rays, and Tmax[dir] is equal to the length from the origin of the ray to the location where the ray exits cell L1:0,1, and is di erent for each ray. The distance of Tmax[dir] is a function of the location of the ner cell from which the ray entered the coarser cell. Note that for the ray leaving cell L0:3,1, its TDeltaX value for this rst step in the new level is equivalent to TDeltaX of the previous level. Therefore, the TmaxX value correctly describes the next x breach of this ray, and needs no adjustment for this rst step in the new level. The remaining cells L0:0,1, L0:1,1, and L0:2,1 however, will require an additional 3TDeltaX, 2TDeltaX, and TDeltaX, respectively, to be added to the current TmaxX values in order to accurately represent the location at which the next x breach will occur in this new level. 29 Noting the x component of cur, the indices are mapped to the appropriate factor that will be used to become a multiple of TDeltaX in determining the new TmaxX. This holds when the component of the direction vector that corresponds to the breached face is positive. When this component of the direction vector is negative, the mapping is trivial. Tables for the positive and negative examples can be found online at www.chpc.utah.edu/common/home/u0258978/public html. The mapping is accomplished by the variable lineup, which is then used to scale TDelta and update Tmax as demonstrated in the algorithm shown in Figure 3.1. The other component (or components, if in three dimensions) follows a similar procedure. If in three dimensions, the component normal to the page in Figure 3.11 would be handled in an identical fashion to the x component. The y component is also handled identically, given that the usual incrementation of TmaxY is handled prior to the executions of Algorithm (3.1). The modulus of the component of the index that corresponds to the breached face will always return 0, giving a lineup value of 0 for the positive cases and a value of 1 minus the coarsening ratio for the negative cases. When lineup is scaled with TDelta as shown in the algorithm of Figure 3.1, then Tmax in the rst step of the new level will be assigned appropriately such that the subsequent breach in the new level will occur on the wall of the coarser cell, as indicated in Figure 3.11. 3.5 When a Ray Leaves the Domain RMCRT uses Arches-speci c boundary information to determine when a ray has reached the extent of the ow domain. This approach uses a simple test of the boundary condition information to determine whether the next cell along a ray's path is a ow cell or not. If the boundary information of the next cell is representative of a ow cell, the ray perpetuates, if not, the ray either re ects or terminates depending on the boundary condition. For component generality, volume fraction could be used as an alternative, so long as at each step, the location of the ray is compared with the extents of the domain. Otherwise, a ray using volume fraction to locate boundaries would attempt to pass through all non-wall boundaries and reference values that do not exist, leading to segmentation violations. A naive approach to comparing the location of the ray with the domain extents, as well as a recommended approach are as follows. The naive approach to determining when a ray has left the domain extents is to compare each of the three components of the current location with the positive and negative extents of the domain. This would lead to the following six comparisons. low.x() <= cell.x() && 30 low.y() <= cell.y() && low.z() <= cell.z() && high.x() > cell.x() && high.y() > cell.y() && high.z() > cell.z(). However, because for each step, the face through which a ray will breach is known, this information can be used to reduce the number of comparisons to test whether or not the ray has left the domain. For instance, if a ray has just breached an x face, only the x component of the location with the x domain extents need be compared. Generalizing this approach, the following is obtained, low[face] <= cell[face] && high[face] > cell[face]; This simple modi cation saves 5% to 8% of the radiation compute time, and is the recommended approach for use in combination with the volume fraction test. 3.6 Stopping Criteria This section outlines the algorithm by which RMCRT determines when to cease the propagation of a ray. Let represent the optical thickness from the origin to the current location of a ray. The optical thickness is de ned as = x; (3.8) where x is the distance between the origin and the current point of the ray. Then, by the Beer-Lampert law, I = Ioe ; (3.9) where Io is the intensity at the current point, and I is the fraction of that intensity that arrives at the origin following attenuation through the medium. A threshold criterion may be set whereby the ray termination point can be determined. For instance, if a threshold value of 0.05 is chosen, the ray is terminated at the location where less than 5% of the initial intensity at a given location would arrive at the origin. 3.7 Intrusion Cells Intrusion cells are cells that lie within the computational domain, are not ow cells, and do not lie on the domain extents. Examples include solid objects that are placed in the ow of a computational domain and geometric protrusions of the domain boundary. It is 31 often of value to predict radiative uxes to intrusion cells, such as the e ect of placing an object in a re. Further, the existence of intrusion cells changes the nature of radiative heat transfer throughout the domain. To compute the uxes to surfaces in a simulation that may or may not contain intrusion cells, the RMCRT algorithm rst tags which ow cells are adjacent to boundaries, and which faces of the cells interface with the ow. Because intrusion can occur anywhere in the domain, a cell iterator that loops through all interior cells has been implemented. Within this cell iterator, a function that tests to determine if a given cell has a boundary face is invoked. This function loops through the six adjacent cells to the cell at hand, and returns \true" if at least one adjacent cell is a boundary cell. In addition, this function also returns a vector of enumerated values that corresponds to the faces that are adjacent to a boundary. Currently, this function uses Arches-speci c boundary information to determine if an adjacent cell is a ow cell or not. This can be changed for generality, but as is, the algorithm is elegant and e cient as it takes advantage of the integer values of the boundary information. For details of this function, see Appendix E. At this point, the algorithm loops through the cells that have at least one boundary, then loops through the vector of faces that are boundary faces. Second, for ray tracing to occur in the proper direction, the face value is used to determine the proper orientation of the rays. 3.7.1 Ray/Intrusion Boundary Interactions The inclusion of intrusion cells modi es the behavior of radiative transport primarily in two ways: emission from the intrusion cells to other ow cells; and re ection of radiation o the intrusion boundaries, resulting in a change in ray direction and/or intensity. To account for these phenomena, the RMCRT algorithm tests the cells along a ray path to determine whether the ray has encountered a boundary. This is accomplished by a function call that returns the boundary condition information of the current cell. If the current cell is determined to be a wall, whether of intrusion or domain-boundary, the subroutine that handles wall emission and re ection is invoked. 3.7.2 Parallel Implementation During patch domain decomposition, intrusion objects may be broken into two or more di erent patches. Currently, the RMCRT algorithm uses the Uintah-speci c container, called a stencil7, to store the ux values of boundary and intrusion faces. A C++ std::map was also tested, and the following modi cations were used. A map of maps was created as 32 a way to identify which cells on which patches contained intrusion cells. The more general map used the patch ID as its key, and as the value, another map. This more speci c map contained the cell indices and face as the key, and solved uxes as the values. A stencil7 variable, conversely, is a eld variable and as such requires no patch iden- ti cation information. Six of the seven variables of the stencil7 (w,e,s,n,b,t) are doubles that store the ux values of each of the faces. The faces that do not interface between boundaries and ow remain initialized to zero. The seventh value (p) is set to zero if none of the six faces of a ow cell interface with a boundary and to 1.00000000 if one or more of the faces do. Because the p value of the stencil7 is a double, where only a boolean variable would su ce, some memory is wasted. The bene t of the stencil7 approach is that the algorithm to assign ux values is simple and e cient. When placed inside a loop of all boundary and intrusion faces, the assignment is accomplished in a single line as follows boundFlux[origin][ face ] = sumProjI * 2 * pi/ nFluxRays, where sumProjI is the sum of the projected intensities of all rays, and nFluxRays is the number of rays used in the ux ray loop. 3.7.3 E ciency Considerations To improve the e ciency, it was assumed that the intrusion boundaries remain xed in time. The cells that contain faces for which a boundary ux is to be computed are catalogued in a simple vector. Therefore, once the vector containing the faces of all boundary and intrusion cells has been created, the need to loop through all boundary cells before doing ray tracing is avoided, and a simple vector iterator can be invoked, decreasing the computation time. 3.8 Re ections This section explains how the model represents the physics of specular re ections within a domain with nonblack walls. The model is optimized to work with a structured domain composed of rectangular hexahedrons with faces that are always aligned in the Cartesian directions. The properties of this type of mesh greatly reduce the overhead of handling re ections. To illustrate, consider the general form of the equation that determines the postre ection direction vector ~R given the original, incident direction vector ~ and the surface normal vector ~N [94]. ~R = ~ 2(~N ~ )~N (3.10) Equation 3.10 requires the use of three cosine functions, nine multiplications and three subtractions. Clever use of the properties of the Cartesian-oriented mesh reduced this 33 overhead to a single multiplication. The proceeding paragraph explains. In a Cartesian mesh, the six possible surface normals are as follows. Nnorth = 2 4 0 1 0 3 5 (3.11) Nsouth = 2 4 0 1 0 3 5 (3.12) Neast = 2 4 1 0 0 3 5 (3.13) Nwest = 2 4 1 0 0 3 5 (3.14) Nbottom = 2 4 0 0 1 3 5 (3.15) Ntop = 2 4 0 0 1 3 5 (3.16) Let face be a variable that can take on the value of x; y; or z, and represents the cell face that is struck during a re ection. Then the only nonzero component of ~N is Nface: Therefore, the other two components of ~R are equal to the corresponding components of ~ . To compute the remaining component, Rface, we begin with Rface = face 2(N I)Nface: (3.17) Recognizing that N I = jjOjjcos( i), and that the magnitude of is always unity, Eqn. 3.17 may be written as Rface = face 2cos( i)Nface: (3.18) Because the de nition of cosine is the adjacent component divided by the hypotenuse of the triangle formed between ~ and ~N , we substitute cos( i) = face Nface . With this substitution, Eqn. 3.18 becomes Rface = face 2 face Nface Nface: (3.19) Canceling Nface yields Rface = face 2 face = face: (3.20) In other words, Rface is simply the negative if face. Therefore, to determine the new direction vector following a re ection, one need only change the sign of the component that 34 corresponds to the face that was struck during the re ection. This is illustrated in Figure 3.12. What further adds to the simplicity of re ections in Cartesian grids is that the ray marching scheme is minimally a ected by a re ection. Recall from section (3.1) that when the ray location surpasses one of the Tmax values, a step is taken in the corresponding face direction. Conveniently, the sequence in which Tmax values are updated, and thus the sequence in which x; y; and z faces are breached remains unchanged even after re ections (see Figure 3.13). This means that the algorithm need not reset the values of Tmax and TDelta following a re ection. The only variable that requires adjustment is the step variable. This variable determines whether the ray will step in the positive direction of face or the negative direction of face: Because it is a function of the direction vector, when the face component of the direction vector is changed, it too must be changed in a similar manner. Therefore, similar to Eqn. 3.20, stepface is assigned to be the negative of its current value. One caveat of the ray tracing algorithm is that the re ection condition is not triggered until after the ray has stepped outside the domain. Neglecting to account for this would lead to inaccuracies such as re ections occurring within the boundary rather than on the surface. This is remedied by creating a variable that lags behind the current ray location by one step. Then, upon reaching the re ection condition, the current ray location is assigned to the value of the lagging variable. The RTE is also minimally a ected by re ections. For a moment, assume that in Figure 3.13, a re ection does not occur at point A, and that the ray location is currently at point B in Figure 3.13. Then, the black body intensity from the current cell (in this case the cell with the dotted border) is as follows, Ib = T4 cur= ; (3.21) where Tcur represents the temperature of the current cell. Then following augmentation by absorption and emission along the path back to the origin, the intensity from this cell is Ii = Ib[exp( A) exp( B)]: (3.22) Let us now investigate the e ect on Ii for the case in Figure 3.13 where re ections do occur at the domain boundaries, and the ray is currently at position Br of Figure 3.13. The current cell would then be the cell to the left of the dotted cell, and the intensity that reaches the origin from the segment length of A to Br is Ii = Ib[exp( A) exp( Br )]fs; (3.23) 35 where fs represents the remaining fraction of spectral intensity following all previous re ec- tions. In this case, fs = A;where A is the re ectivity of the boundary at point A. Note the minimal di erences between the formula for nonre ected incoming intensity, Eqn. 3.22, and the expression for re ected incoming intensity, Eqn. 3.23. To demonstrate the procedure of determining incident intensity after a series of re ec- tions, consider the intensity emitted between points Cr and Br attenuated back to the origin, where fs = Br A. This scenario may be viewed in a piece-wise fashion as follows. A small amount of intensity is emitted between points Cr and Br based only on the optical thickness between the two points, not the running total of the optical thickness back to the origin. This intensity would then be scaled by Br and then attenuated according to Beer's law, again using only the optical thickness between to point A and point Br. This value would then be scaled by A, and nally attenuated by Beer's law using the optical thickness from the origin to point A. Mathematically, Ii = Ib[(exp(0) exp(( Cr Br ))) Br (exp(0) exp(( Br A))) Aexp( A): (3.24) However, for any series of consecutive, independent optical thicknesses, e:g:; 1; 2; and 3, the following relationship holds [exp( 2) exp( 3)]exp( 1) = exp(( 1 + 2)) exp(( 1 + 3)): (3.25) Furthermore, by the commutative property of multiplication, Br and A of Eqn. 3.24 are factored into the single term, fs and placed at the end of the equation. This yields Ii = Ib[exp( Br ) exp( Cr )]fs: (3.26) This equation is mathematically equivalent to Eqn. 3.24, but is much more tractable as it does not necessitate the storage of independent optical thicknesses for each ray section between re ections. It requires only the running total of the optical thickness as well as the optical thickness from the prior step. In RMCRT, Eqn. 3.26 is repeated for each step of each ray to get the total contribution for all rays. In the model, this is accomplished in a ray-marching loop within the ray loop as sumI += sigmaT4OverPi[prevCell] * ( expOpticalThick prev - expOpticalThick ) * fs, where sumI is the ongoing sum of all steps of all rays for a given cell at a given time step. 3.9 Re ection Veri cation Testing To assess the accuracy of the model, results were compared against an analytical solution provided in Section 13.2 of Modest [86]. 36 Here two in nite parallel black plates are separated a distance L apart. The medium between the plates is gray, with an optical thickness between the plates of L. The solution to the radiative- ux divergence at location is given by, dq d ( ) = 2 (T4w T4m )[E2( ) + E2( L )] 1 + (1= 1)[1 2E3( L)] ; (3.27) where = x and L = L. E1 and E2 are exponential integrals, and are given by E1(x) = Z1 0 ex= d ; (3.28) E2(x) = Z1 0 ex= d ; (3.29) where = cos( ): In this case, the following values were used in the calculation of the exact solution to the radiative- ux divergence, L = 1; (3.30) Tw = 1000K; (3.31) Tm = 1500K; (3.32) = 1: (3.33) To create a numerical model of an in nite parallel plates, a cubic domain with black top and bottom surfaces and perfectly re ecting side surfaces was implemented. The analytical solution at varying values of x that corresponded to the cell centered locations of the numerical solution was computed using a MATLAB script. To compute E1 and E2, the script used trapezoidal integration of 105 discretization points for d . The rst nine digits of the obtained values of the exponential integrals remained insensitive to a factor of 10 increase in the discretization, and the rst six digits of the exponential integrals matched the six digits tabulated by Modest. The advantage of computing the exponential integrals over using the table is that we are now not limited to the relatively sparse number of values in Modest's table, i:e:; we computed the exponential integrals at cell centers for any arbitrary resolution without resulting to interpolation between tabulated values. Furthermore, the computed values have a tolerance of 109 rather than 106. The exact solution that results from using the computed exponential integrals is indicated by the blue lines of Figures 3.14 and 3.15. The values of the radiative- ux divergence computed using RMCRT are given in green in the same gures. f The L2 error norms were computed 37 using the values at the cell centers of the discretized points along the vertical line between the two plates as well as the values of the exact solution at corresponding locations. The equation used for the L2 norm is given by L2 = Xnc i=0 p (Ei Ni)2; where Ei and Ni represent, respectively, the exact solution and the numerical solution at a given cell center. nc represents the number of cells between the plates and is given by nc = L dx ; where dx is the width of a cell, and L is the distance between the plates. The L2 norm as a function of ray number, N; converged at the expected order, O1=2 for 1 < N < 105. For values of N greater than 105; the convergence rate slowed. This is explained by the nite error from the threshold error and grid resolution error becoming nonnegligible for large values of N. The pink and red circles shown in Figure 3.16 have larger N and ner grid resolutions, allowing for an analysis of ray error. 3.10 Scattering Radiative scattering is the redirection of radiation within participating media. Three separate phenomena are responsible for this redirection [86]. 1. Re ection o the surface of particles in the media. 2. Refraction of the radiation after passing through a particle. 3. Di raction of radiation that passes close to particles. Generally, the physical scale at which combustion simulations take place is signi cantly larger than the size of the particles in the media. Therefore, in the RMCRT model, the three scattering phenomena are not di erentiated and a single scattering coe cient, s is used for a given computational cell. RMCRT models scattering as follows. The scattering length, l, is selected as, l = log(R) s ; (3.34) where R is a random number that varies between 0 and 1, and s is the scattering coe cient of the current cell at the time the scattering length is calculated, which is either the origin, or the cell at the location where the most recent scattering event has occurred. Concurrently, 38 the current length, lc is set to zero. As the ray propagates throughout the domain, lc is updated as follows lc = lc + ls; (3.35) where ls is the segment length of the current step, and is computed using disMin; a variable from the ray marching algorithm, and the cell width in the x direction, x, as follows ls = x disMin: (3.36) lc is updated as the ray propagates until the following condition is met lc = l: (3.37) Once lc has met or exceeded the size of l, a scattering event occurs. At a scattering event, a new direction for the ray is chosen. For isotropic scattering, the direction is chosen in the same manner as the original direction for a nonboundary cell whose rays will subtend a full 4 Sr. This is accomplished as follows. z = 2R 1 (3.38) r = p 1 z2 (3.39) = 2 R; (3.40) where R is a random number uniformly distributed between 0 and 1, and z; r; and are the components of the direction vector in polar coordinates. For convenience, these values are converted to Cartesian coordinates, as follows ~d = rcos ; rsin ; z; (3.41) where ~d is the new direction vector. To avoid division in later computations the inverse of this vector is calculated as follows, ~dinv = 1 ~ dx ; 1 ~ dy ; 1 ~ dz : (3.42) In accordance with the ray marching algorithm, new step and sign values are assigned. Each of the three components of step and sign are assigned based on the corresponding components of the new direction vector. If a component of the direction vector is positive, then the corresponding component of step and sign are assigned the value of 1. Otherwise, the values of -1 and 0, respectively, are assigned. Then, based on the current location of 39 the ray, the inverse direction vector, sign, the values of tMax and tDelta are calculated as was demonstrated in Section (3.1). tMaxi = curi + signi i x locationi; (3.43) tDeltai = abs(~di) i x ; (3.44) where i is the x; y; or z component, and i x is the ratio of the cell width in the ith direction relative to the cell width in the x direction. step is then used in the incrementation/ decrementation of cur; the variable that represents the indices of the current cell. Let face represent x; y; or z, and assume that it is updated based on the cell face through which a ray has most recently passed. During some scattering events, there is a sign change for the face component of step, e:g:; the x component of step was -1 prior to a scattering event that occurred on an x cell face, yet +1 after the scattering event. In this circumstance, the ray is scattered back into the cell through which it would have passed had it not scattered. To account for this form of scattering, cur is assigned to the previous cell's indices. Note that in other scenarios, step may change sign, yet cur need not be re-assigned to the previous indices. For instance, if the x component of step changes sign at a scattering event that occurs on the y face, then simply the adjustment of step and sign will account for the correction of the order in which the ray will march from the cells in the domain. In this case, no cell need be referenced more than once as there is only one segment length per cell along the ray's path. In the prior case, however, there will be multiple ray segments within the same cell, requiring cur to be referenced more than once for intensity and attenuation calculations. The RMCRT model accounts for this phenomenon. For the implemented numerical method of scattering, scattering events always occur on cell faces. Then, at a scattering event, one of the three tMax values will be zero since the ray location will lie on a cell face. The corresponding tDelta is added to this tMax value to avoid erroneously stepping immediately in the face direction. After each scattering event, lc is reset to zero, and a new scattering length is chosen according to Eqn. 3.34. This procedure continues until the ray is terminated according to the threshold value. 3.10.1 Veri cation The selected scattering veri cation case was described by Siegel [1]. This case includes a 1m unit cube with cold, in nite parallel plates on the top and bottom, and cold, mirror sides. The cube is lled with absorbing, emitting, scattering media at T = 64.7K. Analytical 40 solutions were given for the surface uxes at the top and bottom plates at varying optical thickness, and varying scattering albedo, ! according to the following equation q( )= T4 i = c(1 + 3 2 c ) 4 3 : Analytical and computed solutions are shown in Figure 3.17. 3.10.2 Properties of Pulverized Coal Particles In the Arches combustion simulations, a common particle in the media is pulverized coal particles. To accurately calculate a scattering coe cient for participating media, the complex index of refraction of the particle is used as an input parameter. As part of this work, research into previously performed experimental studies was performed. One such study suggests that a value of 1:8 2i, where i is the square root of -1, can be used to obtain e ective radiative properties of pulverized coal properties in a radiative eld where the dominant wavelengths are on the order of 10 m [95]. The RMCRT model has been set up to receive this input value. This input value can then be used within a property calculation model, such as that developed by Simeonova [96]. TmaxX = (17 + 1 17:343) 1 :7071 = 0:9291 (3.45) 41 while the normalized intensity is greater than the minimum thresholdf while the ray is in the domainf update the cell's location; // Determine which cell the ray will enter next by nding the // shortest segment length if X is shorter than Y and Z, then face = X if Y is shorter than X and Z, then face = Y if Z is shorter than X and Z, then face = Z step in the face direction; disMin = tMax[face] - the previous tMax value tMax previous = tMax[face] tMax[face] = tMax[face] + tDelta[face] update ray locations g end domain loop pick up any wall emissivity re ect the ray if the wall isn't black update intensity g end intensity loop Figure 3.1. Pseudo code for the ray marching algorithm. Figure 3.2. First step of ray marching. 42 Figure 3.3. Second step of ray marching. Figure 3.4. Third step of ray marching. 43 Figure 3.5. Fourth step of ray marching. Figure 3.6. Fifth step of ray marching. 44 Figure 3.7. Sixth step of ray marching. 45 17, 8 17.343, 8.617 0.7071 0.7071 Figure 3.8. First step in a cubic cell. 46 17, 8 17.343, 8.617 0.7071 0.7071 Figure 3.9. First step in a non-cubic cell with y x=2. 47 Figure 3.10. Multilevel mesh. The processor owns the ne mesh information indicated by the red square, and is passed coarsened versions of the mesh for regions outside of the local extents. L0:0,1 L0:1,1 L0:2,1 L0:3,1 L1:0,1 Figure 3.11. The segment length of the rst step in a new level is a function of the location of the ne cell of interest relative to the coarser cell. 48 N I R Iy Ry Figure 3.12. Specular re ection about the surface normal, N. Note that Ry = Iy: Figure 3.13. Re ection ray marching. This gure demonstrates that the values of Tmax and TDelta need not be adjusted after a re ection. The x and y faces are breached in the same order even after a re ection. For example, after the ray has reached the rst nonblack boundary, indicated by point A, the following breach occurs at a y face as shown both at point B and point Br. Subsequently, there is another re ection at point Br followed by an x breach both at points C and Cr. 49 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2.5 3 3.5 4 4.5 5 5.5 x 10 5 divQ [W/m3] location between plates [m] Exact RMCRT Figure 3.14. Exact solution for the radiative- ux divergence compared to RMCRT with 1000 rays and 413 cells for Modest's benchmark case 13.2. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3 3.5 4 4.5 5 5.5 x 10 5 divQ [W/m3] location between plates [m] Exact RMCRT Figure 3.15. Exact solution for the radiative- ux divergence compared to RMCRT with 100,000 rays and 413 cells for Modest's benchmark case 13.2. 50 0 1 2 3 4 5 6 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 log of L2error Norm log of Number of Rays y = − 0.49218*x − 0.24088 data 1 linear data 2 data 3 Figure 3.16. Convergence rate of the L2 error norm of RMCRT on benchmark 13.2 as a function of ray number from 1 to 1,00,000 rays. The blue circles represent the L2 error norms from RMCRT data with a ray threshold of 103 on a grid of 413 cells. The red line is a curve t of these norms. The pink circle represents the L2 error norm of RMCRT with N=1,000,000 rays and a threshold of 104. The red circle represents the L2 error norm of RMCRT with N=1,000,000 and a threshold of 104, but on a grid of 1013 cells. scattering albedo=0 0 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Optical Thickness, ! Radiative Flux, (w/m2) 0.3 0.6 0.8 0.9 0.95 Figure 3.17. RMCRT-computed (*) and analytical (-) surface uxes along the bottom plate, where the analytical solution is provided by the case described by Siegel at varying optical thicknesses, and scattering albedo. [1]. CHAPTER 4 FLUX CALCULATIONS 52 4.1 Explanation of Boundary Fluxes To compute the boundary uxes, a similar methodology as was implemented for the virtual radiometer model was used. In RMCRT, rays are generated over a hemisphere, rotated into the the appropriate hemisphere for the boundary at hand, traced as usual, then weighted by the cosine of the polar angle from the surface normal. The details of each of these steps are given below. 4.2 Generating Rays on a Hemisphere Because a hemisphere is symmetric about the surface normal, the azimuthal, , is assigned simply as 2 R1, where R1 is a random number between 0 and 1, and thereby achieving the appropriate range of of 0 to 2 . For the polar angle, , because the area of a given ring of the hemisphere is a function of the polar angle, our random number is scaled by the arccosine in order to achieve equi-distribution of rays throughout the solid angle, = acos(R2): 4.3 Rotating Rays Onto a Hemisphere When a ray direction has been selected, initially, the ray will be oriented in the positive z direction, as if it were originating from the top face of a cell. This direction is adjusted to lie within the appropriate hemisphere for the face at hand. For a structured Cartesian mesh, all of the surface normals of the cells are aligned in the coordinate directions. This greatly simpli es the rotation of the rays as it negates the necessity of using a rotation matrix, as was done for virtual radiometers with arbitrary orientations. To re-orient a rays into a new direction such that the rays originate from the face at hand, a simple rearrangement of the vector indices was implemented. This adjustment takes place as follows, where face is an enumeration with the following order: E,W,N,S,T,B. Notice that this enumeration is slightly di erent than the face enumeration that is passed in from a call to the Uintah type \face" iterator, which has the order: W,E,S,N,B,T. A simple array called RayFace, with values [1,0,3,2,5,4] is used to ameliorate the problem, as the RayFace[Uintah face] will return the proper faces. With the proper face enumeration, the direction is reassigned onto the face at hand, and the sign of one of the components may be reversed as well, if the current face is E,N, or T. One may note that for any face, the ray direction will always point toward the inside of the cell, placing the rst segment length of a ray through the origin cell. This is because the operation that loops through the cells in the domain to identify which cells have boundary 53 faces, loops through the interior cells. One could imagine a scenario where the algorithm would loop through the boundary cells and identify which of those have faces that are adjacent to the ow cells. In this scenario, several modi cations to the algorithm would be necessary. First, the positive and negative faces would need to be reversed, as a west boundary face would need to have rays placed on its east face in order to determine the ux at the actual interface between ow cells and boundary cells. Second, the hemisphere would then be on the outside of the cell face as opposed to the inside, as rays should not be traced through boundary material. This would lead to t |
| Reference URL | https://collections.lib.utah.edu/ark:/87278/s6pc69qx |



