Modeling nondiffeomorphic motion VIA composite deformation models

Modeling nondiffeomorphic motion VIA composite deformation models

Title	Modeling nondiffeomorphic motion VIA composite deformation models
Publication Type	dissertation
School or College	College of Engineering
Department	Computing
Author	Preston, J. Samuel
Date	2018
Description	Diffeomorphic models of image deformation are a mainstay of medical image registration due to both their advantageous mathematical properties, allowing the analysis of shape and giving rise to the field of computational anatomy, and their practical ability to accurately model the wide variety anatomical variability and physiological motion encountered in clinical practice. However, the price for these beneficial properties is a restriction on the types of motion that can be modeled, most notably the requirements of differentiability and topology preservation. In this work, we consider observed cases of nondiffeomorphic motion in medical imaging, and develop generative statistical models that accurately represent the observed motion while leveraging the useful diffeomorphic framework by representing nonsmooth motion via multiple diffeomorphic transformations. We focus on two motivating cases: nonsmooth motion in fluoroscopic imaging due to the overlapping 2D projections of 3D objects, and the nonsmooth sliding motion of the lower lungs against the thoracic wall in 4DCT imaging. In fluoroscopic imaging, we represent the observed motion as the additive combination of smoothly deforming layers, investigate the effect of layer image priors (regularization), and show applications to denoising, frame interpolation, and digital subtraction angiography. Motivated by challenges encountered in temporal registration of contrast-enhanced vessels, a novel discrete registration technique is developed for estimating smooth, nonlinear motion of small or repetitive features. Applications of this technique on slice-to-slice microscopy registration are presented. To model discontinuous motion in 4DCT imaging, we propose a framework for representing and estimating globally invertible and piecewise-smooth transformations. This formulation explicitly represents the location of discontinuities in the deformation field. Based on a novel representation of the invertibility constraint, our formulation allows us to automatically estimate the discontinuous motion, including the location of discontinuities, from the image data. Finally, we extend this invertible and piecewise-smooth model to represent spatially localized topological image changes as a separation between smooth segments in the composite deformation. Initial results are presented modeling the physical tearing of 2D histological sections, automatically estimating both the deformation and tearing region.
Type	Text
Publisher	University of Utah
Dissertation Name	Doctor of Philosophy
Language	eng
Rights Management	© J. Samuel Preston
Format	application/pdf
Format Medium	application/pdf
ARK	ark:/87278/s6jravy8
Setname	ir_etd
ID	1746605
OCR Text	Show MODELING NONDIFFEOMORPHIC MOTION VIA COMPOSITE DEFORMATION MODELS by J. Samuel Preston A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computing School of Computing The University of Utah August 2018 Copyright c J. Samuel Preston 2018 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of J. Samuel Preston has been approved by the following supervisory committee members: Ross T. Whitaker , Chair(s) 10 Nov 2017 Date Approved Sarang Joshi , Member 10 Nov 2017 Date Approved Suresh Venkatasubramanian , Member 10 Nov 2017 Date Approved Guido Gerig , Member 3 Dec 2017 Date Approved Preston Thomas Fletcher , Member 10 Nov 2017 Date Approved by Ross T. Whitaker , Chair/Dean of the Department/College/School of Computing and by David B. Kieda , Dean of The Graduate School. ABSTRACT Diffeomorphic models of image deformation are a mainstay of medical image registration due to both their advantageous mathematical properties, allowing the analysis of shape and giving rise to the field of computational anatomy, and their practical ability to accurately model the wide variety anatomical variability and physiological motion encountered in clinical practice. However, the price for these beneficial properties is a restriction on the types of motion that can be modeled, most notably the requirements of differentiability and topology preservation. In this work, we consider observed cases of nondiffeomorphic motion in medical imaging, and develop generative statistical models that accurately represent the observed motion while leveraging the useful diffeomorphic framework by representing nonsmooth motion via multiple diffeomorphic transformations. We focus on two motivating cases: nonsmooth motion in fluoroscopic imaging due to the overlapping 2D projections of 3D objects, and the nonsmooth sliding motion of the lower lungs against the thoracic wall in 4DCT imaging. In fluoroscopic imaging, we represent the observed motion as the additive combination of smoothly deforming layers, investigate the effect of layer image priors (regularization), and show applications to denoising, frame interpolation, and digital subtraction angiography. Motivated by challenges encountered in temporal registration of contrast-enhanced vessels, a novel discrete registration technique is developed for estimating smooth, nonlinear motion of small or repetitive features. Applications of this technique on slice-to-slice microscopy registration are presented. To model discontinuous motion in 4DCT imaging, we propose a framework for representing and estimating globally invertible and piecewise-smooth transformations. This formulation explicitly represents the location of discontinuities in the deformation field. Based on a novel representation of the invertibility constraint, our formulation allows us to automatically estimate the discontinuous motion, including the location of discontinuities, from the image data. Finally, we extend this invertible and piecewise-smooth model to represent spatially localized topological image changes as a separation between smooth segments in the composite deformation. Initial results are presented modeling the physical tearing of 2D histological sections, automatically estimating both the deformation and tearing region. CONTENTS ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii CHAPTERS 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Medical Image Registration Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Layered Deformations in Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Time Series Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Generative Bayesian Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Challenges of Nonsmooth Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2. ADDITIVE LAYER MODEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1 Fluoroscopic Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Additive Layer Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3. 4. ADDITIVE LAYER MODEL WITH CONVEX LAYER PRIORS . . . . . . . . . . . . 20 3.1 Motion Estimation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Layer Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Linear Model of Layer Deformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 H1 Layer Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Optimization for Translational Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 TV Layer Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Transparent Cloud Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Results on Translating Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Fluoroscopic Phantom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Cloud Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 21 23 24 26 27 27 29 29 30 31 33 LAYERED DEFORMATION WITH NONLINEAR MOTIONS . . . . . . . . . . . . . . 37 4.1 Additive Layer Model for Nonlinear Physiological Motion . . . . . . . . . . . . . . . . . . 4.1.1 Deformation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Layer Gradient Penalty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Energy Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.4 Residual Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.5 Discretization and Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 38 38 40 41 42 5. 6. 7. 4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Integrating Correspondences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Correspondence Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Layer Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 46 47 48 48 MULTISCALE DISCRETE REGISTRATION . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Standard MRF Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Multiscale MRF Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Subgrid Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Rotational Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.5 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 53 54 54 56 56 57 57 60 PIECEWISE DIFFEOMORPHISMS FOR SLIDING MOTION ESTIMATION . . 62 6.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Piecewise-Diffeomorphic Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Registration Formulation and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Estimation of the Partitioning Function `0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Deformation Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Segmentation Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 65 67 69 73 74 75 80 MODELING SPATIALLY-LOCALIZED NONINVERTIBILITY WITH PIECEWISE DIFFEOMORPHISMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 8. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 APPENDICES A. DETAILS OF ESTIMATING TRANSLATIONAL MOTION IN THE ADDITIVE LAYER MODEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 LAYERED DEFORMATION DETAILS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 B. v LIST OF FIGURES 1.1 Frame generation model for a time-varying scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Overlapping motion in fluoroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Nonsmooth motion example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Fluoroscopic imaging diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Cartoon layer separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Layered motion model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.1 Possible layer separations diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 Penalties on layer gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 Three-layer synthetic dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4 Optimization objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5 Fluoroscopic phantom dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.6 Denoising results, fluoroscopic phantom data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.7 Optimization initialization test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.8 Cropped fluoroscopic phantom data results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.9 Direct log-additive layer estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.10 Cloud data layer estimation, independent vs. joint channel estimation . . . . . . . . . . . . . . 35 3.11 Layer separation results, cloud data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.1 Ambiguous deformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2 Layer separation, synthetic dataset results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3 Layer separation, clinical dataset results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.4 Layer separation, interpolation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.5 DSA results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.6 Layer estimation with correspondences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.1 Discrete multiscale downsampling diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.2 Synthetic vessel registration results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.3 Per-scale-level results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.4 Microscopy registration results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.1 Two-label sliding motion diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.2 Invertibility constraint function comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.3 Graph setup for discrete labeling optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.4 Motion segmentation initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.5 Sliding registration motion segmentation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.6 Sliding vs. smooth deformation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.7 Sliding registration DIR-lab correspondence errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.8 Deformed mask sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.9 Tumor data results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.1 Examples of torn histology slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.2 Tearing registration results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 vii LIST OF TABLES 5.1 MSBP registration performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.1 Two-label value table for c(l1 , l2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.2 ck values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.3 Sliding registration DIR-lab correspondence error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 CHAPTER 1 INTRODUCTION Medical image registration refers to the process of estimating coordinate transformations that align corresponding anatomical structures in two or more images. State-of-the-art techniques in medical image registration can generate smooth and invertible transformations that provide physically plausible correspondences across a wide variety of applications. Not all physical changes, however, can be accurately represented by a single smooth and invertible deformation; for example, sliding or separation between anatomical structures introduces nonsmoothness. This work introduces new models for both 2D and 3D medical image registration to accommodate such cases. Two driving instances motivate much of this work: the overlapping motion of transparent anatomical structures in dynamic X-ray imaging, and the discontinuous motion exhibited between adjacent but unconnected organs in the anatomy. In both cases, a smooth and invertible deformation is unable to accurately represent the observed motion. In the case of projective X-ray imaging, the projection of a set of physical 3D locations onto the same 2D image location means that unique pointwise correspondences between 2D images do not exist. The lack of unique correspondences means that it is generally not possible to define a single deformation that aligns anatomical features between successive images. In the case of discontinuous motion in the anatomy, one organ may slide against another along another during normal physiological motion. Here unique correspondences do exist, but a smooth deformation cannot capture such discontinuous motion. Although ostensibly very different, this work addresses both cases by introducing techniques for composite deformation modeling, in which a small number of smooth and invertible deformations are used to represent the observed motion. The specifics of the compositing methods, image representation, and estimation techniques are tailored to the particular motion and imaging modality, and will be discussed in later chapters. In general, though, this model allows the reuse of established techniques for modeling constituent deformations, while enabling novel motion representations in the final composite transformation. In this dissertation, we first define the use of terms such as image registration and medical imaging. To this end, we next provide a brief introduction to medical image registration, allowing the development of terminology and motivating concepts that underpin this work. In addition, 2 our motivating examples specifically concern registration of moving objects. This work uses a generative statistical formulation of the temporal registration problem, and a brief overview of this model-based Bayesian paradigm is given. Finally, we discuss the challenges this work addresses, and present the specific contributions of this research. 1.1 Medical Image Registration Overview Image registration is a vital step in many medical image analysis tasks. However, medical image registration is sometimes challenging due to image acquisition noise and artifacts, the need to model highly nonlinear transformations, and the dearth of distinct localizable features in the anatomy. To help address these challenges, state-of-the-art techniques in this field enforce smoothness and invertibility constraints on the transformations. We will first discuss image registration in medical imaging and the development of these smooth and invertible models. In addition to modern medical image registration methods, this dissertation draws on multideformation models from the computer vision community. We will therefore cover related material in this area. Finally, our motivating examples deal with temporal registration, which is the focus of the models in this dissertation. We therefore briefly discuss models and applications of time-series registration. The purpose of image registration in the medical field is to align corresponding anatomical structures between images. This process is necessary in order to combine, compare, or process pointwise information between images. Registration aligns images by generating a dense coordinate transformation of the image domain. In order to align corresponding physical structures between images, this transformation should map locations in a fixed image, f̂, to the location of corresponding structures in another image, the moving image f. For such a transformation φ, this means that the structure represented by the image value f̂(x) corresponds to the structure represented by the image value f(φ(x)). More succinctly, f̂ ≈ f ◦ φ. (1.1) Due to physiological motion and intersubject variability, mappings representing the observed differences between images can be highly nonlinear. For this reason, medical image registration often represents the transformation φ(x) as a densely sampled vector field, allowing very complex mappings. Simpler linear changes to due to a moving patient or viewpoint are possible, but these registrations are usually performed as a preprocessing step. Although dense deformation fields allow expressive mappings, estimating such a high-dimensional transformation based solely on minimizing image dissimilarity would give useless results—we could simply map each point to its best match in the target image, yielding a transformation that produces a near-perfect looking image, but which does not resemble the physical motion of points in the imaged object. Furthermore, 3 it is likely that there is no unique optimal mapping; any point with multiple equally ‘best matching’ locations will cause ambiguity in the final result. In order to approximate the true motion of imaged objects, regularization is needed. Ideally, the regularization used will promote not only numerical stability in estimating the solution, but also solutions better approximating the physical system being imaged. In medical imaging, the combination of smoothness and invertibility constraints is often imposed. The use of smoothness as a regularizer is nearly universal in medical image registration. It is a simple and effective regularizer, used since some of the earliest work in registration [1]. Smoothness can be enforced either explicitly or implicitly, both of which have been extensively studied. Explicit constraints take the form of nonsmoothness penalties, whereas implicit smoothness requirements are enforced by representing the deformation as a smooth parametric function (e.g., b-spline models [2]). In either case, continuity (a prerequisite for smoothness) results in the anatomy deforming as a single object without tearing or separating. Intuitively, continuity states that if two points are nearby in one image, they must be nearby in another image. The solid nature of organs and tissues of the anatomy means that this assumption holds in a wide range of common cases. Smoothness further requires that the deformations are free of kinks caused by discontinuous derivatives. Even though smoothness constraints are ubiquitous in medical image registration, by themselves they are often insufficient to guarantee physically plausible deformation results. Nonsmoothness penalties based on differential operators have been widely used in medical image registration under the moniker elastic registration, so named because of their use of differential operators based on physical elastic energy [3, 4, 5]. These methods result in smooth transformations, but highly nonlinear local image-to-image changes often are not faithfully represented. The true change may not be fully represented, either because the transformation model cannot represent the true change, or because doing so would lead to too high a nonsmoothness penalty. Alternatively, folding of the deformation field can occur, resulting in the noninvertible (and physically impossible) situation where multiple points map to the same location. In order to overcome these limitations, more advanced techniques have been developed imposing both smoothness and invertibility constraints. Invertibility enforces the physical notion that objects cannot appear or disappear between frames; for any given point in one image, a corresponding point can be found in another image. This assumption is reasonable in many applications because medical image registration primarily occurs between images depicting the same anatomical structures. Anatomical changes to a specific subject due to growth or physiological motion may change the relative size and shape of imaged structures, but do not generally result 4 in topological changes to the image; they do not alter the existence or relative configuration of anatomical structures. Therefore, any point in one image represents a location in a structure that also exists in the other image. Similarly, the consistent morphological structure of the anatomy means that the unique correspondence assumption often holds between images of different patients. Modern medical image registration methods imposing both smoothness and invertibility constraints were developed in order to more accurately represent the complex morphological changes present in intersubject registration, where registration is performed between images of different patients. Christensen et al. [6] proposed a flow-based deformation model that allowed the representation of large deformations while guaranteeing that the deformation is a diffeomorphism: a smooth, invertible function with a smooth inverse. The introduction of optimization techniques for the full flow model, called the large deformation diffeomorphic metric mapping (LDDMM) model [7], allowed the deformation energy to be viewed as a metric on shapes, resulting in the field of computational anatomy [8]. The explicit requirement of smooth, invertible transformations has become common practice in medical image registration. Even when registration techniques that do not guarantee invertibility are used (e.g., diffusion-based models [9] or implicit regularization through b-spline parameterization [2]), noninvertibility of the result is viewed as a failure of the registration method; invertibility is desired and implicitly assumed. In order to address situations where numerical estimation results in undesirable noninvertibility, versions of these models guaranteeing invertibility have been developed ([10] and [11, 12]). Smoothness is a useful regularizer, and has been successfully applied to a wide variety of registration methods. However, unlike invertibility, smoothness is often not a fundamental property of the anatomical system being modeled. Many cases exist in which the observed changes cannot be accurately represented by a smooth transformation. In addressing these challenges, this dissertation will develop medical image registration techniques that accommodate nonsmooth motion. Nonsmooth motion is the rule instead of the exception in computer vision, and therefore related techniques from this field have informed the models developed here. Next we give a brief overview of related multideformation models from computer vision. 1.1.1 Layered Deformations in Computer Vision A large body of work from the computer vision community is concerned with motion estimation in video sequences, i.e., optically imaged temporal sequences of real-world scenes. Of particular relation to this work are layered motion models that use multiple transformations to represent the apparent motion of objects. Although the motivations and technical challenges are designed for natural images, these techniques have helped inspire the medical imaging algorithms developed in 5 this dissertation. Below, we briefly review layered motion models in computer vision as it relates to this work. Registration techniques for medical images share a common goal with registration techniques applied to photography or video, but differ in the specific challenges they face. A primary difference is the projective nature of photography. Medical imaging generally creates images of the same dimensionality as the original objects—3D imaging of 3D anatomy, e.g., computed tomography (CT) or magnetic resonance (MR) imaging; or 2D imaging of 2D structures, e.g., histology. Photography of natural scenes, however, creates a 2D image of 3D objects, which leads to occlusions and apparent motion discontinuities at object boundaries. Due to occlusions, the set of physical points visible to the camera may change from frame to frame. A single image is therefore insufficient as a model for the scene over time, as it will be missing information about occluded regions that may be visible in other frames. One solution is to build a full 3D model of the scene [13, 14], but this requires a constrained model of the scene or motion—for instance, requiring a static scene imaged by a moving camera. However, a simpler model of representing the scene as a number of 2D images, or layers, each undergoing independent motion, has been widely successful [15, 16, 17]. The model of a scene then consists of a number of layers, their corresponding motions, and an ordering constraint that determines which layer is visible. This dissertation develops composite deformation models for medical imaging. Although inspired by the work in computer vision, models for medical image registration must accommodate the properties of medical images. For instance, the imaging of structures in their native dimensionality means that invertibility constraints may be enforced, ensuring physical interpretability of the results. Another distinct feature of medical image registration concerns the types of changes to be represented. These can be highly nonlinear, displaying large localized changes arising from varying mechanical properties of neighboring anatomical structures, or representing intersubject anatomical variability. We therefore employ large-deformation models for individual layer transformations. Finally, medical images are noisy and contain few uniquely identifiable features. Image quality is constrained by the physical properties of the imaging modality, where tradeoffs must be made between noise levels and factors such as patient safety, comfort, and cost. In the case of X-ray imaging such as CT, reducing the exposure of a patient to ionizing radiation also decreases the signal-to-noise ratio (SNR), and therefore imaging is performed at the lowest adequate SNR for a given task. In the case of MR imaging, increasing the quality comes at the cost of increased imaging time, which is uncomfortable and expensive for patients, and increases the likelihood of motion artifacts. The methods proposed in this work should accommodate such high-noise scenarios. In either modality, the smooth shape and homogeneous internal appearance of many 6 organs means that accurate feature localization is often a difficult task. Therefore, feature-based registration techniques from computer vision are unlikely to be applicable to medical images. 1.2 Time Series Image Registration Registration is a fundamental component of many medical image analysis tasks, enabling image alignment, improved segmentation [18, 19], and statistical shape analysis [20], among many other applications. This work primarily focuses on registration of time-series image sequences. In this section, we will motivate the need for accurate motion modeling, and review the generative, Bayesian modeling approach used in this dissertation. An accurate motion model for temporal sequences has many applications; for instance, a model of patient motion allows for motion compensation, which could improve the analysis of coronary angiography data [21]. In addition, a motion model means that all acquired data can be used to estimate a single template image that is deformed to represent the configuration observed at each timepoint, greatly improving image quality [22]. In radiation oncology, an accurate motion model may also be used to predict full 3D motion based on a few tracked points, allowing improved localization of dose delivery to diseased tissue [23]. Although the time-series image registration tasks discussed here model physiological motion of a single patient over a short time period, the ability to model such intrapatient changes is an essential first step to modeling the larger changes observed in interpatient registration. For instance, in measuring shape variability between subject anatomies, it would be undesirable to conclude that large shape differences exist between anatomies simply because measurements were taken at different phases of breathing or heartbeat. 1.2.1 Generative Bayesian Model This work will primarily consider temporal modeling of anatomy undergoing physiological motion, as opposed to intersubject variability. We will develop Bayesian generative models to describe the observed motion. Here we briefly describe this modeling method, and present a simple example both to clarify concepts and to introduce notation that will be used throughout this work. As an example, consider the simple deformation model sketched in Figure 1.1. Here we have a noise-free template image f̂ being acted on by a time-varying deformation φ representing object motion. Noisy images are acquired at times t = {t1 , . . . , tT } resulting in frames {f1 , . . . , fT }. The frame generation equation can then be stated as ft (x) = f̂ ◦ φt,0 (x) + η (1.2) 7 t0 t1 t2 tT time φ f̂ f̂1 f̂2 f0 f1 f2 ··· f̂T fT Figure 1.1. Frame generation model for a time-varying scene. The template scene f̂ is modified via a transformation φ to produce scenes f̂t at each time tt . The camera indicates the generation of an observed frame ft from scene f̂t . where φt,0 is the deformation φ at time t, and η is noise drawn from some specified distribution. In this formulation, we have a noisy observation of the true objects and we wish to estimate the deformation and (possibly) the template frame f̂. The generative model gives us a statistical formulation of the problem, where the noise model determines the data likelihood, p(ft \| f̂, φ), and the variables of interest (f̂ and φ) are random variables. We can introduce our beliefs regarding the types of objects being imaged and the type of motion by defining the prior distributions from which these variables are drawn, p(f̂) and p(φ). For smooth deformations, a nonsmoothness penalty can be encoded in the prior p(φ). In the case of the large deformation model (LDDMM [7]), this prior would be smoothness on the underlying flow sufficient to guarantee invertibility. Given a set of frames {ft } = {ft }Tt=1 , and assuming conditional independence of the frames, the full likelihood is given by p({ft } \| f̂, φ) = T Y p(ft \| f̂, φ). (1.3) t=1 The posterior distribution can then be computed using Bayes’ theorem, p(f̂, φ \| {ft }) = p({ft } \| f̂, φ)p(f̂)p(φ) . p({ft }) (1.4) Standard inference techniques can then be used to generate maximum a posteriori (MAP) estimates of the random variables f̂ and φ, argmax p(f̂, φ \| {ft }) = argmax p({ft } \| f̂, φ)p(f̂)p(φ). f̂,φ (1.5) f̂,φ MAP estimation generally optimizes over the equivalent log-transformed energy minimization problem argmin Ef (f̂, φ) + Ef̂ (f̂) + Eφ (φ), f̂,φ where Ef (f̂, φ) = − ln p({ft } \| f̂, φ), Ef̂ (f̂) = − ln p(f̂), and Eφ (φ) = − ln p(φ). (1.6) 8 Although the combination of a template image estimate with a single diffeomorphic deformation, as described above, is often used in medical image registration, there are limitations on the type of motion that this model can represent. There exist other options for representing observed motion; in the most general form, we assume that an image ft can be generated from parameters describing the objects in the image at some template configuration, f̂, and parameters describing the transformation from this template configuration to the observed configuration, φt . The forms of f̂ and φt vary widely based on imaging modality and analysis task. The parameters f̂ may represent a noise-free version of the image, or individual representations of a set of objects that comprise the image. In imaging real-world objects, both ft and f̂ are generally represented by a dense sampling of intensity values, accompanied by an interpolation function. For 2D imaging of 3D objects, the representations may be projected 2D views (such as might be used for optic flow computation) or full 3D models (for instance, CT reconstruction from a set of projections). The parameters φt may represent a low-dimensional parametric transformation such as translation, rigid, or affine motion; a higher-dimensional parametric transformation such as a b-spline deformation; or it may be a discrete set of correspondences on the sampling grid of ft . In the composite deformation models considered here, the parameters φt will consist of multiple transformations defined over the image domain, and the representation of f̂ will be application dependent. 1.3 Challenges of Nonsmooth Motion Much of the anatomical motion of interest to clinicians is well modeled by smooth, invertible deformations, as supported by the wide use such models (e.g., [24, 22]). However, there are clinically relevant cases displaying motion that cannot be represented by smooth formulations. Here we will discuss how a smooth motion model is inadequate for our two motivating cases: fluoroscopic imaging, in which 2D projective X-ray imaging of 3D objects results in overlapping object motion; and temporal CT (4DCT) chest imaging, in which sliding motion is observed between the lower lung and thoracic cage. In fluoroscopic imaging, the independent motion of overlapping structures leads to pointwise object combinations that may not exist in other frames. These frame-to-frame topological changes cannot be accommodated by a diffeomorphic motion model. Figure 1.2 depicts a simplified example of this phenomenon. Fundamentally, the apparent 2D motion of projected 3D structures is poorly represented by a single smooth deformation. We propose approximating the true 3D motion by a series of 2D additive transparent layers. In contrast to the apparent nonsmoothness in fluoroscopic imaging, the nonsmooth motion 9 t0 t1 Figure 1.2. Cartoon fluoroscopic imaging diagram. Top and bottom rows show different timepoints, t0 and t1 . Separate motion of overlapping objects causes changes that cannot be represented via a single smooth deformation. Note how the motion of the lung overlaps with the static rib in the center column, and how the moving heart and lung appear in different configurations at different times in the right column. observed in 4DCT imaging is due to the true discontinuous motion of the anatomy. Under normal physiological motion, organs that are not directly attached may slide against each other, causing spatial discontinuities in the true coordinate transformation. Representation of this motion via a single diffeomorphic deformation necessarily smooths this discontinuity, causing unrealistic shearing of the adjacent structures. See Figure 1.3 for an example of discontinuous lung motion. In order to localize the effect of this shearing, smoothness constraints may be relaxed. However, this introduces a tradeoff between localizing the discontinuity and adequately regularizing the deformation. Alternatively, discontinuity-preserving regularization can accommodate sliding in the deformation, but does not guarantee invertibility. Recent work in this area explicitly represents the location of discontinuities, allowing discontinuous motion while guaranteeing invertibility. 21 + 20 0 2−1 (a) (b) - (c) Figure 1.3. During normal breathing motion, the lower lungs slide along the thoracic cage, which results in discontinuous motion visible between the lungs and ribs or spine. However, a single, smooth deformation cannot represent this discontinuous motion. The Jacobian determinant of such a smooth deformation is shown in (b). It can be seen that the deformation incorrectly extends beyond the lung boundary. The image difference between the estimated transformation and the true fixed image in (c) shows that the deformation causes nonphysical motion of the ribs and spine. 10 However, these methods rely on a precomputed motion segmentation. 1.4 Contributions Toward overcoming the challenges of nonsmooth motion modeling in medical image registration, this work makes the following contributions: 1. A general formulation of the additive layer model applied to motion estimation in fluoroscopic imaging, jointly estimating both template layers and deformations. Although previous work has formulated fluoroscopic motion as an additive layer model for specific constrained applications, the work presented in Chapters 2–4 represents the first general-purpose additive layer model for motion estimation in fluoroscopic imaging. Specific layer and motion models cover distinct use cases: (a) High-noise, translational motion: Chapter 3 provides a formulation of layer estimation as a convex optimization problem in an additive layer model, given known deformations and a convex layer prior. For translational motion, an efficient frequency domain solution is given. (b) Low-noise, nonlinear motion: Chapter 4 gives the construction of a novel layer prior for the additive layer model, to aid recovery of layers and deformations in the case of nonlinear physiological motion in fluoroscopic imaging 2. Formulation of a discrete estimation method allowing accurate registration of small features such as vessels or cell membranes undergoing a large, smooth deformation. A multiscale Markov random field formulation is developed in Chapter 5, improving state-of-the-art performance for discrete techniques tracking small features in microscopy imaging, and providing a parallelizable solution for large 2D registration tasks. 3. A framework for representing and estimating invertible, piecewise-smooth deformations. (a) Chapter 6 presents the first framework capable of representing and automatically estimating invertible and piecewise-smooth deformations from raw time-series CT data. The framework is used to accurately estimate the discontinuous motion in 4DCT chest imaging. (b) Chapter 7 presents an extension of this model to accommodate noninvertibility due to spatially localized regions lacking interframe correspondences. Preliminary results are presented in estimating slice-to-slice histology registration exhibiting tearing. 11 These contributions extend the types of motion that can be automatically estimated from medical imaging data. Although motivated by specific example data, these models are quite general and will, we hope, prove useful for a wider range of datasets. The organization of the rest of this document follows the structure of the contributions as outlined above. Chapter 2 introduces the additive layer model applied to fluoroscopic imaging. Chapter 3 describes methods for estimating deformations under the additive layer model with convex layer priors. The novel formulation eliminates the need for explicit layer estimation. For translating layers, an efficient frequency space solution exists. We give results for translating layers on synthetic data and imaged phantoms, and present modifications to allow application of the method to cloud removal from aerial video sequences of landscapes. Chapter 4 incorporates a nonlinear diffeomorphic deformation model into the additive layer model, investigates observed estimation artifacts and presents a helpful and novel layer regularization penalty and primal-dual optimization procedure. Chapter 5 presents a novel discrete registration technique for estimating smooth deformations, developed to overcome challenges encountered in temporal registration of contrast-enhanced coronary arteries. Results are given on registration of transmission electron microscopy images. Chapter 6 presents a framework for estimating piecewise-diffeomorphic deformations based on a novel discrete formulation of the global invertibility requirement. Results are shown on the DIR-lab 4DCT datasets. Chapter 7 adapts the piecewise diffeomorphic framework of Chapter 6 to represent topological changes in images as separation between diffeomorphic segments. Preliminary results are presented modeling the physical tearing of 2D histological sections, and Chapter 8 presents final thoughts and discussion. CHAPTER 2 ADDITIVE LAYER MODEL This chapter gives an overview of the additive layer model that will be used for motion estimation in Chapters 3 and 4. Since the development of this model is motivated by fluoroscopic imaging data, we start by giving a brief introduction to this modality, as well as motivating the need for accurate motion modeling. Next the formulation of the additive layer model will be presented, followed by a discussion of related work. 2.1 Fluoroscopic Imaging Fluoroscopic imaging is an important tool commonly used for diagnostic and interventional procedures. Fluoroscopy produces real-time video sequences showing X-ray attenuation through an imaged object. As seen in Figure 2.1, a radiation source (emitter) produces X-rays that travel through an object. Some of the X-ray photons interact with the matter in the imaged object, attenuating the X-ray intensity. A detector then converts the X-ray intensity into a visible image, either directly via an image intensifier that converts X-rays into visible light, or via a flat-panel detector that converts X-rays into electrical signals to be digitally recorded and displayed as image intensities. Accurate motion modeling for fluoroscopic imaging promises to improve a number of existing techniques. Most fundamentally, motion-compensated denoising would allow dramatic improvements in image quality without subjecting patients or healthcare workers to additional radiation dose. In addition, processing techniques such as roadmapping, used to guide catheterization, or digital subtraction angiography (DSA), used to remove background structures in angiography analysis, could be improved by removing motion-based artifacts. Because X-rays are a form of ionizing radiation, reducing patient exposure is a primary concern; however, dose is directly related to image quality. Balancing the tradeoff between image quality and radiation dose is a difficult decision, and there is an active pursuit of image processing techniques to improve image quality without increasing dose. The increased long-term cancer risk associated with increased ionizing radiation exposure is well documented [25], and at high doses severe skin injury may occur [26]. However, signal-to-noise ratio (SNR), a standard measure 13 Figure 2.1. Fluoroscopic imaging diagram. Left: diagram of a C-arm fluoroscope with emitter below and detector above. Right: image formation diagram, where measurement at a specific location is based upon the density of all internal structures along a ray between the emitter and an image location on the detector. of image quality, is proportional to the square root of the number of photons. Decreasing the number of photons therefore directly decreases image quality, and could lead to unfavorable clinical outcomes. Image denoising techniques offer an opportunity to increase image quality without increasing patient dose. However, denoising techniques rely on averaging image intensity over a set of pixels, and can therefore lose fidelity to the original signal. Intuitively, averaging data from different physical locations can reduce noise, but is accurate only to the extent that the differing locations are interchangeable. For instance, spatial or temporal averaging increases SNR at the expense of spatial or temporal resolution, respectively. Techniques such as bilateral filtering or nonlocal means reduce blurring effects, but can remove low-contrast or unique image features. In contrast, temporal averaging of a static object has no side effects – averaging occurs only with data from the same physical location. Motion-compensated temporal averaging improves the SNR without introducing additional undesired effects because, as with temporal averaging of a static object, only data representing the same physical locations are averaged. Another possibility for dose reduction is reducing the framerate in pulsed image sequences, where photon emission is gated to reduce motion artifacts at a specific framerate. However, such 14 low-framerate sequences destroy the illusion of continuous motion, and can be quite distracting to clinicians. Accurate motion estimation can also be used to improve such low-framerate imaging by allowing frame interpolation or extrapolation, filling in the unacquired frames. In addition to dose-reduction applications for motion modeling, there are current visual enhancements for fluoroscopic imaging that suffer from motion artifacts; the use of an accurate motion model would be able to remove such artifacts. Roadmapping and DSA use static images to provide enhanced views of dynamic fluoroscopic sequences. In roadmapping, a contrast-enhanced image of a vessel tree is acquired, and overlayed on later unenhanced data to aid in catheter guidance. DSA is a technique for analyzing the vascular structure for diagnosis as well as interventional procedures [27]. In DSA, a radiographic contrast agent is injected into a blood vessel, and then a pre-contrast frame (or mask) is subtracted from all the subsequent frames. Ideally, the resulting subtraction would only show the intensity change due to the injected contrast. In practice, normal physiological motion due to breathing and heartbeat as well as other patient motion introduces artifacts in both techniques, and so motion estimation between the static and current frames is needed to suppress these artifacts. As we have seen, accurate motion estimation has many important applications in fluoroscopic imaging. However, unlike standard 3D image-based deformation estimation problems in which each point in the reference image is assumed to map to a single point in the target image, fluoroscopic imaging techniques generate a projection of a 3D object onto a 2D imaging plane. This projection results in a motion estimation problem in which smooth motion of the true 3D object projects on to a nonsmooth motion in the acquired image, and the intensity of each pixel at one timepoint contributes to the intensity of multiple dispersed locations in another timepoint. Next we introduce an additive layer model to approximate the overlapping projected motion in fluoroscopic imaging with a series of 2D layers and corresponding deformations. 2.2 Additive Layer Model A true motion model for projective medical imaging would reconstruct the 3D anatomy and a 3D transformation associated with each frame of the imaging sequence. Reconstruction of the (assumedly) static 3D anatomy from projective X-ray images is exactly how 3D CT scans are generated [28]. Reconstructions directly from fluoroscopic sequences [29] have also been performed, and the joint motion and image estimation has also been studied [22] to account for patient motion during acquisition. However, these techniques all assume a set of images taken from differing viewpoints covering a large angular range around the imaged object. Failing to have a large enough angular range is known as the limited angle problem, which results in reconstruction 15 artifacts [30]. Standard fluoroscopic imaging represents the most severe limited angle case, in which we have only a single viewpoint. Although reconstruction from limited angle data has been studied, no approach exists to handle the single-viewpoint case. A large body of work considers the registration of a known 3D model of a patient with a real-time 2D projection image, allowing a high-quality scan acquired prior to an interventional procedure to be aligned with the patient anatomy during the procedure [31]. Reconstruction of 3D scenes based on object motion has been studied in video with opaque objects as structure from motion, which is still a difficult and open area of research. Alternatively, 3D scene structure can be inferred from 2D image sequences if a strong scene model is used, enforcing strong geometric constraints such as an affine transformation or adherence to known scene geometry [13, 14]. However, this work builds a model of general nonlinear deformations for objects of unknown geometry. As we are primarily interested in estimating realistic deformations and frames as seen from the same viewpoint as the original series, we will model a time series of fluoroscopic imaging frames as a number of superimposed layers each undergoing a smooth deformation. For an anatomy comP posed of multiple objects, we will model the X-ray intensity at the detector as β exp(− i µi di ), where µi is the attenuation coefficient, di is the thickness of the ith object along the ray from source to detector, and β is a constant representing the maximum transmission value. Assuming a log-transformed image (although in practice clinical data may have some approximation applied), this formulation results in a model in which the contribution of individual objects to the overall image is an additive (or subtractive) process. The proposed model constructs an image from a set of layers, where layers group anatomical structures undergoing similar transformations. We subtract the contribution of each layer from M = log(β), a maximum observed image intensity, which produces layers where zero values are be interpreted as the absence of an object. To make the model more concrete, assume we are given a series of frames acquired at evenly spaced time intervals, and that we wish to represent this series with K layers. The frames will be indexed by time, where the first frame occurs at time t0 and frame T occurs at time tT , giving a total of T + 1 frames. These input frames will be labeled f0 . . . fT . Each frame is a function ft : Ω → R, where Ω is the image domain. We define a general model for additive transparent motion where an image f at time t is created as the sum of K layers, {`k } = `k for k ∈ [1 . . . K], each independently transformed at time T by a deformation φkt,0 . The model for frame t is then ft = M − K X `kt (x) + η, (2.1) k=1 where `kt = `k ◦ φkt,0 , x ∈ Ω, and where η ∼ N (0, σ 2 ) represents the corruption of the image measurement at location x by i.i.d. additive zero-mean Gaussian noise. Although this is not 16 technically correct for the log-transformed photon count model, we assume the distribution is sufficiently close to a Gaussian model. Although the physical model constructs subtractive layers, for ease of interpretation we will sometimes consider positive additive layers, ft (x) = K X `kt (x) + η. (2.2) k=1 The definition of what constitutes a layer depends on the motion model and the interaction of imaged objects. For instance, multiple objects undergoing similar motion can be grouped into a single layer (for instance, all rigid bones showing no movement), whereas a single object may be split into multiple layers if a single deformation cannot describe its motion (for instance, a vessel tree in which overlapping branches show differing motion). Figure 2.2 gives a conceptual diagram of layer separation, where objects undergoing coherent, smooth motion can be represented by a single layer. In this case, the static ribs, slowly moving lungs, and rapidly beating heart are separated into their own layers. Figure 2.3 diagrams the frame generation model (Equation (2.2)), showing the relationship between layers and frames. 2.3 Related Work Layered motion models have appeared in both the computer vision and medical imaging literature, modeling the motion of both opaque and transparent objects. Here we review the development Figure 2.2. Simple conceptual layer separation diagram. 17 tT−1 tT fT−1 fT φ1 `1T−1 `1T `21 φ2 `2T−1 `2T `K 1 φK `K T−1 `K T t0 t1 f0 f1 `10 `11 `20 `K 0 time Figure 2.3. The layered motion model. Template layers are shown in the left column, and input frames are shown along the top row. Dashed boxes indicate deformed versions of the template layers. of these models, and the work most related to the proposed model. Although there has been extensive work on layer-based representations of 3D scenes in the computer vision community, the vast majority have assumed opaque objects, segmenting the image domain into regions undergoing similar transformations and estimating a set of deformations and unique pixel assignments (e.g., [15, 32, 33, 16, 17, 34], etc.). Brox et al. [35] incorporate correspondences into a dense optic flow model, allowing improved estimation of small objects undergoing significant motion independent of the larger objects in the scene. In optical imaging of natural scenes, the additive layer model has been used to describe the motion of scenes containing semitransparent objects such as reflections on transparent glass or water [36], scenery obscured by clouds or fog [37], and opaque but only partially occluding structures such as leaves or fences. Early work modified the opaque motion model to allow overlapping motion segmentation regions, giving the possibility of multiple motions at each image location [38, 39]. For modeling transparency or reflections, a two-layer model is generally suffi- 18 cient, and additional constraints such as linear or repetitive motion, or with pairs of images obtained with different compositing functions, have been used to create layer estimates [36, 40, 41, 42]. Estimation of multiple motions at each point has been studied via the double optical flow constraint proposed in [43, 44], which estimates two motion vectors at each point. Work based on this and similar formulations [45] attempts to regularize these fields into consistent motion models or increase the number of motions captured at each point [46, 47, 48, 49, 50, 51]. These formulations do not attempt layer extraction (estimation of the layer intensities) and cannot be used to generate interpolated or motion compensated frames. A log-additive model [52] has been used to decompose images into separate reflectance and illumination layers, exploiting sparsity in the gradient domain. Work specifically limited to translating layers includes [53, 37, 54], which detect multiple translations by searching for characteristic features in a frequency-transformed spacetime volume. This technique requires a large number of frames showing constant translation, and relies on feature-space clustering techniques, which can be unreliable, especially in the presence of significant noise. Be’ery and Yeredor [55] propose a frequency-domain blind source separation algorithm for semitransparent translating layers to recover both translation parameters and a set of mixing coefficients, but requires a good initialization of translation values via a correlation of edge-enhanced mixtures. Recent work on additive reflections [56, 57, 58, 59, 60] has focused on edge-based feature alignment since, unlike intensity values, which are nearly always modified by overlapping additive layers, sparse object edges remain unchanged at most locations under additive layer composition. In applications to X-ray imaging, Close et al. [61] propose an ad hoc hierarchical algorithm for layer extraction in analysis of angiographic stenosis. This formulation assumes linear transformation of layers, and the sequential layer estimation and removal is a greedy technique that cannot recover from errors in previous layer estimates. Zhang et al. [62] propose a similar hierarchical model based on assumptions regarding imaged vessel motion. Chen et al. [63] propose a method for two-layer extraction with one layer being static and the availability of dual-energy X-rays, which is a far more constrained situation than we wish to model. Auvray et al. [64, 65] attempt motion compensation for denoising of fluoroscopic sequences using the three-frame motion constraint of [47]. Although this formulation is derived in terms of translational motion, it is used to solve for nonlinear motion. Also, although multiple motion layers are estimated, only two motions may occur at any point. Further, only motions and regions of influence are modeled, not the actual layer intensities. These restrictions greatly constrain the type of prediction or compensation for which this technique can be used. 19 In [66] the authors use externally measured information regarding breathing motion to help determine layer motions in fluoroscopic imaging. Other work investigates layer estimation assuming known motions [40, 67, 68]. In Chapters 3 and 4, we will further develop our layered deformation model, and apply it to specific use cases. Chapter 3 focuses on a model for estimating translational motion in high-noise fluoroscopic imaging, and Chapter 4 develops nonlinear motion models for additive layers. CHAPTER 3 ADDITIVE LAYER MODEL WITH CONVEX LAYER PRIORS Due to concerns regarding radiation dose to patients and clinicians, fluoroscopic imaging is often acquired at very low dose. The low dose levels result in images with very low SNR; to the untrained viewer, some catheterization sequences can look like random noise. In order to improve image quality in the presence of noise, a standard approach is to introduce a smoothness prior into a generative image model, encoding our belief that the true noise-free images is smooth in most locations, and encouraging the estimation of such an image. This chapter presents a prior model and estimation techniques for the additive layer model aimed at recovering relatively simple motion from extremely noisy fluoroscopic imaging sequences. In the additive layer model, the joint estimation of layers and deformations yields a very highdimensional and nonconvex optimization problem. However, we show that conditional on the motion, and assuming a convex prior on the layers, the layer estimation is a convex problem. Given true layers, however, motion estimation is still a challenging nonconvex optimization. It is not surprising then that a large body of previous work on this problem exploits formulations allowing deformation estimation without explicit estimation of the layers (e.g., [69, 58]). Such techniques fundamentally simplify the estimation process, but also remove the ability to add layer constraints, which can improve deformation estimates. This chapter presents a novel Bayesian formulation for deformation estimation under an additive layer model, including an explicit layer prior. We show that for a class of priors, a closed-form solution for the estimation of the layers conditioned on the deformation exist. Using the solution we eliminate explicit layer estimation while still enforcing spatial regularity of the now implicitlydefined layers. The role of the spatial prior on layers during deformation estimation is to regularize the deformation objective function, removing ambiguity due to noise or limited data. Once the correct deformation is found, estimation of the layers becomes a much simpler task, and priors respecting the task-specific visual or analytical requirements (e.g., edge or texture preservation, noise or artifact removal) can be imposed. 21 Although our formulation assumes only that layer motion is represented as a vector field, for the special case of translational motion an especially efficient frequency-domain motion estimation algorithm is presented. We apply this estimation technique to high-noise fluoroscopic imaging, and show that we can robustly estimate the motion of thin objects, such as small wires. The problem of separating moving clouds from underlying landscape in aerial video is also examined, and a formulation of this problem as an additive layer model under certain assumptions is given. The effects of these assumptions (and their violations) are examined, and we present an algorithm for improved layer reconstruction given the recovered layer transformations. 3.1 Motion Estimation Model Given T frames, {ft } = ft for t ∈ [1 . . . T ], we wish to recover the deformations, φ = φkt,0 for k ∈ [1 . . . K] and t ∈ [1 . . . T ]. We propose two alternative estimation methods, whose connection we will explore. First is a ML-MAP estimate argmax max p({ft }, {`k } \| φ), φ {`k } and second is a Max Marginal Likelihood (MML) estimate Z argmax p({ft } \| φ) = argmax p({ft }, {`k } \| φ) d{`k }, φ (3.1) (3.2) φ where the joint probability has been marginalized over the layers. In both cases, given an appropriate prior on {`k }, the inner operations over {`k } (the maximization in ML-MAP and the integral in MML) can be computed in closed-form, leading to an optimization only over the deformations. In the case where the deformation of each layer is a translation with temporally constant velocity, we give an efficient optimization strategy for recovering these velocities. 3.1.1 Layer Priors The form of the layer prior we choose influences how a frame is separated into layers. Smoothness is a common assumption in image estimation, allowing noise to be separated from signal. We will assume that layer prior encourages smoothness via a penalty on the gradient magnitude of the R layer image Ω p (\|∇f(x)\|). Object boundaries, or edges, create large gradients in image intensity. If p is a convex function, it will attempt to spread edges between layers, while common nonconvex functions will attempt to consolidate each edge into a single layer. For instance, consider the separation of a 1D edge into two layers. Without any motion information, two equally likely separations are shown in Figure 3.1. Because the alignment of edges in separate layers is unlikely, we prefer the separation on the lower right. The layer smoothness penalty we choose will influence 22 intensity δx frame δI location δx layer 2 intensity intensity layer 1 δx δI δI 2 δI 2 Figure 3.1. A 1D signal with an edge is shown in the top row. Without additional information, separation into layers is underdetermined. Two possible layer separations are shown, one that splits the edge across layers, and one that leaves the edge in one layer while creating an ‘empty’ layer. which separation our optimization will prefer. If we consider a penalty R d dx f(x) p where f(x), f : R → R, is the 1D signal, and p is the penalty function. For the function with linear step as shown in Figure 3.1, the left split results in a penalty of 2δx p (c), whereas the right split results in a penalty of δx p (2c), where c = δI 2δx . A convex penalty p such as the squared penalty in Figure 3.2 will prefer splitting the edge between layers, as p (2c) = 4c2 ≥ 2c2 = 2p (c). The magnitude penalty is agnostic as to the split, as p (2c) = \|2c\| = 2\|c\| = 2p (c). Finally, a nonconvex penalty such as the square root penalty in √ √ Figure 3.2 will prefer to keep the edge in a single layer, since p (2c) = 2c ≤ 2 c = 2p (c). Even though nonconvex penalties have desirable properties in this respect, here we will focus on convex penalties, because the theoretical benefits are generally outweighed by the practical difficulties in optimizing such functions. We will explore both the squared gradient (H1 ) and the gradient magnitude (total variation) layer smoothness penalties. (2x) \|2x\| 2 √ √ 2x x \|x\| x2 0 x 2x 0 x 2x 0 x 2x Figure 3.2. Examples of possible penalties on layer gradient magnitudes. The squared penalty is strictly convex and differentiable, the magnitude penalty is convex but nondifferentiable, and the square root penalty is nonconvex. 23 3.1.2 Linear Model of Layer Deformations Our goal is to find efficient and robust layer estimation methods given deformations and convex layer priors. We start by showing that frame generation (Equation (2.2)) can be written as a linear system in terms of the layers; in the next section we will show how to incorporate specific layer priors in this linear model. We first define a discretized version of frame ft or layer `k as f t or `k , respectively, each a P ×1 column vector of pixel values at pixel locations {xp }Pp=1 . The deformation φkt,0 can also be represented by samples at these locations, creating a 2P ×1 vector φkt . Along with an interpolation strategy (e.g., bilinear interpolation), this can be used to define the P × P matrix Φkt such that Φkt `k is the discrete representation of `k (φkt,0 (x)). The discrete version of the layer generation equation (2.2) can then be written as ft = K X Φkt `k + η (3.3) k=1 = Φ∗t ¯` + η, (3.4) 1 > 2 > K > > ¯ where Φ∗t is the P×KP matrix [Φ1t Φ2t · · · ΦK t ], ` is the KP×1 column vector [(` ) (` ) · · · (` ) ] , and η is a P × 1 vector of i.i.d. noise. With additive Gaussian noise, η ∼ N (0, σ 2 I) (where y ∼ N (µ, Σ) implies D 1 1 p(y \| µ, Σ) = (2π)− 2 \|Σ\|− 2 e− 2 (y−µ) > Σ−1 (y−µ) for D-dimensional vector y), f t is a multivariate normally distributed random variable, f t ∼ N (Φ∗t ¯`, σ 2 I). (3.5) Given multiple frames {f t }Tt=1 the frame generation equations are combined as: f̄ = Φ̄¯` + η̄ z h i h i }\| Φ00 Φ10 ···   h i h i   Φ0 Φ11 ···   1 = .. ..   .. .   . .   h i h i Φ0T Φ1T ··· { η̄ ¯ ` Φ̄ f̄ z h}\|i  f0  h i  f  1   ..  .  h i fT (3.6) z i {  h ΦK−1 h 0 i ΦK−1 1 .. . h i ΦK−1 T        z }\| { .. .    .    .    .   +   ..   .   h i     K−1 . .. ` h}\|i `0 h i `1 .. . { > > > where f̄ is the T P ×1 column vector [f > 1 f 2 · · · f T ] , Φ̄ is the T P ×KP matrix Φ̄ = [(Φ∗1 )> (Φ∗2 )> · · · (Φ∗T )> ]> , (3.7) 24 and η̄ is a T P ×1 vector, η̄ ∼ N (0, σ 2 I). The full data is then a multivariate normally distributed random variable, f̄ ∼ N (Φ̄¯`, σ 2 I). (3.8) For T ≥ K, and given the deformations φ, the maximum likelihood estimate is given (via minimizing the negative log likelihood) by a linear least squares solution ˆ ¯ ` = argmax p(f̄ \| ¯`, Φ̄) = argmin kΦ̄¯` − f̄ k2 . ¯ ` (3.9) ¯ ` However, it is worth noting that this system is not full-rank due to the ambiguity of a constant offset between layers, e.g., in the two-layer case {`1 + C, `2 − C} is equivalent to {`1 , `2 } in terms of the objective function for any constant C. This ambiguity is overcome by placing an appropriate prior on ¯`, most simply by penalizing the squared magnitude of the entries of ¯`, giving ¯` ∼ N (0, (2 I)−1 ), (3.10) where controls the strength of the regularization. This formulation results in a MAP estimate (conditioned on the deformations) equivalent to a regularized least squares estimate p(f̄ \| ¯`, Φ̄)p(¯`) p(f̄ \| Φ̄) ¯ ` = argmin − ln p(f̄ \| ¯`, Φ̄) − ln p(¯`) argmax p(¯` \| f̄ , Φ̄) = argmax ¯ ` (3.11) (3.12) ¯ ` = argmin ¯ ` 3.1.3 1 kΦ̄¯` − f̄ k2 + k¯`k2 . 2σ 2 (3.13) H1 Layer Prior Although Equation (3.13) produces a solvable system, the regularization added does not enforce smoothness on the layer estimates ¯`. Especially in high-noise cases such as low-dose fluoroscopy, smoothness priors are necessary for accurate image estimation. In this section we will use an H1 P k 2 prior that penalizes the squared gradient of each layer, K k=1 k∇` k . Although this penalty is known to produce images with blurred edges, we are not using it to recover accurate layer images, but to aid in recovering deformations by reducing the probability of matching noise or ambiguous local solutions. In Section 3.3 a formulation for layer estimation with a total-variation-based prior will be given, which can be used to estimate more accurate layer images once the deformations have been recovered. > We write the discrete version of the H1 penalty on `k as kG`k k2 = `k H`k where G is the 2P × P matrix representing the discrete 2D gradient operation, and H = G> G. This 25 discrete formulation can be written as a Gaussian Markov Random Field (GMRF) prior on `k , `k ∼ N (0, (2λH)−1 ) or, written on all layers in ¯`, ¯` ∼ N (0, (2λH̄)−1 ) where H̄ is the KP ×KP block-diagonal matrix with K copies of H as the blocks along the diagonal, and λ controls the strength of the smoothness prior. The derivative operation H̄ has a null space consisting of constant functions, which is the same ambiguity as the original layer estimation problem. We therefore include the regularization penalty (3.10), giving the final (positive definite) GMRF prior on ¯`, ¯` ∼ N (0, (2λH̄ + 2 I)−1 ). (3.14) We can now write the joint probability p(f̄ , ¯` \| Φ̄) in terms of the data likelihood p(f̄ \| ¯`, Φ̄) and the prior p(¯`), p(f̄ ,¯` \| Φ̄) = p(f̄ \| ¯`, Φ̄)p(¯`) 1 2 ¯ ¯> ¯ = Cσ e− 2σ2 kΦ̄`−f̄ k Cλ, e−` (λH̄+ I)` h > ` − ¯ = Cσ Cλ, e where Cσ = (2πσ 2 )− TP 2 > 1 Φ̄ Φ̄+λH̄+ I 2σ 2 and Cλ, = π − KP 2 ¯ `−2 > 1 ¯> > ` Φ̄ f̄ + 12 f̄ f̄ 2σ 2 2σ i , (3.15) 1 \|λH̄ + I\| 2 . Completing the square (see Section A.1 for details), Equation (3.15) can be written as a scaled Gaussian in ¯`, 1 ¯ > ¯ p(f̄ , ¯` \| Φ̄) = Cσ Cλ, SΦ e− 2 (`−µΦ ) ΛΦ (`−µΦ ) , (3.16) where 1 > Φ̄ Φ̄ + λH̄ + I ΛΦ = 2σ 2 1 > µΦ = 2 Λ−1 Φ̄ f̄ 2σ Φ 1 > SΦ = eµΦ ΛΦ µΦ − 2σ2 f̄ > f̄ (3.17) (3.18) . (3.19) The subscript Φ on these variables is a reminder of their dependence on the deformations. Note that the inner maximization of ¯` in Equation (3.1) occurs when ¯` = µ , which is the same as the Φ MAP estimate of ¯` given the deformations (hence the name ML-MAP). We can therefore write Equation (3.1) as max p(f̄ , ¯` \| Φ̄) = Cσ Cλ, SΦ , or minimizing over the negative log ¯ ` argmax max p(f̄ , ¯` \| Φ̄) = argmin − µ> Φ ΛΦ µΦ φ ¯ ` (3.20) φ = argmin − φ 1 > > f̄ Φ̄Λ−1 Φ Φ̄ f̄ . 4σ 4 (3.21) However, since p(f̄ , ¯` \| Φ̄) is written as a scaled Gaussian in ¯`, we can also compute the marginalR ization over ¯` in closed form, p(f̄ , ¯` \| Φ̄) d¯` = Cσ Cλ, SΦ C −1 , where CΦ is the normalizing Φ constant for the exponential term in (3.16), CΦ = π − KP 2 1 \|ΛΦ \| 2 . (3.22) 26 meaning we can rewrite the MML equation (3.2) (minimizing over the negative log) as argmax p(f̄ \| Φ̄) = argmin φ φ 1 1 > > ln \|ΛΦ \| − 4 f̄ Φ̄Λ−1 Φ Φ̄ f̄ . 2 4σ Note that this differs from the ML-MAP objective only by the term 1 2 (3.23) ln \|ΛΦ \|, which does not depend on the data f̄ . For deformations φ parameterized by θ (in the most direct case θ is simply the entries of all vectors φkt ), we can differentiate the objective function (3.21) w.r.t. θ, −1 > ∂ 1 > ∂θ (− 4σ 4 f̄ Φ̄ΛΦ Φ̄ f̄ ) where ∂ΛΦ ∂θ = > 1 ∂ Φ̄ ( 2σ 2 ∂θ = − 4σ1 4 f̄ > ∂ Φ̄ ∂θ ). Φ̄ + Φ̄ ∂ 1 ∂θ 2 > ∂ Φ̄ −1 > ∂θ ΛΦ Φ̄ > −1 ∂ΛΦ −1 > ∂ Φ̄ + Φ̄Λ−1 Φ ∂θ − Φ̄ΛΦ ∂θ ΛΦ Φ̄ + f̄ , For the objective in (3.23) we have one extra term, ln \|ΛΦ \| = 1 2 ∂ΛΦ Tr(Λ−1 Φ ∂θ ), (3.24) where Tr(·) is the trace operator. 3.2 Optimization for Translational Motion Equations (3.21) and (3.23) require computing Λ−1 Φ , and optimizing (3.21) or (3.23) also requires computing ∂ Φ̄ ∂θ . While the previous derivation holds for any parametric or nonparametric motion model, here we assume that each φkt,0 (x) is a translation with constant temporal velocity. k This translational motion is parameterized as φkt,0 (x) = x + tt vk , so θ = {vk }K k=1 . If Φt is defined via an interpolation strategy that can be represented as a convolution, then each matrix Φkt is diagonalizable by a 2D discrete Fourier transform (DFT). The frequency domain representation d d > b̄ > b̄ is then composed of diagonal blocks, as is Φ̄ of Φ̄ (denoted Φ) Φ (we write Φ̄ as a reminder that d > b̄ > ). Because the precision matrix of the prior, 2λH̄+2 I, is also diagonal in the frequency Φ̄ 6= Φ cΦ can be rearranged (via row reordering) as a block diagonal matrix composed of P domain, Λ blocks, each of size K ×K, making inversion an essentially linear operation for small K. In order to compute (3.23), the log determinant ln \|ΛΦ \| must also be computed. This is in general a computationally (and numerically) challenging task, but for translations, as in the cΦ can be converted to a block-diagonal matrix in the Fourier domain, whose previous paragraph, Λ determinant is easily calculated as the product of the determinant of its blocks. Each P×P block of ΛΦ is block circulant with circulant blocks (BCCB), and so is diagonalizable by the unitary DFT cΦ as matrix F . We can therefore generate Λ F̄ ΛΦ F̄ −1 cΦ , =Λ (3.25) where F̄ is block diagonal with blocks F . F̄ is also unitary, so we have \|F̄ ΛΦ F̄ −1 cΦ \|. \| = \|ΛΦ \| = \|Λ (3.26) 27 In order to optimize over the velocities, we define each Φkt using sinc interpolation, which ck ) directly via the allows us to write the frequency-domain operation (which we will denote Φ t ck is diagonal with entries e−2πjvtk · ωs where ω is the 2D frequency Fourier shift theorem. That is, Φ t index, s is the 2D grid size, and ω s represents coordinate-wise division. Differentiation w.r.t. vtk is then quite simple in the frequency domain, and we use an adaptive-step gradient descent method to recover the velocities. 3.3 Although the H1 TV Layer Prior prior on layers helps to accurately estimate deformations, it penalizes sharp edges and is therefore not ideal for layer estimation where we would like to preserve object boundaries. To preserve edges, we can add a final step of re-estimating the layer images, given the final deformations, using a total variation based regularization. Once the deformations have been estimated, the layer estimation with a TV prior is a convex (though nonlinear) optimization problem. In particular, the primal-dual objective [70] can be written as argmin argmax ¯ ` p̄∈S̄ 1 > kΦ̄¯` − f̄ k2 + λp̄> G¯` + ¯` ¯` 2σ 2 (3.27) where p̄ = [(p1 )> (p2 )> · · · (pK )> ]> . Each pk is a 2P ×1 vector representing a 2D vector field, and S̄ is defined such that each pk ∈ S, requiring that the vector field represented by pk be smooth and have pointwise norm ≤ 1. We then use the iterative primal-dual method of [71], and note that at the expense of another Fourier matrix inversion (in the case of translational motion) we can compute the duality gap, giving a bound on the energy difference between the current solution and the optimum. 3.4 Transparent Cloud Model In addition to modeling fluoroscopic imaging, the additive layer model can be adapted to separate transparent clouds from aerial video. The formation of an image f from a semitransparent foreground image c and an opaque background image β can be modeled via the standard alpha compositing equation f(x) = β(x)(1 − α(x)) + c(x)α(x), (3.28) where α is the nonuniform opacity of the foreground image, 0 ≤ α(x) ≤ 1. In the case of clouds or fog as the foreground image, c(x) can be simplified to a spatial constant c, and Equation (3.28) can be simplified to f(x) = β(x)(1 − α(x)) + cα(x). (3.29) 28 If the frame at time t is generated by deforming the foreground and background layers via transformations φαt,0 and φβt,0 , respectively, then the frame generation model is ft (x) = β(φβt,0 (x))(1 − α(φαt,0 (x))) + cα(φαt,0 (x)). (3.30) If the fully-opaque foreground (cloud) color has greater magnitude than any intensity in the input frames (i.e., c > f(x) ∀x), and we know or can estimate the color c, then we can write a transformed version of the frames as a sum of transformed versions of the layers f̃(x) = β̃(φβt,0 (x)) + α̃(φαt,0 (x)), (3.31) where f̃(x) = ln (c − ft (x)), β̃(x) = ln (c − β(x)), and α̃(x) = ln(1 − α(x)). Although clouds are generally white when viewed from the air, the assumption that this value is the brightest in the scene can be violated by reflective man-made structures on the ground, and is slightly violated (c = max f(x)) if there exist any fully-opaque clouds in the scene. For small x violations of this assumption, however, we can perform the optimization using an incorrect cloud color set to c = max f(x) + ε for small ε. Motion estimation is possible in this framework, but x the estimated layers contain visible artifacts. To correct this, we re-estimate the layers given the estimated motions using the original model (3.30), where we iteratively solve for β, α, and c while holding the other parameters fixed. We note that the optimal solution for the alpha blending model is ambiguous under certain scalings and shifts. To see this, we can define new background and opacity layers as scaled and shifted versions of the original layers, α̂(x) = Sα α(x) + Cα (3.32) β̂(x) = Sβ β(x) + Cβ , (3.33) as well as a new cloud color ĉ. Plugging these into (3.29) and collecting terms we have f = Sβ (1 − Cα )β(x) − Sβ Sα β(x)α(x) + Sα (ĉ − Cβ )α(x) + Cβ − Cβ Cα + ĉCα . (3.34) If we require equivalency with (3.29) by requiring that the coefficients of each term match, we get a set of constraints Sβ (1 − Cα ) = 1 (3.35) Sβ Sα = 1 (3.36) Sα (ĉ − Cβ ) = c (3.37) Cβ − Cβ Cα + ĉCα = 0 (3.38) 29 This gives us four constraints on five unknowns (Sα , Cα , Sβ , Cβ , and ĉ). Note that we can use these to solve for ĉ by solving (3.35) for Sβ and (3.37) for Sα , and plugging these substitutions into (3.36), giving c = ĉ − (Cβ + ĉCα − Cα Cβ ). (3.39) The terms in the parenthesis equal zero via the constraint of Equation (3.38), so we have ĉ = c and three independent constraints on the remaining four unknowns, giving us a 1D solution space. For our purposes, however, we will consider it acceptable to recover background and opacity up to a difference in scaling and offset. 3.5 Results on Translating Layers Here we present results of layer and motion estimation for scenes with translating layers. Synthetic data allows us to test a three-layer model under high noise, a phantom imaged using a clinical fluoroscope allows more realistic data with near-translational layer motion, and cloud removal in aerial photography validates the proposed log-additive model for alpha-blended layers. 3.5.1 Synthetic Data We demonstrate the robustness of this algorithm to noise on random synthetic images showing thin linear structures. Figure 3.3 shows a dataset consisting of three translating layers, and the recovered layers using a sequence of five input frames. The true layers have range [0, 1], and Gaussian noise with σ = 0.5 is added to the frames. In order to investigate the effect of motion estimation with the H1 prior as opposed to the TV prior, we have sampled the ML-MAP objective at discrete locations using both the H1 and TV prior on layers. We use a two-layer version of the dataset shown Figure 3.3, with σ = 2.0 and λ chosen to best recover the layer images. Because we are looking for two 2D velocities, the search space is four dimensional. We sample a 2D slice containing the assumed global optimum (i.e., the location representing the true velocities used generate the dataset) at units of one pixel per frame (since we are searching for velocities defined in pixels/frame). Figure 3.4 shows these results. The H1 energy values were calculated directly as described in Section 3.2. For the TV prior, the primal-dual algorithm was used as described in Section 3.3, iterating to convergence to obtain the energy at each sampled velocity. The overall shape of the objective seems to be the same, both attaining a minimum at the location of the true velocities. The H1 objective is overall smoother, likely aiding optimization. 30 Figure 3.3. Three-layer synthetic dataset, σ = 0.5. The top row shows three of the five input frames, the middle row shows the true layers, and the bottom row shows the recovered layers using the H1 prior. 3.5.2 Fluoroscopic Phantom We show results on an phantom dataset imaged with a clinical C-arm fluoroscope. The sequence shows a contrast-filled pump and variously-sized wires moving over an anatomical phantom, and was developed to assess object visibility under low-dose techniques. Low-dose imaging is used reduce exposure of patients and medical staff to ionizing radiation during lengthy procedures such as cardiac catheterization. Motion tracking in this situation is further complicated by the fact that the object of interest is a small intravascular catheter guide wire. For all tests, frame intensity values were normalized to the range [0, 1] and the regularization 31 H1 TV Figure 3.4. 2D slice of 4D (two layer) ml-map function, sampled discretely on an integer grid, using an H1 layer prior (left) and TV prior (right). constant was set to = 0.01. For optimization purposes σ and λ are not independent, but control tradeoff between the data and smoothness terms. We therefore fix σ = 1 and set λ empirically. Input frames are size 1024×1024. Motion was estimated from a 5-frame sequence, followed by layer reconstruction using the same five frames and both the motion estimation (H1 ) prior and the TV prior. Translation estimation used a multiscale implementation. Figure 3.5 shows the estimated layers and a reconstructed frame using the 5-frame sequence. Figure 3.6 shows details of the original data, as well as denoised versions from layers reconstructed using the H1 and the TV prior—note the blurring evident in the H1 -regularized image. Translational motion provides a highly constrained formulation, in which it is conceivable that the number of pixels in the image make estimation possible without any further spatial regularity. To demonstrate the importance of the prior in the optimization process, we ran 100 optimizations of the imaged phantom dataset with velocity initializations taken from a uniform distribution on [−8, 8]×[−8, 8] (in pixels/frame), both with (λ = 20.0) and without (λ = 0) a spatial prior. The results are shown in Figure 3.7. In addition, it is possible that the optimization is simply latching on to the strong edges of the contrast-filled pump. We therefore estimated motion using the cropped section of data shown in Figure 3.8(a), with estimated layers shown in (b) and (c). 3.5.3 Cloud Removal We present results of cloud removal on a short video sequence shot on a phone camera from the window of an airplane. The sequence shows the landscape moving relative to the observer, 32 (a) (b) (c) Figure 3.5. Top row: three consecutive frames of input data. Bottom row: Images (a) and (b) show the estimated layer images, while (c) shows the reconstructed (denoised) frame (all using TV prior for layer estimation). and the clouds moving relative to the landscape. Motion estimation was performed on ten frames of a grayscale version of the data via the additive layer model on the log-transformed frames (see Equation (3.31)). The cloud color was not assumed known, and was set to c = max f(x) + x 0.01. Although motion estimation was possible under this model, layer separation shows significant artifacts (see Figure 3.9) caused by the incorrect cloud color. An initial transparent cloud model for color images represented β as a three-channel field, c as a three-entry vector, and α as a scalar field applied independently to each channel of β and c. However, results using this model showed hue artifacts that corresponded to cloud location (see Figure 3.10). It is possible that the effect of clouds on different wavelengths of light is not constant, violating the single-channel alpha model. With layer separation (including alpha estimation) run on each channel independently, the estimated colors are much more constant across the frame (Figure 3.11). 33 (a) (b) (c) Figure 3.6. Original frame (a), H1 denoised frame (b), and TV denoised frame (c). 3.6 Conclusion The use of a low-dimensional parameterization for the motion model and a convex prior on the layer images has allowed layer and motion estimation in very high-noise cases. The H1 prior helps to smooth the deformation objective function, and can be seen as a strictly convex surrogate for the TV layer prior. We note that while translational motion is a restrictive model, it occurs naturally in cases such cloud motion, liver motion due to breathing, and motion of interventional tools over a short time period. Further, computational benefits of translation estimation mean that while the current implementations are not real time, an optimized real-time implementation is conceivable. Future work includes both optimizing for more complex motion models in the general formulation, and extending the translational algorithm to relax the assumption of frame-wide translations. 34 Figure 3.7. Velocities after optimization for 100 random initializations. With prior, all solutions agree to < 0.01 pixel, without prior 45% have error > 1.0 pixel. Original H1 Layer TV Layer Figure 3.8. Results run on cropped fluoroscopic phantom data, showing correct results even when thin wires are the only moving objects. 35 Figure 3.9. Direct log-additive layer estimation. Although the motion is correctly estimated, the resulting layers show severe artifacts. Figure 3.10. Background layer estimate using independent (left) vs. joint (right) alpha channel models. The use of a single alpha map for all color channels leads to color artifacts. 36 Figure 3.11. Layer separation results on aerial data. Note both the accurate layer separation, as well as the improved noise characteristics of the estimated template image. CHAPTER 4 LAYERED DEFORMATION WITH NONLINEAR MOTIONS The previous chapter presents a method for estimating the layer images and deformations in the additive layer model, given a set of input frames. Given a convex prior on layer smoothness and a current estimate of the layer deformations, the optimal layer estimates were used in incrementally updating the deformation estimates. The layer deformations considered in that chapter were translations, limiting the applicability in modeling nonlinear motions caused by physiological functions as breathing or heartbeat. Here we will consider modeling layer deformations via dense deformation fields, allowing general, nonlinear motion. The joint estimation of layer images and dense deformation fields from a sequence of input images is a severely underconstrained problem. We examine the artifacts encountered when standard layer regularizations, such as the H1 or TV penalties discussed in Chapter 3, are used along with dense deformations in the additive layer model. We then develop a novel TV-based layer regularization to avoid these artifacts, and a primal-dual optimization scheme for this penalty. Estimating a dense deformation field representing correspondences between image locations is an underconstrained problem, but correspondences modeling image differences due to motion, growth, and intersubject variability of biological structures has been shown to be well modeled by smooth deformation fields [11]. As each layer encompasses some portion of the imaged anatomy undergoing physiological motion, we assume that the motion in each layer can also be modeled by such a deformation. The overall goal is to estimate layers and deformations such that the sum of smoothly deforming layers can accurately describe the motion in the original frames. To achieve this goal, the estimated layers need not represent a segmentation of objects in the scene—they must only separate overlapping objects where contradictory motion violates a smooth-deformation model. The layer separation should therefore be considered motion based separation as opposed to object based separation. As a pertinent example, we may consider the contrast-enhanced vessel tree to be a single object in the imaged scene, but branches of the imaged vessel tree may overlap over time such that the motion of the entire tree cannot be represented by a single smooth 38 deformation. In this case the vessel tree must be separated into multiple layers. Conversely, objects showing similar motion will be grouped into the same layer. In setting up our model we assume some general properties of the input data based on the use case of a patient undergoing fluoroscopic imaging. With the exception of objects completely attenuating the imaging signal (a case we will ignore), we can assume that all objects are semitransparent, such that we do not have to consider occlusions. We also assume the object being imaged stays mostly within the field of view, with relatively small portions moving in and out of frame. We wish to allow for very expressive deformations with no low-order parameterization. Further, we will not assume a dominant motion between frames, and will therefore avoid a hierarchical motion decomposition. This chapter is based on publication [72]. 4.1 Additive Layer Model for Nonlinear Physiological Motion The frame generation model used here is the subtractive model given by Equation (2.1) in Section 2.2. This section presents the details of this formulation for modeling nonlinear patient motion in fluoroscopic imaging, including the deformation model, layer smoothness penalty, and extensions for modeling contrast injection. In addition, the discretization and estimation methods will be discussed. 4.1.1 Deformation Model In order to represent the variety of anatomical motions observed in clinical practice, we model each layer {`k } as undergoing its own smooth deformation φk (x, t), t ∈ [t0 , tT ] using the diffeomorphic large deformation model [6, 7]. In this model, φ(x, t) is defined via the time-varying Rt velocity field v(x, t), φ(x, t) = φ(x, t0 ) + t0 v(φ(x, s), s) ds, where φ(x, t0 ) = x. Smoothness Rt of the deformation φ is enforced by penalizing t0T kv(x, t)k2L dt, where kvk2L =< Lv, v >2 for smooth velocity field v, and L is a differential operator. Under reasonable smoothness assumptions the deformation φ(x, t) is guaranteed to be diffeomorphic, ensuring the existence of φ−1 (x, t), which is computed by integrating the negative of the flow field v backwards in time. Following [7] we use the notation φt,s (x) = φ(φ−1 (x, tt ), ts ) for the deformation moving the point x at time tt to its location at time ts . Going forward we will also drop explicit mention of the spatial parameter x for notational convenience. 4.1.2 Layer Gradient Penalty The proposed method jointly estimates both the layers and the deformations. Even with the smoothness constraint on the deformations imposed by the large deformation model this is an 39 extremely underconstrained problem. In order to formulate a well posed estimation problem, some assumptions regarding the properties of the layers must be made. We wish to separate overlapping objects in the input frames into different layers in the proposed model. Reducing the number of edges in layer images will help with this goal, as the overlapping of transparent objects introduces an edge that will appear in multiple layers if proper separation has not been achieved. A natural choice would be to impose a total variation penalty on the layer images. Such a penalty encourages sparsity of gradients within an image, and more importantly for our application, encourages sparsity of gradients across the layer images. Even with the layer smoothness constraint, the formulation permits ambiguous solutions for even simple motion as shown in Figure 4.1. This shows a small object moving to the right between two timepoints. Solution (a) is the expected solution, however we see that solution (b) may actually be the optimal solution given the tradeoff between deformation1 and gradient penalties. In order to improve this situation we note that other information is available that can improve the solution. Consistent motion across multiple frames can remove ambiguity. We can also use the frame data to improve the layer smoothness penalty. We know that if there is an edge in a frame, that edge should exist in at least one layer. Further, if some location contains no edge in a frame, no edge should exist at that location in any layer at that time. Observe that this is violated in Figure 4.1 (b). However, as our goal is to separate objects into different layers, we do not want to force every edge into every layer. Consider the following penalty Z k∇`k k2 − ∇`k · n0 dx, (4.1) Ω where n0 represents the normals of the frame f0 taken with a cutoff regularization based on parameter : 1 A pure translation will not be penalized by our smoothness penalty, however in realistic 2D scenes a pure translation would be uncommon, so for purposes of this example we can associate increased size of deformation with increasing smoothness penalty Figure 4.1. A 1D example of two possible solutions accounting for the movement of a small object with two layers. Note that a larger deformation is required in case (a), and more edges are required in case (b). 40 n0 = ( ∇f0 − k∇f 0 k2 0 if k∇f0 k2 ≥ , (4.2) if k∇f0 k2 < . This looks like a standard TV penalty on the layer `k except where an edge occurs in the frame f0 . Here the penalty is eliminated if the gradient in the layer aligns (in the same direction) with the gradient of the frame, and is doubled if they align in opposite directions. Also note that this value is always nonnegative, as ∇`k · n0 takes its maximum value when n0 = ∇`k , k∇`k k2 resulting in zero penalty. It is also zero if ∇`k is zero, meaning there is no penalty for a layer not representing an edge in f0 . Of course, noise in the frame will cause incorrect estimates of the normals. It will be important for our results to have nonzero normals only where true edges in the frame exist. To ensure this, we calculate normals from f̄0 , a denoised version of frame f0 . We desire sparse gradients, and therefore introduce a standard total variation based denoising method. We note that a penalty very similar to (4.1) is proposed in [73] for the purposes of denoising, where the normals used are estimates of the true normals of the image being denoised, and are approximated by the normals of the TV-denoised image. The undeformed layers are estimated at time t0 , and therefore equation (4.1) only makes sense for the normals of frame f0 . We do not wish to bias our solution to the configuration of objects observed at t0 , so we propose a version incorporating all frames T Z X t=0 k∇`k k2 − ∇`k · ñt dx, (4.3) Ω where ñt represents the normals of the denoised version of ft taken after deforming it to time t0 , once again with cutoff :  − ∇(f̄t ◦φt,0 ) k∇(f̄t ◦φt,0 )k2 ñt = 0 if k∇(f̄t ◦ φt,0 )k2 ≥ , (4.4) if k∇(f̄t ◦ φt,0 )k2 < . Because standard numerical solutions of TV denoising do not result in perfectly uniform image regions, the parameter is chosen to ignore very small gradients, in our case approximately one percent of the input image intensity range. 4.1.3 Energy Formulation We formulate the estimation as an energy minimization problem. Given the current constraints, the total energy is given by Etotal = Edata + λ` Elayer + λv Edef , (4.5) where Edata is the data attachment term, Elayer is the layer gradient penalty, and Edef enforces the deformation smoothness constraints. The constants λ` and λv control the tradeoff between the 41 goals pursued by the different terms. We can now explicitly state the penalty terms. The data term comes directly from the frame generation equation (2.2) Edata = T X kf̂t − ft k22 , (4.6) t=0 where f̂t is the estimated frame at time tt ; f̂t = PK−1 k=0 `kt . The layer gradient penalty, as outlined above, is Elayer = K−1 T XX k∇`k k2 − ∇`k · ñt k=0 t=0 1 , (4.7) and the deformation smoothness, again as discussed above, is Edef = K−1 X Z tT k=0 t0 vtk 2 dt, L (4.8) where k · kL is dependent on our choice of L. In this work we use L = ∇2 , the Laplacian operator. Note that we should also normalize for the number of frames, but for brevity we have absorbed this in our constants λ` and λv . The estimation problem is then formulated as argmin Etotal {ft }, {`k }, {vtk } . (4.9) {`k },{vtk } 4.1.4 Residual Layers Although the smooth deformation of layers described above can adequately describe the motion caused by breathing, heartbeat, and other deformations of 3D anatomy, there are some important cases in fluoroscopic imaging that violate this model. Specifically, the introduction of new objects such as tools during interventional procedures or contrast agents during angiography are not handled by our model. In order to accommodate these cases, additional layers with specific properties meant to model these situations are introduced. Consider the case of a radiographic contrast medium introduced into the vascular system to increase the contrast between blood vessels and surrounding tissue, thereby exposing vessel blockages, leaks, and abnormalities. The contrast enters the frame as a large dark object, and spreads rapidly from frame to frame. In order to model the contrast, we introduce a residual layer that accounts for interframe changes not well modeled by a deformation. Based on the observed properties of contrast, we model it as a smooth contiguous object. The layer should also be sparse, containing mostly zero values. We therefore estimate a layer at each timepoint that is not subject to deformation, and impose a TV penalty and L1 penalty on this layer. The model for our estimated 42 frame is then f̂t = PK−1 k=0 `kt − bt , where bt is the residual layer at tt , t ∈ {0 . . . T}, and we introduce a new term to the energy minimization Eres = λTV T X k∇bt k2 + λL1 1 T X bt 1 . (4.10) t=0 t=0 again accounting for normalization over the number of frames in the constants λTV and λL1 . 4.1.5 Discretization and Solution Because the time interval between frames is arbitrary in our formulation, we choose unit temporal spacing, t0 = 0, t1 = 1, . . . , tT = T. We expect small deformations between subsequent frames, so our discretization of a time-varying velocity field v(x, t) will match the frame times such that there is one piecewise-constant (in time) velocity field vt (x) corresponding to each frame ft , t ∈ {0 . . . T − 1}. Euler integration in time will be used for generating φ from v, and bilinear interpolation is used for deformation of images and composition of deformations. For reverse-time integration, the small-deformation approximation v−1 = −v will be used. In order to optimize the layer gradient penalty (4.7), we use a primal-dual strategy based on [71]. This choice is based on properties of the regularized primal variation as ∇`kt → 0 with k ∇` ∇ft 6= 0, which could force some portion of ∇ft into each layer. Noting that k∇`k k2 = ∇`k · k∇` kk and choosing a dual variable pk s.t. kpk k2 ≤ 1 approximating ∇`k k∇`k k2 2 for n ∈ {0 . . . K − 1}, we rewrite (4.7) as Elayer = K−1 T XX ∇`k · pk − ñt k=0 t=0 1 . (4.11) This transforms the minimization (4.9) in to a max/min problem. When including a residual layer in our formulation, we also introduce a dual variable qt for each residual image bt in order to solve the total variation penalty. The L1 penalty is solved using a shrinkage-based[74] L1 update on bt . A multiscale algorithm is used to ensure correct deformations are found even in cases of large movements of small structures such as vessels. At each scale level f̄t is computed from the appropriately downsampled version of ft , and ñt is then calculated from f̄t via equation (4.4). The algorithm iteratively updates each {`k }, {pk }, and {vtk } (and {qt } and {bt } if using residual layers), taking appropriate gradient descent steps on the primal variables, and gradient ascent steps on the dual variables followed by reprojection onto their constraints, repeating until convergence. 4.2 Results We first present results on a synthetic dataset meant to approximate a fluoroscopic image sequence of a contrast-enhanced vessel structure (see the first column of Figure 4.2). This is included 43 (a) (b) (c) (d) f0 `00 est @ t0 re-est @ t0 f1 `10 est @ t1 re-est @ t1 f2 `20 est @ t2 re-est @ t2 Figure 4.2. Results of layer and deformation estimation on synthetic dataset. Column (a) shows the input frames. Column (b) shows each layer at t0 . Column (c) shows the estimated frame at each timepoint. Column (d) show the reconstruction with re-estimated layers. to highlight characteristics of solutions this algorithm produces. Results are then presented on a clinical angiography data with similar appearance to the synthetic dataset (see first column of Figure 4.3), with qualitative results included for frame interpolation (see Figure 4.4). Finally results are presented for the DSA application using the deformation with residual model on a clinical dataset (Figure 4.5 (a) and (b)). Values for the λ constants were determined experimentally. The synthetic and clinical data frames are 256×256 and 512×512 pixels, respectively. Optimization was run on a NVIDIA Tesla C1060. Each gradient descent iteration on the clinical data at the finest scale level takes approximately 310 ms. Results presented were run for 4000 iterations at each scale level to ensure convergence, taking approximately 30 minutes to compute. The synthetic data is generated from four layers; a static rib, a slowly moving diaphragm, 44 (a) (b) (c) (d) f0 `00 est @ t0 re-est @ t0 f1 `10 est @ t1 re-est @ t1 f2 `20 est @ t2 re-est @ t2 Figure 4.3. Results of layer and deformation estimation on clinical dataset. Column (a) shows the input frames. Column (b) shows each layer at t0 . Column (c) shows the estimated frame at each timepoint. Column (d) show the reconstruction with re-estimated layers. and two vessel layers representing a vessel tree undergoing a nonlinear deformation with branches overlapping from the imaged viewpoint. Although the data is generated from four layers, experimental results reveal that three layers are sufficient to capture the independent motion of different structures, and additional layers do not improve the result. The results in Figure 4.2 are a representative example of the solutions found by our algorithm. Notwithstanding that the results are incorrect from a segmentation perspective, in a given region any objects displaying contradictory motion are separated. Note that the diaphragm object has been removed from the upper portion of `0 , allowing the vessel to cross the diaphragm boundary. While we employ a total variation based regularization on the layers in order to help separate objects and estimate coincident motions, we do not typically want the highly denoised results shown in 45 (a) (b) est @ t0 est @ t0.5 est @ t1 Figure 4.4. Frame interpolation using estimated layers and deformations. Row (a) shows results on synthetic data from Figure 4.2, and row (b) on clinical data from Figure 4.3. The center frame is estimated between t0 and t1 . columns (b) and (c). In fact, we see areas at the tips of the vessel structure where fine detail has been obliterated by the denoising properties of the estimation. If the set of estimated deformations are consistent with the motion of the imaged objects, it is possible to re-estimate only the layer intensities given these fixed deformations. In this case very little regularization is necessary in the intensity estimation, and the resulting layers preserve the noise texture and much of the fine detail of the original images, as shown in column (d). By temporal interpolation we can use our layer model to generate intermediate frames, as shown in Row (a) of Figure 4.4 where the re-estimated layers have been used. Figure 4.3 shows results on a clinical angiography dataset. A Three-layer model has again been chosen based on experimental results. Note the separation of the most prominent vessel from the diaphragm. Once again, we show initial denoised layers as well as re-estimated layers. Row (b) of Figure 4.4 shows an interpolated frame between times t0 and t1 . Note the crossing of vessels in the upper portion of the image. Figure 4.5 shows the results of using a residual layer and two deforming layers to estimate the motion and contrast between two frames of an angiographic sequence to diagnose stent placement for treatment of an abdominal aortic aneurysm, and compares against static subtraction and elastic registration. 46 (a) (b) (c) (d) (e) (f) Figure 4.5. DSA Results. Initial (mask) frame and current frame are shown in (a) and (b), respectively. Static subtraction is shown in (c). The estimated current frame (from layered registration) is shown in (d), without residual layer. DSA results using frame (d) are shown in (e). Subtraction using simple elastic registration between frames is shown in (f). 4.3 Integrating Correspondences The primary structures of interest in cardiac fluoroscopy are vessels of the heart. These vessels are both relatively small compared to the field of view, and moving large distances between frames. Discovering these types of correspondences is difficult using local gradient-based optimization. Standard multiscale techniques often fail to recover the motion of these small objects, particularly in our case where incomplete layer separation during optimization may cause the estimated deformation to be ‘pinned’ by overlayed static objects. We therefore present a model for incorporating sparse correspondences into the deformation optimization. k = {(x , y ), (x , y ), . . . , (x Let Cs,t 1 2 1 2 \|C k \| , y \|C k \| )} be a set of correspondences for layer k, s,t s,t where xi and y i are the corresponding location at times s and t, respectively. The penalty Ecorr will attempt to match correspondences between any two frames, and for all layers Ecorr = K−1 X T−1 X T X X ky − φks,t (x)k2 (4.12) k k=0 s=0 t=s+1 (x,y)∈Cs,t Because we anticipate the existence of incorrect correspondences, a robust sum-of-distances penalty is used. The correspondence term Ecorr is included in the full deformation estimation. Although the robust penalty on correspondence matching limits the effect of incorrect correspon- 47 dences, these nonmatching correspondences degrade the quality of the final estimated deformation. Therefore, after convergence all correspondences that have not been matched by the deformation within some tolerance δ (generally one pixel) are removed, and optimization is again run to convergence. If initial layer estimates {`k } are given, correspondences between these layers and the frames k for t ∈ {1 . . . T}. Updated layers and new at subsequent timepoints can be calculated, i.e., C0,t deformations can be calculated from these correspondences. Frame-to-frame correspondences can also be calculated, i.e., between frames fs and ft for some 0 ≤ s < t ≤ T. These correspondences can improve our layer and deformation estimates, as small features in the input frames may be lost in the layer estimation. However, we do not know the layers to which each correspondence should belong. A simple best-match assignment is used, in which each correspondence is assigned to the layer whose current deformation best matches the correspondence offset. For a correspondence pair (x, y), the layer to which they are assigned is given by argmin ky − φks,t (x)k. The layers and k deformations can then be reestimated with the frame-to-frame correspondences included. 4.3.1 Correspondence Calculation Whether between layers and frames or between two frames, our approach to correspondence calculation is to find the best match between each pixel of the initial image and some location in the final image, giving a dense set of correspondences. These dense correspondences are then filtered to retain only those correspondences we are fairly certain are correct. In order to allow matching between layers and complete frames, or between frames in which objects may pass over other objects, we require a matching criteria that is insensitive to absolute intensity level. Because fluoroscopic images have relatively high noise levels, and because the objects being imaged are undergoing local deformations, gradient based matching criteria do not perform well. We instead choose patch-based Normalized Cross Correlation (NCC) as our matching criteria. Given a patch structure defined by offsets G ∇ = {kx , ky } for kx , ky ∈ [−∇, . . . , ∇], we can define a region centered at x as {x + g} for g ∈ G r . For regions defined by centers x and y, the normalized cross-correlation is then calculated as A (x + g) − µA (x) B (y + g) − µA (y) f f r r X G G 1 . A B r \|G \| σG r (x)σG r (x) g∈G r (4.13) A A where µA G r (x) and σG r (x) are the mean and variance, respectively, of the region of f defined by G r and centered at x. If we expect the maximum motion of an object in any direction to be r, we define a window around each pixel given by offsets G r . We select the correspondence for a location xi as 48 v i = argmax NCC (P(f, xi , w), P(f, xi + v, w)) , (4.14) v∈G r where P(f, x, w) = {f(x + g)}g∈G w is the patch in image f centered at x with radius w, and NCC is the normalized cross correlation between two patches. This dense set of correspondences is then filtered for inverse consistency and consistency with neighbors, yielding a sparse set of high-confidence correspondences. 4.3.2 Layer Initialization In order to generate layer correspondences, an initial layer estimate must be available. However, a good layer estimate requires a reasonable deformation estimate, which for cases involving large motion of small objects requires layer correspondences. In order to bootstrap this problem, frameto-frame correspondences are calculated between the first two frames, and these correspondences are assigned to layers via clustering. A simple k-means clustering of the offsets suffices in cases where the motion of layers is sufficiently close to translational. However, in general we are looking only for layers exhibiting smooth motion, not necessarily translations. This problem can be formulated as a graph labeling problem with one vertex per correspondence, and can then be solved via normalized cuts or min-flow for the two-layer case (min-flow requires at least one vertex to be preassigned to each layer), or recursive ncuts for three or more layers. 4.3.3 Results The use of correspondences is particularly important during large frame-to-frame motion. The example in Figure 4.6 shows the results of estimating a 3-layer model from a 4-frame angiographic sequence showing particularly large motion due to contraction of the heart. Without correspondences, the motion of the vessels is not accurately captured and severe artifacts (dark patches and missing or blurred structures) are visible in the estimated layers. With the correspondence model, layer motion matches that of specific structures in the frames, and artifacts are greatly reduced. 49 `00 `10 `20 Figure 4.6. Layer estimation with and without augmenting correspondences in the motion model. The top row shows the estimated layers without using correspondences, showing artifacts and missing structure. These artifacts are significantly reduced in the bottom row, where the proposed augmenting correspondences are used. CHAPTER 5 MULTISCALE DISCRETE REGISTRATION In previous chapters we have looked at motion modeling for fluoroscopic imaging via an additive layer model. In particular, the motion of contrast-enhanced coronary arteries was considered. Temporal tracking of the coronary arteries, which follow the rapid motion of the beating heart, poses additional challenges beyond the independent motion of overlapping structures. The large frame-to-frame motions of small vessels mean that registration algorithms require nonzero initialization in order to converge to the correct result. Standard multiscale methods to convexify the optimization ignore small objects or features, implicitly assuming that the motion of small features closely follow the motion of larger objects. In fluoroscopic image registration, the motion of the vessels (influenced by the beating heart) clearly does not follow the motion of larger structures such as bones or the liver. Even with an ideal layer separation that contains one or more layers showing only vessels, multiscale registration may have difficulty estimating the large motion of relatively small vessels. Motivated by temporal vessel registration, this chapter will focus on standard single-layer registration, and develop improved techniques for registration of small objects undergoing relatively large displacements. Although motivated by vessel registration, we will primarily focus on a similar registration task; that of slice-to-slice registration of biological data. Such datasets are common, collected from histology or microscopy, and registration is an important component in understanding the 3D structure represented by stacks of 2D slices. Such slice-to-slice registration of biological structures shares a number of similarities with temporal vessel datasets: 1. The type of deformations encountered are highly nonlinear, excluding the use of a loworder parameterization. An affine transformation may be present, but it is also generally accompanied by both global nonlinear warping and changes in local structure. 2. The deformations are required to be globally smooth, in order to provide a dense bidirectional mapping from one image to another. 3. The data is often noisy, especially from microscopy. 51 4. The data often lacks consistent features. Even though some distinct features may exist between slices, large areas are often featureless or have only repetitive structures that are difficult to match uniquely. Although accurate methods are available for global linear alignment of images, progress is still required in estimating nonlinear warping to match individual structures between slices. Improvements in these techniques can greatly aid methods such as cell segmentation and tracking—if a perfect registration were available, correct segmentation in a single slice could automatically be propagated to a full 3D structure. Current state-of-the-art methods use variational optimization techniques to minimize a pointwise dissimilarity function while imposing smoothness criteria. These methods are inherently local, and are therefore dependent on a good initialization. Two common techniques for this secondary (post-affine) initialization are 1) a scale-space approach, which registers larger objects first and successively refines the registration by including smaller features(e.g., [75]), or 2) a feature-based alignment, which uses distinct features in the images and a low-order parameterized deformation to generate a rough alignment that can be used as an initialization to the variational methods (e.g., tools in [76]). Although these procedures do not suffer from the same initialization requirement as the original variational method, they make additional assumptions about the data that may be violated, leading to poor results. The scale-space approach assumes that small features closely follow the deformation experienced by larger objects, i.e., their motion can simply be interpolated. The feature-based approach assumes that there are enough unique features in the scene that can be correctly paired using a best-match criteria to define a transformation, and that this simplified transformation gives a good approximation of the final deformation.h Although these methods work well for initial rigid or affine alignment, more complex deformations require higher-quality correspondences than are often available. Discrete techniques based on Markov random field (MRF) formulations offer the promise of a single-step (after initial global alignment) procedure overcoming these limitations by allowing a nonparametric smooth deformation while simultaneously considering all possible matches within a large region. Advances in MRF optimization have produced algorithms that can give results quite near the global optimum. The major factors limiting the applicability of these techniques are computational (both processing time and memory), and the discrete nature of the resulting deformation. Although modern MRF optimization techniques have greatly improved the speed and quality of results, simply computing and storing the dense image matching term can be challenging for large images or large deformations. In fact, in 3D this step alone makes these methods currently 52 infeasible on a standard desktop, and explains why current discrete methods in medical imaging focus on solving a low-resolution (relative to the image) MRF representing control points of a b-spline deformation. While dense deformation fields are generally represented in discrete form, only the function’s domain is discretized; A function v can be approximated as a discrete set of sampled vectors {v p }p∈P . Discrete MRF formulations also discretize the codomain, allowing only a discrete set of possible values for each vector v p . This representation is particularly problematic because we wish to represent a smooth deformation. For example, transitions between consecutive offset values are necessarily abrupt, even when a continuous deformation might be smooth at resolutions below the output discretization; e.g., a slow, smooth change in the true (continuous) deformation is represented by piecewise-constant segments with jumps at the resolution of the output discretization. These effects can be reduced by a finer discretization of the offset space, but at an unacceptable increase in the number of offsets that must be considered. Alternatively, these artifacts can be removed by creating a spline approximation to the final deformation field, or by formulating the original optimization as a MRF on the control points of a spline. However, this means that the interpolated values are chosen purely by smoothness, while ideally we would like them to be driven by the data matching as well, even at subgrid resolution. In fact, the subgrid accuracy of variational methods is a major advantage over most discrete methods. We propose a hierarchical MRF-based approach that, while considering a dense matching term, limits the number of possible offsets at a given scale. This method can be run to arbitrary subgrid accuracy, resulting in a smooth deformation still using the full data matching term at each point. We further note that a limitation of many region-based matching criteria (e.g., patch-based intensity matching or normalized cross correlation) is that they do not consider changes to the local geometry due to the transformation we are trying to recover. Although rotationally invariant features exist, these generally throw out all information regarding spatial layout, some of which is surely useful in determining correspondences. We therefore propose a simple extension to impart a measure of rotational invariance to a selected data matching metric. The material in this chapter appears in publication [77]. 5.1 Background Discrete Markov random field formulations for image registration have been well studied, especially as the optical flow problem from computer vision. Advances in discrete optimization have lead to a number of algorithms for efficient approximate MRF maximum likelihood inference [78, 79, 80, 81]. In applications to medical image registration, the efficient α-expansion optimization of [78] has been applied to an MRF formulation of the image registration problem [82, 83]. 53 This algorithm guarantees solutions within a factor of the global optimum, and has been shown to give good empirical results for many problems. However, it imposes restrictions on the class of smoothness functions it can represent, and therefore the authors are constrained to use a discontinuity preserving penalty as opposed to one enforcing global smoothness, which is generally required in medical image registration. Direct application of belief propagation techniques (which allow penalties enforcing global smoothness) to dense 2D registration problems with large possible deformations is still too computationally demanding. Approximations exist that limit the offsets considered on a per-pixel basis (e.g., [84]). These have been applied to medical image registration [85, 86], but do not seem well suited to problems such as ours with a strong smoothness prior and a combination of high noise and sparse or repetitive features—these indicate that the image match term alone may not be a good predictor of the best match. More efficient alternate MRF formulations also exist [87], but do not improve the overall computational complexity and therefore still cannot handle the huge number of labels necessary for direct multidimensional registration with large deformations. A number of tools exist for dense nonlinear registration of 2D biological image data (e.g., [75, 76]), however discrete optimization techniques are more rare. The most widely-used discrete techniques for medical image registration, and the most similar to our work, are due to Glocker et al. [88, 89], in which a joint MRF/b-spline formulation is used with a hierarchical scheme refining both the b-spline grid and the considered offsets, as well as sequentially solving at successively finer image resolutions. Similar to our technique, a MRF optimization is run at each step that improves the localization of the deformation. However, while their technique focuses on quickly finding an approximate solution based on a sparse set of block matching locations (i.e., the b-spline grid control points), our method tries to approximate a fully dense matching optimization as closely as possible. 5.2 Methods The registration problem we wish to solve can be stated as argmin v̄ Ematch (f s , Tv̄ (f m )) + Esmooth (v̄), (5.1) where f s is the static (target) image and f m is the moving (source) image, x̄ is the identity transform, v̄ is an offset field mapping locations in f s to correspondences in f m , and Tv̄ (f m ) is a transformation of f m by the offset field v̄. The term Ematch is a measure of dissimilarity between the static image and the registered moving image, and Esmooth is a measure of nonsmoothness of v̄. 54 5.2.1 Standard MRF Formulation Let P be the set of pixels in f m or f s . We use a standard pairwise Markov random field formulation with a 4-connected neighborhood, where each pixel p ∈ P corresponds to a node in the MRF, and each node is connected by an undirected edge to its two neighbors in each dimension. Let E, the set of edges, be the set of unordered pairs {p, q} such that p and q are neighbors. Each node p has a label v p that takes values from a discrete set V of two-dimensional offsets. The maximum likelihood estimation we are interested in attempts to assign a label v p to each node p ∈ P such that the probability of the labeling v̄ = {v p }p∈P is maximized. This is equivalent to minimizing the negative log likelihood L(v̄) = X Dp (v p ) + p∈P X S(v p , v q ). (5.2) {p,q}∈E The term Dp is some measure of the likelihood of v p being the true offset based on the image data—generally a data dissimilarity between the fixed image near point xp and the moving image near point xp +v p . The term S is a smoothness term ensuring that offsets are similar to neighboring P offsets. Looking at Equation (5.1), we see that Ematch is represented here by p∈P Dp (v p ), and P Esmooth by {p,q}∈E S(v p , v q ). We wish to enforce global smoothness, so we use a squared Euclidean distance as the smoothness term Sp (v p , v q ) = λper kv p − v q k22 , (5.3) where λper is a constant that balances the smoothness and data terms. Let us define a twodimensional regular grid of offsets, centered at zero, with 2∇ + 1 positions in each dimension and a grid spacing of δ, as Gδ∇ = {δkx , δky } for kx , ky ∈ [−∇, . . . , ∇]. If we assume a maximum deformation size of R pixels in any dimension, the set of offsets needed for single-pixel precision is given by V = G1R , and the number of offsets is then \|V\| = (2R + 1)2 , which can quickly become intractable for large deformations. 5.2.2 Multiscale MRF Formulation Instead of optimizing over the full set of offsets V = G1R needed for pixel-level resolution, we will fix the grid size to have a sample radius (in number of offsets) of r in each dimension, and approximate the full problem by a series of MRF optimizations, iteratively approaching the true solution by using the solution at a higher (coarser) level to restrict the search space of the next lower (finer) level. Figure 5.1 illustrates the downsampling. The sample radius r fixes the number of offsets at all levels, \|V\| = (2r +1)2 . Optimization starts at the coarsest level s = S and proceeds 55 (a) (b) Figure 5.1. Illustrations of label multiscale downsampling for a 5×5 search window. (a) shows the search window (solid box) and possible offsets (circles) at downsample level s, and the search window (dotted box) and offsets (dots) at the downsample level s − 1, assuming the arrow shows the ‘best’ offset chosen at level s. The shaded region shows the area over which the ‘min’ is taken in Equation (5.5). (b) shows neighboring nodes during the upsampling process from s = 2 to s = 1 for S = 2. The grey dots represent the MRF nodes. to pixel-resolution offsets at level s = 0. At a given scale s, the possible offsets for a node p is given by Vps = bsp + g for g ∈ G2rs . (5.4) That is, Vps is an evenly-spaced two-dimensional grid, centered at bsp , whose inter-offset spacing increases with increasing level s. However, we do not want to restrict the locations at which a particular node can match. We therefore define the data penalty Dsp for each node p based on the best match at the base level (D0p = Dp ), i.e., min-pooling over the finest-scale data match energies in a patch of offsets. Dsp (v p ) = min [Dp (v p + g)] g (5.5) for g ∈ G1t , t = min 0, 2s−1 . This allows all matches at the finest resolution to be represented at any scale. However, because many possible matches are identified with a single offset, we are essentially ignoring nonsmoothness that can only be represented at lower scales. At each level the function being optimized is Ls (v̄) = X Dsp (v p ) + p∈P X S(v p , v q ), (5.6) {p,q}∈E s ˆ = argmin Ls (v̄). The initial base offset at each and we find the maximum likelihood labeling v̄ v̄ point is set to zero offset, bSp = 0. At each subsequent scale level, the base offset at each node p, bsp , is initialized to the maximum likelihood offset at the previous level, bsp = v̂ s+1 p . The number 56 of levels S at which multiscale registration is run is then determined by the maximum size of the deformation that is to be represented. For a maximum deformation size of R and sample radius r, the required number of levels is given by S = log2 Rr . Messages for all nodes are initialized to zero at each level. 5.2.3 Subgrid Resolution A drawback of most discrete registration techniques is the lack of subgrid resolution, at least without greatly increasing the computational burden. As long as the chosen data penalty Dp can be evaluated at nongrid locations, our method can be trivially extended to provide arbitrary subgrid accuracy. The formulation stays exactly as above, but we allow negative level indices in Equations (5.4) and (5.5). In practice these extra levels converge very quickly and therefore add little to the computation time. 5.2.4 Rotational Invariance Standard data penalty terms that consider the spatial layout of a region surrounding the location to be matched suffer from the fact that they do not take into account a possible deformation of the local region due to the transformation that we are attempting to recover. This limitation is due to the assumption of pointwise independence of the data match term—in fact, the local transformation is defined by the offsets within a local neighborhood of a point. Rotationally invariant features have been used, but these generally throw out all spatial information about the local region, some of which is surely useful in determining correspondences. Even for large nonlinear deformations, the transformation of a small local region can often be described quite accurately by a low-dimensional transformation, such as rigid or affine. The parameters for such a transformation should vary smoothly under a smooth deformation, so we could discretize these parameters and optimize over these as well, of course at the cost of greatly increasing the required \|V\|. As we are not especially interested in recovering parameters other than the offset at each point, at only the cost of increasing the computation time of Dp (which is precomputed for each scale level), we can compute the minimum cost over some discrete set of transformation parameters. This is equivalent to ignoring the smoothness requirement for these parameters, in which case the maximum likelihood estimate is independent at each point, and given simply by the values that maximizes Dp . This means that the rotation used to compute the data matching term at each point is not required to be consistent across the deformation, nor is it required to be consistent with the local rotation introduced by the estimated deformation. This does not introduce discontinuities in the estimated deformation; the deformation is still required to be smooth. However, it may underestimate the dissimilarity of the 57 final solution. This represents a tradeoff between the possibility of missing structural matches due to local deformation, and overestimating the degree of local match. Because our primary concern is finding potential matches for small, sparse structures, we choose to bias the results in favor of including potential matches. 5.2.5 Optimization As noted previously, there are a number of available optimization techniques for inference on pairwise MRFs [90]. Some restrictions apply based on our choice of a squared Euclidean smoothness metric—this is nonsubmodular, and therefore the standard α-expansion graph-cutbased technique cannot be applied. We note that techniques have been developed that use a series of graph cuts in their optimization, but which do not suffer from the same restrictions on smoothness functions [81]; however no open-source implementation has been produced, making integration or independent evaluation difficult. The choice of smoothness penalty does make some optimization techniques more appealing. Belief propagation techniques can take advantage of the separable nature of the squared Euclidean distance (i.e., kv p − v q k22 = kvpx − vqx k22 + kvpy − vqy k22 ) to reduce message computation time from O(\|V\|2 ) to O(\|V\|), as long as possible offset values for a given node lie on a grid. Because our offsets all lie on a grid (although not necessarily the same grid due to the per-node base offset), we will use loopy belief propagation [79] (LBP) for each optimization step in our multiscale algorithm. It is worth noting that while graph cut based algorithms are quite efficient, they are also inherently serial. The message passing structure of the LBP algorithm, on the other hand, is trivially parallelizable down to the node (pixel) level. We also note that more sophisticated BP techniques exist [80], and while our method may benefit from improved optimization, the formulation presented above would remain unchanged. In order to improve optimization rates, we use hierarchical belief propagation (HBP) [79] on the initial (highest/coarsest) scale level s = S. In this technique the resolution of the grid of MRF nodes is changed in order to speed convergence. Note that this is separate from our multiscale formulation, were the full-resolution grid is used at each stage, but the set of possible offsets V and the data penalties Dp are changed. The use of HBP is only possible if neighboring nodes share possible offsets, so at lower scales where bSp is not zero, optimization starts on the full-resolution grid. 5.3 Results For the results given here a simple patch-based squared intensity difference has been used as the image dissimilarity metric 58 Dp (v p ) = X (f s (xp + g) − f m (xp + v p + g))2 , (5.7) g∈G1w where the size of the patch is defined by w. Figure 5.2 shows results on synthetic vessel data. The two input images have been created by hand tracing part of a contrast-enhanced coronary vessel tree from consecutive frames of fluoroscopic sequence. In addition to the local frame-to-frame changes, a global rotation is syn- thetically introduced to cause greater misalignment. Because our pairwise smoothness constraint penalizes linear transformations, the optimal solution will generally not represent a global rotation, and therefore the rotation can be used to represent a general nonlinear transformation. Finally, additive i.i.d. Gaussian noise has been added to each image. Figures 5.2 (a) and (b) show an initial + 0 - (a) (b) + 0 - (c) (d) (e) (f) Figure 5.2. First row: synthetic vessel dataset, (a) static (template) image and (b) initial difference image. Second row: final difference images, (c) translation-only patch matching and (d) best-rotation patch matching. Third row: Jacobian determinant images, (e) translation-only patch matching and (f) best-rotation patch matching. 59 synthetic image and an initial difference image, respectively. Figures 5.2 (c) and (d) show that allowing the best match over a sampled set of rotations leads to better matching, and Figures 5.2 (e) and (f) show that in addition to better matching, less local volume change is necessary when using the best-rotation patch matching. Figure 5.3 shows the results of registration at each scale level for the best-rotation results in Figure 5.2. The top row of results show the source image f m deformed by the current estimation of the deformation. The min-pooling over regions of offset errors at each scale level results in the blocky registration visible at initial scales. For instance, at scale 3 the goal is simply to bring each location within 23 = 8 pixels (in each dimension) of its true correspondence. The second row shows the Jacobian determinant of the deformation estimated at each scale level. Note that at scale level 0, the standard single-pixel deformation resolution for discrete registration, discretization artifacts and regions of noninvertibility in the deformation field are still clearly visible. In order to compare against real imaging data with a known ground truth, we apply a random deformation to a C. elegans section imaged using transmission electron microscopy. The data is smoothed and downsampled to remove noise, and a 256×256 region of interest is used for our tests. Each deformation has been constructed as the composition of smooth, small-scale perturbations meant to represent slice-to-slice changes in cell appearance, and a random global rotation between 0 and 15 degrees meant to represent large nonlinear warping or poor global slice-to-slice registration. In these results, the proposed method has been run with r = 10 and s = {2, . . . , −3}, i.e., a 21 × 21 grid of offsets at each scale level, two scales above image resolution and three below, giving a total representable deformation range of 40 pixels in any direction and a resolution of 1 8 pixel. scale 3 scale 2 scale 1 scale 0 scale −1 scale −2 Figure 5.3. Per-scale-level deformed image (first row) and Jacobian determinant (second row) results. Downsample factor is in powers of two, so scale 2 has minimum offset increments of 22 = 4 pixels, and scale −1 has minimum offset increments of 2−1 = 21 pixels. 60 We compare against the drop software package [89, 81], a state of the art system for 2D and 3D image registration using discrete optimization techniques. To give the most fair comparison, registration parameters for drop have been set to have the same maximum deformation range (40 pixels) and their “dense” sampling scheme. To match our software the SSD data metric and quadratic (i.e., squared Euclidean) smoothness metric have been used. Other parameters have been empirically chosen to give the best results on our challenging large global synthetic deformations. Table 5.1 shows RMS and max pointwise error from the true deformation, for each method, over the same ten randomly generated deformations. Errors have been masked to exclude regions where the true deformation falls outside the image domain. Multiscale belief propagation (MSBP) is our method, with results shown for both translational patch matching (no rot) and best-rotational matching (best rot) where the best match has been taken over 9 even samples in the range [− π8 , π8 ]. In addition results are shown where min-pooling over local data match energy is disabled (no agg), yielding a sparse-sampling scheme. Note that our BP implementation uses MPI-based multithreading, and tests have been run on 64 cores. Because the entire process is parallelizable, and the memory requirements are relatively small, we believe a GPU implementation would make this reasonable to run on a desktop. As shown in Figure 5.4, the failures of drop have been in regions undergoing large deformation, and towards the edges of the image. The fact that our method consistently comes very close to the true deformation with or without sampling the rotational space shows the importance of dense sampling so as not to miss possible correspondences. The addition of rotational invariance shows a more modest improvement. In practice we find that smoothness generally resolves ambiguous matches, and rotational invariance reduces the prevalence of small-scale artifacts resulting from the precomputed patch-based matching term. 5.4 Conclusion This chapter presents a discrete registration technique for estimating smooth deformations, allowing both large deformations and dense, subvoxel matching criteria. Previous approaches have Table 5.1. MSBP performance: av. time is the average time per run, and data time is the amount of time used computing the data match term. method rms error max error av. time data time 1 MSBP, no agg 0.80 26.2 132 (8448 ) 12 (7681 ) MSBP, no rot 0.78 5.88 155 (99201 ) 39 (24961 ) MSBP, best rot 0.76 5.12 567 (363491 ) 451 (289251 ) drop 1.77 59.2 116 94 1 estimated times if run with single CPU thread 61 (a) (b) (c) Figure 5.4. Registration results showing a failure example of the drop package with large rotation. Subplot (a) shows the generated target image and the true deformation field. Subplot (b) shows registration results for our method, while (c) show registration results using drop. Note that while the results in (c) look underregularized, higher smoothness parameters caused drop to miss the large deformation entirely. employed a coarse registration grid with a nonlinear interpolation function to make the estimation of large, smooth deformations tractable. The proposed method achieves reasonable computation times through a multiscale formulation and the use of parallelizable loopy belief propagation for optimization. This formulation also incorporates a measure of rotational invariance in patch-based matching terms in order to accommodate local deformation effects. Discrete registration techniques alleviate the dependence of optimization results on initialization, and are particularly effective in cases of sparse or repetitive features. The application to transmission electron microscopy data shows the effectiveness in such a challenging registration case. Further work is necessary to incorporate this method into a composite deformation model such as the additive layer model presented in previous chapters, but it has the promise of improving the robustness of related estimation problems involving multiple incoherent motions and repetitive structures. CHAPTER 6 PIECEWISE DIFFEOMORPHISMS FOR SLIDING MOTION ESTIMATION The previous chapters consider nonsmoothness arising from the imaging modality, where anatomical motion, assumed to be smooth in 3D, is viewed as nonsmooth overlapping motion when projected as a 2D view. However, the common assumption of smooth anatomical motion is regularly violated when adjacent but disconnected organs slide against each other along the inter-organ boundary during normal physiological motion. This chapter considers such physical nonsmoothness when imaged in an object’s native dimensionality, developing a model for representing this type of discontinuity while maintaining key principles of current state-of-the-art registration models such as invertibility of the resulting deformation. A primary consideration in developing a registration model is the choice of transformation; a representation must be chosen that is sufficiently flexible to accommodate the expected changes between images, while aiding optimization by disallowing unrealistic solutions. The highly nonlinear deformations needed to represent anatomical variability or physiological motion preclude the use of low-order parameterizations, making a densely sampled vector field an appealing representation for the transformation. This representation is extremely expressive, but requires additional constraints to produce physically plausible results. Towards this end, requiring smooth and invertible (diffeomorphic) transformations has become standard practice in medical image registration. The choice of allowable transformations represents a tradeoff between fidelity to the physical process being modeled and tractability of the resulting optimization. Smooth, invertible transformations have proven to represent a useful point of balance in this tradeoff. Smoothness conditions regularize the otherwise ill-posed registration problem; it guarantees continuity, constraining nearby points in one image to remain nearby in the deformed image, as well as differentiability. This representation of an imaged scene as solid, connected object accurately represents much of the anatomy on which medical image registration focuses. Invertibility, meanwhile, encodes the intuitive notion of one-to-one correspondences, disallowing the appearance or disappearance of structures during deformation. This assumes that registration will occur between images with objects showing the same morphological structure, or as applied to medical imaging, that anatomies to be 63 registered show the same configuration of organs and internal structures. This assumption is especially valid in modeling patient motion, where the same anatomy is present in all images. The wide adoption of methods enforcing smoothness and invertibility constraints is evidence that anatomical motion or variability are generally well modeled by diffeomorphic transformations. There are clinically important instances, however, where the global smoothness constraint (i.e., smoothness enforced over the entire deformation domain) does not properly model transformations in the imaged anatomy. For instance, discontinuous motion occurs along organ interfaces that are not directly attached, such as between the lung and thoracic cage, or the liver and diaphragm. In order to accurately model nonsmooth anatomical motions, we propose replacing the assumption of global smoothness with a piecewise-smooth model, while continuing to require a globally invertible deformation. Discontinuous motion is then allowed between the smooth regions. This work introduces a method for estimating such globally invertible and piecewise-smooth deformations. The proposed method constructs the final piecewise-smooth deformation from a set of diffeomorphic deformations and a segmentation indicating which deformation affects each location. The segmentation and component deformations are estimated such that global invertibility of the final deformation is maintained. Driven by a novel formulation of the invertibility constraint as a discrete pairwise penalty, optimization alternates between a graph-cut-based estimation of the motion segmentation and a continuous optimization of the constituent deformations. The motion segmentation guarantees that discontinuities occur only along a limited set of coherent boundaries, and the joint estimation of mask and deformations allows anatomical structures to be automatically grouped based on the similarity of their motion. The remainder of the chapter is organized as follows: Section 6.1 discusses related work. Section 6.2 introduces the globally invertible and piecewise-smooth deformation model we use for representing discontinuous motion, and Section 6.3 presents the registration formulation and discrete/continuous optimization method. We presents quantitative and qualitative results in Section 6.4, applying the proposed method to model discontinuous motion of the lung in 4DCT imaging. Finally, Section 6.5 presents conclusions and discussion. Portions of this chapter have been previously published in [91]. 6.1 Related Work The challenge of discontinuous anatomical motion has often been noted, especially in registering 4DCT chest imaging, which shows conspicuously discontinuous motion between the lower lung and thoracic wall. Early work addressing this problem [92] segmented organs or organ groups, and separately registered each corresponding set. Wu et al. [93] recognized the need to add a penalty 64 to avoid gaps (or overlaps) in the resulting deformations. This is the same principle behind the invertibility constraint proposed in this work; however, in this work the region deformations are jointly computed, and segmentations are estimated as part of the registration process. Various other methods have been proposed in modeling discontinuous motion in respiration, including biomechanical-based finite element models (FEM) [94, 95] and hybrid FEM/intensity based registration [96]. These methods explicitly represent discontinuity boundaries, but require extensive patient-specific modeling. Vandemeulebroucke et al. [97] give an automated method for creating a motion mask of the anatomy that defines discontinuity locations, but base this on anatomical features, and not the observed motion. Other methods attempt to better accommodate lung motion without explicit discontinuity modeling, using techniques such as spatially-varying regularization [98], modified differential penalties allowing large shear [99], and modeling lung motion via a diffeomorphism on densities [100]. A great deal of image registration work from computer vision has also considered the similar problem of motion-based segmentation (e.g., [101, 102]), in which the image domain is partitioned into a set of regions, with each region undergoing independent motion. This includes methods using a similar optimization structure to this work, alternating between discrete and continuous updates [103]. However, the primary challenge this work addresses is enforcing invertibility, a constraint not applicable to video sequences where occlusion is expected. Recent work on modeling discontinuous anatomical motion has either guaranteed invertibility, but required a precomputed segmentation of the discontinuity boundary; or has automatically estimated discontinuities, but not guaranteed invertibility. Thus, these methods cannot estimate anatomically consistent deformations directly from time-series images. Work requiring precomputed locations of discontinuity have enforced invertibility via a b-spline decomposition [104], Neumann boundary conditions [105], or anisotropic diffusion [106]. The invertibility constraint we propose is similar to [107], disallowing region-based overlap. However, this work adapts this constraint to estimate the partitioning of the domain instead of requiring the discontinuity locations as an input to the algorithm. In work automatically estimating discontinuous anatomical transformations, various discontinuity preserving regularization models have been used such as total variation [108, 109], bilateral filtering [110], or truncated quadratic penalties [111]. Although these methods can represent discontinuous motion, they may introduce spurious non-anatomical discontinuities and do not guarantee the invertibility of the deformation. Similarly, Heinrich et al. [112] models discontinuous motion by combining the results of multiple discrete registrations (using belief propagation) of the same image using different supervoxel clusterings and graph structures, allowing discontinuities 65 but without guaranteeing invertibility. As in the proposed model, Kiriyanthan et al. [113] controls the locations of discontinuities by estimating a partition of the deformation domain, allowing discontinuities only along the partition boundaries. However, unlike this work, their formulation does not guarantee the invertibility of the resulting composite deformation. Schmidt-Richberg et al. [114] propose a diffusion-based regularization that allows discontinuities along a predefined boundary. In addition, they propose as a technique for automatically estimating locations of discontinuous motion. However, their discontinuity estimation is based on local measures of tangential deformation discontinuity at a precomputed set of image edges. Requiring discontinuities to occur only at edge locations incorrectly constrains the representable discontinuity boundaries, because not all inter-organ interfaces are detectable from static images. Further, estimating motion discontinuity locally is error prone, and the results show areas of discontinuity along image edges that should be experiencing locally smooth motion. The method we propose allows automatic and accurate estimation of the discontinuity boundary, while still guaranteeing invertibility of the resulting deformation. We represent the location of discontinuities via a labeling on the deformation domain, guaranteeing a coherent boundary structure. The next section lays out the deformation model and invertibility constraints in detail. 6.2 Piecewise-Diffeomorphic Formulation We refer to the allowable discontinuous motion in the proposed model as sliding, because the locations of nearby points on either side of a discontinuity boundary may show large relative motion, but region boundaries maintain contact. A globally invertible and piecewise-smooth (piecewisediffeomorphic) deformation allows discontinuous motion between smooth regions, while disallowing overlap or separation between these regions. Informally, we can see this through the effects of the invertibility and smoothness assumptions. Invertibility asserts the overlap or separation conditions pointwise, as each point in the codomain has a preimage in the domain (surjectivity, disallowing separation), and that preimage is unique (injectivity, disallowing overlap). The piecewise smoothness conditions guarantee that boundaries of each region are preserved by the deformation, meaning the initial boundary set maps onto the deformed boundary set. In order to represent this sliding motion, we construct the final piecewise-diffeomorphic deformation from multiple constituent deformations, each diffeomorphic over the entire image domain, and a partitioning function indicating how the constituent deformations are combined. The solution then consists of K deformations and a K-label partition of the domain. The global invertibility condition is enforced by disallowing separation or overlap between or among the individually deformed diffeomorphic regions. 66 More formally, we define a D-dimensional piecewise-diffeomorphic deformation φ̄t,0 : Ωt → Ω0 with Ωt , Ω0 ∈ RD . Using the notation of [7], the function φ̄t,0 (x) maps a point x at time t to its position φ̄t,0 (x) at time 0. The piecewise-diffeomorphic deformation is composed of a set of K labels L := {1 . . . K}, and segmentation defined on Ω0 , `0 : Ω0 → L. For each label k ∈ L there is an associated diffeomorphic deformation φkt,0 : Ωt → Ω0 . The piecewise function φ̄t,0 can then be defined pointwise by taking the value of one of the constituent deformations, φ̄t,0 (x) := φkt,0 (x) where k is chosen such that `0 ◦ φkt,0 (x) = k. The global invertibility condition disallowing separation or overlap can then be seen as ensuring that this definition exists and is unique at each point x ∈ Ωt , as illustrated in Figure 6.1. In order to formulate the global invertibility constraint, we define binary indicator functions for each label k ∈ L, χk0 : Ω0 → {0, 1}, as χk0 (y) := 1 0 `0 (x) if `0 (y) = k otherwise. (6.1) `t (x) φ1t,0 φ2t,0 `0 (x) `0 (φ̄t,0 (x)) xA xB xC xD Figure 6.1. The upper row shows a two-layer (i.e., two-label) composite deformation. The red region represents label 1, while the blue region represents label 2. Deformation φ1t,0 affects the red region, while φ2t,0 affects the blue region. The lower row shows problems that can arise if φ1t,0 and φ2t,0 are arbitrary diffeomorphic deformations. At point xA the deformations agree as to which deformation (φ1t,0 ) is in effect, as `0 ◦ φ1t,0 (xA ) = 1 and `0 ◦ φ2t,0 (xA ) 6= 2. A similar situation occurs at xC (where φ1t,0 is in effect). However, both deformation are in effect at xB (`0 ◦ φ1t,0 (xB ) = 1 and `0 ◦ φ2t,0 (xB ) = 2) and no deformation is in effect at xD (`0 ◦ φ1t,0 (xD ) 6= 1 and `0 ◦ φ2t,0 (xD ) 6= 2). 67 By construction, then, k k∈L χ0 (y) P = 1 for all y ∈ Ω0 . Defining χkt := χk0 ◦ φkt,0 , disallowing separation or overlap can be stated as the constraint X χkt (x) = 1 ∀ x ∈ Ωt . (6.2) (6.3) k∈L An indicator function taking the value 1, χkt (x) = 1, asserts that deformation φkt,0 is in effect at point x. The constraint stated in terms of the labeling is then at each x, `0 φkt,0 (x) = k should be true for exactly one k. If the statement is not true for any label k, then we have separation in the deformation at time t, and if it is true for two or more labels we have overlap at this point. Given the indicator functions, we can define the segmentation at time t, `t (x) := k χkt (x) = 1, s.t. (6.4) which, for x ∈ Ωt , is guaranteed to exist and be unique by Equation (6.3). We can then rewrite the definition of φ̄t,0 as ` (x) t φ̄t,0 (x) := φt,0 (x). (6.5) Now that φ̄t,0 is defined, we can verify that it is invertible. It is surjective because each point y ∈ ` (y) 0 Ω0 can be mapped to a point x = φ0,t (y) via the invertibility of the constituent deformations, ` (y) 0 and based on the definition of φ̄t,0 and `t (x), φt,0 must be the constituent deformation in effect at x. It is injective because any φ̄t,0 (x) s.t. `t (x̂) = k is a unique mapping among points with the same label, {x̂ \| `t (x̂) = k}, via the invertibility of the constituent deformations, and unique among points with different labels, {x̂ \| `t (x̂) 6= k}, due to the nonoverlapping constraint between label regions. Together this shows the invertibility of the composite deformation φ̄t,0 . 6.3 Registration Formulation and Optimization Given two input images f0 and ft (assigned artificial time indices 0 and t), we solve the standard pairwise registration problem, finding a deformation φ̄t,0 such that f0 ◦ φ̄t,0 ≈ ft . (6.6) The input images f0 and ft are D-dimensional scalar-valued functions, f0 : Ω0 → R and ft : Ωt → R where Ω0 , Ωt ⊂ RD , and φ̄t,0 : Ωt → Ω0 . We measure the image dissimilarity via a functional of the form Z Edata (f0 ◦ φ̄t,0 , ft ) := D(f0 ◦ φ̄t,0 , ft , x) dx, (6.7) Ωt where D(g, h, x) is a real-valued local dissimilarity term between images g and h, measured at a point x. In this work we use a normalized gradient field (NGF) dissimilarity. 68 Given the constraints on the labeling `0 and deformations Φ guaranteeing a piecewise-diffeomorphic deformation (6.3), we choose among valid deformations by optimizing over an objective function that balances the tradeoff between data matching and the regularity of the deformation and labeling, E(Φ, `0 ) := Edata (Φ, `0 ) + λper Eper (`0 ) + λpdf Epdf (`0 ) + λreg X Ereg (φkt,0 ). (6.8) k∈L We reformulate the data term (6.7) as a function of Φ and `0 XZ χkt (x) D(f0 ◦ φkt,0 , ft , x) dx, Edata (Φ, `0 ) := (6.9) k∈L Ωt where χkt is defined by `0 and φkt,0 via Equations (6.1) and (6.2). Deformation regularization is taken over each constituent deformation, with the form of Ereg dependent on the registration method used. The term Eper penalizes the size of the segmentation perimeter, guaranteeing that discontinuities are permitted only on a set of measure zero. In addition, the penalty Epdf is a region-based intensity segmentation penalty, encouraging the labeling `0 to segment regions of similar intensity, for instance following organ boundaries. This behavior helps the method to choose anatomically conforming results where there is too little information in the deformations to define a unique boundary. In modeling lung motion, for instance, although sliding is easily identifiable in the lower lung, motion of the upper lungs closely match that of the surrounding tissue. Because no sliding is observed, the constituent deformations may be very similar in this region. In regions where two constituent deformations are identical, any boundary satisfies the invertibility condition and has identical data cost, making motion segmentation ambiguous. In such cases the perimeter penalty Eper encourages the smallest possible boundary, cutting through the upper lung. The intensity segmentation penalty, Epdf , instead encourages the discontinuity boundary to follow the visible lung boundary. With the energy terms defined, the problem we wish to solve can then be stated as argmin E(Φ, `0 ) s.t. Φ,`0 X χkt (x) = 1 ∀ x ∈ Ωt . (6.10) k∈L Image registration is already a nonconvex optimization problem, and here we greatly increase the parameter space by attempting to simultaneously estimate multiple deformations and a segmentation function. In addition, the optimization is made more challenging by the constraint on the interaction between the segmentation and deformations. In order to achieve an approximate solution, the invertibility constraint (6.3) is relaxed to a penalty, Einv (Φ, `0 ) := k k∈L χt P −1 1 . (6.11) 69 We write the relaxed version of (6.10) as argmin E(Φ, `0 ) + λinv Einv (Φ, `0 ), (6.12) Φ,`0 and optimization alternates between finding an optimal labeling `0 for a fixed set of deformations Φ, and optimizing over Φ while keeping `0 fixed. This alternating optimizations do not increase the objective (6.12) at each step, guaranteeing convergence. 6.3.1 Estimation of the Partitioning Function `0 In the segmentation estimation step we hold the deformations Φ fixed and optimize (6.12) over `0 argmin Edata (Φ, `0 ) + λper Eper (`0 ) + λpdf Epdf (`0 ) + λinv Einv (Φ, `0 ). (6.13) `0 The labeling `0 takes discrete values, and we therefore use a discrete optimization strategy. We discretize the problem by defining f0 and `0 only on a discrete set of N points arranged on a regular k N grid, {y n }N n=1 ⊂ Ω0 , and similarly define ft , `t , φt,0 , etc. on {xn }n=1 ⊂ Ωt . The objective is written as a discrete set of unary and pairwise functions of node labelings that can be solved exactly for binary labelings (K=2), and approximately otherwise, via graph-cut algorithms [78, 115]. To define values at an arbitrary point we use linear interpolation for the real-valued frames f0 and ft , and nearest-neighbor interpolation for the integer valued labeling `0 . Nearest-neighbor interpolation is commonly used to deform labelings as it cannot introduce new nonlabel values. It also allows us to write any evaluation of the deformed labeling `0 φkt,0 (xn ) as an evaluation at a discretized location `0 (y m ), where y m = nn φkt,0 (xn ) , and where nn(·) is the nearest-neighbor projection operator. This formulation allows us to write terms involving deformed labeling using only values at the discretized locations {y n }N n=1 ⊂ Ω0 . The data matching term Edata (6.9) can be written discretely as a unary pointwise penalty on the labeling `0 , as can Epdf . In order to write Edata in terms of the discretized labeling `0 , we map the discretely evaluated matching function defined at time t back to the labeling locations at time 0 that determine whether the given deformation is in effect. The matching cost of assigning label k to point y n is then given by dyn (k) := X D(f0 ◦ φkt,0 , ft , x), (6.14) x∈Mkyn where Mkyn := {xn \| nn φkt,0 (xn ) = y n }, i.e., the set of discrete points in {xn }N n=1 that map to point y n under nearest-neighbor projection of φkt,0 . 70 We write Epdf as a negative log likelihood penalty on `0 Epdf (`0 ) :∝ − ln (p(ft \| `0 )) = − N X ln (p (ft (y n ) \| `0 (y n ))) , (6.15) n=1 where p(ft (y n ) \| `0 (y n ) = k) is a kernel density estimate of the distribution of {ft (y n ) s.t. `− 0 (y n ) = k}, and `− 0 is the segmentation from the previous iteration. Because this penalty is intended only to resolve ambiguous situations, and not enforce or rely on an intensity based segmentation, its influence is kept relatively weak; for many problems λpdf = 0 yields acceptable results. We now formulate Eper and Einv discretely as sums of pairwise terms. For ease of notation when working with label values, we define the functions q : L → {0, 1} (and similarly ¬q) for indicating equality (or inequality) of labelings (q for equals): 1 1 if i = j ¬q(i, j) := q(i, j) := 0 0 otherwise if i 6= j otherwise. (6.16) The perimeter penalty Eper is written as a simple Potts model smoothness term on `0 , N X X Eper (`0 ) := ¬q (`0 (y n ), `0 (y m )) , (6.17) n=1 y m ∈N (y n ) where N (y n ) is a set containing the neighbors of y n . In this work we use a 4- or 6-connected neighborhood in 2D or 3D, respectively. We can directly write the discretized version of Einv (Φ, `0 ) (6.11) as the sum of N terms (one associated with each xn ), term n taking as inputs the K labelings {`0 ◦ φkt,0 (xn )}k∈L Einv (Φ, `0 ) := N X c(`0 ◦ φ1t,0 (xn ), . . . , `0 ◦ φK t,0 (xn )), (6.18) n=1 with ! c(l1 , . . . , lK ) = X q(lk , k) k∈L See Table 6.1 for a two-label example. Table 6.1. Binary value table for c(l1 , l2 ). l1 l2 c(l1 , l2 ) 1 1 2 2 1 2 1 2 0 1 1 0 `t (x) = 1 overlap separation `t (x) = 2 −1 . (6.19) 71 Although Equation (6.18) discretely encodes the invertibility constraint, for K > 2 the Kinput terms c(·) associated with each point are difficult and costly to optimize over. Instead, we introduce auxiliary variables representing the labeling `t , and optimize over both `0 and `t . Under the invertibility constraint (6.11), `t is uniquely determined via `0 and Φ, so we have not introduced any additional unknowns. However it allows us to rewrite each K-input term in (6.18) as the sum of K pairwise terms Einv (Φ, `0 , `t ) := N X X ck `0 ◦ φkt,0 (xn ), `t (xn ) , (6.20) n=1 k∈L where ck (i, j) := ¬q(q(i, k), q(j, k)). The function ck `0 ◦ φkt,0 (xn ), `t (xn ) indicates whether labels `0 ◦ φkt,0 (xn ) and `t (xn ) agree that deformation k is (or is not) in effect at φkt,0 (xn ) (see P Table 6.2). To see the equivalence of c(l1 , . . . , lK ) and k ck (lk , lt ), where lk and lt indicate `0 ◦ φkt,0 (xn ) and `t (xn ), respectively, for some point xn , consider that the auxiliary variable lt is an input to K binary factors ck (lk , lt ) for k ∈ L. In order for lt = j to satisfy each of these factors P (i.e., k ck (lk , lt ) = 0), exactly one of the labels must indicate that its corresponding deformation is in effect, specifically lj = j, and lk 6= k for k 6= j. See Figure 6.2 for a 3-label example of this equivalence. In the discretized formulation, Einv is a pixelwise count of the constraint violations from Equation (6.3). For a large enough value of λinv , Einv becomes an essentially hard constraint. Consider a single-label segmentation k(x) = k ∀ x. This labeling has no boundary and therefore zero boundary cost (Eper (k) = 0), and no possible constraint violations (Einv (k) = 0). Therefore setting λinv to a value greater than the minimum single-label segmentation cost, i.e., λinv = min (Edata (Φ, k) + λpdf Epdf (`0 )) + , for some positive , ensures that there is a valid solution k of lower cost than a single constraint violation, guaranteeing that the optimal solution satisfies the constraint. We now have the objective function (6.13) written as unary (Edata and Epdf ) and pairwise (Eper and Einv ) terms on the label values `0 (y n ) and `t (xn ). We can therefore formulate the Table 6.2. ck values for inputs p and q, here k̄ indicates that the input is any label other than k. For the 2-label case, this reduces to Table 6.1, with c = c1 = c2 p q ck (p, q) k̄ k̄ k k k̄ k k̄ k 0 1 1 0 72 c 3 c c1 lt c2 dl2 dl1 dl3 l2 l1 dl2 dl1 l3 dl3 l2 l1 3-input factor 2-input factors l1 l2 l3 l1 l2 l3 ˆlt 1̄ 1̄ 1̄ 1̄ 1 1 1 1 ∈ {1, 2, 3} 3 2 ∈ {2, 3} 1 ∈ {1, 3} ∈ {1, 2} ∈ {1, 2, 3} 1̄ 1̄ 1̄ 1̄ 1 1 1 1 2̄ 2̄ 2 2 2̄ 2̄ 2 2 3̄ 3 3̄ 3 3̄ 3 3̄ 3 c(l1 , l2 , l3 ) 1 0 0 1 0 1 1 2 2̄ 2̄ 2 2 2̄ 2̄ 2 2 3̄ 3 3̄ 3 3̄ 3 3̄ 3 l3 P k ck (lk , ˆlt ) 1 0 0 1 0 1 1 2 Figure 6.2. ComparisonPshowing the equivalence of a single 3-input factor c representing the pointwise sum penalty \| k∈L χkt (x) − 1\| for a point x ∈ Ωt with three binary factors {ck }3k=1 and auxiliary label lt . The negated numbers indicate any labeling except the given label, i.e., lk = 3̄ indicates lk ∈ {1, 2}. The value ˆlt in the table represents the minimum-cost label(s) for lt , which is unambiguous given the values of the inputs {lk }3k=1 . label estimation as inference on a first-order Markov random field, for which efficient optimization techniques exist. See Figure 6.3 for a diagram of the graph setup. For binary labels a single graph-cut can compute the optimal labeling [115], and for K > 2 the efficient α-expansion [78] method can generate an approximate solution. In order to apply α-expansion optimization, we must verify that the proposed pairwise penalties satisfy the relation from [115]: F(p, α) + F(α, q) ≥ F(p, q) + F(α, α) (6.21) for any labels p, q, α and the pairwise dissimilarity F. The pairwise energy terms from Eper , ¬q(p, q), form a metric on labels, and therefore satisfy (6.21). The function ck making up the pairwise terms of Einv is a pseudometric, as it assigns zero distance to some distinct labels, e.g., c2 (1, 3) = 0. Because ck (p, p) = 0 for any label p, (6.21) reduces to the triangle equality, which is satisfied by the pseudometric ck . Representing the partition function as a discretely sampled labeling allows for efficient and robust discrete optimization, but it also ties the invertibility constraint to the resolution of the discretization. Based on the particular values of the continuous deformation fields along a boundary, locations may exist where overlap (or separation) of up to one voxel are accommodated without violating the discrete constraints, while in other locations arbitrarily small overlap (or separation) may cause a constraint violation. This means the invertibility constraint along the partition boundary can only be consistently enforced to the level of voxel resolution. 73 `t x c2 c1 `0 d s d φ1t,0 (x) φ2t,0 (x) Figure 6.3. Graph structure for a 2-label optimization induced by a point x ∈ Ωt . Black nodes represent label values, the unary d factors represent Edata at point x, the s factors represent Eper in the affected neighborhoods, and the ck factors encode Einv at x. The discretization also affects the discontinuity boundaries representable under the invertibility constraint. Nearest-neighbor interpolation can map deformation locations representing distinct points (with distinct labels) onto the same discrete label variable. This overlap can cause situations in which, even when given an analytic model of sliding motion satisfying the invertibility constraints, a discretely-sampled representation may violate the invertibility constraints at some voxel locations. These spurious violations mean that enforcing the hard constraints can yield unsatisfactory results, producing segmentations that cannot conform to the true discontinuity boundary. Instead, we choose λinv to allow some constraint violations. In locations of constraint violation the labeling of `t is ambiguous, but the proposed optimization model chooses among ambiguous labels by minimizing the data energy. Ideally, the violated constraints allow accurate discontinuity boundaries to be chosen without introducing overlap or separation beyond the subvoxel level, which is not consistently enforced even by the hard constraint. Although relaxation of the constraints allows the possibility of large regions of noninvertibility, in practice (see Section 6.4) the overlap penalty is sufficient such that violations beyond the subvoxel level are extremely rare. 6.3.2 Deformation Estimation In the constituent deformation estimation step, we hold the labeling `0 fixed and optimize (6.12) over the deformations Φ argmin Edata (Φ, `0 ) + λreg Φ X Ereg (φkt,0 ) + λinv Einv (Φ, `0 ). (6.22) k∈L In order to guarantee that each of the constituent deformations is a diffeomorphism, we use the large deformation diffeomorphic metric mapping (LDDMM) model [6, 7], where φ(x, t) is defined via the time-varying velocity field v(x, t), Z t φ(x, t) = φ(x, t0 ) + v(φ(x, s), s) ds, t0 74 where φ(x, t0 ) = x. Smoothness of the deformation φ is enforced by penalizing R tt t0 kv(x, t)k2L dt, where kvk2L = hLv, vi2 for smooth velocity field v, and L is a differential operator penalizing nonsmoothness. Under reasonable smoothness assumptions the deformation φ(x, t) is guaranteed to be diffeomorphic, guaranteeing the existence of φ−1 (x, t). However, the optimization is agnostic towards the deformation model, and a b-spline deformation model was also tested as part of this work. In the b-spline case we represent each φkt,0 via a b-spline control grid, and impose additional regularization via Ereg (φkt,0 ) := k∇φkt,0 k2 . Although invertibility is not guaranteed by this regularization, it can be easily verified. In practice the regularization is significant (as allowed by the piecewise formulation) and the deformations are sufficiently well-behaved that we do not observe noninvertibility during optimization. In order to produce a continuous objective function, linear interpolation is used in computing the deformed masks during optimization of (6.22), and optimization over the L1 norm in Einv is performed via the iteratively reweighted least squares algorithm [116, 117]. The ability of the alternating optimization to accurately estimate the discontinuity boundary relies on the inability of a single deformation to accurately describe the observed motion at the discontinuity. If a single deformation can accurately depict the imaged motion (without incurring a large nonsmoothness penalty), the perimeter penalty Eper will cause a uniform (single value) segmentation, collapsing the model to a single-deformation solution. Therefore, in order to ensure correct convergence, a multiscale optimization is performed. For the LDDMM-based deformation, the individual velocity fields vt are initially truncated in frequency space to allow only very smooth deformations. Such frequency space truncation has been shown to be an effective regularizer in other work [118, 119]. As optimization progresses and the motion masks become more accurate, the truncation is gradually relaxed. For the b-spline deformation model, refinement of the control point grid accomplishes the same goal. In addition to overcoming nonconvexity in the joint deformation and segmentation estimation, this provides the convexification benefits of standard multiscale registration techniques for each constituent deformation. 6.3.3 Segmentation Initialization Because we are optimizing a highly nonconvex function, an initialization sufficiently close to the desired result is often advantageous. Because a goal of the proposed algorithm is to segment the domain along regions of motion discontinuity, we automatically create an initial segmentation that separates regions showing different motion. This initial motion mask is created from a single discontinuity-preserving registration, which does not guarantee invertibility. We enforce only a simple anisotropic TV penalty on deformation smoothness (as in [82]): 75 Z argmin ν D(f0 (x + ν(x)), ft ) + λtv Ωt D X k∇d ν(x)k dx, (6.23) d=1 where ∇d is the derivative along dimension d. This penalty results in a discontinuous motion field containing piecewise-constant regions. The elements of ν are then grouped into K clusters using an unsupervised Gaussian mixture model, and the cluster labels are then mapped back to the corresponding spatial locations creating an initial segmentation. Because the deformation contains many spurious discontinuities, it is discarded after initial segmentation generation, and the individual deformations of the model are each initialized to the identity transform. Although this initial segmentation is rather crude, it is sufficient for optimization to converge to the desired solution (see Figure 6.4). 6.4 Results We have applied the proposed method to estimating the discontinuous motion observed as the lung slides along the thoracic cage during respiration. Anatomically, the lungs are directly connected to the rest of the anatomy only via the pulmonary arteries, trachea, and other structures of the mediastinum that attach to the inner lung near the heart. The pleural cavity, a thin fluid-filled space between the lungs and thoracic wall, allows the outer boundary of the lungs to slide freely. Although not directly attached to the diaphragm, negative pressure during inhalation causes the low-density lung tissue to expand to fill the space above the contracting diaphragm. The sliding motion is most visible in the lower lung, where the large movement of the lung–diaphragm interface is contrasted by the nearly static ribs. We have tested the proposed method on the publicly available DIR-lab dataset [120, 121], which consists of ten 4DCT chest datasets as well as landmark correspondences. Resolution (voxel size) varies slightly between datasets but is approximately 1×1×2.5mm. Datasets 6–10 contain a Figure 6.4. Initialization via clustering of discontinuity-preserving deformation (left), and the final segmentation found by the proposed algorithm (right) for DIR-lab case 1. Although the initial segmentation is rather crude, it is sufficient for optimization to find the desired solution. 76 large volume of empty space around the patient, and were cropped to a region of interest around the patient body. We have registered the maximum inhale and exhale images for each dataset, each of which has 300 voxel-resolution correspondences manually annotated. We have used a binary motion segmentation (K = 2) to represent the breathing motion. A multiscale optimization was used, both in the local adaptability of the deformations (see Section 6.3.2) and on the image resolution (for computational efficiency). Ten outer iterations (alternating between discrete segmentation estimation and continuous deformation optimization) were performed at each scale level. Continuous optimization of the constituent LDDMM deformations was implemented in CUDA and performed on a Kepler-generation GPU. Discrete optimization of the motion segmentation used a modified version of the gco library [78]. The L operator in the LDDMM model takes the form of the viscous fluid operator given in [6], Lu = 4u + 0.2∇∇ · u + 1e−4 u, and each time-varying velocity field was discretized with four timesteps. Other parameters for both the initialization (λtv ) and optimization (λper , λpdf , λreg , λinv ) steps were empirically chosen, but fixed across all datasets. In order to better register fine features and overcome intensity differences due to density changes in the lung during breathing, we chose a normalized gradient field (NGF) dissimilarity [122, 123] as the pointwise image dissimilarity measure, D(·) in Equation (6.7). Figure 6.5 shows the computed motion segmentation for each dataset. The qualitative results presented here are from the LDDMM motion model, but the b-spline model produced very similar results. During inhalation the diaphragm contracts, pressing down on the abdominal organs and moving them lower and outward. The lower lungs therefore show similar motion to the organs of the abdominal cavity. The segmented region therefore includes the lungs as well as regions of visible motion continuing into these abdominal organs, while excluding structures such as bone that show little motion. This motion segmentation is in contrast to a segmentation of the lung, which is the discontinuity map used by many of the techniques that require predefined sliding locations [114, 106, 107]. Using the visible lung boundary allows for simple automated or semiautomated segmentation techniques to be used, but as other authors have noted [104, 97], the observed motion is better accommodated via a sliding boundary that extends below the lung along the abdominal wall, as in the results in Figure 6.5. Figure 6.6 compares the effect of the piecewise-smooth deformation to that of a single smooth deformation run with the same parameters. This highlights the errors caused by smooth registration, as the motion of the lungs spreads into the static structures of the spine, as well as clearly showing the discontinuous motion along the outer lung boundary. Figure 6.7 and Table 6.3 give the landmark errors on these datasets. We compare our methods described here, labeled LDDMM and b-spline, to leading methods NGF(b) of [124] and pTV 77 case 1 case 2 case 3 case 4 case 5 case 6 case 7 case 8 case 9 case 10 Figure 6.5. Motion segmentation results for all DIR-lab datasets. For each case, the left figure shows a coronal slice near the spine with motion mask region highlighted in red, and the right figure shows transparent 3D rendering of this region. of [109]. NGF(b) uses a precomputed mask to apply normalized gradient field registration only within the lungs, while pTV uses a discontinuity preserving regularization that does not guarantee invertibility. While our method shows slightly worse average performance (average error of 1.07mm with b-spline model or 1.09mm with LDDMM deformation model vs. 1.00mm for NGF(b) and 1.01 for pTV), it provides invertibility without requiring a precomputed mask. While the LDDMM model guarantees invertibility of the constituent deformations, it provides slightly worse performance in terms of correspondence error. Based on our experience, the direct spatial smoothness constraints imposed on the constituent deformations by the b-spline control grid resolution is somewhat more effective and more easily tuned than the frequency truncation of the time-varying velocity fields we use for multiscale smoothness in the LDDMM model. This agrees with the observation that composition of very smooth velocity fields can lead to highly nonlinear local deformations [12]; even with highly-regularized flow fields, the resulting deformation can 78 21 20 2−1 (a) (b) + 0 - (c) (d) Figure 6.6. Comparison of our piecewise-smooth deformation ((a) and (c)), and a single globally smooth deformation ((b) and (d)), on the DIR-lab dataset 1. The figures on the top row show the Jacobian determinant of the deformation on a log-scale colormap. Note the clear sliding boundary (dark/red colors where the Jacobian is undefined along the sliding boundary) in (a), and the nonphysical deformation of the spine region in (b). On the bottom row are difference images between exhale and deformed inhale for the two methods. Again notice the motion of the spine with a single deformation. show a significant amount of shearing, reducing the algorithm’s ability to distinguish between deformations and sliding boundaries. While the relaxed invertibility constraint Einv (6.11) can be enforced exactly by a sufficiently high value of λinv (as discussed in Section 6.3.1), the use of nearest-neighbor interpolation of the labeling means that even a true continuous solution may cause discrete constraint violations. We therefore choose a value of λinv that does not enforce the constraint exactly. In order to evaluate the effect of the constraint violations, we look at the sum of the deformed segmentation indicator P functions k∈L χkt (x) under linear interpolation (see Figure 6.8). The use of nearest neighbor interpolation in the discrete sum penalty Einv means that even under satisfaction as a hard constraint (i.e., Einv (Φ, `0 ) = 0), there may be subvoxel overlap or separation of the discrete segmentation 79 DIR-lab correspondence error mean error + std. dev. (mm) 4 3 NGF(b) pTV b-spline LDDMM 2 1 0 1 1 2 3 4 5 6 7 dataset 8 9 10 ttl Figure 6.7. Comparison of DIR-lab correspondence errors for our framework (b-spline and LDDMM) and leading DIR-lab registration methods (NGF(b) and pTV). Table 6.3. Mean and standard deviation (in parenthesis) of correspondence errors, in millimeters, for the n = 300 landmarks defined at the extreme breathing phases for each dataset. NGF(b) [124] and pTV [109] are leading registration methods for the DIR-lab datasets, and b-spline and LDDMM use our described method with a b-spline or LDDMM deformation model, respectively. dataset NGF(b) pTV b-spline LDDMM 1 2 3 4 5 6 7 8 9 10 av. err 0.76 (0.89) 0.80 (0.88) 0.96 (1.07) 1.33 (1.29) 1.18 (1.45) 1.03 (1.04) 0.92 (0.93) 1.13 (1.15) 1.00 (0.96) 0.91 (0.99) 1.00 (1.08) 0.79 (0.94) 0.74 (0.91) 0.95 (1.09) 1.21 (1.19) 1.24 (1.50) 0.96 (1.01) 0.97 (0.99) 1.17 (1.47) 1.03 (1.08) 1.00 (1.02) 1.01 (1.14) 0.79 (0.90) 0.79 (0.95) 0.87 (1.04) 1.35 (1.28) 1.25 (1.53) 1.05 (1.04) 1.12 (1.18) 1.22 (1.26) 1.12 (1.03) 1.12 (1.38) 1.07 (1.17) 0.80 (0.92) 0.84 (0.94) 0.88 (1.06) 1.30 (1.28) 1.27 (1.51) 1.11 (1.03) 1.11 (1.24) 1.27 (1.43) 1.18 (1.01) 1.11 (1.15) 1.09 (1.17) boundaries under continuous deformation. As full-voxel overlap or separation would lead to a value of 2 or 0 respectively, we observe that nearly all violations are of subvoxel size. Across all datasets, only 156 total voxels are such full-voxel violations. This is less than 0.01% of perimeter voxels, where we define perimeter voxels as those for which the locally evaluated sum constraint P is not identically 1, i.e., k∈L χkt (x) 6= 1, under linear interpolation of the indicator functions. The method includes an intensity segmentation term (Epdf ), but consistent motion is the primary force driving the segmentation. This is especially important when we are interested in tracking e.g., tumors in the lung that do not have the appearance of lung tissue. In Figure 6.9 we show an example from radiation oncology planning in which a lung tumor occurs near the boundary of the lung. Since the tumor moves with the lung tissue, the sliding boundary correctly encompasses the tumor rather than showing discontinuity along the visible tumor/lung boundary. 80 2 1 0 P k Figure 6.8. Sum of deformed label masks, k∈L χt , with linear interpolation. No voxels of full P P overlap ( k∈L χkt (x) = 2) or separation ( k∈L χkt (x) = 0) are observed in this slice, and less than 0.01% of perimeter voxels (156 total voxels) are such full-voxel violations across all datasets. a b c Figure 6.9. A slice from a 4DCT dataset showing lung tumor near the ribs (a), The motion mask generated by our algorithm (b), and a 3D rendering of the sliding surface and the tumor segmentation (c). 6.5 Conclusion In this work we present the first framework for estimating a globally invertible and piecewisesmooth image registration based based on an automatically generated motion segmentation. Such deformations can accurately model nonsmooth physical changes that are ill suited to standard diffeomorphic image registration. Our success in estimating the invertible deformation and locations of discontinuity is enabled by two modeling choices: 1. Defining each constituent deformation over the entire image domain allows updates to the partitioning function by guaranteeing that any partitioning function has a well defined cost in our optimization model. 81 2. Optimizing the partitioning function discretely allows large optimal or near-optimal updates; and does not require relaxing from, or reprojecting to, integer constraints. The novel discrete formulation of the invertibility constraint allows effective optimization of the motion segmentation. The automated, accurate modeling of sliding motion of the lung has been demonstrated. However, the proposed method could be extended to accommodate other data. Modeling less prominent sliding motion, such as between the lung and liver, or between lung lobes, remains an open challenge. The introduction of a shape model indicating likely regions of coherent motion could help model less distinct sliding motion, as well as improve robustness to poor initialization. While we have described the allowable discontinuous motion in the proposed model as sliding, there are more exotic transformations that are technically allowed. For instance, regions not sharing a boundary could swap locations while still maintaining the invertibility of the full deformation. This is permitted because we have only enforced the invertibility constraint at the endpoints of the deformation, allowing noninvertible intermediate deformations as defined by the time varying flow fields v(x, t). We believe this could be addressed by enforcing the sum constraint (6.3) on t ∈ [0, t], but would require introducing the intermediate timesteps into the constraint graph for the labeling update. In practice, however, the local gradient-based optimization of the deformations, starting from the identity transformation, does not allow such exotic cases to develop. CHAPTER 7 MODELING SPATIALLY-LOCALIZED NONINVERTIBILITY WITH PIECEWISE DIFFEOMORPHISMS The previous chapter developed an image registration framework for representing sliding motion in the image domain, a particular type of discontinuous motion where adjacent but disconnected objects show relative motion along a common boundary. This framework uses a piecewisediffeomorphic formulation, maintaining desirable diffeomorphic properties over regions displaying diffeomorphic changes. However, there are many more instances of localized nondiffeomorphic changes that do not fall into the category of sliding motion. For instance, pre- and postoperative image pairs may show changes such as introduced medical devices, resected tissue, or the appearance of edema. Although much of the change between images may be accurately represented by a diffeomorphic transformation, these changes to the structure of the anatomy introduce localized regions of noninvertibility. This chapter introduces a new model for representing such regions of noninvertibility while estimating a diffeomorphic transformation over the remainder of the image domain. We do so by extending the piecewise-diffeomorphic framework of the previous chapter to allow separation between diffeomorphic segments, and represent noninvertible image regions as occupying these areas of separation. Explicitly modeling nondiffeomorphic changes in a diffeomorphic framework has been widely studied in brain imaging, where tumors, lesions, or traumatic brain injury lead to structural changes in the brain [125, 126, 127, 128, 129]. However, these methods rely on precomputed segmentations, normative atlases, or a large set of structurally-similar data from which to bootstrap a normative model. While inspired by this area, this chapter focuses on a 2D case of localized noninvertibility, providing a proof-of-concept for the proposed formulation. A common instance of localized noninvertibility in two dimensions occurs in registration between imaged histological slices, which often tear during preparation [130] (See Figure 7.1 for examples). The proposed formulation jointly estimates the piecewise-diffeomorphic transformation and segmentation of the noninvertible region directly from input images. In the case of registering torn histological slices, 83 Figure 7.1. Examples of torn histology slices from the mouse brain library [131]. the formulation therefore represents this physical tearing as separation between diffeomorphic regions, while estimating the noninvertible region introduced by the tear. Diffeomorphic image registration requires the existence of a one-to-one mapping between points in the input images, and therefore does not accommodate the introduction of new objects. The previous chapter relaxed the global smoothness constraints of the diffeomorphic registration model, allowing nonsmoothness but maintaining global invertibility. Here we introduce local noninvertibility into our model in a way that preserves the invertibility guarantees over regions for which correspondences exist by defining the deformation φ̄t,0 on a subset of the codomain φ̄t,0 : Ω0 → Ωt \ ωtτ , where ωtτ is a region of Ωt lacking correspondence in Ω0 . The invertibility constraint (6.3) now becomes X χkt (x) + χτ (x) = 1 ∀x ∈ Ωt . (7.1) k∈L The deformation is therefore still invertible (on Ωt \ ωtτ ) and piecewise smooth. The region of noninvertibility is well defined, and does not overlap with (or separate from) the deformed segmentation. A penalty is placed on the size of this region to discourage the labeling of locations for which correct correspondences exist as noninvertible Z τ Eτ (χ ) := χτ (x) dx. (7.2) Ωt Given an indicator function χτ marking noninvertible locations, the deformation estimation can be trivially updated to accommodate the modified constraint (7.1). The discrete segmentation optimization can also easily be modified to estimate noninvertibility by introducing an additional segmentation label τ to represent noninvertible locations in the time-t labeling `t . If we consider the 84 discrete formulation in which both `0 and `t are estimated, the data cost associated with assigning label τ to a point xn ∈ Ωt or y n ∈ Ω0 is given by dyn (τ ) := ∞, dxn (τ ) := λτ . Thus we allow regions to be labeled as noninvertible only in the time t segmentation, and a penalty λτ is imposed on the size of these noninvertible regions. 7.1 Results Figure 7.2 shows results on a synthetic tear generated from an optically imaged mouse brain histology slice from the Mouse Brain Library [131]. A squared intensity difference was used as the pointwise dissimilarity metric D(·). Initialization and optimization follows the procedure described for sliding motion estimation, with no pixels initially labeled as noninvertible. For tear estimation, the intensity segmentation penalty Epdf was turned off (λpdf = 0). A b-spline deformation model was used with two segmentation labels and the additional noninvertible region label. Multiscale optimization was used in image resolution (two scales) and b-spline resolution (two refinements per image scale). Although this is a relatively simple registration problem, the automatic noninvertible region estimation performs quite well, even correctly (according to our model) classifying the small strip of torn region that exists in Figure 7.2(a) as part of the invertible region and not part of the tearing region segmentation. These results demonstrate proof of concept for our model, which represents unmatched structures between images as separation between diffeomorphic regions. The general principle is to explicitly model the unmatched regions, and estimate an invertible transformation on a restricted domain, namely the original domain with unmatched regions removed. Joint estimation of the deformations and noninvertible region allows correctly matched areas to help localize the unmatched structures. We believe this technique has applications beyond such literal tearing to model more general cases of registration in which there is structural variation among images. 85 a b c + 0 - d e f g Figure 7.2. (a) and (b) are f0 and ft , respectively. (c) is the deformed version of f0 , with noninvertible tearing region marked. (d) and (e) are the initial and final difference images, respectively, with the tearing region shown in black in (e). (f) and (g) are f0 and ft with overlayed segmentations `0 and `t , respectively, here with the tearing region masked in opaque red. CHAPTER 8 CONCLUSION The work presented in this dissertation enables modeling and automatic estimation of challenging nondiffeomorphic motion in specific settings where existing techniques were inadequate. The composite deformation models allow greater flexibility in the types of motion the can represent; however, the increased parameter space also increases the nonconvexity of the resulting optimization problem, and care must be taken to find an appropriate solution. In estimating the parameters of the additive layer model, this took the form of direct estimation techniques over the layer parameters, allowing optimization to focus on estimating the deformations, as well as a custom layer prior to convexify the objective function. In estimating piecewise diffeomorphic deformations, the use of discrete optimization techniques and good initialization allow accurate estimation of the observed motion. Another recurring theme in this work is the use of accurate motion estimation to improve the detection of outliers—injected contrast in the case of fluoroscopic motion modeling, and noninvertible regions in histology slice registration. Additionally, the role of invertibility constraints is worth highlighting. Invertibility has traditionally been used to avoid undesirable results, such as folding, in smooth deformations [6]. However, as explored in this work (especially Chapters 6 and 7), it is a constraint which can be applied to discontinuous or locally noninvertible motion models as well. We believe that invertibility as a regularization, separate from smoothness, is applicable to a wider variety of deformation modeling, and believe that this area warrants additional investigation. Finally, the models developed in this dissertation are very general in the sense that they make only the most basic smoothness assumptions regarding motion and images. This allows them to be easily applied to new data, but does not take advantage of known anatomical shape or motion constraints. Performance improvements in specific tasks could be realized by adapting the priors to incorporate known or learned information. For instance, a shape prior on the discontinuity boundary for piecewise diffeomorphic motion estimation could significantly improve performance, as could an appearance prior to aid estimation of localized noninvertible regions. In fluoroscopic imaging, enforcing different layer motion models based on the expected layer contents for a partic- 87 ular application would help simplify estimation. Incorporating such domain knowledge or training data is an obvious avenue for future work. APPENDIX A DETAILS OF ESTIMATING TRANSLATIONAL MOTION IN THE ADDITIVE LAYER MODEL A.1 ML-MAP: Completing the Square Z p(f̄ , ¯` \| φ) d¯` Z p(f̄ \| ¯`, φ)p(¯`) d¯` p(f̄ \| φ) = = Writing out the probabilities from (3.8) and (3.14), we have p(f̄ , ¯` \| φ) = p(f̄ \| ¯`, φ)p(¯`) 1 > 2 ¯ = Cσ exp − 2 kΦ̄` − f̄ k2 Cλ, exp −¯` λH̄ + I ¯` 2σ 1 ¯` − f̄ k2 + ¯`> λH̄ + I ¯` k Φ̄ = Cσ Cλ, exp − 2 2σ 2 1 ¯> > ¯ 1 ¯> > 1 > > ¯ ¯ = Cσ Cλ, exp − ` Φ̄ Φ̄` − 2 2 ` Φ̄ f̄ + 2 f̄ f̄ + ` λH̄ + I ` 2σ 2 2σ 2σ 1 ¯> > 1 > 1 > > ¯ ¯ Φ̄ Φ̄ + λH̄ + I ` − 2 2 ` Φ̄ f̄ + 2 f̄ f̄ = Cσ Cλ, exp − ` 2σ 2 2σ 2σ KP TP 2 1 and Cλ, = π − 2 \|λH̄ + I\| 2 . We can complete the square and write this as a scaled Gaussian in ¯`. We want to write the terms in the exponential as where Cσ = (2πσ 2 )− > > (¯` − µΦ )> ΛΦ (¯` − µΦ ) = ¯` ΛΦ ¯` − 2¯` ΛΦ µΦ + µ> Φ ΛΦ µΦ , so by inspection we have ΛΦ = 1 > Φ̄ Φ̄ + λ H̄ + I 2σ 2 (A.1) and 1 > > > −2¯` ΛΦ µΦ = −2 2 ¯` Φ̄ f̄ 2σ 1 −1 > µΦ = 2 ΛΦ Φ̄ f̄ , 2σ (A.2) 89 giving p(f̄ , ¯` \| φ) = p(f̄ \| ¯`, φ)p(¯`) 1 > > > = Cσ Cλ, exp − (¯` − µΦ ) ΛΦ (¯` − µΦ ) − µΦ ΛΦ µΦ + 2 f̄ f̄ 2σ = Cσ Cλ, SΦ exp −(¯` − µΦ )> ΛΦ (¯` − µΦ ) (A.3) where 1 > > SΦ = exp µΦ ΛΦ µΦ − 2 f̄ f̄ . 2σ (A.4) APPENDIX B LAYERED DEFORMATION DETAILS B.1 Optimization Details In order to optimize the layer gradient penalty, we use a primal-dual strategy based on [71]. This choice is based on properties of the regularized primal variation as ∇`kt → 0 with ∇ft 6= 0, which could force some portion of ∇ft into each layer. Noting that k∇`k k2 = ∇`k · choosing a dual variable pk approximating Elayer = ∇`k k∇`k k2 K−1 T XX ∇`k k∇`k k2 and for n ∈ {0 . . . K − 1}, we rewrite (4.7) as ∇`k · pk − ñt k=0 t=0 1 . (B.1) For the deformation-only model (without a residual layer) the energy minimization problem (4.9) now becomes the min/max problem: argmax argmin Etotal {ft }, {`k }, {pk }, {vtk } , {pk } s.t. kpk k2 ≤1 (B.2) {`k },{vtk } which will be solved with alternating gradient descent steps on each `k and vtk and gradient ascent steps on each pk . After each update of pk , it is projected back onto its constraint kpk k2 ≤ 1. Taking the first variation of Etotal w.r.t. each of these variables gives ∇`k Etotal = − T X t=0 k Tφ†k (f̂t t,0 k − ft ) + λ` ∇ · p − ñt , ∇pk Etotal = λ` ∇` , ∇vk Etotal t (B.4) T X = K Tφ†k s=t+1 (B.3) s,t+1 (f̂s − fs ) Tφk ∇`kt s,t + λv vtk . (B.5) where K = L−1 , Tφ represents composition via bilinear interpolation, i.e., Tφ `k = `k ◦ φ using bilinear interpolation for nongridpoint lookup into `k , and Tφ† is the numerical adjoint of Tφ . See Appendix B.2 for details on the calculation of ∇vk Etotal . Application of the K operator is done t efficiently in frequency space, see [132] for details. 91 If we include the residual layer, Equations (B.3)-(B.5) stay the same except for the inclusion of P k bs in f̂s , f̂s = K−1 k=0 `s − bs . In addition, we now require an update to bt . As in Elayer , we will introduce a dual variable qt representing ∇bt k∇bt k2 for t ∈ {0 . . . T} ∇qt Etotal = λTV ∇bt , (B.6) ∇bt Etotal = −(f̂t − ft ) − λTV ∇ · qt + λL1 sgn(bt ). (B.7) We will update bt based on a shrinkage-based L1 update bt = shrink bt + δb (f̂t − ft ) + λTV ∇ · qt , δb λL1 , (B.8) where δb is a step size for the update of bt , and the pointwise shrink operation is shrink(b(x), λ) = max(b(x) − λ, 0) + min(b(x) + λ, 0). B.2 Calculation of ∇vtk Etotal In order to take the variation of Etotal w.r.t. vtk we must write φks,0 explicitly in terms of vtk . We will do this in terms of the velocity field discretization, small deformation velocity field inversion, and Euler integration as described in Section 4.1.5. This gives us φks,0 = (x − v0k ) ◦ (x − v1k ) ◦ k ). Note that φk only depends on vk if s > t. If this holds, we can write: · · · ◦ (x − vs−1 t s,0 φks,0 = φkt,0 ◦ x − vtk ◦ φks,t+1 . The discretized version of the deformation smoothness penalty (Equation (4.8)) is (B.9) PK−1 PT k=0 t=0 vtk where we have let normalization constants be absorbed by λv . We will let vtk lag in the Elayer term, computing only the dependence of Edata and Edef on vtk . Expanding Edata based on (B.9) and taking the first variation w.r.t. vtk gives: ∇vk Etotal = T X t s=t+1 Tφ†k s,t+1 h i (f̂s − fs ) Tφk ∇`kt + λv Lvtk . s,t (B.10) We now transform this gradient into the space defined by the norm k · kL by application of the K operator, yielding more stable updates (see the discussion in [7]): ∇vk Etotal = T X t s=t+1 K Tφ†k s,t+1 + λv vtk . (f̂s − fs ) Tφk ∇`kt s,t (B.11) 2 , L REFERENCES [1] B. K. Horn and B. G. Schunck, “Determining optical flow,” Artificial intelligence, vol. 17, no. 1-3, pp. 185–203, 1981. [2] D. Rueckert, L. I. Sonoda, C. Hayes, D. L. Hill, M. O. Leach, and D. J. Hawkes, “Nonrigid registration using free-form deformations: application to breast mr images,” IEEE transactions on medical imaging, vol. 18, no. 8, pp. 712–721, 1999. [3] R. Bajcsy, R. Lieberson, and M. Reivich, “A computerized system for the elastic matching of deformed radiographic images to idealized atlas images.” Journal of computer assisted tomography, vol. 7, no. 4, pp. 618–625, 1983. [4] R. Bajcsy and S. Kovačič, “Multiresolution elastic matching,” Computer vision, graphics, and image processing, vol. 46, no. 1, pp. 1–21, 1989. [5] G. E. Christensen, R. D. Rabbitt, and M. I. Miller, “3d brain mapping using a deformable neuroanatomy,” Physics in medicine and biology, vol. 39, no. 3, p. 609, 1994. [6] G. E. Christensen, R. D. Rabbitt, and M. I. Miller, “Deformable templates using large deformation kinematics,” Transactions on Image Processing, vol. 5, pp. 1435–1447, 1996. [7] M. F. Beg, M. I. Miller, A. Trouv, and L. Younes, “Computing large deformation metric mappings via geodesic flows of diffeomorphisms,” IJCV, vol. 61, no. 2, pp. 139–157, 2005. [8] M. I. Miller, A. Trouvé, and L. Younes, “On the metrics and euler-lagrange equations of computational anatomy,” Annual review of biomedical engineering, vol. 4, no. 1, pp. 375– 405, 2002. [9] J.-P. Thirion, “Image matching as a diffusion process: an analogy with maxwell’s demons,” Medical image analysis, vol. 2, no. 3, pp. 243–260, 1998. [10] T. Vercauteren, X. Pennec, A. Perchant, and N. Ayache, “Diffeomorphic demons: Efficient non-parametric image registration,” NeuroImage, vol. 45, no. 1, pp. S61–S72, 2009. [11] D. Rueckert, P. Aljabar, R. A. Heckemann, J. V. Hajnal, and A. Hammers, “Diffeomorphic registration using b-splines,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2006, pp. 702–709. [12] S. Y. Chun and J. A. Fessler, “A simple regularizer for b-spline nonrigid image registration that encourages local invertibility,” IEEE journal of selected topics in signal processing, vol. 3, no. 1, pp. 159–169, 2009. [13] M. Pollefeys, D. Nistér, J.-M. Frahm, A. Akbarzadeh, P. Mordohai, B. Clipp, C. Engels, D. Gallup, S.-J. Kim, P. Merrell et al., “Detailed real-time urban 3d reconstruction from video,” International Journal of Computer Vision, vol. 78, no. 2-3, pp. 143–167, 2008. 93 [14] R. A. Newcombe and A. J. Davison, “Live dense reconstruction with a single moving camera,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010, pp. 1498–1505. [15] J. Wang and E. Adelson, “Representing moving images with layers,” Transactions on Image Processing, vol. 3, no. 5, pp. 625 –638, 1994. [16] J. Wills, S. Agarwal, and S. Belongie, “What went where [motion segmentation],” in Computer Vision and Pattern Recognition, vol. 1, 2003, pp. I–37 – I–44 vol.1. [17] J. Xiao and M. Shah, “Motion layer extraction in the presence of occlusion using graph cuts,” Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1644 –1659, 2005. [18] B. B. Avants, C. L. Epstein, M. Grossman, and J. C. Gee, “Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain,” Medical image analysis, vol. 12, no. 1, pp. 26–41, 2008. [19] J. M. Lötjönen, R. Wolz, J. R. Koikkalainen, L. Thurfjell, G. Waldemar, H. Soininen, D. Rueckert, A. D. N. Initiative et al., “Fast and robust multi-atlas segmentation of brain magnetic resonance images,” Neuroimage, vol. 49, no. 3, pp. 2352–2365, 2010. [20] N. Singh, P. T. Fletcher, J. S. Preston, L. Ha, R. King, J. S. Marron, M. Wiener, and S. Joshi, “Multivariate statistical analysis of deformation momenta relating anatomical shape to neuropsychological measures,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2010, pp. 529–537. [21] C. Blondel, R. Vaillant, G. Malandain, and N. Ayache, “3d tomographic reconstruction of coronary arteries using a precomputed 4d motion field,” Physics in medicine and biology, vol. 49, no. 11, p. 2197, 2004. [22] J. Hinkle, P. T. Fletcher, B. Wang, B. Salter, and S. Joshi, “4d map image reconstruction incorporating organ motion,” in Information Processing in Medical Imaging. Springer, 2009, pp. 676–687. [23] K. K. Brock, L. A. Dawson, M. B. Sharpe, D. J. Moseley, and D. A. Jaffray, “Feasibility of a novel deformable image registration technique to facilitate classification, targeting, and monitoring of tumor and normal tissue,” International Journal of Radiation Oncology* Biology* Physics, vol. 64, no. 4, pp. 1245–1254, 2006. [24] K. Murphy, B. Van Ginneken, J. M. Reinhardt, S. Kabus, K. Ding, X. Deng, K. Cao, K. Du, G. E. Christensen, V. Garcia et al., “Evaluation of registration methods on thoracic ct: The empire10 challenge,” Medical Imaging, IEEE Transactions on, vol. 30, no. 11, pp. 1901– 1920, 2011. [25] E. Ron, “Ionizing radiation and cancer risk: evidence from epidemiology,” Radiation research, vol. 150, no. 5s, pp. S30–S41, 1998. [26] S. Balter, J. W. Hopewell, D. L. Miller, L. K. Wagner, and M. J. Zelefsky, “Fluoroscopically guided interventional procedures: A review of radiation effects on patients skin and hair 1,” Radiology, vol. 254, no. 2, pp. 326–341, 2010. [27] J. Jasper, “Role of digital subtraction fluoroscopic imaging in detecting intravascular injections,” Pain Physician, vol. 6, no. 3, pp. 369–372, 2003. 94 [28] P. Suetens, Fundamentals of medical imaging. Cambridge university press, 2009. [29] C. Rottman, L. McBride, A. Cheryauka, R. Whitaker, and S. Joshi, “Mobile c-arm 3d reconstruction in the presence of uncertain geometry,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2015, pp. 692–699. [30] R. Rangayyan, A. P. Dhawan, and R. Gordon, “Algorithms for limited-view computed tomography: an annotated bibliography and a challenge,” Applied optics, vol. 24, no. 23, pp. 4000–4012, 1985. [31] P. Markelj, D. Tomaževič, B. Likar, and F. Pernuš, “A review of 3d/2d registration methods for image-guided interventions,” Medical image analysis, vol. 16, no. 3, pp. 642–661, 2012. [32] S. Ayer and H. Sawhney, “Layered representation of motion video using robust maximumlikelihood estimation of mixture models and mdl encoding,” in Computer Vision, 1995. Proceedings., Fifth International Conference on, 1995, pp. 777 –784. [33] Q. Ke and T. Kanade, “A robust subspace approach to layer extraction,” in Motion and Video Computing, 2002. Proceedings. Workshop on, 2002, pp. 37 – 43. [34] P. Bhat, K. Zheng, N. Snavely, A. Agarwala, M. Agrawala, M. Cohen, and B. Curless, “Piecewise image registration in the presence of multiple large motions,” in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2, 2006, pp. 2491 – 2497. [35] T. Brox, C. Bregler, and J. Malik, “Large displacement optical flow,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009, pp. 41–48. [36] R. Szeliski, S. Avidan, and P. Anandan, “Layer extraction from multiple images containing reflections and transparency,” in Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on, vol. 1. IEEE, 2000, pp. 246–253. [37] P. Milanfar, “Two-dimensional matched filtering for motion estimation,” Image Processing, IEEE Transactions on, vol. 8, no. 3, pp. 438–444, 1999. [38] T. Darrell and E. Simonecelli, “Nulling filters and the separation of transparent motions,” in Computer Vision and Pattern Recognition. IEEE, 1993, pp. 738–739. [39] M. Black and P. Anandan, “The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields,” Computer vision and image understanding, vol. 63, no. 1, pp. 75–104, 1996. [40] J. Toro, F. Owens, and R. Medina, “Using known motion fields for image separation in transparency,” Pattern Recognition Letters, vol. 24, no. 1, pp. 597–605, 2003. [41] B. Sarel and M. Irani, “Separating transparent layers through layer information exchange,” Computer Vision-ECCV, pp. 328–341, 2004. [42] ——, “Separating transparent layers of repetitive dynamic behaviors,” in International Conference on Computer Vision, vol. 1. IEEE, 2005, pp. 26–32. [43] M. Shizawa and K. Mase, “Simultaneous multiple optical flow estimation,” in International Conference on Pattern Recognition, vol. 1. IEEE, 1990, pp. 274–278. 95 [44] ——, “Principle of superposition: A common computational framework for analysis of multiple motion,” in IEEE Workshop on Visual Motion. IEEE, 1991, pp. 164–172. [45] C. Mota, L. Stuke, and E. Barth, “Analytic solutions for multiple motions,” in International Conference on Image Processing, vol. 2. IEEE, 2001, pp. 917–920. [46] W. Yu, G. Sommer, and K. Daniilidis, “Multiple motion analysis: in spatial or in spectral domain?” Computer Vision and Image Understanding, vol. 90, no. 2, pp. 129–152, 2003. [47] M. Pingault et al., “A robust multiscale b-spline function decomposition for estimating motion transparency,” Tran on Image Processing, vol. 12, no. 11, pp. 1416 – 1426, 2003. [48] J. Bergen, P. Burt, R. Hingorani, S. Peleg et al., “A three-frame algorithm for estimating two-component image motion,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 9, pp. 886–896, 1992. [49] I. Stuke, T. Aach, E. Barth, C. Mota et al., “Estimation of multiple motions by block matching,” in Proceedings of the ACIS Fourth International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD03), 2003, pp. 358–362. [50] I. Stuke et al., “Estimation of multiple motions using block matching and markov random fields,” in Proceedings of SPIE, vol. 5308, 2004, pp. 486–496. [51] A. Ramı́rez-Manzanares, M. Rivera, P. Kornprobst, and F. Lauze, “Variational multi-valued velocity field estimation for transparent sequences,” Journal of Mathematical Imaging and Vision, vol. 40, no. 3, pp. 285–304, 2011. [52] Y. Weiss, “Deriving intrinsic images from image sequences,” in Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, vol. 2. IEEE, 2001, pp. 68–75. [53] D. Vernon, “Decoupling fourier components of dynamic image sequences: A theory of signal separation, image segmentation, and optical flow estimation,” ECCV, pp. 69–85, 1998. [54] M. Pingault and D. Pellerin, “Motion estimation of transparent objects in the frequency domain,” Signal Processing, vol. 84, no. 4, pp. 709–719, 2004. [55] E. Be’Ery and A. Yeredor, “Blind separation of superimposed shifted images using parameterized joint diagonalization,” Image Processing, IEEE Transactions on, vol. 17, no. 3, pp. 340–353, 2008. [56] K. Gai, Z. Shi, and C. Zhang, “Blindly separating mixtures of multiple layers with spatial shifts,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008, pp. 1–8. [57] ——, “Blind separation of superimposed images with unknown motions,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009, pp. 1881–1888. [58] ——, “Blind separation of superimposed moving images using image statistics,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 34, no. 1, pp. 19–32, 2012. 96 [59] X. Guo, X. Cao, and Y. Ma, “Robust separation of reflection from multiple images,” in Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014, pp. 2195–2202. [60] T. Xue, M. Rubinstein, C. Liu, and W. T. Freeman, “A computational approach for obstruction-free photography,” ACM Transactions on Graphics (TOG), vol. 34, no. 4, p. 79, 2015. [61] R. Close et al., “Accuracy assessment of layer decomposition using simulated angiographic image sequences,” Transactions on Medical Imaging, vol. 20, no. 10, pp. 990–998, 2001. [62] W. Zhang, H. Ling, S. Prummer, K. S. Zhou, M. Ostermeier, and D. Comaniciu, “Coronary tree extraction using motion layer separation,” in Medical Image Computing and ComputerAssisted Intervention–MICCAI 2009. Springer, 2009, pp. 116–123. [63] Y. Chen, T.-C. Chang, C. Zhou, and T. Fang, “Gradient domain layer separation under independent motion,” in Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009, pp. 694–701. [64] V. Auvray, P. Bouthemy, and J. Lienard, “Motion estimation in x-ray image sequences with bi-distributed transparency,” in Image Processing, 2006 IEEE International Conference on, 2006, pp. 1057 –1060. [65] V. Auvray, P. Bouthemy, and J. Liénard, “Jointmotion estimation and layer segmentation in transparent image sequenc: application to noise reduction in x-ray image sequences,” EURASIP Journal on Advances in Signal Processing, vol. 2009, p. 19, 2009. [66] P. Fischer, T. Pohl, A. Maier, and J. Hornegger, “Surrogate-driven estimation of respiratory motion and layers in x-ray fluoroscopy,” in Medical Image Computing and ComputerAssisted Intervention–MICCAI 2015. Springer, 2015, pp. 282–289. [67] ——, “Markov random field-based layer separation for simulated x-ray image sequences,” in Bildverarbeitung für die Medizin 2015. Springer, 2015, pp. 329–334. [68] P. Fischer, T. Pohl, T. Köhler, A. Maier, and J. Hornegger, “A robust probabilistic model for motion layer separation in x-ray fluoroscopy,” in Information Processing in Medical Imaging. Springer, 2015, pp. 288–299. [69] I. Stuke, T. Aach, E. Barth, and C. Mota, “Multiple-motion-estimation by block matching using mrf,” International Journal of Computer and Information Science, vol. 5, no. 1, 2004. [70] A. Chambolle, “An algorithm for total variation minimization and applications,” Journal of Mathematical imaging and vision, vol. 20, no. 1-2, pp. 89–97, 2004. [71] M. Zhu and T. Chan, “An efficient primal-dual hybrid gradient algorithm for total variation image restoration,” UCLA CAM Report, pp. 08–34, 2008. [72] J. S. Preston, C. Rottman, A. Cheryauka, L. Anderton, R. T. Whitaker, and S. Joshi, “Multi-layer deformation estimation for fluoroscopic imaging,” in Information Processing in Medical Imaging. Springer, 2013, pp. 123–134. [73] S. Osher et al., “An iterative regularization method for total variation-based image restoration,” Multiscale Modeling & Simulation, vol. 4, no. 2, pp. 460–489, 2005. 97 [74] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009. [75] I. Arganda-Carreras, C. Sorzano, R. Marabini, J. Carazo, C. Ortiz-de Solorzano, and J. Kybic, “Consistent and elastic registration of histological sections using vector-spline regularization,” Computer Vision Approaches to Medical Image Analysis, pp. 85–95, 2006. [76] A. Cardona, S. Saalfeld, J. Schindelin, I. Arganda-Carreras, S. Preibisch, M. Longair, P. Tomancak, V. Hartenstein, and R. J. Douglas, “Trakem2 software for neural circuit reconstruction,” PloS one, vol. 7, no. 6, p. e38011, 2012. [77] J. S. Preston, S. Joshi, and R. Whitaker, “Multiscale mrf optimization for robust registration of 2d biological data,” in 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI). IEEE, 2015, pp. 302–305. [78] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 23, no. 11, pp. 1222–1239, 2001. [79] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient belief propagation for early vision,” International journal of computer vision, vol. 70, no. 1, pp. 41–54, 2006. [80] V. Kolmogorov, “Convergent tree-reweighted message passing for energy minimization,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, no. 10, pp. 1568– 1583, 2006. [81] N. Komodakis, G. Tziritas, and N. Paragios, “Fast, approximately optimal solutions for single and dynamic mrfs,” in Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on. IEEE, 2007, pp. 1–8. [82] T. W. Tang and A. C. Chung, “Non-rigid image registration using graph-cuts,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2007. Springer, 2007, pp. 916–924. [83] R. W. So, T. W. Tang, and A. Chung, “Non-rigid image registration of brain magnetic resonance images using graph-cuts,” Pattern Recognition, vol. 44, no. 10, pp. 2450–2467, 2011. [84] Q. Yang, L. Wang, and N. Ahuja, “A constant-space belief propagation algorithm for stereo matching,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010, pp. 1458–1465. [85] A. Roozgard, N. Barzigar, S. Cheng, and P. Verma, “Medical image registration using sparse coding and belief propagation,” in Engineering in Medicine and Biology Society (EMBC), 2012 Annual International Conference of the IEEE. IEEE, 2012, pp. 1141–1144. [86] M. P. Heinrich, M. Jenkinson, M. Brady, and J. A. Schnabel, “Textural mutual information based on cluster trees for multimodal deformable registration,” in Biomedical Imaging (ISBI), 2012 9th IEEE International Symposium on. IEEE, 2012, pp. 1471–1474. [87] A. Shekhovtsov, I. Kovtun, and V. Hlaváč, “Efficient mrf deformation model for non-rigid image matching,” Computer Vision and Image Understanding, vol. 112, no. 1, pp. 91–99, 2008. 98 [88] B. Glocker, N. Komodakis, G. Tziritas, N. Navab, and N. Paragios, “Dense image registration through mrfs and efficient linear programming,” Medical image analysis, vol. 12, no. 6, pp. 731–741, 2008. [89] B. Glocker, A. Sotiras, N. Komodakis, and N. Paragios, “Deformable medical image registration: Setting the state of the art with discrete methods*,” Annual review of biomedical engineering, vol. 13, pp. 219–244, 2011. [90] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother, “A comparative study of energy minimization methods for markov random fields with smoothness-based priors,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, no. 6, pp. 1068–1080, 2008. [91] J. S. Preston, S. Joshi, and R. Whitaker, “Deformation estimation with automatic sliding boundary computation,” in Medical Image Computing and Computer-Assisted Intervention– MICCAI 2016. Springer, 2016. [92] E. Rietzel and G. T. Chen, “Deformable registration of 4d computed tomography data,” Medical physics, vol. 33, no. 11, pp. 4423–4430, 2006. [93] Z. Wu, E. Rietzel, V. Boldea, D. Sarrut, and G. C. Sharp, “Evaluation of deformable registration of patient lung 4dct with subanatomical region segmentations,” Medical physics, vol. 35, no. 2, pp. 775–781, 2008. [94] R. Werner, J. Ehrhardt, R. Schmidt, and H. Handels, “Patient-specific finite element modeling of respiratory lung motion using 4d ct image data,” Medical physics, vol. 36, p. 1500, 2009. [95] A. Derksen, S. Heldmann, T. Polzin, and B. Berkels, “Image registration with sliding motion constraints for 4d ct motion correction,” in Bildverarbeitung für die Medizin 2015. Springer, 2015, pp. 335–340. [96] L. Han, D. Hawkes, and D. Barratt, “A hybrid biomechanical model-based image registration method for sliding objects,” in SPIE Medical Imaging. International Society for Optics and Photonics, 2014, pp. 90 340G–90 340G. [97] J. Vandemeulebroucke, O. Bernard, S. Rit, J. Kybic, P. Clarysse, and D. Sarrut, “Automated segmentation of a motion mask to preserve sliding motion in deformable registration of thoracic ct,” Medical physics, vol. 39, no. 2, pp. 1006–1015, 2012. [98] T. Schmah, L. Risser, and F.-X. Vialard, “Left-invariant metrics for diffeomorphic image registration with spatially-varying regularisation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2013. Springer, 2013, pp. 203–210. [99] D. Ruan, S. Esedoglu, and J. A. Fessler, “Discriminative sliding preserving regularization in medical image registration,” in Proceedings of the Sixth IEEE international conference on Symposium on Biomedical Imaging: From Nano to Macro, ser. ISBI’09. Piscataway, NJ, USA: IEEE Press, 2009, pp. 430–433. [Online]. Available: http://dl.acm.org/citation.cfm?id=1699872.1699981 [100] C. Rottman, M. Bauer, K. Modin, and S. Joshi, “Weighted diffeomorphic density matching with applications to thoracic image registration,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016. 99 [101] T. Brox, A. Bruhn, and J. Weickert, “Variational motion segmentation with level sets,” in European conference on computer vision. Springer, 2006, pp. 471–483. [102] D. Sun, E. B. Sudderth, and M. J. Black, “Layered segmentation and optical flow estimation over time,” in IEEE CVPR. IEEE, 2012, pp. 1768–1775. [103] V. Lempitsky, S. Roth, and C. Rother, “Fusionflow: Discrete-continuous optimization for optical flow estimation,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008, pp. 1–8. [104] V. Delmon, S. Rit, R. Pinho, D. Sarrut, and V. Delmon, “Direction dependent b-splines decomposition for the registration of sliding objects,” in Proceedings of the Fourth International Workshop on Pulmonary Image Analysis, 2011, pp. 45–55. [105] L. Risser, F.-X. Vialard, H. Y. Baluwala, and J. A. Schnabel, “Piecewise-diffeomorphic image registration: Application to the motion estimation between 3d ct lung images with sliding conditions,” Medical image analysis, vol. 17, no. 2, pp. 182–193, 2013. [106] D. F. Pace, S. R. Aylward, and M. Niethammer, “A locally adaptive regularization based on anisotropic diffusion for deformable image registration of sliding organs,” Medical Imaging, IEEE Transactions on, vol. 32, no. 11, pp. 2114–2126, 2013. [107] S. Heldmann, T. Polzin, A. Derksen, and B. Berkels, “An image registration framework for sliding motion with piecewise smooth deformations,” in Scale Space and Variational Methods in Computer Vision. Springer, 2015, pp. 335–347. [108] M. P. Heinrich, M. Jenkinson, M. Brady, and J. Schnabel, “Discontinuity preserving regularisation for variational optical-flow registration using the modified lp norm,” Medical Image Analysis for the Clinic: A Grand Challenge, pp. 185–194, 2010. [109] V. Vishnevskiy, T. Gass, G. Székely, and O. Goksel, “Total variation regularization of displacements in parametric image registration,” in Abdominal Imaging. Computational and Clinical Applications. Springer, 2014, pp. 211–220. [110] B. W. Papież, M. P. Heinrich, J. Fehrenbach, L. Risser, and J. A. Schnabel, “An implicit sliding-motion preserving regularisation via bilateral filtering for deformable image registration,” Medical image analysis, vol. 18, no. 8, pp. 1299–1311, 2014. [111] S. Y. Chun, J. A. Fessler, and M. L. Kessler, “A simple penalty that encourages local invertibility and considers sliding effects for respiratory motion,” pp. 72 592U–72 592U–9, 2009. [Online]. Available: +http://dx.doi.org/10.1117/12.811181 [112] M. P. Heinrich, I. J. Simpson, B. W. Papież, M. Brady, and J. A. Schnabel, “Deformable image registration by combining uncertainty estimates from supervoxel belief propagation,” Medical image analysis, vol. 27, pp. 57–71, 2016. [113] S. Kiriyanthan, K. Fundana, and P. C. Cattin, “Discontinuity preserving registration of abdominal mr images with apparent sliding organ motion,” in Abdominal Imaging. Computational and Clinical Applications. Springer, 2011, pp. 231–239. [114] A. Schmidt-Richberg, R. Werner, H. Handels, and J. Ehrhardt, “Estimation of slipping organ motion by registration with direction-dependent regularization,” Medical image analysis, vol. 16, no. 1, pp. 150–159, 2012. 100 [115] V. Kolmogorov and R. Zabin, “What energy functions can be minimized via graph cuts?” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 26, no. 2, pp. 147– 159, 2004. [116] P. W. Holland and R. E. Welsch, “Robust regression using iteratively reweighted leastsquares,” Communications in Statistics-theory and Methods, vol. 6, no. 9, pp. 813–827, 1977. [117] I. Daubechies, R. DeVore, M. Fornasier, and C. S. Güntürk, “Iteratively reweighted least squares minimization for sparse recovery,” Communications on Pure and Applied Mathematics, vol. 63, no. 1, pp. 1–38, 2010. [118] M. I. Miller, S. C. Joshi, and G. E. Christensen, “Large deformation fluid diffeomorphisms for landmark and image matching,” Brain Warping, pp. 115–131, 1999. [119] M. Zhang and P. T. Fletcher, “Finite-dimensional lie algebras for fast diffeomorphic image registration,” in International Conference on Information Processing in Medical Imaging. Springer, 2015, pp. 249–260. [120] R. Castillo, E. Castillo, R. Guerra, V. E. Johnson, T. McPhail, A. K. Garg, and T. Guerrero, “A framework for evaluation of deformable image registration spatial accuracy using large landmark point sets,” Physics in medicine and biology, vol. 54, no. 7, p. 1849, 2009. [121] E. Castillo, R. Castillo, J. Martinez, M. Shenoy, and T. Guerrero, “Four-dimensional deformable image registration using trajectory modeling,” Physics in medicine and biology, vol. 55, no. 1, p. 305, 2009. [122] E. Haber and J. Modersitzki, “Intensity gradient based registration and fusion of multi-modal images,” in MICCAI 2006. Springer, 2006, pp. 726–733. [123] E. Hodneland, A. Lundervold, J. Rørvik, and A. Z. Munthe-Kaas, “Normalized gradient fields for nonlinear motion correction of dce-mri time series,” Computerized Medical Imaging and Graphics, vol. 38, no. 3, pp. 202–210, 2014. [124] L. Konig and J. Ruhaak, “A fast and accurate parallel algorithm for non-linear image registration using normalized gradient fields,” in Biomedical Imaging (ISBI), 2014 IEEE 11th International Symposium on. IEEE, 2014, pp. 580–583. [125] M. Prastawa, E. Bullitt, S. Ho, and G. Gerig, “A brain tumor segmentation framework based on outlier detection,” Medical image analysis, vol. 8, no. 3, pp. 275–283, 2004. [126] O. Clatz, H. Delingette, I.-F. Talos, A. J. Golby, R. Kikinis, F. A. Jolesz, N. Ayache, and S. K. Warfield, “Robust nonrigid registration to capture brain shift from intraoperative mri,” IEEE transactions on medical imaging, vol. 24, no. 11, pp. 1417–1427, 2005. [127] P. Risholm, E. Samset, I.-F. Talos, and W. Wells, “A non-rigid registration framework that accommodates resection and retraction,” in Information Processing in Medical Imaging. Springer, 2009, pp. 447–458. [128] M. Niethammer, G. L. Hart, D. F. Pace, P. M. Vespa, A. Irimia, J. D. Van Horn, and S. R. Aylward, “Geometric metamorphosis,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2011, pp. 639–646. 101 [129] D. Kwon, M. Niethammer, H. Akbari, M. Bilello, C. Davatzikos, and K. M. Pohl, “Portr: Pre-operative and post-recurrence brain tumor registration,” IEEE transactions on medical imaging, vol. 33, no. 3, pp. 651–667, 2014. [130] U. Bagci and L. Bai, “Automatic best reference slice selection for smooth volume reconstruction of a mouse brain from histological images,” IEEE Transactions on Medical imaging, vol. 29, no. 9, pp. 1688–1696, 2010. [131] G. D. Rosen, A. G. Williams, J. A. Capra, M. T. Connolly, B. Cruz, L. Lu, D. C. Airey, K. Kulkarni, and R. W. Williams, “The mouse brain library@ www. mbl. org,” in Int Mouse Genome Conference, vol. 14, 2000, p. 166. [132] B. C. Davis, Medical image analysis via Fréchet means of diffeomorphisms. 2008. ProQuest,
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6jravy8