Estimating generative models for statistical shape analysis

Estimating generative models for statistical shape analysis

Title	Estimating generative models for statistical shape analysis
Publication Type	dissertation
School or College	College of Engineering
Department	Computing
Author	Agrawal, Praful
Date	2020
Description	Statistical shape analysis plays a crucial role in computer vision and medical imaging applications. Shape-based models help improve the performance of object detection/ segmentation tasks and also provide gain critical insights for medical science research. Over time, researchers have adopted various shape modeling strategies, varying from primitive geometry-based models to data-driven machine learning approaches. This dissertation focuses on generative modeling techniques for the statistical shape analysis. Three chapters (2-4) in this dissertation summarize the four research contributions. Chapter 2 presents a Bayesian approach to generate a probabilistic summary from volumetric shape representations. The proposed approach has shown favorable results for varied medical imaging applications. Chapters 3 and 4 lay out the research contributions involved with surface-based shape models. Specifically, the third chapter showcases the technical contributions of surface-based shape models for the purpose of orthopedic research. The fourth chapter focuses on state-of-the-art deep learning algorithms to help generate better statistical shape models. The machine learning approach has helped alleviate the need for manual effort, often performed by clinicians, for the analysis of medical images. The primary contribution of this dissertation is to improve statistical shape modeling, using classic Bayesian learning and currently favored deep learning methods.
Type	Text
Publisher	University of Utah
Dissertation Name	Doctor of Philosophy
Language	eng
Rights Management	© Praful Agrawal
Format	application/pdf
Format Medium	application/pdf
ARK	ark:/87278/s6rnw7k0
Setname	ir_etd
ID	2064171
OCR Text	Show ESTIMATING GENERATIVE MODELS FOR STATISTICAL SHAPE ANALYSIS by Praful Agrawal A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computing School of Computing The University of Utah May 2020 Copyright c Praful Agrawal 2020 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of Praful Agrawal has been approved by the following supervisory committee members: Ross T. Whitaker , Chair(s) February 11, 2020 Date Approved Shireen Y. Elhabian , Chair(s) January 03, 2020 Date Approved Sarang C. Joshi , Member January 03, 2020 Date Approved Tolga Tasdizen , Member January 03, 2020 Date Approved Vivek Srikumar , Member January 03, 2020 Date Approved by Ross T. Whitaker , Chair/Dean of the Department/College/School of Computing and by David B. Kieda , Dean of The Graduate School. ABSTRACT Statistical shape analysis plays a crucial role in computer vision and medical imaging applications. Shape-based models help improve the performance of object detection/ segmentation tasks and also provide gain critical insights for medical science research. Over time, researchers have adopted various shape modeling strategies, varying from primitive geometry-based models to data-driven machine learning approaches. This dissertation focuses on generative modeling techniques for the statistical shape analysis. Three chapters (2-4) in this dissertation summarize the four research contributions. Chapter 2 presents a Bayesian approach to generate a probabilistic summary from volumetric shape representations. The proposed approach has shown favorable results for varied medical imaging applications. Chapters 3 and 4 lay out the research contributions involved with surface-based shape models. Specifically, the third chapter showcases the technical contributions of surface-based shape models for the purpose of orthopedic research. The fourth chapter focuses on state-of-the-art deep learning algorithms to help generate better statistical shape models. The machine learning approach has helped alleviate the need for manual effort, often performed by clinicians, for the analysis of medical images. The primary contribution of this dissertation is to improve statistical shape modeling, using classic Bayesian learning and currently favored deep learning methods. Dedicated to the memory of my grandfather, a role model for hard work, honesty, and integrity. CONTENTS ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii CHAPTERS 1. 2. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 1.2 1.3 1.4 1.5 2 3 3 4 9 AN OPTIMAL, GENERATIVE MODEL FOR ESTIMATING MULTILABEL PROBABILISTIC MAPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3. Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 21 23 29 35 37 38 STATISTICAL SHAPE ANALYSIS OF HIP JOINT AND THREE DIMENSIONAL COVERAGE ESTIMATION IN DYSPLASTIC POPULATION . . . . . . . . . . . . . . 48 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 4. Generative Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Volumetric-Shape Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Surface-Based Shape Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Medical Imaging Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Experimental Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 49 50 50 54 54 55 56 DEEP LEARNING FEATURES FOR AUTOMATED PLACEMENT OF CORRESPONDENCE POINTS ON ENSEMBLES OF COMPLEX SHAPES . . . . 61 4.1 4.2 4.3 4.4 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 62 64 68 4.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 vi ACKNOWLEDGMENTS At the verge of this milestone, I would like to express my heartfelt gratitude to the people who have inspired, guided, and supported me throughout the journey. Above all, my adviser Ross Whitaker has been a constant source of energy and knowledge, which has helped me sail through the tough moments. I am filled with pride and immense gratitude toward him as I begin to comprehend the learning and accomplishments of the last 5 years. I am thankful to my coadviser Shireen Elhabian, who has been a very resourceful and hard-working companion. Her support and persuasion have helped me achieve crucial deadlines. I am fortunate to have benefited from the knowledge and guidance of my committee members: Sarang Joshi, Tolga Tasdizen, and Vivek Srikumar. Their crucial feedback and interactions helped highlight some of the contributions in my dissertation. I want to thank the SCI founding director Chris Johnson for establishing a warm, cheerful, and welcoming environment for graduate students. His efforts and guidance set a high standard for humility and maturity as one progresses through milestones. The often unsung administrative and support staff at SCI have really helped reduce the graduate life stress. Among many, I would like to specifically acknowledge Brenda, Ed, Deb, Magali, Awais, Nick, and Ali for accommodating my numerous requests. I would like to thank Christine for providing crucial edits for my writing. Her guidance and feedback have helped me improve my writing skills. I would like to thank my collaborators: Andrew Anderson, Penny Atkins, J. Mozingo, K. Uemura, Jeff Weiss, Steve Maas, Alan Morris, Riddhish Bhalodia, and Cameron Christensen for their support and guidance. I would like to acknowledge National Institutes of Health (NIH) for funding my dissertation work through research grants R01-HL135568, R01-EB016701, P41-GM103545, U24-EB029011, and R01-AR076120. I am grateful to my parents and lifetime mentors: my grand uncle Raman Agrawal and undergraduate adviser Sanjay Goel for believing in me and encouraging me to pursue higher goals. I would like to thank my family members, Pulkit, Saloni, Megha, and Tushar, for their companionship and unconditional support. I am grateful for the support and encouragement of my dear friends: Ashok, Michael, Meg, Lalitha, Mina, Amey, Mehran, Himanshu, Samarth, Blake, Markus, Gopal, Nisha, Prasanna, Eleanor, John, Jimmy, TAJO, Will, and Kris. Their love and support help me cherish the good and tackle the bad moments in life. viii CHAPTER 1 INTRODUCTION Research in statistical shape modeling is usually based on one of the three types of shape representations: volumes, surfaces, and coordinate transformations. Such statistical shape models help us understand the underlying natural variability in an object population, and identify the object boundary and its uncertainty, which are useful in a wide range of applications. In medical applications, the focus of this dissertation, systematic understanding of shape abnormalities in the bones (e.g., hip, scapula, and skull) and soft tissue structures (e.g., heart, brain cortical structures, and lungs) helps improve clinical practices. Statistical shape modeling provides a systematic and consistent approach to model these abnormalities. The choice of shape modeling method depends on the anatomy and application at hand. In population-based studies, a notion of an average shape and deviations with respect to the average summarize the anatomical shape variations. Transformation- and surfacebased shape models are often best suited for this purpose. Surface-based methods aim to establish an anatomical correspondence among the samples in the populations; therefore, shapes with consistent anatomies, such as bones, can be modeled effectively in this fashion. On the other hand, soft tissue structures often vary in a nonlinear fashion and may not have completely matching anatomies. Therefore, coordinate transformations provide a better description to understand the average alignments of the tissues across samples in the population. Transformation-based shape models are often estimated directly on intensity images. However, a segmented anatomy is useful for improving performance in cases with challenging intensity variations. Surface-based shape models leverage the segmented shapes in the form of a volumetric label map, which indicates the presence of the shape in a given image domain. Further, many diagnostic and screening tests require the segmentation of 2 target structures for planning and identifying the need of the treatment. Human experts perform well on image segmentation tasks; however, the vast amount of pixel locations to be labeled limits the throughput. For the automated algorithms, the segmentation task becomes complex due to variation in the appearance and systematic image acquisition noise in medical images. Volumetric-shape models are designed to model the uncertainty in voxel-wise labeling of shape objects, and hence can help image segmentation algorithms generate accurate segmentations. Among many approaches, generative modeling methods perform well for medical imaging applications. Therefore, in this dissertation, we focus on the shape generative models based on surface- and volumetric-shape representations. This section briefly describes the philosophy of generative modeling with a general idea of volumetric- and surface-based shape models. Further, we discuss medical imaging applications that utilize shape modeling techniques and conclude with a snapshot of the contributions in this dissertation. 1.1 Generative Modeling In generative modeling, the goal is to estimate the probability of the existence of a sample in the underlying space of the given data. A good generative model should be able to produce examples that mimic the properties of given data, while being different from specific instances [1]. To construct a generative model, a probability density function is defined over the independent variables defining the sample space. In some cases, this space is comprised of a large number of dimensions, whereas the underlying variability spans a significantly lower dimensional space. Therefore, the data samples are typically projected onto a low-dimensional subspace, e.g., using principal component analysis (PCA) [2], such that the samples can be reconstructed with minimum error, while the subspace also captures the variations among samples. A probability density function is then defined on this subspace, leading to a generative model. Another approach is to directly learn a low-dimensional subspace with predetermined dimensionality, such that the generative model is optimized in the resulting subspace, as done in probabilistic PCA [2]. With the understanding of how data are generated, these models are being used to 3 better inform the learning of discriminative tasks, such as object classification and image segmentation. An image segmentation algorithm, from thresholding-based techniques to deep learning models, categorizes image voxels based on local intensity variations. Therefore, the segmentation performance often depends on the complexity in appearance of the objects. Recent studies have shown that generative models can guide image segmentation algorithms toward human-like performance [1]. 1.2 Volumetric-Shape Models Volumetric-shape models use voxel-wise shape representations to quantify the uncertainty in the labeling of a spatial domain. The parameters of the model are a field of probability vectors defined over the spatial domain, which indicates the likelihood of the presence of an object at every location. This parametric spatial map is often referred to as a probabilistic label map or a probabilistic segmentation. Typically, the voxel-wise probability density function is modeled over indicator label maps or signed distance transforms. The probabilistic segmentation thus obtained represents a summary of the training data. Therefore, the generative ability of the resulting model depends on the choice of the modeling function as well as the properties of the input shape representation. Figure 1.1 shows a sample population and the corresponding probabilistic segmentation. In practice, this generative modeling can further be used in an expectation maximization (EM) framework to discover the subgroups in a given shape population [3]. These subgroups can then be used to formulate a more powerful, mixtures-based generative model. The unimodal and the mixture-based models can be used in various medical imaging applications, such as image segmentation and label fusion. 1.3 Surface-Based Shape Models Surface-based shape models capture the underlying surface-to-surface correspondence across samples in a given shape population. The parameters of the model vary based on the modeling assumptions. The end result is a statistical model with the ability to estimate an average representation and quantify the modes of variations. Further, pairwise coordinate transformations can be derived using the surface-to-surface correspondence. The object surface is typically extracted from the volumetric label maps. Explicit surface 4 meshes (contours in 2D) and implicit level sets of a signed distance map are common representations for the purpose. Further, anatomical correspondence is established among the surfaces to generate a shape correspondence model. Figure 1.2 shows a sample shape population and the associated point-cloud-based shape correspondence model. Once a shape correspondence model is established, a probability density function can be imposed over the space of resulting shape representations (e.g., point-clouds in Figure 1.2) to construct a generative model. Often the shape representation is very high dimensional (e.g., number of points × dimensionality, in case of point-clouds), whereas the underlying variability spans a much lesser dimension space. Therefore, an intermediate subspace representation may be obtained (e.g., using PCA [2]) before constructing the generative model. In medical imaging, surface-based shape models are specifically useful for population-based statistical studies and quantifying the inherent anatomical variations. They also aide systematic comparison of different subgroups of interest and patient-specific studies. 1.4 Medical Imaging Applications With the increasing use of different imaging modalities for diagnosis and treatment planning, much emphasis has been placed on developing automated techniques. With such automated tools, clinicians can perform better quantitative assessments, resulting in improved patient outcomes. Further, researchers in medical science study and quantify the behavior and appearance of different aspects of human and animal anatomies. Shape often plays an important role in understanding disease-related symptoms. Therefore, numerous studies aim to quantify the shape properties of different anatomies. For instance, the shape of the hippocampus helps researchers understand the functionality of the limbic system. In orthopedics, shape characteristics of the hip joint directly relate to the presentation of particular diseases. The shape of the heart often influences its underlying mechanical and electrical processes. In this section, we discuss some of the important medical imaging applications related to shape modeling. 5 1.4.1 Medical Image Segmentation Identifying and marking the boundary of the object shape is often a key challenge as part of a medical research or a clinical pipeline. Due to methodological differences in the acquisition process, each imaging modality presents with its own unique challenges in the accurate identification and segmentation of the objects. The presence of speckle noise and inconsistent intensity patterns significantly degrades the performance of intensity-based image segmentation algorithms, which is where a prior knowledge of shape may help in delineating object boundaries. Both volumetric and surface-based shape models can be used for the purpose, based on the application at hand. For instance, surface-based shape priors have been used to improve the performance of a Bayesian image segmentation algorithm to identify the left atrium shape in MRI images [4]. Figure 1.3 showcases the results from the study. It is important to highlight the crucial shape information provided by the shape prior, making the segmentation algorithm robust to typical noise in MRI images. In [5], Sultana et al. used a surface-based statistical shape model derived from the cranial nerve endings to identify cranial nerves from MRI images. As shown in Figure 1.4, the intensity profile in the neighborhood of cranial nerves matches with many other associated regions. Crucial prior knowledge in the form of a statistical shape model is able to guide the image segmentation to achieve accurate results. Pohl et al. [6] used a volumetric-shape model to perform image segmentation on MRI images of human brains. Their results showed that using a shape model significantly improved segmentation performance of subcortical structures. 1.4.2 Label Fusion In medical imaging, an intensity image and corresponding label map are jointly referred to as an atlas. Manual labeling of atlases requires significant effort and results vary based on the domain expertise. Therefore, often there is need to either build a consensus among label maps created by different experts or estimate a label map for a new image using a few reference atlases. In both scenarios, a probabilistic segmentation is generated using label fusion on the given atlases. The first scenario is referred to as the consensus generation [7]. Figure 1.5 shows an example of computing a consensus-based label map 6 from manual segmentations of the epicardium in a heart MRI image [7]. Here, the typical approach is to estimate a probabilistic summary of given label maps, a.k.a. label fusion, such that possible mistakes within the individual manual labeling can be rectified. In the second scenario, the goal is to generate a label map (or a segmentation) for a new intensity image, using the given atlases as references. Hence, this process is aptly referred to as multiatlas segmentation. Figure 1.6 shows the two-step process followed by most methods, in the context of brain image segmentation [8]. First, a nonrigid transformation is estimated between the references and the new intensity images, which is then used to warp the corresponding reference label maps to the new image. As a result, we get multiple possible segmentations for the new image, one from each reference atlas. A consensus is built among these segmentations to generate the final result. 1.4.3 Mixture Models Anatomical shapes vary significantly due to various genetic factors, resulting in subgroups of shape variability. These subgroups may share a similar overall shape with group specific transformations, and therefore mixture modeling techniques are used to perform medical imaging tasks on such shape populations. Figure 1.7 shows sample results from a shape-based clustering algorithm [7] on a synthetic shape population. Later in this dissertation, we also showcase clustering results on a real anatomical population of left atrium shapes. Veni [9] suggested using the mixture models in favor of a unimodal shape prior to perform medical image segmentation in a Bayesian framework. 1.4.4 Anatomical Variability Quantifying anatomical shape variations leads to meaningful population studies in many medical imaging applications. For instance, in orthopedics, hip arthritis is often related to the shape abnormalities among the hip bone(s). The assessment of symptoms may vary based on the experience of the clinician. Therefore, a quantitative model of normal hip anatomy could greatly help in better treatment planning. Several studies have been done to understand hip anatomical variations [10]. Atkins et al. [11] looked into quantifying the femur shape variations using surface-based statistical shape models. Figure 1.8 shows the first mode of variation in a femur population, as discovered by a surface-based statistical shape model. The variation along the mode encapsulates the 7 elongation and shrinkage of the femoral neck, consistent with the qualitative observations of the experts. Further, patient-specific pathomorphology can be quantified using the statistical mean from a control population. Harris et al. [12] used a statistical model to quantify impingement in cam-type FAI patients. Figure 1.9 shows one such result from the study, where the difference with respect to the control mean highlights the location and extent of the impingement. Bhalodia et al. [13] used the statistical shape model to understand the variations in skull shapes and quantify the severity of metopic craniosynostosis. Such quantitative measures can help clinicians make consistent and robust assessments. Figure 1.10 shows sample results from the study, where a statistical mode of variation highlights the shape abnormalities consistent with practical observations. Figure 1.11 shows the subjects mapped onto the latent severity axis, obtained from the statistical model, establishing a quantitative shape-based metric to assess the extent of severity. Beyond individual shape variations, surface-based statistical shape models are also used to understand the combined variations in closely interacting anatomies [14]. We later showcase the results on variations in hip joint anatomy for a dysplastic population. 1.4.5 Group-Wise Comparisons Surface-based statistical shape models are further being used to compare the variations between pathological and control groups and to understand the related symptoms. Atkins et al. [15] used the statistical shape model to develop a linear discriminant function. The goal was to distinguish between the patients with cam-type femoroacetabular impingement and control subjects. Figure 1.12 showcases the sample results from the study. 1.4.6 Data Augmentation for Learning With the recent advances in deep learning research, more algorithms are being developed using supervised training of complex network architectures. Due to the large number of parameters and highly complex configurations, these networks require large amounts of training data to produce impactful results. In medical imaging research, often adequate training data are not available for the deep learning requirements. Therefore, shape-based generative models are being used for data augmentation to enable successful training of deep networks [16]. 8 Both types of shape generative models have been addressed in the literature; however, more work is needed to extend those models to more complex anatomies. In this dissertation, we propose a new generative model to estimate a probabilistic segmentation from a population of label maps (a volumetric shape representation). The model has shown favorable performance in various medical imaging applications and has the ability to handle multiple interacting labels in an unbiased fashion. We use geometric features derived from surface normals and deep learning methods to enable correspondence-based shape models for complex anatomies, specifically for pelvis and scapula. Modeling the pelvis shape variabilities helped in the understanding of hip joint coverage variability for a dysplastic population. Chapter-wise contributions in this dissertation are summarized below: • Chapter 2 proposes a novel estimation of multilabel probabilistic segmentations based on generative modeling. • Chapter 3 describes the shape modeling and 3D coverage variability of the hip joint in a dysplastic population. • Chapter 4 proposes to train deep learning models to extract geometric features from complex anatomies. The goal is to use such features to optimize point-based correspondence models. 9 1.5 References [1] D. Foster, Generative Deep Learning: Teaching Machines to Paint, Write, Compose and Play. Sebastopol, CA, USA: O’Rielly Media Inc., 2019. [2] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics). New York, NY, USA: Springer Science & Business Media, 2006. [3] J. Grim, “EM cluster analysis for categorical data,” in The Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), 2006, pp. 640–648. [4] G. Veni, S. Y. Elhabian, and R. T. Whitaker, “Shapecut: Bayesian surface estimation using shape-driven graph,” Med. Image Anal., vol. 40, pp. 11–29, Aug. 2017. [5] S. Sultana, P. Agrawal, S. Y. Elhabian, R. T. Whitaker, T. Rashid, J. E. Blatt, J. S. Cetas, and M. A. Audette, “Towards a statistical shape-aware deformable contour model for cranial nerve identification,” in The Clinical Image-based Procedures Workshop, 2016, pp. 68–76. [6] K. Pohl, J. Fisher, R. Kikinis, W. Grimson, and W. Wells, “Shape based segmentation of anatomical structures in magnetic resonance images,” in The International Workshop on Computer Vision for Biomedical Image Applications, 2005, pp. 489–498. [7] S. Y. Elhabian, P. Agrawal, and R. T. Whitaker, “Optimal parameter map estimation for shape representation: A generative approach,” in The IEEE International Symposium on Biomedical Imaging, 2016, pp. 660–663. [8] J. M. Lötjönen, R. Wolz, J. R. Koikkalainen, L. Thurfjell, G. Waldemar, H. Soininen, and D. Rueckert, “Fast and robust multi-atlas segmentation of brain magnetic resonance images,” Neuroimage, vol. 49, no. 3, pp. 2352–2365, Feb. 2010. [9] G. Veni, “Surface-based image segmentation using application-specific priors,” Ph.D. dissertation, University of Utah, Salt Lake City, UT, Dec. 2016. [10] P. R. Atkins, “Characterization of cam femoroacetabular impingement using subjectspecific biomechanics and population-based morphological measurements,” Ph.D. dissertation, University of Utah, Salt Lake City, UT, May 2018. [11] P. R. Atkins, Y. Shin, P. Agrawal, S. Y. Elhabian, R. T. Whitaker, J. A. Weiss, S. K. Aoki, C. L. Peters, and A. E. Anderson, “Which two-dimensional radiographic measurements of cam femoroacetabular impingement best describe the three-dimensional shape of the proximal femur?” Clin. Orthop. Relat. Res., vol. 477, no. 1, pp. 242–253, Jan. 2019. [12] M. D. Harris, M. Datar, R. T. Whitaker, E. R. Jurrus, C. L. Peters, and A. E. Anderson, “Statistical shape modeling of cam femoroacetabular impingement,” J. Orthop. Res., vol. 31, no. 10, pp. 1620–1626, Jul. 2013. [13] R. Bhalodia, L. A. Dvoracek, A. M. Ayyash, L. Kavan, R. Whitaker, and J. A. Goldstein, “Quantifying the severity of metopic craniosynostosis: A pilot study in application of advanced machine learning in craniofacial surgery,” J. Craniofac. Surg., Jan. 2020. 10 [14] K. Gorczowski, M. Styner, J.-Y. Jeong, J. Marron, J. Piven, H. C. Hazlett, S. M. Pizer, and G. Gerig, “Statistical shape analysis of multi-object complexes,” in The IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8. [15] P. R. Atkins, S. Y. Elhabian, P. Agrawal, M. D. Harris, R. T. Whitaker, J. A. Weiss, C. L. Peters, and A. E. Anderson, “Quantitative comparison of cortical bone thickness using correspondence-based shape modeling in patients with cam femoroacetabular impingement,” J. Orthop. Res., vol. 35, no. 8, pp. 1743–1753, Aug. 2017. [16] R. Bhalodia, S. Y. Elhabian, L. Kavan, and R. T. Whitaker, “Deepssm: A deep learning framework for statistical shape modeling from raw images,” in The Workshop on Shape in Medical Imaging, 2018, pp. 244–257. 11 Figure 1.1. A volumetric shape model: (a) sample segmentations from a synthetic shape population and (b) the probabilistic segmentation derived from the shape population. Figure 1.2. A surface-based shape model: (a) sample segmentations from a synthetic shape population and (b) a point-cloud-based model establishing anatomical correspondence among sample shapes. 12 Figure 1.3. Segmentation results using [4] (red and blue curves) and the manual labels (green and purple curves): (a) Left atrium results and the intensity profile of MRI slice and (b) Results on a simulated image with similar intensity profile. Figure adapted from [4]. 13 Figure 1.4. Improved results for cranial nerve segmentation using average shape models. Figure adapted from [5]. Figure 1.5. Consensus of the epicardium of the LGE-MRI slice. The first row shows the manual segmentations of three human experts who disagree on the accurate contour of the epicardium. The second row shows the parameter maps estimated by [7], leaving the respective expert segmentation out of the training samples. Figure reprinted with permission [7]. 14 Figure 1.6. Steps of multiatlas segmentation: (a) Nonrigid registration used to register all atlases to patient data, (b) classifier fusion using majority voting for producing class labels for all voxels, and (c) postprocessing of multiatlas segmentation result by various algorithms taking into account intensity distributions of different structures. Figure reprinted with permission [8]. Figure 1.7. Synthetic data clustering results. Samples for each cluster are presented in decreasing order of log-likelihood (left to right). Figure reprinted with permission [7]. 15 Figure 1.8. First mode of femur shape variation discovered from the statistical shape model. Shapes represent the plus (right) and minus (left) two standard deviations from the mean shape (center), along the mode of variation. ‘ Figure 1.9. Color plot of a single cam FAI femur (shown) and the amount it deviated from the average control femur. These plots could be used as a guide for planning surgical debridement to relieve FAI. Figure reprinted with permission [12]. 16 Figure 1.10. Normal skull with colorimetric overlay generated through the statistical model indicating the regions of malformation in the metopic CS skull, with color intensity proportional to severity. Figure based on results from [13]. Figure 1.11. Representative top-down view of patients with varying degrees of metopic craniosynostosis severity. Figure adapted from [13]. 17 Figure 1.12. Statistical discriminant function to distinguish between cam and control shapes. For the purpose of illustration, five subject specific shapes from three cam patients and two controls are shown with their mapping value at the appropriate location. Figure reprinted with permission [15]. CHAPTER 2 AN OPTIMAL, GENERATIVE MODEL FOR ESTIMATING MULTILABEL PROBABILISTIC MAPS P. Agrawal, R. T. Whitaker, S. Y. Elhabian, ”An Optimal, Generative Model for Estimating Multilabel Probabilistic Maps,” IEEE Trans. Med. Imag. 2020, To Appear. (Reformatted and reprinted with permission) c 2020 IEEE The authors would like to thank the Division of Cardiology (data were collected under Nassir Marrouche, MD, oversight and with additional management by Brent Wilson, MD, PhD) at the University of Utah for providing the left atrium MRI scans and their corresponding segmentations. 19 2.1 Abstract Multilabel probabilistic maps, a.k.a. probabilistic segmentations, parameterize a population of intimately coexisting anatomical shapes and are useful for various medical imaging applications, such as segmentation, anatomical atlases, shape analysis, and consensus generation. Existing methods to estimate probabilistic segmentations rely on ad hoc intermediate representations (e.g., average of Gaussian-smoothed label maps and smoothed signed distance maps) that do not necessarily conform to the underlying generative process. Generative modeling of such maps could help discover the subgroups in a given population and aide in the statistical analysis via clustering and mixture modeling techniques. In this paper, we propose an estimation of multilabel probabilistic maps and showcase their favorable performance for modeling anatomical shapes such as the left atrium of the human heart and brain structures. The proposed formulation relies on a constrained optimization in the natural parameter space of the exponential family form of categorical distributions. A smoothness prior provides generalizability in the model and helps achieve greater performance in modeling tasks for unseen samples. We demonstrate and compare the effectiveness of the proposed method for Bayesian image segmentation, multiatlas segmentation, and shape-based clustering. 2.2 Introduction Representing the statistics of anatomical shapes that include multiple, interacting regions is a particularly challenging, yet important, problem in several medical imaging applications. In image segmentation, for instance, precise delineation of object boundaries is usually confounded by misleading image information, including heterogeneous intensities, imaging artifacts, partial volume effects, and weak boundaries. Such imaging challenges along with shapes of complex structures motivate the need for a statistical model that can guide accurate segmentations [1–6]. Likewise, in the context of multiatlas segmentation, a probabilistic summarization of given labeled samples needs to be generated via a fusion process (e.g., [7, 8]). Probabilistic label maps, also referred to as probabilistic segmentations, quantify the uncertainty in labeling a spatial domain, and therefore they are a useful tool for label fusion 20 in multiatlas segmentation [7, 8], deformable atlas building [4, 9, 10], image segmentation [1, 4–6], shape clustering [11], shape retrieval [11], tractography [12], and longitudinal anatomical studies [2, 3]. In practice, probabilistic maps are often learned from a set of label maps that corresponds to an ensemble of anatomical shapes. Most commonly, probabilistic label maps are computed using heuristic-based approaches that typically lack a statistical foundation or generative modeling. Bazin and Pham [1] used signed distance maps (SDMs) to compute statistical atlas as a shape prior for brain image segmentation. Habas et al. [3] and Pohl et al. [4–6] used SDMs- and Gaussian-smoothed label maps based probabilistic atlases for various segmentation problems. Iglesias et al. [7] and Sabuncu et al. [8] used SDMs-based probabilistic models for multiatlas segmentation and label fusion applications. Since they do not attempt to learn any generative process, such models do not generalize well on unseen samples, as showcased for binary label data in [13]. In [13], the authors proposed a principled data-driven approach for estimating optimal probabilistic label maps for binary label data (i.e., a single object on background) and demonstrated its superior performance for generalizing to unseen data and generating an optimal consensus from a set of segmentations. In this regard, a logodds transformation [14] was used to perform unconstrained optimization in the natural parameter space of the exponential family form of Bernoulli distributions. However, for more than two labels, a similar approach is likely to introduce undesired bias toward one particular label [15]. Further, it would overestimate the variability captured around foreground-background interactions at the expense of foreground-foreground boundaries. In this paper, the formulation of [13] is extended in a significant way to the multicategory scenario to estimate optimal probabilistic label maps for anatomical ensembles with an arbitrary number of labels. In addition, the smoothness prior and optimization schemes are investigated to ensure optimal results. For the binary label case, the new formulation converges to the one introduced in [13]. We demonstrate the effectiveness of the proposed approach on three medical imaging applications: multilabel image segmentation, shapebased clustering, and multiatlas segmentation. We also showcase a qualitative comparison of the generated probabilistic maps using proposed and existing methods. 21 2.3 Related Work Here, we propose to learn a probabilistic shape representation that captures the underlying generative model of a set of label maps, which is different from learning a statistical shape model (SSM) that encodes the factors of shape variation. Learning such probabilistic label maps aides important medical imaging applications, including label fusion, segmentation, shape-based clustering, and consensus generation. Further, a probabilistic map provides a spatially varying parameterization of the underlying generative model, which is crucial to identify subgroups within a given population, and thereby benefits the tasks relying on mixture models. A probabilistic label map is defined over a spatial domain with a vector of probability values associated with each spatial location (e.g., voxel) in the domain. Learning a probability distribution from a given set of label maps amounts to estimating the parameters of an underlying model that can generate similar label maps. The main challenge comes from the absence of a vector space structure due to the non-Gaussian, i.e., categorical, data likelihood. By construction, elements in every probability vector are constrained to sum to one; hence, some methods opt to use an unconstrained space defined by a bijective mapping from the original vector space [4, 15]. Log ratio transformations provide a useful alternative to probabilities that transform the probability vector maps into a vector space with probabilistic interpretations for addition and scalar multiplications [4]. To capitalize on this convenient bijective transformation, most existing methods adopt ad hoc vector spaces and then use the inverse transformation to generate a probabilistic map. Typically, the vector spaces are defined by first selecting a spatially varying shape representation (e.g., level sets), and each sample representation amounts to a point in that vector space [16–25]. For instance, Pohl et al. [4, 14], Riklin-Raviv et al. [10], and Iglesias et al. [7] used signed distance maps (SDMs) as intermediate shape representations to define the vector space and further used additive log ratio (ALR) transformation as the inverse mapping to compute the probabilistic label maps. However, the extension of SDMs-based representation to more than two labels is not straightforward and would require further modeling assumptions, which was also noted by Pohl et al. [4]. Hence, in their work [4], they chose to use Gaussian-smoothed label maps (Gauss) for the multicategory case. Ashburner and Friston [9] proposed to learn probabilis- 22 tic shape templates as average warped label maps and performed interpolation based on ALR-transformed probabilistic maps. Habas et al. [3] learned probabilistic labeling from blurred label maps and then used ALR-transformed probabilistic maps to interpolate an atlas over time. Bazin et al. [1] also constructed a probabilistic atlas from smoothed label maps. In [26], Malcolm et al. argued that the background label should be treated the same as the other object labels to establish an unbiased and optimal shape representation. They used a regular simplex-based shape representation and modeled probabilistic maps as exponential distributions centered at vertexes of the simplex. Moreover, they argued that using SDMs could bias the shape representation toward the background label, and smoothed label maps could lead to undesirable shape representations. Comparative results indicated that simplex-based representation could outperform the SDMs- and ALRbased representations. Nonetheless, performing principal component analysis (PCA) on this simplex-based representation may result in invalid points along modes of variation to lie outside the simplex (i.e., negative probabilities or probabilities that do not sum to one). Tsai et al. [27], on the other hand, proposed an irregular simplex-based representation by not considering the background to be the same as other object labels, leading to a representation that is biased toward the background label. In a similar vein, Neda and Ghassan [28] and Andrews et al. [15] discussed the effectiveness of using the isometric log ratio (ILR) [29] over the ALR to estimate probabilistic label map from smoothed binary label maps. The main rationale was that unlike the ILR, ALR biases the representation with respect to a particular label. However, the ILR works on a specific arrangement of labels and is not guaranteed to produce consistent results if labels are permuted. Even though the existing literature does provide insights about estimating a probabilistic label map, clear insights about learning optimal probabilistic labeling that reflect the underlying generative process are still lacking. Another line of research [30–33] focuses on label fusion from multiple label maps to achieve optimal consensus among different segmentations of the same object obtained from algorithms or human experts. Their main focus is to simultaneously evaluate and correct the given segmentations by building ”ground truth” segmentation. Such a consensus can be considered as an optimized weighted average of label maps. Nonetheless, such 23 an approach has been shown to over-fit with limited training data [34]. On the contrary, here, we propose to estimate a globally optimal probabilistic label map that conforms with the underlying generative process. The produced probabilistic segmentation shows better performance in guiding a Bayesian image segmentation as well as enables us to discover the natural subgroups via shape-based clustering. 2.4 Methods We propose to estimate probabilistic labeling for a set of multilabel maps based on a generative model, constructed over the field of categorical, a.k.a. multinoulli [35], random variables defined over the spatial domain. The optimal probabilistic segmentation is estimated as a maximum-a-posteriori (MAP) estimate of the corresponding natural parameters of the exponential family form of multinoulli distributions. In this section, we formulate the generative model and further explain the optimization process to estimate globally optimal probabilistic label maps. 2.4.1 Generative Model for Label Maps Given rigidly aligned segmentations, due to inherent variations within anatomical shapes, a simple average does not aptly represent the given population. Therefore, we formulate a data likelihood and further impose a prior to achieve a representative probabilistic map conforming to the generative process. Consider a raster defined over a d-dimensional spatial domain Ω ⊂ Rd containing M pixels and L ∈ N+ objects including background, ωl ⊂ Ω, l = 1, · · · , L where L ≥ 2. A label map f defines a labeling function that maps each image voxel x ∈ Rd to a 1-of-L binary encoding vector f(x), defined as h iT f(x) = f 1 (x), . . . , f L (x) ∈ {0, 1} L , s.t. 1T f(x) = 1, where f l (x) = 1 iff x ∈ ωl and 1 is a vector of ones in the L-dimensional space. A label map can be considered as a realization of a field of independent multinoulli random variables defined over the spatial domain Ω with a voxel-wise parameter vector q(x) = [q1 (x), . . . , q L (x)] T ∈ [0, 1] L such that 1T q(x) = 1. The parameter ql (x) quantifies the probability of a voxel x being assigned to the l −th object. Collectively, these parameters define a probabilistic map ql over the spatial domain Ω such that ql : Ω → [0, 1] M . These probabilistic maps then jointly form a multinoulli parameter map, q ∈ [0, 1] L× M , 24 which defines a probability distribution over the observed label maps and quantifies the underlying uncertainty about the label assignment. Though effective, the above formulation models every pixel independently, hence undermining the correlation between adjacent pixels belonging to the same object. Spatial regularity priors on the label maps, such as Markov random fields (MRF) [36–39], model local structure interactions between pixels and capture the general property that nearby pixels are more likely (than not) to have the same label, and therefore the voxel-wise independence assumption can be relaxed. The probability of a label map drawn from such a spatially regular distribution can be formulated in terms of a generative model given by (2.1), which includes voxel-wise multinoulli likelihood and the MRF spatial prior. ! i f l (x) L h 1 1 × exp − E f(x) , p (f\|q) = ∏ ∏ q l (x) Z T x∈ Ω l =1 (2.1) where E(f(x)) are clique potentials that favor spatially coherent label maps, Z is a Gibbs distribution normalization constant, and T is its temperature [40]. The choice of MRF prior on label maps in (2.1) does not affect the learning process of q, and thus it will be omitted hereafter for notational simplicity. Consider a set of label maps F = {f1 , f2 , ..., f N } independently drawn from a population with L labels including the background. We seek to estimate the globally optimal parameter map q∗ ∈ [0, 1] L× M that corresponds to the generative model in (2.1). Here, we use a MAP estimation of q, where the posterior distribution of q given the training set of label maps F reads as p(q\|F ) ∝ p(F \|q) p(q). The corresponding MAP estimate is given by q∗ = argmax p(q\|F ), q subject to q∗ ∈ [0, 1] L× M and 1T q(x) = 1, ∀x ∈ Ω. Solving this constrained optimization problem is highly complex due to the multiple positive constraints on each component of q∗ along with the constraint of probability values at every voxel being summed to one. In this paper, we reformulate this estimation process as a more relaxed constrained optimization by exploiting the natural parameter space of the exponential family distri- 25 butions. In particular, the multinoulli likelihood has an equivalent form in terms of the exponential family distributions that is parameterized by an unconstrained field of real values φ ∈ R L× M , known as exponential parameters or the widely used term of natural parameters. The parameter vector q(x) ∈ [0, 1] L , corresponding to φ(x) ∈ R L , is the first moment of this exponential form, and hence it is denoted as a mean or expectation parameter. The merit behind considering such an equivalence is casting any parameter estimation problem as a more relaxed constrained optimization in the natural parameters space. The typically adopted linking function that would relate expectation parameters to the corresponding natural parameters is the LogOdds/logistic function (a.k.a. additive log-ratio – ALR – transformation [41]). Nonetheless, this transformation is biased toward the background label (or some other arbitrarily designated label). Hence, ALR overestimates the variability captured around foreground-background interactions at the expense of boundaries between different foreground labels [15], resulting in a suboptimal shape representation. The ILR transformation [29] also imposes a specific arrangement of labels and hence does not guarantee consistent results if the labels were permuted. We chose to use the centered log-ratio (CLR) transformation, which provides a permutation-invariant log ratio transformation. Unlike ILR, the CLR transformation suffers from singularity, and a vector in the CLR transformed space is constrained to sum to zero. However, this constraint is much easier to manage by a simple substitution, as shown in (2.9). The CLR mapping of a probability vector and its inverse is given by h √ q i q1 φ = clr(q) = ln , ··, ln l , g(q) = l q1 q2 · ·ql , g (q) g (q) q = clr−1 (φ) = C(exp(φ)). The above equations can be written in a matrix form as follows: h i φ(x) = B log q(x) ⇔ q(x) = C exp φ(x) , (2.2) where C is a re-normalization, a.k.a. closure, operator defined as C(v) = v/1T v. The matrix B = [b1 , b2 , . . . , b L ] is a symmetric matrix of size L × L such that  bl = T  1  −1, . . . , −1 , L − 1, −1, . . . , −1 . \| {z }  L  \| {z } l − 1 elements ( L − l ) elements (2.3) 26 Hence, the corresponding generative model in the natural parameter space is written as h i exp φ(x) h i, log p(f\|φ)= ∑ f(x)T log (2.4) x∈ Ω 1T exp φ(x) n h i o = ∑ φ(x)T f(x) − log 1T exp φ(x) . (2.5) x∈ Ω Next, we derive the MAP estimation for optimal φ with special emphasis on the model hyperparameters that would promote generalizability. 2.4.2 Optimal Natural Parameter Map Estimation In the natural parameter space, we seek the optimal natural parameter map φ∗ : Ω → R L× M as the MAP estimate of the generative model in (2.4), i.e., φ∗ = argmax p(φ\|F ). (2.6) φ This section further formulates the posterior objective function and derives the iterative gradient-based updates to estimate φ∗ . 2.4.2.1 Objective Function for MAP Estimation The log-posterior probability to be maximized can be written as log p(φ\|F ) ∝ log p(f1 , ..., f N \|φ) + log p(φ). (2.7) For optimal φ, we can ignore terms independent of φ. Using (2.5), the log marginal likelihood reads as log p(f1 · ·f N \|φ) = ∑ ( x∈ Ω N ∑ φ(x)T fn (x) − N log h 1T exp φ(x) i ) . (2.8) n =1 To perform any gradient-based optimization on the above marginal likelihood, the constraint φ T 1 = 0 must be satisfied. Therefore, we rewrite the marginal likelihood in an unconstrained L − 1 dimensional space, by using φL = − ∑lL=−11 φl . The revised log marginal likelihood reads as log p(f , ··, f \|φ)= ∑ 1 N x∈ Ω ( N L −1 n =1 l =1 ∑ ∑ L −1 − N log ∑ exp l =1 h φl (x)fnl (x) − fnL (x) L −1 ∑ φl (x) ! l =1 i h φl (x) + exp − L −1 ∑ φl (x) i !) . (2.9) l =1 To obtain spatially coherent parameter maps for the labels and to avoid overfitting in the case of N being relatively small, we introduce a Gaussian MRF prior over individual 27 φl -maps with φl : Ω → R M , where the prior in (2.7) can be factored out as p(φ) = ∏lL=−11 p(φl ). This smoothness prior over a φl -map is written as a Gibbs distribution, p(φl ) ∝ exp {−λU (φl )}; λ > 0. The hyperparameter λ controls the generalizability aspect of the resulting shape representation. It favors more smoothness if training data were drawn from a population of high variability (relative to N). Note that λ is positive by construction as negative values will favor nonsmooth φl -maps. The Gibbs energy U (φl ), hence, is chosen to favor smooth parameter maps by penalizing abrupt edges. We use a Dirichlet energy, i.e., U (φl ) = k∇φl k22 to quantify the abrupt local changes within the resulting φl -map. The MAP estimation can thus be written in terms of a minimization problem with the corresponding objective function given by ( E (φ\|F , λ)= ∑ x∈ Ω − N log N L −1 L −1 n =1 l =1 l =1 ∑ ∑ φl (x)fnl (x) − fnL (x) ∑ φl (x) !) h i h L −1 i L −1 λ ∑ exp φl (x) + exp − ∑ φl (x) + 2 l =1 l =1 ! L −1 ∑ k∇φl k22 . (2.10) l =1 The objective function E is convex with respect to φ, a detailed proof of convexity is derived in Section 2.4.3. 2.4.2.2 Gradient Descent-Based Optimization Due to the convexity of E , the global optimum φ∗ of (2.10) can be obtained using a gradient descent scheme. However, to enable faster convergence with stable large time steps, we use a semi-implicit update scheme with forward time marching. The first variation of (2.10) with respect to the φl reads as h i ∂E = N ql − qL − f̄l + f̄L + λ∆φl , ∂φ l where f̄ = 1 N ∑n fn is the average label map of F , and ∆ denotes the Laplacian operator being applied to the individual φl -maps. Hence, the iterative update based on a semiimplicit scheme for φl is given by 1 ( t +1) (t) φl = ⊗ φl − δtN ql − qL − f̄l + f̄L , 1 + λδt∆ ∀l ∈ {1, ..., L − 1}. (2.11) The final component φL is obtained using the constraint, φLt+1 = − ∑lL=−11 φlt+1 , at the end of every iteration. We perform the spatial convolution ⊗ in the Fourier domain for efficient 28 computation. The convex objective function further enables us to use any initial φ. We opt to use a voting-based initialization where f̄ is used as initial q and then transformed to φ using (2.2). We further restrict q to (0, 1)−interval to prevent overflow of log-ratios at pixels that have unanimous labeling in the training data. Estimation of the optimal φ-map depends on the choice of the hyperparameter λ, which can be viewed as a regularization parameter. It can also be considered as indicative of the training process, in particular, the variability of the population at hand with the sampling process from such a population. Hence, we estimate the optimal hyperparameter λ∗ using cross-validation over training data to account for unseen data. The model selection criterion S is expressed in terms of the negative log-likelihood on validation sets (held-out samples) as S(λ, F , K ) = 1 K K 1 n ∗ − log p f \| φ ( λ, F ) −k , ∑ \|Fk \| n∑ f ∈F k =1 (2.12) k where Fk is the held out set in the k−th fold, and F−k contains the remaining label maps used for training. We use a coarse-to-fine grid search to efficiently find the optimal value λ∗ , which is then used to estimate φ∗ (λ∗ , F ). 2.4.3 Proof of Convexity The objective function in (2.10) consists of two components: likelihood and prior. The prior term is a convex function, and therefore, we will focus on proving the convexity of the negative log likelihood term E LL , that reads as E LL = ∑ x∈ Ω ( T h N log 1 exp φ(x) i N − ∑ φ (x) ) T n f (x) , n =1 h i where exp φ(x) is a vector of size L, the number of labels, with each component i being exp(φ(x)i ). First- and second-order partial derivatives of E LL with respect to l-th element in vector φ(x) reads as ( ) ∂E LL exp(φ(x)l ) h i − f̄(x) , =N ∂φ(x)l 1T exp φ(x) exp(φ(x)l ) exp(φ(x)k ) ∂2 E LL = −N h i 2 , ∂φ(x)l ∂φ(x)k T 1 exp φ(x) 29 ( ) ∂2 E LL exp(φ(x)l ) exp(φ(x)l )2 h i− =N h i 2 . ∂φ(x)2l T 1T exp φ(x) 1 exp φ(x) The gradient vector and Hessian matrix formed by the above elements can be written as   h i     exp φ(x) ∂E LL h i − f̄(x) , =N   ∂φ(x)   1T exp φ(x) h i h i h iT ) diag exp φ(x) exp φ(x) exp φ(x) LL h i − , =N h i 2 ∂φ(x)2 T 1T exp φ(x) 1 exp φ(x) ( ∂2 E h i h i where diag exp φ(x) is a diagonal matrix with diag exp φ(x) = exp φ(x)i , i ∈ ii {1, · · · , L}. Using the inverse CLR mapping from (2.2) h h i h ii ∂2 E LL T = N diag q ( x ) − q ( x ) q ( x ) ∂φ(x)2 Let A represent the above Hessian matrix, then A(i,j);i6= j = −q(x)i q(x) j , A(i,i) = q(x)i − q(x)2i L L L j =1 j =1 j =1 (2.13) ∑ A(i,j) = q(x)i − q(x)i ∑ q(x) j = 0; ∑ q(x) j = 1 The above equations show that A is a Hermitian diagonally dominant matrix with real nonnegative diagonal entries and hence is positive semidefinite. 2.5 Experimental Setup In [13], we demonstrated the effectiveness of the proposed method for the binary case (i.e., single object scenario) over existing solutions based on the generalization performance and in the context of medical imaging applications: consensus generation and shape-based clustering. In this paper, we compare the existing solutions with the proposed model on medical image segmentation and multiatlas segmentation applications. Furthermore, we showcase the shape-based clustering in the multilabel scenario. In this section, we briefly explain the experimental protocol, including existing methods, applications, evaluation metrics, and datasets used for comparison. 30 2.5.1 Existing Methods Existing methods include probabilistic label maps obtained from Gaussian-smoothed label maps and smooth signed distance maps (SDMs) [15]. During experimentation, we observed that the performance of Gaussian-smoothed label maps varies significantly based on the numerics involved to avoid log(0), for the calculation of multinoulli log-likelihood in (2.8). To avoid this unstable behavior, we compute smooth label maps by applying a logistic function on signed distance maps. The probabilistic label map is estimated as the average of all smooth label maps. A smooth label map h and the shape representation for a set of label maps are computed as h l (x) = 1 , ∀x ∈ Ω, l ∈ {1, · · · , L} 1 + exp (−k ( Dl (x) − c)) φLogistic = 1 N (2.14) N ∑ hn , (2.15) n =1 where Dl (x) defines the signed euclidean distance of point x ∈ Rd to the closest point onto ∂ωl ; the zero level set defining the boundary of object ωl , i.e., object with label l. In our experiments, the optimal value of k and c parameters is determined using crossvalidation and the model selection criterion defined in (2.12), in a similar fashion as the optimal hyperparameter λ∗ estimation. 2.5.2 Datasets We considered three publicly available datasets: BrainWeb [42], MouseBrain [43], and LeftAtrium [44]. The MouseBrain dataset is used to showcase segmentation performance for medical image segmentation and multiatlas segmentation applications. The LeftAtrium dataset is used to showcase the shape-based clustering. All datasets and their experimental protocols are summarized below: 1) The BrainWeb dataset contains 20 normal brain models. The label maps are linearly registered to the International Consortium for Brain Mapping average brain space [42]. The dataset comprises 11 labels that we merge into six categories: Background (Bg), CSF, Gray Matter (GM), White Matter (WM), Fat/Muscle/Skin (FMS), and Skull/Marrow/Dura (SMD). Figure 2.1(a) shows an example of these shapes. Vessels and connective are labeled according to the nearest neighbors. Models are trained using training datasets of sizes 4, 8, 12, and 16, where five independent experiments were 31 performed in each case. Results from this dataset are used for qualitative comparison of resulting probabilistic maps. 2) The LeftAtrium dataset [44] contains 72 anonymized late-gadolinium enhancement (LGE) cardiac MRI volumes, obtained retrospectively from patients with atrial fibrillation.1 The dataset contains corresponding expert-delineated binary segmentations for the epicardium and endocardium; see Figure 2.2 for sample shapes. All the images were rigidly aligned and scaled to one of the subjects from the dataset. The dataset contains few subgroups based on the size of the atrium and branching pattern of the pulmonary veins, and we therefore use this dataset to demonstrate shape-based clustering. 3) The MouseBrain parcellation dataset [43] consists of 19 label maps from 9 in vivo and 10 in vitro MR images of normal adult mouse brains, segmented into 20 regions excluding background. All images were rigidly aligned to one of the reference image from the dataset. For computational ease, we combined these 20 regions into five broad categories according to Atlas ver1:2 Cerebral cortex (Cr), Cerebral nuclei (Cn), Cerebellum (Cb), Inter brain (Ib), and Mid brain/Brain Stem (MbBs); see Figure 2.3 for sample shapes. Training sizes of 4, 8, 12, and 16 are used for image segmentation experiments, where five independent experiments are performed in each case. 2.5.3 Application to Image Segmentation With Shape Prior Let I ∈ R M be the input image to be segmented and f ∈ [0, 1] L× M be the unknown labeling function of I . The Bayesian image segmentation seeks an optimal labeling f∗ of the given image I that maximizes the log-posterior probability p(f\|I , φ), where φ serves as a parameterization of the probability distribution of a shape-class of interest. The intensity/appearance likelihood can be written as p(I\|f) = ∏ L h i f l (x) , ∏ p I(x)\|x ∈ ωl (2.16) x∈ Ω l =1 where we use a nonparametric density estimate to model object intensities based on a given set of training images. The probability of a labeling function f generated from 1 Scans and segmentation were obtained retrospectively from the AFib database at the University of Utah. 2 http://help.brain-map.org/display/mousebrain/Documentation 32 a distribution parameterized by the corresponding parameter map q is given by (2.1). Hence, the optimal labeling function f∗ can be estimated as f∗ = argmin − log p(f\|I , φ) ∝ − log p(I\|f) − log p(f\|φ), f L = argmin f ∑ ∑ − f l (x) x∈ Ω l =1 h i 1 log p I(x)\|x ∈ ωl + log ql (x) − U (f), T (2.17) where we use the standard Potts model for pairwise potentials U (f) where the temperature constant is set via cross-validation. A graph-cut on Markov random fields (MRFs) under a maximum-a-posteriori estimation framework enables the explicit incorporation of high-level contextual information (appearance- and shape-based) in the energy function while guaranteeing a global optimum. Therefore, the energy in (2.17) can be efficiently minimized via the α-β-swap algorithm [45], which works for semimetric potentials in (2.17), and produces an optimal low-order polynomial time solution. To evaluate the segmentation performance, we used the dice coefficient [46] that quantifies the agreement between the ground truth labeling and the estimated one. 2.5.4 Application to Multiatlas Segmentation Given a set of atlases for a particular object class, the task is to segment that object in a new intensity image. In a multiatlas segmentation, first a registration step estimates a deformable registration between the given intensity image and each atlas intensity image. These registration parameters are then used to deform the atlas label map. Subsequently, all the deformed atlases are fused together to generate segmentation for the given intensity image. The performance of multiatlas segmentation heavily relies on this registration step; therefore, in most cases, background removal techniques (e.g., skull stripping for brain images) are used to preprocess the intensity images to improve registration estimates. Moreover, due to a limited number of atlases in most medical applications, the fusion step must be robust enough to not ignore the subject-specific features, which may not be present in all the atlases [47]. The MouseBrain dataset is used to demonstrate the superior performance of the proposed method in label fusion for multiatlas segmentation. A skull stripping step is performed to clear out the other head structures in intensity images, which is achieved by 33 masking out the objects of interest using the ground truth segmentations. A sequence of rigid, affine, and Bspline-based deformable image registrations is used to estimate the transformation parameters between a particular sample and all other samples in the dataset. For a new subject, four closest samples are chosen based on the Frobenius norm of the deformation matrix for Bspline-based deformable image registration. The corresponding label maps are then deformed using the estimated parameters. The fusion of label maps is achieved by proposed multinoulli formulation as estimating an optimal parameter map using the four deformed label maps as training samples. The output is a probabilistic segmentation, which is then converted to a multicategory label map by simply selecting the label corresponding to the highest probability for a particular voxel. We compare the performance of label fusion with STAPLE [30] and the commonly used majority voting scheme. 2.5.5 Application to Shape-Based Clustering Here, we formulate an expectation-maximization (EM)-based clustering algorithm to discover subpopulations within a given set of label maps. The objective function being optimized in most EM-based clustering algorithms is the data likelihood of given samples with respect to the cluster centers. Hence, the cluster-center estimation in the M-step is reduced to a simple weighted average of samples [48]. Here, we formulate a posteriorbased objective function, which results in the proposed multinoulli representation, as a more robust estimate of cluster centers. Let the given set of label maps F = {f1 , ...., f N } be divided into K clusters, and π k and φk be the likelihood probabilities of clusters and their centers, respectively. The objective function for clustering can thus be written as L(Θ) = log p(Θ\|F ) ≈ log p(F \|Θ) + log p(Θ), log p(F \|Θ) = N ∑ n =1 log p(fn \|Θ) = N ∑ n =1 (2.18) K log ∑ p(fn , zkn = 1\|Θ), k =1 Θ = { φ1 , · · · , φ K , π 1 , · · · , π K }, where zn ∈ {0, 1}K are the latent binary indicator vectors that identify the cluster labeling for each label map. We impose an uninformative prior on the cluster probabilities using 34 the Beta distribution with (1, 1) as parameters. Further, using the Dirichlet energy from (2.10), we get log p(Θ) = K L −1 λk ∑ − 2 ∑ \|\|Oφlk \|\|22 + β(π k ; 1, 1), l =1 k =1 where λk is the hyperparameter for k-th cluster. Using Jensen’s inequality, the objective function becomes N K ∑ ∑ Q(zkn = 1) log L(Θ) >= n =1 k =1 p(fn , zkn = 1\|Θ) Q(zkn = 1) K + ∑ log p(φk ) + log p(π k ), (2.19) k =1 where Q(zkn = 1) is the variational approximate posterior and ∑k=1 Q(zkn = 1) = 1. The estimation and maximization steps of clustering can be further derived as the following: E-step: Q(zkn = 1) = π k p (fn \| φ k ) ∑Kj=1 π j p(fn \|φ j ) (2.20) M-Step: πk = 1 N N ∑ Q(zkn = 1). (2.21) n =1 Using (2.2), φ k = 1 1 + λk δt∆ k ⊗ φ − δt N ∑ n =1 wnk ∑ N 1 wnk fn q − n= ∑nN=1 wnk k !! , (2.22) where wnk = Q(zkn = 1). It is important to note that computing p(fn \|φk ) suffers from underflow, due to the product of a large number of pixel-wise multinoulli likelihoods. Therefore, in all experiments we rely on the log likelihood from (2.5). However, it can be seen that using just the log likelihood would not help in tractable estimation of the E-step in (2.20). We need to further adopt some numerical tricks. In particular, we divide the numerator and denominator by max p(fn \|φ1 ), · · · , p(fn \|φk ) , which does not change the equation. This results in crisp weights Q(zkn = 1) of values close to zero or one, effectively making this a max-max optimization, similar to k-means clustering. 35 2.6 Results and Discussion Here, we refer to shape representations from proposed and existing methods defined in (2.6) and (2.15) as Multinoulli and Logistic representations, respectively. Figures 2.1–2.3 compare the probabilistic segmentations obtained from Multinoulli and Logistic representations for the lowest and highest training sizes. Evidently, Logistic representation is significantly oversmoothed even for the highest number of training samples, resulting in a suboptimal generalization performance. Using WM from BrainWeb dataset as an example, Figure 2.1(g) highlights that the Logistic representation is noisy near object boundaries where there is high variability in the training samples and hence requires more smoothing. Among the four representations in Figure 2.1(g), the Logistic representations are most smoothed. Comparing the Multinoulli for the training sample size of 4 versus 16, the representation obtained from four training samples is naturally smoother, but still less smooth than the Logistic representation generated from 16 training samples. Overall, the proposed model generates smooth meaningful probabilistic label maps that respect the underlying variability; specifically, WM in the BrainWeb dataset best demonstrates this contrast. Also, the Multinoulli representation is more robust in modeling the uncertainty in labeling near object boundaries. 2.6.1 Image Segmentation We use the estimated probabilistic segmentations of the MouseBrain dataset as shape priors to segment mouse brain images. Figure 2.4 compares the segmentation performance of Multinoulli and Logistic representations on all labels in the MouseBrain dataset. Each plot compares the dice coefficient [46] between the two methods, for a particular label, across all experiments, including five independent experiments for each of the four different training sizes. For every case, Multinoulli representation outperforms Logistic as a shape prior in aiding the segmentation process. Both methods performed well in discriminating background versus foreground. However, Multinoulli faired slightly better due to superior shape representation near boundaries. It is important to note that especially for shapes with high variability across samples (Cerebral Nuclei, Interbrain and Midbrain), the Multinoulli outperformed Logistic on nearly every sample in every experiment, including training as 36 well as test sets. For other shapes with more consistently solid structures, such as Cerebral Cortex and Cerebellum, the Logistic performance improved; however, the Multinoulli performed better on at least 65% of samples for those shapes. Figure 2.4(d) shows a sample case where dice coefficient for Logistic is better, due to the extra thin structures detected by Multinoulli. Both methods performed consistently well on the segmentation of the background label. Figure 2.5 showcases the sample segmentation results, underlining the contrast in performance between the two methods. Figure 2.5(a) illustrates the manual segmentations for image regions in Figure 2.5(b). The squared regions highlight the complexity in terms of multiple interacting shapes and the thin structures in some shapes. Comparison results in Figures 2.5(c) and 2.5(d) clearly highlight the superiority of the Multinoulli in marking the object boundaries, especially with thin overlapping structures. Using the Logistic representation as a shape prior oversmoothed the thin regions, whereas the Multinoulli representation is able to model such regions consistently using same training samples. This generalizability of the Multinoulli approach makes it more robust in generating accurate segmentations. Further, it is important to note the difference in performance for the second rectangular region, where the Multinoulli representation produced consistently good segmentations for different training sample sizes. In contrast, even with no thin structures, Logistic representation led to varied suboptimal results. The Logistic representation overfits to the training samples, hence producing inconsistent results over different training sizes. On the other hand, the Multinoulli representation is able to generalize consistently with different training sets. 2.6.2 Multiatlas Segmentation Figure 2.6 compares the label-wise performance for multiatlas segmentation of the three methods. The bar plots correspond to the performance over all the samples in the dataset. The proposed method outperforms STAPLE and the majority voting for label fusion. The superior performance is mainly attributed to the MAP estimation using the generative model over the label maps, which helps in generalizing over the unseen samples. 37 2.6.3 Shape-Based Clustering Figure 2.7 presents results for shape-based clustering on the LeftAtrium dataset using the Multinoulli representation. We chose to find three clusters, based on the background information from the dataset. Initial cluster centers are assigned using the dominant modes of PCA subspace. Results indicate that samples with similar shape characteristics are eventually assigned to the same cluster. The three clusters evolved based on the size of arteries as well as the shape of the left atrium. Clustering was performed using complete multilabel configuration; however, for brevity, only the endocardium shape is displayed. Figure 2.7 presents the endocardium shape representation for the initial and final cluster centers. The top ten members from each cluster are also displayed, arranged in descending order of the log likelihood (2.5). 2.7 Conclusion This paper proposed to estimate an optimal multilabel probabilistic segmentation using a generative model over a given set of training label maps. With the experimental results, we showcased that the proposed method can be used for various medical imaging applications. The representation obtained by the proposed method outperformed the current ad hoc practice of averaging the smoothed label maps for medical image segmentation, multiatlas segmentation, and consensus generation (for two labels case). Specifically, the importance of the proposed method is highlighted by robust results obtained with a small number of training samples for all three datasets used for the experiments. Further, effective shape-based clustering can be performed using the representation obtained from the proposed method. The proposed model can be extended from a MAP-based point estimate to estimating the posterior distribution over shape representation using a fully Bayesian formulation. Estimating this posterior distribution could be useful in generating samples from the underlying generative model and therefore can be used for data augmentation to generate large datasets for training deep networks. Here, our focus was to generate a probabilistic segmentation that respects the underlying generative model and is useful for various medical imaging applications such as image segmentation, shape clustering, and label fusion. 38 2.8 References [1] P. Bazin and D. Pham, “Homeomorphic brain image segmentation with topological and statistical atlases,” Med. Image Anal., vol. 12, no. 5, pp. 616–625, Oct. 2008. [2] E. Dittrich, T. Raviv, G. Kasprian, R. Donner, P. Brugger, D. Prayer, and G. Langs, “A spatio-temporal latent atlas for semi-supervised learning of fetal brain segmentations and morphological age estimation,” Med. Image Anal., vol. 18, no. 1, pp. 9–21, Jan. 2014. [3] P. Habas, K. Kim, J. Corbett-Detig, F. Rousseau, O. Glenn, A. Barkovich, and C. Studholme, “A spatiotemporal atlas of MR intensity, tissue probability and shape of the fetal brain with application to segmentation,” Neuroimage, vol. 53, no. 2, pp. 460–470, Nov. 2010. [4] K. Pohl, J. Fisher, S. Bouix, M. Shenton, R. McCarley, W. Grimson, R. Kikinis, and W. Wells, “Using the logarithm of odds to define a vector space on probabilistic atlases,” Med. Image Anal., vol. 11, no. 5, pp. 465–477, Oct. 2007. [5] K. Pohl, J. Fisher, R. Kikinis, W. Grimson, and W. Wells, “Shape based segmentation of anatomical structures in magnetic resonance images,” in The International Workshop on Computer Vision for Biomedical Image Applications, 2005, pp. 489–498. [6] K. Pohl, R. Kikinis, and W. Wells, “Active mean fields: Solving the mean field approximation in the level set framework,” in The International Conference on Information Processing in Medical Imaging, 2007, pp. 26–37. [7] J. Iglesias, M. Sabuncu, and K. Leemput, “A unified framework for cross-modality multi-atlas segmentation of brain MRI,” Med. Image Anal., vol. 17, no. 8, pp. 1181–1191, Dec. 2013. [8] M. Sabuncu, B. Yeo, K. Leemput, B. Fischl, and P. Golland, “A generative model for image segmentation based on label fusion,” IEEE Trans. Med. Imag., vol. 29, no. 10, pp. 1714–1729, Oct. 2010. [9] J. Ashburner and K. Friston, “Computing average shaped tissue probability templates,” Neuroimage, vol. 45, no. 2, pp. 333–341, Apr. 2009. [10] T. Raviv, K. Leemput, B. Menze, W. Wells, and P. Golland, “Segmentation of image ensembles via latent atlases,” Med. Image Anal., vol. 14, no. 5, pp. 654–665, Oct. 2010. [11] A. Srivastava, S. H. Joshi, W. Mio, and X. Liu, “Statistical shape analysis: Clustering, learning, and testing,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 4, pp. 590– 602, Apr. 2005. [12] C. Brown, B. Booth, and G. Hamarneh, “K-confidence: Assessing uncertainty in tractography using K optimal paths,” in The IEEE International Symposium on Biomedical Imaging, 2013, pp. 250–253. [13] S. Y. Elhabian, P. Agrawal, and R. T. Whitaker, “Optimal parameter map estimation for shape representation: A generative approach,” in The IEEE International Symposium on Biomedical Imaging, 2016, pp. 660–663. 39 [14] K. Pohl, J. Fisher, M. Shenton, R. McCarley, W. Grimson, R. Kikinis, and W. Wells, “Logarithm odds maps for shape representation,” in The International Conference on Medical Image Computing and Computer Assisted Intervention, 2006, pp. 955–963. [15] S. Andrews, N. Changizi, and G. Hamarneh, “The isometric log-ratio transform for probabilistic multi-label anatomical shape representation,” IEEE Trans. Med. Imag., vol. 33, no. 9, pp. 1890–1899, Sep. 2014. [16] M. E. Leventon, W. E. L. Grimson, and O. Faugeras, “Statistical shape influence in geodesic active contours,” in The IEEE Conference on Computer Vision and Pattern Recognition, 2000, pp. 316–323. [17] A. Tsai, A. Yezzi Jr, W. Wells, C. Tempany, D. Tucker, A. Fan, W. E. Grimson, and A. Willsky, “A shape-based approach to the segmentation of medical imagery using level sets,” IEEE Trans. Med. Imag., vol. 22, no. 2, pp. 137–154, Feb.2003. [18] M. Rousson, N. Paragios, and R. Deriche, “Implicit active shape models for 3D segmentation in MR imaging,” in The International Conference on Medical Image Computing and Computer Assisted Intervention, 2004, pp. 209–216. [19] S. Dambreville, Y. Rathi, and A. Tannen, “Shape-based approach to robust image segmentation using kernel pca,” in The IEEE Conference on Computer Vision and Pattern Recognition, 2006, pp. 977–984. [20] S. Dambreville, Y. Rathi, and A. Tannenbaum, “A framework for image segmentation using shape models and kernel space shape priors,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 8, pp. 1385–1399, Aug. 2008. [21] F. Lecumberry, Á. Pardo, and G. Sapiro, “Simultaneous object classification and segmentation with high-order multiple shape models,” IEEE Trans. Image Process., vol. 19, no. 3, pp. 625–635, Mar. 2010. [22] D. Cremers, S. J. Osher, and S. Soatto, “Kernel density estimation and intrinsic alignment for shape priors in level set segmentation,” International Journal of Computer Vision, vol. 69, no. 3, pp. 335–351, May 2006. [23] J. Kim, M. Çetin, and A. S. Willsky, “Nonparametric shape priors for active contourbased image segmentation,” Signal Process., vol. 87, no. 12, pp. 3021–3044, Jun. 2007. [24] A. Wimmer, G. Soza, and J. Hornegger, “A generic probabilistic active shape model for organ segmentation,” in The International Conference on Medical Image Computing and Computer Assisted Intervention, 2009, pp. 26–33. [25] D. M. Gavrila, “A bayesian, exemplar-based approach to hierarchical shape matching,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 8, pp. 1408–1421, Aug. 2007. [26] J. Malcolm, Y. Rathi, M. Shenton, and A. Tannenbaum, “Label space: A coupled multishape representation,” in The International Conference on Medical Image Computing and Computer Assisted Intervention, 2008, pp. 416–424. [27] A. Tsai, W. Wells, C. Tempany, E. Grimson, and A. Willsky, “Mutual information in coupled multi-shape model for medical image segmentation,” Med. Image Anal., vol. 8, no. 4, pp. 429–445, Dec. 2004. 40 [28] N. Changizi and G. Hamarneh, “Probabilistic multi-shape representation using an isometric log-ratio mapping,” The International Conference on Medical Image Computing and Computer Assisted Intervention, pp. 563–570, 2010. [29] J. J. Egozcue, V. Pawlowsky-Glahn, G. Mateu-Figueras, and C. Barcelo-Vidal, “Isometric logratio transformations for compositional data analysis,” Math. Geol., vol. 35, no. 3, pp. 279–300, Apr. 2003. [30] S. Warfield, K. Zou, and W. Wells, “Simultaneous truth and performance level estimation (STAPLE): An algorithm for the validation of image segmentation,” IEEE Trans. Med. Imag., vol. 23, no. 7, pp. 903–921, Jul. 2004. [31] T. R. Langerak, U. A. van der Heide, A. N. T. J. Kotte, M. A. Viergever, M. van Vulpen, and J. P. W. Pluim, “Label fusion in atlas-based segmentation using a selective and iterative method for performance level estimation (SIMPLE),” IEEE Trans. Med. Imag., vol. 29, no. 12, pp. 2000–2008, Dec. 2010. [32] A. J. Asman and B. A. Landman, “Robust statistical label fusion through consensus level, labeler accuracy, and truth estimation (COLLATE),” IEEE Trans. Med. Imag., vol. 30, no. 10, pp. 1779–1794, Oct. 2011. [33] O. Commowick, A. Akhondi-Asl, and S. K. Warfield, “Estimating a reference standard segmentation with spatially varying performance parameters: Local MAP STAPLE,” IEEE Trans. Med. Imag., vol. 31, no. 8, pp. 1593–1606, Aug. 2012. [34] W. Crum, O. Camara, and D. Hill, “Generalized overlap measures for evaluation and validation in medical image analysis,” IEEE Trans. Med. Imag., vol. 25, no. 11, pp. 1451–1461, Nov. 2006. [35] K. P. Murphy, Machine Learning: A Probabilistic Perspective. MIT press, 2012. Cambridge, MA, USA: [36] M. Szummer, P. Kohli, and D. Hoiem, “Learning CRFs using graph cuts,” The European Conference on Computer Vision, pp. 582–595, 2008. [37] P. Kohli, L. Ladický, and P. H. Torr, “Robust higher order potentials for enforcing label consistency,” International Journal of Computer Vision, vol. 82, no. 3, pp. 302–324, Jan. 2009. [38] S. Nowozin and C. H. Lampert, “Global connectivity potentials for random field models,” in The IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 818–825. [39] C. Rother, P. Kohli, W. Feng, and J. Jia, “Minimizing sparse higher order energy functions of discrete variables,” in The IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1382–1389. [40] S. Z. Li, Markov Random Field Modeling in Image Analysis. Science & Business Media, 2009. [41] J. Aitchison, The Statistical Analysis of Compositional Data. Hall, Ltd., 1986. London, UK: Springer London, UK: Chapman & 41 [42] B. Aubert-Broche, M. Griffin, G. B. Pike, A. C. Evans, and D. L. Collins, “Twenty new digital brain phantoms for creation of validation image data bases,” IEEE Trans. Med. Imag., vol. 25, no. 11, pp. 1410–1416, Nov. 2006. [43] Y. Ma, P. Hof, S. Grant, S. Blackband, R. Bennett, L. Slatest, M. McGuigan, and H. Benveniste, “A three-dimensional digital atlas database of the adult c57bl/6j mouse brain by magnetic resonance microscopy,” Neuroscience, vol. 135, no. 4, pp. 1203–1215, Sep. 2005. [44] G. Veni, S. Y. Elhabian, and R. T. Whitaker, “Shapecut: Bayesian surface estimation using shape-driven graph,” Med. Image Anal., vol. 40, pp. 11–29, Aug. 2017. [45] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 11, pp. 1222–1239, Nov. 2001. [46] A. A. Taha and A. Hanbury, “Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool,” BMC Med. Imaging, vol. 15, no. 1, p. 29, Aug. 2015. [47] J. E. Iglesias and M. R. Sabuncu, “Multi-atlas segmentation of biomedical images: A survey,” Med. Image Anal., vol. 24, no. 1, pp. 205–219, Aug. 2015. [48] J. Grim, “EM cluster analysis for categorical data,” in The Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), 2006, pp. 640–648. 42 Figure 2.1. BrainWeb dataset: (a) sample of the five foreground labels, (b) Multinoulli representation obtained using four training samples, (c) Logistic representation obtained using four training samples, (d) Multinoulli representation obtained using 16 training samples, (e) Logistic representation obtained using 16 training samples, (f) color map for the shape representations, and (g) comparing the zoomed snapshot from the four representations in (b)–(e) for WM shape. 43 Figure 2.2. LeftAtrium dataset: (a) sample of the two foreground labels, (b) Multinoulli representation obtained using 12 training samples, (c) Logistic representation obtained using 12 training samples, (d) Multinoulli representation obtained using 48 training samples, (e) Logistic representation obtained using 48 training samples, and (f) color map for the shape representations. 44 Figure 2.3. MouseBrain dataset: (a) sample of the five foreground labels, (b) Multinoulli representation obtained using four training samples, (c) Logistic representation obtained using four training samples, (d) Multinoulli representation obtained using 16 training samples, (e) Logistic representation obtained using 16 training samples, and (f) color map for the shape representations. 45 Figure 2.4. Comparing the dice coefficient for segmentation results for different labels in the MouseBrain dataset, obtained using the Multinoulli and Logistic representations. Plots (a)-(f) correspond to Background, Cerebral Cortex, Cerebral Nuclei, Cerebellum, Interbrain, and Midbrain labels, respectively. 46 Figure 2.5. MouseBrain dataset: (a) Sample slices of manual segmentations and zoomed white rectangular regions and (b) image slices corresponding to segmentations in (a) and zoomed rectangular regions. Results obtained using (c) Multinoulli representation and (d) Logistic representation, with different training sample sizes. All results and zoomed figures correspond to the segmentations and image slices presented in (a) and (b), respectively. Color scheme for labels is consistent with Figure 2.3(a). 47 Figure 2.6. Comparing the Dice coefficient for multiatlas segmentation results for different labels in the MouseBrain dataset. The three schemes compared for label fusion include the proposed Multinoulli, Majority voting, and STAPLE. Figure 2.7. Clustering results on the LeftAtrium dataset: endocardium shape representation for initial and final cluster centers along with their top 10 members (sorted based on log likelihood values). CHAPTER 3 STATISTICAL SHAPE ANALYSIS OF HIP JOINT AND THREE DIMENSIONAL COVERAGE ESTIMATION IN DYSPLASTIC POPULATION P. Agrawal, J. Mozingo, K. Uemura, S. Y. Elhabian, R. T. Whitaker, A. E. Anderson, ”Modeling Three Dimensional Coverage in Dysplastic Population Using Statistical Shape Modeling of Hip Joint,” Under Submission. 49 3.1 Abstract This work address the problems in statistical shape modeling of pelvis shapes and modeling the joint shape variabilities of femur bone and pelvis. This chapter also describes a method to model joint shape variations with interacting anatomies that addresses the issue of relative alignment. Pelvis bone presents complex shape modeling challenges, including high curvature and thin regions of the iliac wing and acetabulum. With the successful modeling of the pelvis, one of the major challenges for joint modeling of the entire hip anatomy has been resolved, which has further enabled us to understand the coverage of the femoral head with respect to the acetabulum as the hip joint varies across the population. The results are showcased using a population with clinically determined dysplasia. 3.2 Introduction Understanding the morphology of hip joint is crucial for successful surgical treatment of various hip abnormalities including, but not limited to, pincer femoroacetabular impingement (FAI) and hip dysplasia. Specifically, the coverage of the femoral head by the acetabulum is of great interest in surgical planning. Current clinical practices often rely on two dimensional (2D) radiographic measurements for articulate this understanding [1–5]. Recent studies have shown that the three dimensional (3D) modeling can produce a more reliable measure of the coverage [6]. With the recent advances in statistical shape modeling, the task of computing coverage on hip models can be automated and may lead to consistent results, in place of the manual labeling needed in current methods. Further, statistical shape models can be used to correlate the coverage statistics in symptomatic populations, such as dysplasia and FAI, with the directions of shape variations and may lead to predictive models. In this research, we formalize the process of computing coverage used in [6], by using shape correspondence models, and compute the coverage statistics for directions of shape variability. The results are computed using presurgery hip joints with clinically determined dysplasia. 50 3.3 Related Work The hip joint coverage is often evaluated using 2D radiographic measurements of the lateral center-edge angle (LCEA) [1, 2, 4], extrusion index [5] and Tönnis angle [3]. The 3D modeling techniques [7–13] to compute coverage are also limited in their shape modeling, and the coverage results are impacted based on the CT position and gait of the participants [6]. Moreover, the coverage computation based on principal curvature and manual identification of landmarks to construct the femoral coordinate system [6] limits the throughput of the large cohort-based studies and may introduce inconsistencies in the coverage measurement. Therefore, a correspondence-based statistical shape model can automate the process of identifying the required geometry and ensure consistent results. However, thus far, there are no good shape correspondence models for the pelvis anatomy, due to its high shape complexity. Moreover, larger shape variations relative to the feature size of the shapes pose additional challenge in establishing correspondence. Recent advances in statistical shape modeling [14] have shown success in modeling such shape complexes, with the help of nonlinear geometric features. In this research, we aim to address the challenges in generating a good shape correspondence model for pelvis shape. Next, we formalize the alignment of joint shape space to construct a statistical shape model for hip joint. Further, we demonstrate a more automated and consistent way of estimating the coverage of femoral head by the acetabulum and hence compute the coverage for directions of shape variations as learning from the statistical shape model of hip joint. 3.4 Methods Overall, there are three main steps for analyzing coverage for the given hip joint population, starting with generating an anatomical correspondence model. The next step involves alignment of the hip joint for the purpose of statistical analysis, which is then used in final step to estimate 3D coverage along the directions of shape variations. 3.4.1 Normals-Based PSM To model the hip joint, the first challenge is to generate a good shape correspondence model for the pelvis anatomy. We use the particle-based correspondence model, a.k.a. par- 51 ticle shape model (PSM) [15]. The PSM uses entropy-based optimization of particle locations constrained to the shape surface, where the correspondence between particles on different shapes is established based on the entropy modeled on spatial locations. Due to the high curvature regions and thin structures, the spatial location of point-clouds is not enough to establish a good anatomical correspondence. Figure 3.1 showcases the mean shape from a correspondence model developed based on spatial locations. As expected, the two sides of the iliac and lower arc collapsed in the average shape, due to the incorrect correspondence estimation. Inspired by the work of Datar et al. [14], we chose to augment the shape matrix in PSM by surface normals. Figure 3.2 shows how normals can help distinguish the two sides of thin structures, like the iliac in the pelvis, with closely matching spatial locations. Similar to Datar et al. [14], this method incorporates additional features in the model. However, surface normals do not require any manual annotation of landmarks, as needed for computing geodesics-based features. For a shape population, S = {z1 , z2 , ..., z N } with d N surfaces, where zn = [z1n , z2n , ..., znM ] ∈ RdM such that zm n ∈ R , represents the point-cloud with M number of particles on the n−th surface. The objective function optimized by PSM [15] is N Q = H (Z) − ∑ H (Xn ), n =1 where H (Z) is the correspondence entropy modeled using the Gaussian distribution over the shape vectors zn ∈ RdM , constructed using the particle positions on n−th shape. In case of the augmented features, such as geodesics or surface normals, these zn are augmented with the surface normals for every particle position, resulting in zn ∈ R2dM . Subsequently, the entropy is modeled in the same fashion, i.e., using the Gaussian distribution on augmented shape vectors. The optimization steps are followed as per Cates et al. [15]. However, the gradient update for particle locations needs an additional Jacobian matrix formed by the gradient of surface normals. Let f n ( x, y, z) be the image volume that represents the n−th implicit shape. The shape surface is then identified by the zero level set of the implicit shape, i.e., f n ( x, y, z) = 0. The surface normal unit vector and the gradient are given by n̂ = ∇2 f ∇f , and ∇n̂ = [ I − n̂n̂0 ], \|∇ f \| \|∇ f \| 52 where I is the identity matrix. 3.4.2 Statistical Shape Analysis of Hip Joint Once a good shape correspondence model is established, the goal is to quantify the shape variabilities in the hip joint. For the purpose of shape analysis, uniform scale, rotation, and translation, jointly known as similarity transform, are removed to discover the subtle local variations. Therefore, we use the generalized Procrustes analysis (GPA) on the combined correspondence model to remove the undesired similarity transformation. Let Si represent the combined shape vectors, formed by concatenating the associated femur and pelvis point-clouds Fi and Pi , respectively. In GPA, the Procrustes alignment [16] is used to estimate the transformation Ti between the sample shape Si and a template shape Sµ , given by Ti = argmin D2 ( T (Si ), Sµ ), (3.1) T ∈Sim(3) where D2 is the squared distance between the template and Procrustes aligned shape vectors. Typically, the average shape from the given population is used as the template shape, in order to eliminate bias toward a particular sample. The mean template is iteratively evolved with the aligning shapes. The iterative alignment is performed until convergence. The resulting transformation Ti is then applied to Fi and Pi vectors independently to obtain the aligned hip joints. The resulting mean template Sµ preserves the average relative position of the two components of hip joint. The hip joint consists of an articulation between the femoral head and the acetabulum of the pelvis. The two shapes are rigid structures and move independently relative to each other in a controlled fashion. This independent movement may result in a different pose (at the time of CT scan) for two very similar hip anatomies. The alignment of combined shape vectors will not be able to remove this relative motion, as illustrated in Figure 3.3. Such movements will result in a suboptimal statistical model and may alter the desired subtle shape variations. Therefore, we use a modified GPA scheme to further align the femur and pelvis shapes. Unlike the previous step, this alignment is performed independently on femur and pelvis shapes. We use the final mean template Sµ from the combined alignment in order to preserve the integrity of average hip anatomy, which is why we also modify the GPA to not update the template shape over iterations. Fµ and Pµ (obtained from Sµ) 53 are used as fixed template shapes for the femur and pelvis, respectively. Moreover, we consider only rotations and translations in the Procrustes alignment, so as to preserve the relationship between the pelvis and femur sizes in independent hip joints. Once the shapes are aligned, we use principal component analysis (PCA) to construct a statistical shape model. The modes of variation from the mean shape, in the PCA space, are then used to measure the coverage within the hip joint. Next, we describe the steps involved in estimating the coverage. 3.4.3 Computing the Coverage in Hip Joint The coverage of the hip joint is defined by the “area where the lunate surface of the acetabulum covers the femoral head” [6]. We consider the total as well as the regional coverage, where the femoral head is divided into superior, inferior, anterior, and posterior regions. The coverage is computed using the following steps, as used in [6]: • The lunate surface of the acetabulum is defined using the second principal curvature [17]. • The femoral head is defined using the first principal curvature [17]. • The femur coordinate system is defined as the following: – The superior-inferior axis is defined as the best-fit vector to the nodes of the posterior aspect of the greater trochanter. – The transverse axis is defined as the long axis of the best-fit cylinder to the femoral neck. – The neck axis is defined based on the femoral neck regions, identified using the anterior aspect of the intertrochanteric line. • The four quadrants of the femoral head are identified using the planes that bisected the neck axis (see Figure 3.4). • The coverage elements of the femoral head are identified as the elements that intersect with the normal projection of any element on the lunate surface. 54 Since we built a shape correspondence model, we can leverage the anatomical correspondence and define the required geometry only on the mean shape, which is then propagated to +/- 2 standard deviations of the modes of interest from the joint statistical shape model. We calculate the total area of the femoral head. The coverage percentage is defined as ratio of the area of the femoral head covered by the lunate surface to the total area. We use the area coverage tool in PostView [18] to calculate the area of the coverage. Similarly, we also compute the total area and coverage percentage for each quadrant. 3.5 Experimental Details We used a dataset of 84 presurgery femur-pelvis pairs from 48 female subjects with dysplasia. The presence of dysplasia is determined using the clinical radiograph measurement LCEA (lateral center edge angle). The age of the subjects ranged from 16 to 58 years, with a mean and standard deviation of 37.74 and 10.23 years, respectively. All the samples were obtained from Kameda Daiichi Hospital, Japan.1 The participants were scanned using a CT scanner, in supine position. The limbs were aligned with the table and were rotated such that the patella was directed upwards. The CT images were collected at inplane resolution of 0.76-0.84 x 0.76-0.84 x 1-1.25 mm3 . The bones were segmented from CT images using Amira software (Visage Imaging, San Diego, CA, USA). The binary segmentations were then resampled to a consistent resolution of 0.5 x 0.5 x 0.5 mm3 , and the signed distance transforms were generated for all the shapes needed as input for PSM [15]. The shape models with 2048 correspondences on each shape were optimized jointly for the femur and pelvis shapes, using the method discussed in Section 3.4.1. We compared the coverage results with the recent study on subjects without dysplasia [6]. 3.6 Results and Discussion Figure 3.5 presents a fully automated statistical shape model for the pelvis shape. The complex thin and high curvature regions in the pelvis shape were resolved with the help of surface normals. Figure 3.5 shows sample point-clouds (in orange) and the mean pelvis shape (in gray) from the correspondence model. With the successful pelvis modeling, we then generated a correspondence model for the complete hip joint by optimizing the shape 1 Thanks to Dr. Tokunaga for data collection and providing this dataset. 55 statistics in the combined space of the femur and pelvis. Figure 3.6 shows the mean hip joint resulting from the correspondence model. Optimizing on the joint space ensures modeling the shape variabilities in joint space, which is thus useful for studying symptoms such as coverage statistics for dysplastic population. We used PCA to learn a low-dimensional space of shape variabilities. Figure 3.6 shows the resulting cumulative variance plot, which indicates the total amount of data variation covered up to a particular number of PCA modes. A total of 26 dominant modes covered the 90% variation in the given dysplastic hip joints. The variations among the first three modes, mapped onto the mean shape, are presented in Figure 3.7. As evident from the color maps, there is significant variation in the hip joint, specifically near the acetabulum, leading to variations in coverage. We compute the coverage statistics along the PCA modes of variation to understand if the symptoms may be correlated to these variabilities. Such a correlation could lead to a clinically relevant quantitative model. Table 3.1 presents the summary of the estimated coverage in the complete femoral head as well as the four quadrants (as discussed in the Section 3.4.3). A preliminary comparison with results on the control population in Uemura et al. [6] indicates reduced total and region-wise coverage, specifically in the anterior and superior quadrants. This quadrantwise reduced coverage is consistent with clinical observations. 3.7 Conclusion This study resolved the challenges in estimation of a good correspondence model for the pelvis shape. As a result, a good quality shape model was obtained for the hip joint, considering the femur and pelvis bones as a combined anatomy. The correspondencebased statistical shape model thus obtained was used to formalize the coverage estimation in the hip joint, requiring significantly less manual effort, as compared to previous studies. The coverage results indicate that the statistical modes of variation indeed correlate with the reduced coverage, as observed clinically in dysplastic patients. 56 3.8 References [1] J. C. Clohisy, J. C. Carlisle, P. E. Beaulé, Y.-J. Kim, R. T. Trousdale, R. J. Sierra, M. Leunig, P. L. Schoenecker, and M. B. Millis, “A systematic approach to the plain radiographic evaluation of the young adult hip,” J. Bone Joint. Surg. Am., vol. 90, no. Suppl 4, p. 47, Nov. 2008. [2] G. Wiberg, “Shelf operation in congenital dysplasia of the acetabulum and in subluxation and dislocation of the hip,” J. Bone Joint. Surg. Am., vol. 35, no. 1, pp. 65–80, Jan. 1953. [3] D. Tönnis and A. Heinecke, “Current concepts review-acetabular and femoral anteversion: Relationship with osteoarthritis of the hip,” J. Bone Joint. Surg. Am., vol. 81, no. 12, pp. 1747–70, Dec. 1999. [4] E. N. Novais, S. Duncan, J. Nepple, G. Pashos, P. L. Schoenecker, and J. C. Clohisy, “Do radiographic parameters of dysplasia improve to normal ranges after bernese periacetabular osteotomy?” Clin. Orthop. Relat. Res., vol. 475, no. 4, pp. 1120–1127, Apr. 2017. [5] S. B. Murphy, R. Ganz, and M. Müller, “The prognosis in untreated dysplasia of the hip. A study of radiographic factors that predict the outcome.” J. Bone Joint. Surg. Am., vol. 77, no. 7, pp. 985–989, Jul. 1995. [6] K. Uemura, P. R. Atkins, S. A. Maas, C. L. Peters, and A. E. Anderson, “Threedimensional femoral head coverage in the standing position represents that measured in vivo during gait,” Clin. Anat., vol. 31, no. 8, pp. 1177–1183, Nov. 2018. [7] H. van Bosse, J. H. Wedge, and P. Babyn, “How are dysplastic hips different? A threedimensional CT study,” Clin. Orthop. Relat. Res., vol. 473, no. 5, pp. 1712–1723, May 2015. [8] H. Cheng, L. Liu, W. Yu, H. Zhang, D. Luo, and G. Zheng, “Comparison of 2.5D and 3D quantification of femoral head coverage in normal control subjects and patients with hip dysplasia,” PloS One, vol. 10, no. 11, p. e0143498, Nov. 2015. [9] W. Dandachli, V. Kannan, R. Richards, Z. Shah, M. Hall-Craggs, and J. Witt, “Analysis of cover of the femoral head in normal and dysplastic hips: New CT-based technique,” J. Bone Joint Surg. Br., vol. 90, no. 11, pp. 1428–1434, Nov. 2008. [10] B. J. Hansen, M. D. Harris, L. A. Anderson, C. L. Peters, J. A. Weiss, and A. E. Anderson, “Correlation between radiographic measures of acetabular morphology with 3D femoral head coverage in patients with acetabular retroversion,” Acta. Orthop., vol. 83, no. 3, pp. 233–239, May 2012. [11] C. M. Larson, A. Moreau-Gaudry, B. T. Kelly, J. T. Byrd, J. Tonetti, S. Lavallee, L. Chabanas, G. Barrier, and A. Bedi, “Are normal hips being labeled as pathologic? A CT-based method for defining normal acetabular coverage,” Clin. Orthop. Relat. Res., vol. 473, no. 4, pp. 1247–1254, Apr. 2015. [12] D. Miyasaka, T. Ito, N. Imai, K. Suda, I. Minato, Y. Dohmae, and N. Endo, “Threedimensional assessment of femoral head coverage in normal and dysplastic hips: A novel method,” Acta. Med. Okayama, vol. 68, no. 5, pp. 277–284, Oct. 2014. 57 [13] D. Suzuki, S. Nagoya, H. Takashima, K. Tateda, and T. Yamashita, “Three-dimensional orientation of the acetabulum,” Clin. Anat., vol. 30, no. 6, pp. 753–760, Sep. 2017. [14] M. Datar, I. Lyu, S. Kim, J. Cates, M. A. Styner, and R. Whitaker, “Geodesic distances to landmarks for dense correspondence on ensembles of complex shapes,” in The International Conference on Medical Image Computing and Computer Assisted Intervention, 2013, pp. 19–26. [15] J. Cates, M. Meyer, T. Fletcher, and R. Whitaker, “Entropy-based particle systems for shape correspondence,” in The Workshop on Mathematical Foundations of Computational Anatomy, 2006, pp. 90–99. [16] C. Goodall, “Procrustes methods in the statistical analysis of shape,” J. R. Statist. Soc. B, vol. 53, no. 2, pp. 285–321, 1991. [17] A. L. Kapron, S. K. Aoki, C. L. Peters, S. A. Maas, M. J. Bey, R. Zauel, and A. E. Anderson, “Accuracy and feasibility of dual fluoroscopy and model-based tracking to quantify in vivo hip kinematics during clinical exams,” J. Appl. Biomech., vol. 30, no. 3, pp. 461–470, Jun. 2014. [18] S. A. Maas, B. J. Ellis, G. A. Ateshian, and J. A. Weiss, “FEBio: Finite elements for biomechanics,” J. Biomech. Eng., vol. 134, no. 1, p. 011005, Jan. 2012. 58 Figure 3.1. Illustrating the residual rotations observed after the combined alignment. Figure 3.2. The two points placed on opposite sides of the iliac, with very similar spatial location but completely different surface normals. Figure 3.3. Illustrating the residual rotations observed after the combined alignment. 59 Figure 3.4. Illustrating the four quadrants identified in the femoral head to compute 3D coverage. Figure 3.5. Pelvis correspondence model: orange - sample point-clouds and reconstructed shapes model and gray - statistical mean shape. Figure 3.6. Hip joint correspondence model: combined mean shape and the variance plot for the PCA modes. 60 Figure 3.7. Modes of variation from combined statistical model of femur and pelvis shapes. The color plots illustrate the -2 standard deviation for particular modes with respect to the mean shape. The physical deviation, measured in mm, is mapped onto the mean shape, showcasing the positive/negative deviations at every location. Table 3.1. Percentage coverage along PCA modes of variation in the space. Anterior Superior Posterior Inferior Mean 17.81 35.82 54.54 6.22 Mode 1 +2 σ 18.91 38.96 55.49 5.95 -2 σ 16.88 32.66 52.93 6.12 Mode 2 +2 σ 18.56 34.11 52.77 6.29 -2 σ 17.13 37.47 55.52 6.14 Mode 3 +2 σ 17.66 35.68 53.88 6.27 -2 σ 18.09 36.06 54.80 5.88 min 16.88 32.66 52.77 5.88 max 18.91 38.96 55.52 6.29 combined shape Total 29.88 31.24 28.36 29.18 30.45 29.69 30.03 28.36 31.24 CHAPTER 4 DEEP LEARNING FEATURES FOR AUTOMATED PLACEMENT OF CORRESPONDENCE POINTS ON ENSEMBLES OF COMPLEX SHAPES P. Agrawal, R. T. Whitaker, S. Y. Elhabian, ”Learning Deep Features for Automated Placement of Correspondence Points on Ensembles of Complex Shapes,” in The International Conference on Medical Image Computing and Computer Assisted Intervention, pp. 185– 193, 2017, Springer. [Portions adapted and reprinted with permission.] P. Agrawal, R. T. Whitaker, S. Y. Elhabian, ”Deep Learning Features for Automated Placement of Correspondence Points on Ensembles of Complex Shapes,” Under Submission. The authors would like to thank Heath B. Henninger, PhD, and Matthijs Jacxsens, MD, for providing scapula shapes with anatomical landmarks. 62 4.1 Abstract Correspondence-based shape models support various medical imaging applications that rely on statistical analysis of populations of anatomical shape. One strategy for automatic correspondence placement is to simultaneously learn a compact representation of the underlying anatomical variation in the shape space while capturing the geometric characteristics of individual shapes. The inherent geometric complexity and populationlevel shape variation in anatomical structures introduce significant challenges in finding optimal shape correspondence models. Existing approaches have adopted iterative optimization schemes with objective functions derived from probabilistic modeling of shape space, e.g., entropy of Gaussian-distributed shape space, to find useful sets of dense correspondence on shape ensembles. Nonetheless, anatomical shape distributions can be far more complex than this Gaussian assumption, which assumes linear shape variation. Recent works have addressed this limitation by adopting an application-specific notion of correspondence through lifting positional data to a higher dimensional feature space (e.g., sulcal depth, brain connectivity, and geodesic distance to anatomical landmarks), with the goal of simplifying the optimization problem. However, the success of the optimization may rely on the manual expertise in creating domain-specific features and the consistent placement of landmarks. This paper proposes a supervised feature learning approach, using deep convolutional neural networks, for optimization of dense point correspondence on shape ensembles. The proposed method endows anatomical shapes with learned features that enhance the shape correspondence objective function to deal with complex objects and populations. Further, an unsupervised domain adaptation scheme is used to augment the pretrained geometric features with new anatomies. Our results demonstrate that deep learning features perform better than methods that rely on position and compete favorably with handcrafted features. 4.2 Introduction Statistical shape models have important applications in various medical imaging tasks, such as image segmentation (e.g., [1]), hypothesis testing (e.g., [2]), anatomical reconstruction (e.g., [3]), and pathology detection (e.g., [4]). The utility of such models is influenced by a shape representation that facilitates statistical analysis. In this regard, landmarks are 63 a popular choice as a lightweight but effective representation. Compared to embedding shapes in the image intensity values at voxels, landmarks-based representation is more intuitive and promotes visual communication of the results [5]. To perform shape statistics, landmarks should be defined consistently within a given population to refer to the same anatomical position on every shape instance, a concept known as correspondence. Such correspondences are often created manually, but this is time-/labor-intensive, requiring qualified specialists (e.g., radiologists), and cost-prohibitive for 3D images and large collections of imaging data. Recent advances in shape analysis research have achieved some success in automatic placement of such correspondences on anatomical structures [6–10]. However, this success is limited to the anatomies that conform to the modeling assumptions in existing methods. Typically, the assumptions include mapping to primitive shapes (such as a sphere), linear shape variations, and no consideration of population-based shape variations. Such assumptions often do not generalize well for complex anatomies. In shape analysis research, the complexity of a shape anatomy is examined by a comparison between the statistical shape variations and the inherent shape features (such as Feret’s diameter). The complexity of the shape increases as the shape features get smaller compared to the observed variations among different samples. Due to this smaller feature size, the task of establishing correspondence becomes more challenging and thus limits the success of existing methods. Datar et al. [11] proposed to incorporate domain knowledge to help guide correspondence optimization for complex shapes. However, their work relies on clinical expertise to build this domain-specific knowledge. The clinical experts identify a combination of geometric features (such as sulcal depth, curvature, and brain connectivity) and key landmarks on the anatomy that can help resolve the correspondence in critical regions. Here, we propose automated learning of geometric features from a given anatomy. These features are then used to establish correspondence among sample shapes. Specifically, we propose the following: 1) supervised learning of features to establish shape correspondence; 2) feature learning using direct surface geometry information rather than relying on 64 predefined surface descriptors; 3) unsupervised adaptation of pretrained features to new, complex anatomies; and 4) incorporating the learned features in the optimization of dense correspondences for complex shape ensembles. Experiments on synthetic and clinical data demonstrate that the proposed deep features help improve the shape-correspondence models. 4.3 Related Work In this research, we intend to generate shape-correspondence models for complex shape anatomies. In this regard, we use deep learning tools to automatically learn geometric features from a given shape population. Initially, a supervised learning scheme is used to capture the geometric properties. An unsupervised learning is then used to adapt these learned features to new anatomies. The trained features (supervised/unsupervised) are incorporated into existing frameworks to generate improved shape correspondence models. In this section, we will discuss the existing methods used to generate shape correspondence models. Next, we discuss the deep learning methods used to establish shape correspondence. Last, we discuss the domain adaptation schemes, with a focus on feature-level adaptation. 4.3.1 Correspondence Estimation In the landmarks-based correspondence models, each shape surface is represented by a point-cloud. The notion of correspondence is established by the index position of each point (a.k.a. particle) in the respective point-cloud. In an optimal correspondence model, a particular index should represent the same anatomical location across all the point-clouds. The problem of solving point-wise correspondence is challenging due to the variations observed across different shapes. A recent study [12] has shown that particle-based shape modeling (PSM) proposed by Cates et al. [13] outperforms the other methods in establishing shape correspondence, while preserving the population-based shape variations. Therefore, in this research, we focus on PSM and its variants for the experimental results. Ad hoc methods, such as choosing nearest points between surfaces, are prone to mismatches and result in substan- 65 dard analytical results [14]. Alternatively, other methods try to match parameterizations of surfaces while maintaining regularity. For example, the spherical harmonics point distribution model (SPHARM-PDM) [15] relies on a smooth one-to-one mapping from each shape instance to the unit sphere. The mapping to a sphere as well as the a priori reliance on smoothness, rather than shape features in the population, is a shortcoming. Other automated approaches have introduced the statistics of the population itself to drive the matching of shape features. For instance, minimum description length (MDL) [16] optimizes point correspondences using an information content objective, but it too relies on intermediate spherical surface parameterizations, which places limitations on the types of shapes and the optimization process. Forgoing the parameterization, the PSM approach has offered a flexible nonparametric and general framework to establish dense point correspondences across shape ensembles without constraining surface topology or parameterization [17,18]. The PSM approach formulates a trade-off between the geometric description of each shape and the statistical compactness of the population, formulated in terms of entropy. Current PSM implementations rely on a Gaussian model in the shape space, which is (approximately) the vector space formed by the x-y-z coordinates of the collection of correspondences (modulo a similarity transform). However, the distribution of anatomical structures can be far more complex than the Gaussian assumes, and surface locations are not always indicative of their correspondence. To address these shortcomings, Datar et al. [11] proposed to use geodesic distances to user-identified landmarks on individual shapes. These point-wise distance values were used to guide the entropy-based optimization of shape correspondence. Further, Oguz et al. (in the context of brain imaging) considered sulcal depth [19] and brain connectivity [20] as additional features in the PSM optimization. While promising, such approaches are tailored to a particular dataset, anatomy, and application. 4.3.2 Deep Learning for Shape Correspondence We intend to use machine learning techniques to automatically learn the shape-specific features to guide the correspondence optimization in the PSM framework. The idea is motivated by advances in computer vision and computer graphics to improve the per- 66 formance of shape matching techniques. In these methods, shapes are represented by 3D meshes, and point-wise matching is used to establish correspondence between a pair of shapes. The goal of PSM is to achieve this point-wise matching and simultaneously optimize the population-based statistics. Recent works have trained deep convolutional neural networks (CNNs) to estimate the point-wise matching between an input 3D mesh with respect to a template mesh [21]. Boscaini et al. [22] proposed a supervised learning of pointwise shape correspondence using anisotropic CNNs. The network was trained against a predefined correspondence model. Thus, this method requires retraining for a new set of correspondences, with a different number of points or a new correspondence model. Groueix et al. [23] chose to estimate shape correspondence with respect to a template, by way of feature descriptions. For a given shape, the feature descriptor is used to model transformation from the template shape. The choice of template shape impacts the resulting statistical variations, and thereby does not lead to a generic population-based statistical model. Kleiman and Ovsjanikov [24] adopted a graph matching approach to establish region-wise correspondence among shapes. The method works with any given shape representation and relies on inherent symmetric nature of shapes. This symmetric property limits the success for shapes with significantly varying topologies. Sun et al. [25] trained a Siamese network for a correspondence versus noncorrespondence classification of an input pair. The training relies on predefined spectral shape descriptors to enable shape matching. Although predefined descriptors may not generalize to complex shapes, this method does not require selection of a template. Hence, we adopt this idea of a Siamese network for a correspondence versus noncorrespondence classification. As input, we have chosen to use a direct snapshot of local surface geometry in place of spectral descriptors, enforcing the network to automatically learn the shape features. 4.3.3 Domain Adaptation Most existing methods, manual or automated, use shape-specific features to resolve point-wise correspondence on complex anatomies. Further, machine learning methods also rely on supervised training, thus limiting the generalizability to a new anatomy. Here, our goal is to learn a set of geometric features that may generalize to unknown shapes. The 67 supervised learning of such generic features requires a large pool of exhaustive training data. Obtaining such a dataset may be expensive and infeasible. Therefore, we plan to update the pretrained features to model the new anatomies. Further, for complex shapes, the labeled training pairs are often expensive and require manual expertise. Thus, to alleviate the need for labeled data, we rely on the unsupervised domain adaptation techniques. A recent survey on domain adaptation [26] illustrates the different techniques to learn semantically meaningful and domain invariant features. Broadly, the domain adaptation techniques follow either adversarial training paradigms or a feature space matching approach. In the feature space matching, the source and target distributions are made to align in the output feature space. For instance, Sun et al. [27] formulated the problem as minimizing the distance between covariance matrices of source and target domain feature spaces, thus resulting in aligning the source domain feature space with that of the target domain. Fernando et al. [28] used subspace matching to map the input data from the source domain onto the target domain. Although simple, Fernando et al. approach is dependent on the correct subspace estimation, which typically requires a large number of samples. In the case of a limited number of samples in either or both domains, these methods may lead to overfitting. Adversarial training is a more generic approach where a discriminator function is trained on the output features, with an adverse objective function. The goal of this adversarial objective is to promote confusion for the discriminator function, to distinguish between source and target domains. There are two types of models for adversarial training; 1) based on a generative model, which tries to generate impostor samples for the discriminator network [29–31]; and 2) based on learning discriminative features by directly using them as inputs to the discriminator network. The goal is to ensure that the output features perform poorly at discriminating between samples from the two domains, while being good at the original task [32–36]. We adopt the domain adversarial training proposed by Ganin and Lempitsky [32], where the output features are used to solve a parallel domain classification problem. The sign-reversed gradients from the augmented discriminator network are backpropagated to encourage domain confusion. 68 4.4 Methodology In this paper, we propose to train a deep CNN such that the output features can mimic the geometry of any given anatomy. Specifically, the features should highlight the shape properties of the anatomy, such as convexity/concavity, saddle structures, and flat regions. Such a feature bank can help correlate anatomical regions between different shape samples, thus guiding the correspondence optimization. We have devised a supervised feature learning scheme via a paired-classification task. Once trained, the network should encode the similar anatomical regions with the same feature vectors. These feature vectors can thus be used to optimize correspondence between shape samples. We further use the unsupervised domain adversarial training [32] to adapt the trained network for new anatomies. The trained feature maps, supervised and/or unsupervised, are then used to aide correspondence optimization in the entropy-based PSM. As input to the deep neural network, we extract local surface geometrical characteristics using circular geodesic patches in the neighborhood of given point locations on a shape’s surface (see Figure 4.1). In this section, we discuss the details of input patch extraction, followed by the deep network configurations used for feature learning. We conclude the section with details to incorporate the extracted features into an existing method for correspondence optimization and evaluation criterion used for the resulting shape-correspondence models. 4.4.1 Geodesic Patch Extraction The success of deep CNNs has been demonstrated on analyzing functions defined on Euclidean grid-like domains such as images. Nonetheless, non-Euclidean data, particularly surface geometry, do not directly benefit from these deep models in which operations such as convolution and pooling are not readily defined. Therefore, using a 3D patch from a volumetric shape representation (e.g., label maps or signed distance maps) to represent the local surface geometry may not yield optimal results. Further, using 3D convolutions will result in a very large number of network parameters with a limited field of view. Recently, local charting methods (e.g., [37]) have been proposed as a generalization of convolution to non-Euclidean domains where a patch operator is defined to extract the local surface patches that are subsequently associated with predefined shape descriptors. Here, 69 we rely directly on surface geometry to compute such local shape descriptors. Specifically, a spatial function of surface distance to the tangent space can encode local geometrical properties in a local neighborhood. Hence, we propose to use signed normal distance to the tangent space sampled in the geodesic neighborhood of a surface point x as a snapshot of the local surface geometry. We use the principal curvature directions (~u, ~v) to define the local intrinsic coordinate system at the surface point x. As illustrated in Figure 4.1, neighboring points that lie in the geodesic ball B(x) = {x0 : d X (x, x0 ) ≤ ρo } with a radius ρo > 0 are sampled on a circular geodesic patch. Each ring in the patch corresponds to a particular geodesic distance d X (x, x0 ) = ρ with ρ ∈ [0, ρo ]. Every ray originating from x, perpendicular to the geodesic rings, is inclined at an angle θ ∈ [0, 2π ) to the first principal curvature direction at x. ρo is automatically set to 5% of the maximum shape diameter in the ensemble. Principal curvature directions are estimated using least squares fitting of a quadratic [38] while enforcing the right-hand rule on local coordinates, with the normal direction representing the z-axis. However, the principal curvature directions are accurate only up to the sign of the vector. To address this ambiguity, we extract two patches per surface point (see Figure 4.1c), one with bases defined by the positive sign of principal directions and the other by negative sign, and use them as a two-channel image for input to the neural network. Further, we generate all four possible combinations of an input pair and provide them as four independent training samples, to help the network become invariant to this sign ambiguity. 4.4.2 Supervised Feature Learning In this paper, we use deep convolutional neural networks (CNNs) to learn local shape features for establishing shape correspondence. The goal is to make some of the hidden layers of the network respond similarly at corresponding locations across the given shape ensemble, and dissimilarly at noncorresponding ones. Therefore, we design the feature learning problem as a paired-classification task (corresponding versus noncorresponding). Given a pair of inputs, the task is to identify if they belong to a similar anatomical location. To accomplish this task, we use a Siamese network configuration, which consists of two identical copies of the same deep CNN. We train the Siamese CNN to learn the 70 binary classification problem, where the positive class indicates correspondence between a pair of input patches, and the negative class suggests that patches possess different shape characteristics. Contrastive loss [39] is optimized to facilitate the Siamese training. We rely on an initial point-cloud based correspondence model to generate training samples. Using the point-wise correspondence, we generate the corresponding and noncorresponding pairs. The geodesic patches extracted at the point locations are fed as inputs to the Siamese network. For the CNN configuration, we use the network settings of Chopra et al. [39]; see Figure 4.2. During experiments, we found that using pdrop > 0 does not impact classification performance; therefore, we set it to zero for our experiments. Batch normalization is used at the end of convolutional layers. It is critical for the correspondence optimization methods that the output features exhibit smooth transitions from one shape feature to another. Therefore, we replace the max pooling layers with the average pooling. Further, we use the softplus activation function in place of ReLU. Softplus activation ensures smooth output features, while maintaining faster convergence behavior similar to ReLU. All the weights are initialized using a zero mean normal distribution with 0.05 standard deviation, and bias is initialized with zero. We subtract the mean of the training data from all training patches (and the same mean is subtracted from testing patches for feature extraction) so that the input data can align with the distribution of initial network weights (spanning the hypersphere). 4.4.3 Unsupervised Feature Adaptation Given a pretrained network for a particular shape S, the objective is to generalize the output features to a new anatomy, without providing the labeled corresponding/noncorresponding samples for this new shape T. To accomplish this objective, we adopt the idea of adversarial training, proposed by Ganin and Lempitsky [32]. In [32], feature-level domain adaptation is achieved by augmenting a domain classifier onto the original learning network (Siamese network in our case, see Figure 4.3). The domain classifier is tasked to identify the domain of incoming sample, represented by the feature vector f w (x). The gradients from this domain classifier network are backpropagated into the Siamese network after a sign reversal, typically performed by a gradient reversal layer. 71 This sign reversal facilitates adversarial training of the domain classifier network and makes it difficult to identify the domain of the input sample. As a result of this confusion, the CNN in the Siamese network is forced to update the parameters so as to combine the topology of the new anatomy T with the shape characteristics of S, being learned via supervision. Similar to the configuration used by Ganin and Lempitsky [32], we use a two-layered network for the domain classification. The two layers contain 500 and 1000 convolution filters of size 1 × 1. The binary crossentropy loss is used with a weight λ, resulting in total loss from the joint network as, Losstotal = Losssiamese + λLossdomain . We use the value of λ = 0.01 for all our experiments. The Siamese part of the network is initialized using the weights from pretrained network. Glorot initialization [40] is used for the domain classifier. The initialized network is trained to minimize the total loss, using the samples from S and T. The samples from S have corresponding/noncorresponding labels as well as the domain identifying label, whereas samples from T possess only domain identification. At the end of training, the CNN is expected to produce features that contain shape characteristics from both S and T anatomies. 4.4.4 Deep Feature PSM We incorporate our learned features into the framework of entropy-based PSM [13]. The method uses a set of dynamic particle systems, one for each shape, in which particles interact with one another with mutually repelling forces to sample (and therefore describe) the surface geometry. Consider a cohort of shapes S = {z1 , z2 , ..., z N } containing N surfaces, each with its own set of M corresponding particles. Each shape vector zn = d [z1n , z2n , ..., znM ] ∈ RdM such that zm n ∈ R is the position vector of m −th particle on m −th shape, and ordering implies correspondence across shapes. Correspondences are established by minimizing an entropy-based cost function that consists of shape correspondence and surface sampling cost Q = H (Z) − ∑nN=1 H (Xn ). Here, H (Z) measures the differential entropy, assuming multivariate normal distribution over samples in the dM-dimensional shape space. Minimizing this entropy enables correspondence among shape vectors. Further, H (Xn ) is a nonparametric density estimator in the (3D) configuration space of the n−th shape. Maximizing this entropy ensures uniform sampling of the n−th shape 72 surface. For groupwise modeling, shapes in S should share the same world coordinate system. Generalized Procrustes alignment [41] is used to estimate a rigid transformation m matrix Tn per shape instance, such that zm n = Tn xn . In particular, correspondence entropy relies solely on particle positions to achieve compact ensemble representation in shape space (first term) against a uniform distribution of particles on each surface for accurate shape representation (second term). In this paper, we propose to modify the correspondence term H (Z) by incorporating the features extracted from trained CNN. Optimizing the updated objective function results in a compact statistical shape model for complex anatomies. The extracted features encode only local geometric properties and require position information, unlike geodesic distances, which can encode global shape characteristics. Therefore, we also keep the particle positions to guide optimization for locating corresponding points. The updated shape 1 m L m m T , vector becomes zn = [z1n , ..., znM ] T ∈ R(d+ L) M , where zm n = f (xn ), ..., f (xn ), (Tn xn ) and f l (xm n ) is the l −th deep feature extracted at the m −th particle on the n −th shape. 4.4.5 Evaluation Criterion We evaluate the correctness of correspondence models using qualitative inspection and quantitative measures. The qualitative assessment of shape correspondence models involves examining the anatomical correctness and integrity of the mean shape. Further, the anatomical stability of shapes in PCA modes of variation indicates the correctness of the shape variability. For a quantitative assessment, the resulting shape generative model is evaluated. To construct the generative model, a low-dimensional PCA subspace is learned from the high-dimensional shape space. The accuracy of the generative model constructed using the resulting subspace is evaluated by the following quantitative measures: • Variance plot: We plot the percentage of variance with respect to PCA (principal component analysis) modes to demonstrate the compactness of sample distribution in the resulting subspace. • Generalization: Proposed by Davies et al. [42], this measures the ability of the generative model to represent the unknown samples. The reconstruction error of the correspondence model for an unseen shape instance, using the PCA model built from training samples, is evaluated in a leave-one-out fashion. We plot the mean 73 of the point-wise physical distance between test samples and their reconstruction with respect to the number of PCA modes. • Specificity: Proposed by Davies et al. [42], this measures the ability of the generative model to produce plausible shapes. It is quantified as the Euclidean distance between a sampled shape (from PCA model built using all training samples) and the closest training sample (lower is better). The average over 1000 samples is used. 4.5 Results and Discussion We showcase the results of supervised feature learning and unsupervised adaptation of pretrained features using synthetic and clinical datasets. First, we demonstrate the success of deep learning features, from supervised training, in establishing a good shapecorrespondence model for complex anatomies. We then employ the orthopedic datasets to showcase the adaptation of features learned from a simpler shape to a complex anatomy. 4.5.1 Shape Correspondence Models Using Supervised Features Two synthetic datasets and a clincal dataset are used to demonstrate the ability of deep feature PSM to generate optimal statistical models for complex shapes. The proposed method (deep feature PSM) is compared with positional PSM [17] and geodesic PSM [11]. 4.5.1.1 Proof-of-Concept The first synthetic dataset (Bean2) contains 30 shapes that represent a coffee bean with a spatially varying structure. The second dataset (Bean4) is comprised of 23 samples of a more complex coffee bean shape, with a group of closely located thin structures collectively varying in position (see Figure 4.4). In order to generate training samples for the Siamese network, we use an optimized statistical shape model with 3072 points for Bean2, obtained using the geodesic PSM. Patches extracted using correspondence points from six randomly selected shapes are used to generate the training data. The CNN configuration with L = 10 output features yields an optimal Siamese classification performance of 0.92 AUC (area under the curve) of the ROC (receiver operating characteristics) curve, resulting in a 90% true positive ratio (TPR) at the expense of a 20% false positive ratio (FPR). Figure 4.5 shows sample features. Given that there are multiple regions with similar shape characteristics, 74 which may lead to a higher FPR, 20% is a relatively small penalty. Moreover, using position information in deep feature PSM will help reduce the impact of false positives. All 10 features are used in PSM to generate a shape model with 4096 correspondence points. Figure 4.6 presents the mean shape and quantitative evaluation of the correspondence model. Results indicate compactness of the statistical shape model and better generalization and specificity performance over the geodesic PSM. Bean4 shapes have similar characteristics to those of Bean2, and therefore, we use the same trained Bean2 network to extract features for Bean4 shapes. Figure 4.5 presents samples of the same feature on the two datasets. Comparative results on Bean4 data, presented in Figure 4.7, highlight the better performance of the proposed method over positional PSM in generating a more compact statistical shape model. The generative model from the proposed method also outperforms in its ability to generalize over unseen data and to produce plausible shape samples. 4.5.1.2 Clinical Data A clinical dataset of 20 scapula shapes (10 controls and 10 patients with osseous HillSachs lesions) is used for experiments. Samples are rigidly aligned with respect to the glenoidal plane, and a set of 16 anatomical landmarks are defined manually. Reconstructed scapula meshes are then clipped to the glenoid, acromion, and coracoid to model the areas of high geometric curvature related to constraint of the humeral head; Figure 4.4 illustrates sample shapes. The shape model with 2432 points and six randomly selected shapes is optimized using the geodesic PSM to generate the training data. Using L = 5 output features gives optimal classification performance of AUC= 0.80. Figure 4.5 shows sample feature maps. For additional comparison, we augment geodesic and positional PSM with local surface curvature. Figure 4.8 showcases the mean shape from correspondence model generated using the proposed deep features, and it shows the improved quantitative measures in comparison to handcrafted features. It is important to note that proposed method is able to achieve errors of about 2 voxels using dominant modes (5 modes) in contrast to a minimum of 3 voxels from other methods. 75 4.5.2 Shape Correspondence Models Using Adapted Features Using the coffee bean and human scapula datasets, we demonstrate the improved performance of statistical shape models, by using the proposed deep learning features in PSM optimization. However, for highly complex medical anatomies, labeled training data are often not available. Therefore, the goal is to capitalize on supervised learning on relatively simpler anatomy and adapt the learned features onto complex shapes, without supervision. Here, we use the femur and pelvis shapes to demonstrate the feature adaptation via domain adversarial learning. Figure 4.9 shows the sample shapes, with inherently different topological features. Of the two, femur shapes can be modeled with great success using the position-based shape models [6]. Pelvis shape, on the other hand, has complex small-sized shape features and hugely varying curvature. Therefore, it is challenging for the position-based models to generate a good shape-correspondence model. In this research, we showcase automated learning of geometric features for the pelvis shape. Unlike Section 4.5.1, the goal here is to learn these features without requiring a correspondence model for the pelvis. We first train a CNN in the Siamese framework, using the labeled samples from femur correspondence model. The trained network is then refined using the unsupervised domain adversarial training, to adapt to the pelvis anatomy. As a result, the pretrained features are evolved into a combined feature space of femur and pelvis anatomies. We use an optimized statistical shape model with 2045 correspondence points, on a dataset of eight femur shapes, to train the Siamese network. The CNN configuration with 10 output features performed best for corresponding/noncorresponding classification, yielding a performance of 0.90 AUC on training samples (80% of the total samples) and 0.87 AUC on the validation set (10% of the total samples). Figure 4.10 showcases sample features from the network. The learned features highlight key anatomical aspects of the femur shape, similar to the coffee bean and scapula shapes in Figure 4.5. In order to validate the pelvis features, we independently train a Siamese network using a quality-controlled model of eight pelvis shapes and 2048 correspondence points. Figure 4.11 shows the sample features thus obtained. To highlight the need for feature adaptation, we use the CNN trained on femur shapes to extract features for the pelvis 76 shape; Figure 4.12 shows the output feature maps. In comparison, the feature maps extracted using the femur trained model (Figure 4.12) do not generalize for the pelvis anatomy. The femur trained model is then adapted using the domain adversarial training. Figure 4.13 shows the adapted features extracted on the pelvis shape. As a result of the feature adaptation, the new features are able to model the high curvature regions and the flat anatomical regions in the iliac wing of pelvis. We use the deep learning features to optimize for the shape correspondence model for a dataset of 23 pelvis shapes. The proposed deep learning features are trained using the geodesic patch, which captures the local surface geometry and is invariant to the location and orientation of the shapes. Such invariance is useful for the generalizability of trained networks onto new datasets. As discussed earlier, we use position information with the deep learning features to resolve the correspondence in similar anatomical regions across the shape. For the pelvis anatomy, orientation information plays a vital role in resolving complex regions, such as the very thin iliac wing. Therefore, we use the position and orientation information along with the deep features in PSM optimization. Specifically, we use the following combinations of position, orientation (surface normals), and deep learning features to compare the performance: • XYZ: positional PSM [13]; • Normals: position and surface normals in the PSM framework; • FemurFea: position and femur-trained features (Figure 4.12) in the PSM framework; • PelvisFea: position and pelvis-trained features (Figure 4.11) in the PSM framework; • DannFea: position and domain-adapted features (Figure 4.13) in the PSM framework; • FemurFea-normals: position, surface normals, and femur-trained features (Figure 4.12) in the PSM framework; • PelvisFea-normals: position, surface normals, and pelvis-trained features (Figure 4.11) in the PSM framework; and 77 • DannFea-normals: position, surface normals, and domain-adapted features (Figure 4.13) in the PSM framework. The pelvis is a complex shape with very thin structures, and therefore, both position and orientation provide crucial support to other geometric features. First, we showcase the mean shapes generated using models involving position and deep learning features; see Figure 4.14. As expected, the pelvis-trained and domain-adapted features are able to significantly reduce the number of incorrect correspondences compared to the positional PSM (XYZ) and the FemurFea models. The few unresolved correspondences in the PelvisFea and DannFea models belong to the highly thin region (up to a voxel thickness, 0.5mm) in the iliac wing (highlighed by the red circle in Figure 4.14). Adding surface normals to the PSM further resolves all the incorrect correspondences; see Figure 4.15. Figure 4.16 presents the quantitative comparison of all PSM variants. Figure 4.16(a) compares the variance plots for the different models. PelvisFea-normals produce the most compact model, followed by the PelvisFea, DannFea-normals, normals, DannFea, XYZ, FemurFea, FemurFea-normals, respectively. This trend highlights that the adapted features an improvement over traditional PSM models (positional and position with surface normals) and is upper-bounded by the performance of deep features learned under supervision (PelvisFea and PelvisFea-normals). The quantitative performance plots in Figure 4.16(b-c) closely follow a similar trend, thus highlighting the need for domain-specific geometric features to generate improved shape correspondence models. 4.6 Conclusion This research proposed deep learning methods to resolve the challenges in shape correspondence estimation for complex anatomies. Supervised and unsupervised learning approaches are proposed to facilitate geometric feature learning. The unsupervised technique is especially useful when a training correspondence model is difficult to obtain for complex anatomies. Thus far, such models have been generated with the help of manual expertise, which may limit the throughput of studies with a large number of sample shapes. Quantitative results obtained using synthetic and clinical datasets showcase the success of the proposed method to improve shape correspondence models. The results showed that position and orientation still play a crucial role in the correspondence 78 optimization. Next steps should involve learning geometric features that also encapsulate this global information and thus reduce the high computation cost of Hessian required for any normals-based PSM. Further, image-based feature learning can help reduce the preprocessing overhead of shape segmentation. 79 4.7 References [1] T. Heimann and H.-P. Meinzer, “Statistical shape models for 3D medical image segmentation: A review,” Med. Image Anal., vol. 13, no. 4, pp. 543–563, Aug. 2009. [2] T. L. Bredbenner, T. D. Eliason, R. S. Potter, R. L. Mason, L. M. Havill, and D. P. Nicolella, “Statistical shape modeling describes variation in tibia and femur surface geometry between control and incidence groups from the osteoarthritis initiative database,” J. Biomech., vol. 43, no. 9, pp. 1780–1786, Jun. 2010. [3] S. Balestra, S. Schumann, J. Heverhagen, L. Nolte, and G. Zheng, “Articulated statistical shape model-based 2D-3D reconstruction of a hip joint,” in The International Conference on Information Processing in Computer-Assisted Interventions. Springer, 2014, pp. 128–137. [4] K.-K. Shen, J. Fripp, F. Mériaudeau, G. Chételat, O. Salvado, and P. Bourgeat, “Detecting global and local hippocampal shape changes in Alzheimer’s disease using statistical shape models,” Neuroimage, vol. 59, no. 3, pp. 2155–2166, Feb. 2012. [5] N. Sarkalkan, H. Weinans, and A. A. Zadpoor, “Statistical shape and appearance models of bones,” Bone, vol. 60, pp. 129–140, Mar. 2014. [6] M. D. Harris, M. Datar, R. T. Whitaker, E. R. Jurrus, C. L. Peters, and A. E. Anderson, “Statistical shape modeling of cam femoroacetabular impingement,” J. Orthop. Res., vol. 31, no. 10, pp. 1620–1626, Jul. 2013. [7] P. R. Atkins, S. K. Aoki, R. T. Whitaker, J. A. Weiss, C. L. Peters, and A. E. Anderson, “Does removal of subchondral cortical bone provide sufficient resection depth for treatment of cam femoroacetabular impingement?” Clin. Orthop. Relat. Res., vol. 475, no. 8, pp. 1977–1986, Aug. 2017. [8] P. R. Atkins, S. Y. Elhabian, P. Agrawal, M. D. Harris, R. T. Whitaker, J. A. Weiss, C. L. Peters, and A. E. Anderson, “Quantitative comparison of cortical bone thickness using correspondence-based shape modeling in patients with cam femoroacetabular impingement,” J. Orthop. Res., vol. 35, no. 8, pp. 1743–1753, Aug. 2017. [9] P. R. Atkins, Y. Shin, P. Agrawal, S. Y. Elhabian, R. T. Whitaker, J. A. Weiss, S. K. Aoki, C. L. Peters, and A. E. Anderson, “Which two-dimensional radiographic measurements of cam femoroacetabular impingement best describe the three-dimensional shape of the proximal femur?” Clin. Orthop. Relat. Res., vol. 477, no. 1, pp. 242–253, Jan. 2019. [10] R. Bhalodia, L. A. Dvoracek, A. M. Ayyash, L. Kavan, R. Whitaker, and J. A. Goldstein, “Quantifying the severity of metopic craniosynostosis: A pilot study in application of advanced machine learning in craniofacial surgery,” J. Craniofac. Surg., Jan. 2020. [11] M. Datar, I. Lyu, S. Kim, J. Cates, M. A. Styner, and R. Whitaker, “Geodesic distances to landmarks for dense correspondence on ensembles of complex shapes,” in The International Conference on Medical Image Computing and Computer Assisted Intervention, 2013, pp. 19–26. 80 [12] A. Goparaju, I. Csecs, A. Morris, E. Kholmovski, N. Marrouche, R. Whitaker, and S. Elhabian, “On the evaluation and validation of off-the-shelf statistical shape modeling tools: a clinical application,” in The Workshop on Shape in Medical Imaging. Springer, 2018, pp. 14–27. [13] J. Cates, M. Meyer, T. Fletcher, and R. Whitaker, “Entropy-based particle systems for shape correspondence,” in The Workshop on Mathematical Foundations of Computational Anatomy, 2006, pp. 90–99. [14] P. J. Besl and N. D. McKay, “A method for registration of 3-D shapes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 14, no. 2, pp. 239–256, Feb 1992. [15] M. Styner, I. Oguz, S. Xu, C. Brechbühler, D. Pantazis, J. J. Levitt, M. E. Shenton, and G. Gerig, “Framework for the statistical shape analysis of brain structures using SPHARM-PDM,” Insight J., no. 1071, pp. 242–250, 2006. [16] R. H. Davies, C. J. Twining, T. F. Cootes, J. C. Waterton, and C. J. Taylor, “A minimum description length approach to statistical shape modeling,” IEEE Trans. Med. Imag., vol. 21, no. 5, pp. 525–537, May 2002. [17] J. Cates, P. T. Fletcher, M. Styner, M. Shenton, and R. Whitaker, “Shape modeling and analysis with entropy-based particle systems,” in The International Conference on Information Processing in Medical Imaging, 2007, pp. 333–345. [18] I. Oguz, J. Cates, M. Datar, B. Paniagua, T. Fletcher, C. Vachet, M. Styner, and R. Whitaker, “Entropy-based particle correspondence for shape populations,” Int. J. Comput. Assist. Radiol. Surg., vol. 11, no. 7, pp. 1221–1232, Jan. 2016. [19] I. Oguz, J. Cates, T. Fletcher, R. Whitaker, D. Cool, S. Aylward, and M. Styner, “Cortical correspondence using entropy-based particle systems and local features,” in The IEEE International Symposium on Biomedical Imaging, 2008, pp. 1637–1640. [20] I. Oguz, M. Niethammer, J. Cates, R. Whitaker, T. Fletcher, C. Vachet, and M. Styner, “Cortical correspondence with probabilistic fiber connectivity,” in The International Conference on Information Processing in Medical Imaging, 2009, pp. 651–663. [21] O. Van Kaick, H. Zhang, G. Hamarneh, and D. Cohen-Or, “A survey on shape correspondence,” Comput. Graph. Forum, vol. 30, no. 6, pp. 1681–1707, Jul 2011. [22] D. Boscaini, J. Masci, E. Rodolà, M. M. Bronstein, and D. Cremers, “Anisotropic diffusion descriptors,” Comput. Graph. Forum, vol. 35, no. 2, pp. 431–441, May 2016. [23] T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and M. Aubry, “3D-CODED: 3D Correspondences by Deep Deformation,” in The European Conference on Computer Vision, 2018, pp. 230–246. [24] Y. Kleiman and M. Ovsjanikov, “Robust structure-based shape correspondence,” Comput. Graph. Forum, vol. 38, no. 1, pp. 7–20, Apr 2019. [25] Z. Sun, Y. He, A. Gritsenko, A. Lendasse, and S. Baek, “Embedded spectral descriptors: Learning the point-wise correspondence metric via siamese neural networks,” 2017. 81 [26] M. Wang and W. Deng, “Deep visual domain adaptation: A survey,” Neurocomputing, vol. 312, pp. 135–153, Oct. 2018. [27] B. Sun, J. Feng, and K. Saenko, “Return of frustratingly easy domain adaptation,” in The Association for the Advancement of Artificial Intelligence, 2016, pp. 2058–2065. [28] B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars, “Unsupervised visual domain adaptation using subspace alignment,” in The IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2960–2967. [29] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan, “Unsupervised pixel-level domain adaptation with generative adversarial networks,” in The IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3722–3731. [30] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in The IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1125–1134. [31] M.-Y. Liu and O. Tuzel, “Coupled generative adversarial networks,” in The Conference on Neural Information Processing Systems, 2016, pp. 469–477. [32] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” Sep. 2014, arXiv:1409.749 [stats.ML]. [33] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural networks,” J. Mach. Learn. Res., vol. 17, no. 1, pp. 2096–2030, Jan. 2016. [34] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative domain adaptation,” in The IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7167–7176. [35] E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko, “Simultaneous deep transfer across domains and tasks,” in The IEEE International Conference on Computer Vision, 2015, pp. 4068–4076. [36] E. Tzeng, C. Devin, J. Hoffman, C. Finn, P. Abbeel, S. Levine, K. Saenko, and T. Darrell, “Adapting deep visuomotor representations with weak pairwise constraints,” Nov. 2015, arXiv:1511.07111 [cs.CV]. [37] D. Boscaini, J. Masci, E. Rodolà, and M. Bronstein, “Learning shape correspondence with anisotropic convolutional neural networks,” in The Conference on Neural Information Processing Systems, 2016, pp. 3189–3197. [38] S. Rusinkiewicz, “Estimating curvatures and their derivatives on triangle meshes,” in The International Symposium on 3D Data Processing Visualization and Transmission, 2004, pp. 486–493. [39] S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively, with application to face verification,” in The IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, 2005, pp. 539–546. 82 [40] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in The International Conference on Artificial Intelligence and Statistics, 2010, pp. 249–256. [41] J. C. Gower, “Generalized procrustes analysis,” Psychometrika, vol. 40, no. 1, pp. 33–51, 1975. [42] R. Davies, C. Twining, and C. Taylor, Statistical Models of Shape: Optimisation and Evaluation. London, UK: Springer Science & Business Media, 2008. 83 Figure 4.1. Geodesic patch extraction at a surface point x: (a) Sample patches on a human scapula mesh. (b) Finding a point x0 at a geodesic distance ρ in direction that makes angle θ with first principal curvature direction (u) in the tangent space of x. (c) Two channels of an input to the CNN, representing geodesic patches where every pixel corresponds to the signed normal distance of point x0 in the patch to the tangent plane at x. Figure 4.2. The Siamese network architecture used to learn features for correspondence. 84 Figure 4.3. The combined network configuration used for adversarial training of the Siamese network. Figure 4.4. Sample shapes from synthetic and clinical datasets: (a) side view and (b) top view of Bean2, (c) side view and (d) top view of Bean4, and (e) scapula. Left column – controls and right column – patients. 85 Figure 4.5. Sample feature maps from networks trained using Bean2 and Scapula datasets. Figure 4.6. Bean2 results: (a) mean shape from deep feature PSM, (b) variance plot, (c) generalization (mm), and (d) specificity (mm), voxel size=1mm. 86 Figure 4.7. Bean4 results: (a) mean shape from positional PSM, (b) mean shape from deep feature PSM, (c) variance plot, (d) generalization (mm), and (e) specificity (mm), voxel size=1mm. 87 Figure 4.8. Scapula results: (a) mean shape from deep feature PSM, (b) variance plot, (c) generalization (mm), and (d) specificity (mm), voxel size=0.5mm. Figure 4.9. Showcasing the anatomical complexity in sample femur and pelvis shapes. 88 Figure 4.10. Sample features for the femur shape learned using the Siamese network. Figure 4.11. Sample pelvis features obtained via supervised training using the initial pelvis correspondence model. 89 Figure 4.12. Sample features extracted on the pelvis shape using the CNN trained on the femur shape. Figure 4.13. Sample adapted features extracted for the pelvis shape. 90 Figure 4.14. Comparing mean shapes from positional and the three deep feature PSM models. The highlighted regions contain incorrect correspondence points. 91 Figure 4.15. Comparing mean shapes after including normals to positional and deep feature PSM models. The highlighted regions contain incorrect correspondence points. 92 Figure 4.16. Pelvis results: (a) variance plot, (b) generalization (mm), and (c) specificity (mm), voxel size=0.5mm. CHAPTER 5 CONCLUSION This dissertations helps improve the shape-generative models derived from volumetric and surface-based shape representations. A data-driven generative model is proposed to quantify the uncertainty in object segmentation. Unlike previous methods, the probabilistic solution evolves directly from the evidence rather than relying on ad hoc modeling assumptions. As a result, the proposed method performs better for medical imaging applications involved with anatomical segmentations. The volumetric shape-generative models from this research can help guide training of deep networks, by way of generating labeled samples and/or providing contextual information as a shape prior. Recent advances in adversarial training of object segmentation networks are a step in this direction. However, the proposed method may provide a more contextual guidance than a regularization based on random noise. The proposed deep learning methods, for surface-based shape models, help mitigate the manual effort needed to generate successful models for complex anatomies. Reducing this manual effort greatly impacts the throughput of medical research and associated clinical applications. For the purpose of orthopedic application, a systematic approach is proposed as a guide to modeling and understanding the shape variations involved with closely interacting anatomies, such as bone joints. With the help of the proposed automated feature learning, more challenging anatomies (such as sacrum shape) can be successfully modeled. The requirement of accurate object segmentations is the major bottleneck in current surface-based shape models. Bhalodia et al. [1] have made some attempts to generate shape models directly from intensity images. However, their model relies on a supervised learning and thus requires a correspondence model to generate labeled samples. An unsupervised adaptation of shape models from one anatomy to another could be the key to relax the need for labeled samples. 94 5.1 References [1] R. Bhalodia, S. Y. Elhabian, L. Kavan, and R. T. Whitaker, “Deepssm: A deep learning framework for statistical shape modeling from raw images,” in The Workshop on Shape in Medical Imaging, 2018, pp. 244–257.
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6rnw7k0