Comparing optimal cut-points from the Youden index and Euclidean index

School or College	School of Medicine
Department	Public Health Division
Project type	Master of Statistics (MSTAT): Biostatistics Project
Author	Barbeau, William
Title	Comparing optimal cut-points from the Youden index and Euclidean index
Date	2021-04-23
Description	Screening and diagnostic tests, based on continuous biomarker measurements, have be- come essential tools in medicine. Designing diagnostic tests based on continuous biomarkers is challenging because the biomarker's distribution in the population with the disease and population without the disease are rarely completely separated. Overlap in these two distributions means the test will always result in misclassification. The Youden index and Euclidean index (AKA the point closest to (0,1)) are two popular methods for choosing optimal cut-points to maximize sensitivity and specificity of diagnostic tests. Despite both of these methods being described and deployed in practice, there is little guidance on which method to use and at which situation. Through mathematical derivations, we show in the binormal case that the Euclidean index, relative to the Youden index, yields optimal thresholds with lower absolute deference's between sensitivity and specificity, especially when the difference in biomarker variances is large. If developers of diagnostic tests aim to maximize sensitivity and specificity for normally distributed data, then the Euclidean index is the preferred optimal cut-point method.
Type	Text
Publisher	University of Utah
Subject	Statistics; biostatistics
Rights Management	© William Barbeau
ARK	ark:/87278/s6qg4swj
Setname	ir_dph
ID	1703770
OCR Text	Show COMPARING OPTIMAL CUT-POINTS FROM THE YOUDEN INDEX AND EUCLIDEAN INDEX by William Barbeau A project submitted to the faculty of The University of Utah in partial ful llment of the requirements for the degree of Master of Statistics Department of Family and Preventive Medicine The University of Utah April 2021 Copyright © William Barbeau 2021 All Rights Reserved The University of Utah Graduate School STATEMENT OF PROJECT APPROVAL William Barbeau The project of has been approved by the following supervisory committee members: Fares Qeadan , Chair(s) 23 Apr 2021 Date Approved Charlie Casper , Member 23 Apr 2021 Marlene Egger , Member 23 Apr 2021 Date Approved Date Approved ABSTRACT Screening and diagnostic tests, based on continuous biomarker measurements, have become essential tools in medicine. Designing diagnostic tests based on continuous biomarkers is challenging because the biomarker's distribution in the population with the disease and population without the disease are rarely completely separated. Overlap in these two distributions means the test will always result in misclassi cation. The Youden index and Euclidean index (AKA the point closest to (0,1)) are two popular methods for choosing optimal cut-points to maximize sensitivity and speci city of diagnostic tests. Despite both of these methods being described and deployed in practice, there is little guidance on which method to use and at which situation. Through mathematical derivations, we show in the binormal case that the Euclidean index, relative to the Youden index, yields optimal thresholds with lower absolute di erences between sensitivity and speci city, especially when the di erence in biomarker variances is large. If developers of diagnostic tests aim to maximize sensitivity and speci city for normally distributed data, then the Euclidean index is the preferred optimal cut-point method. CONTENTS ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v CHAPTERS 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 1 Background and De nitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2. SIMULATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. MAIN FINDINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1 3.2 Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1.1 Theoretical Optimal Youden Cut-Point . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1.2 Theoretical Optimal Euclidean Cut-Point . . . . . . . . . . . . . . . . . . . . . . . . 9 The Variance Controls Optimal Cut-Points Across Di erent Means . . . . . . . . 10 4. APPLICATIONS AND CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1 Real Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5. * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 LIST OF FIGURES 1.1 The relationship between the PDFs of X and Y with sensitivity and speci city. Dashed blue line represents the cut-point. Adapted from [7]. . . . . . . . . . . . . . . 2.1 Simulated optimal Youden & Euclidean cut-points. X ∼ N (0, 1) and Y ∼ N (1, σ 2 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 and Y ∼ N (1, σ 2 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V ar[Y ]. X ∼ N (0, 1) and Y ∼ N (1, σ 2 ). 3.4 11 V ar[Y ]. X ∼ N (0, 1) and Y ∼ N (1, σ 2 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Optimal Youden and Euclidean cut-points as a function of σ . X ∼ N (0, 1) 2 and Y ∼ N (µ, σ ) where µ = {0.5, 1, 1.5, 2}. . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Sensitivity - Speci city for optimal Youden and Euclidean cut-points as a function of 4.1 . . . . . . . . . . . . . . . . . . . . . . . . Sensitivity and speci city di erence for theoretical and simulated optimal Youden and Euclidean cut-points as a function of 3.3 7 Theoretical and simulated optimal Youden and Euclidean cut-points as a function of 3.2 6 Sensitivity - Speci city for simulated optimal Youden & Euclidean cut-points. X ∼ N (0, 1) 3.1 3 σ 2 . X ∼ N (0, 1) and Y ∼ N (µ, σ 2 ) where µ = {0.5, 1, 1.5, 2}. . . . . 14 Optimal Youden and Euclidean cut-points (pg/mL) for cytokines that predict metastasized cancer. SD is the di erence in population standard deviations. Se=Sensitivity. Sp=Speci city. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 CHAPTER 1 INTRODUCTION Screening tests based on biomarkers have become essential tools in public health. Some of the greatest public health successes in the U.S. are from widespread screening programs [1]. Once penicillin was widely available to the civilian U.S. population, cheap, nontreponemal tests were used to implement national screening programs for syphilis. As a result, the incidence of syphilis dramatically declined from 350 cases per 100,000 in 1941 to less than 100 cases per 100,000 in 1963 [2]. Similarly, widespread implementation of Pap smear tests for cervical cancer screening reduced cervical cancer mortality from 5 deaths per 100,000 in 1975 to 2 deaths per 100,000 in 2003 [3]. Test validity is one of the key components for a successful screening program [4]. Screening and diagnostic tests based on continuous biomarker measurements rely on the fact that certain biomarkers have di erent distributions in diseased vs. non-diseased populations. For example, on average, men with prostate cancer have higher serum concentrations of prostate speci c antigen (PSA) than men without prostate cancer [5]. Constructing valid screening and diagnostic tests based on continuous biomarkers is challenging because the biomarker's distribution in the population with the disease and population without the disease are rarely completely separated. Overlap in these two distributions means the test will always result in misclassi cation. Constructing valid screening and diagnostic tests from continuous biomarkers requires picking an optimal threshold that minimizes misclassi cation. Methods for determining optimal cut-points based on receiver operating characteristic (ROC) curves have been developed and deployed in practice for decades [6]. However, little guidance exists for test developers on which optimal cut-point method to use and when. This paper will compare two popular optimal cut-point methods, the Youden Index and Euclidean Index (AKA the point closest to (0, 1)). In the binormal case, I prove that the Euclidean Index yields optimal cut-points 2 with smaller absolute di erences in sensitivity and speci city, even as the variances between the biomarker's distribution in those with the disease and those without increase. Based on these results, we recommend test developers use the Euclidean Index if they have binormal data and seek to maximize sensitivity and speci city. 1.1 Background and De nitions Let X denote the random variable for the biomarker distribution among those without the disease. We will represent the CDF of X as FX . Similarly, let Y denote the random variable for the biomarker distribution among those with the disease. We will represent the CDF of Y as GY . With this framework, we can de ne common measures of test validity, sensitivity, and speci city in terms of CDFs of X and Y. Sensitivity, also known as the true positive rate, is the probability that a test is positive given the patient has the disease. Similarly, speci city, or the true negative rate, is the probability that a test is negative given the patient does not have the disease. The relationship between the distributions of X and Y, and sensitivity and speci city are shown in Figure 1.1 [7]. Figure 1 demonstrates the reciprocal nature of sensitivity and speci city. As the cut-point increases, the area under g(y) to the right of the cut-point (sensitivity) decreases, but the area under f(x) to the left of the cut-point (speci city) increases. This reciprocal relationship between sensitivity and speci city makes selecting an optimal cut-point that simultaneously maximizes both challenging. ROC curves are used to visualize sensitivity and speci city tradeo s for all possible cut-points. An ROC curve is the set of points {(Se(c), 1 − Sp(c)) ∀ c}. As speci city decreases along the x-axis, sensitivity increases along the y-axis. ROC curves also visualize the overall predictive utility of biomarkers. corresponding ROC curve of the 45◦ A biomarker with no predictive utility has a line passing through the origin. A biomarker with perfect predictive utility has a corresponding ROC curve consisting of two line segmentsthe rst from the origin to (0, 1), and the second from (0, 1) to (1, 1). At the point (0, 1), sensitivity=speci city=1, which would be the ideal diagnostic cut-point. In practice, ROC curves lie between these two extremes [8]. Two popular methods for optimal cut-point selection are de ned using the ROC curve. The Youden index is de ned as, 3 Serum Osmolarity (mOsm/L) 260 0.030 270 280 290 300 310 320 330 Cut- Without Disease Densit y 0.020 Positive Test Negative Test 0.025 With Disease f(x) 0.015 g(y) Sp Se 0.010 0.005 0.000 –4.0 Figure 1.1. –3.0 –2.0 –1.0 0.0 x 1.0 2.0 3.0 4.0 5.0 The relationship between the PDFs of X and Y with sensitivity and speci city. Dashed blue line represents the cut-point. Adapted from [7]. 4 c∗Y u = max[Se + Sp − 1] (1.1) Geometrically, the Youden index is the point on the ROC curve that maximizes the vertical distance between the 45◦ line and the ROC curve. Intuitively, the Youden Index is the farthest point on the ROC curve from the worst possible performance of a test. The Euclidean index is de ned as, p c∗Eu = min[ (1 − Se)2 + (1 − Sp)2 )] (1.2) Geometrically, the Euclidean index is the point on the ROC curve that minimizes the distance between the point (0, 1) and the ROC curve. Intuitively, the Euclidean Index is the closest point on the ROC curve to the best possible performance of a test. At rst glance, these two methods would appear to yield similar optimal cut-points. Perkins & Schisterman (2006) proved that optimal Youden and Euclidean cut-points are only equivalent if sensitivity equals speci city at the optimal cut-point. Even though these methods yield the same optimal cut-point under particular circumstances, little work has evaluated their performance. Perkins & Schisterman (2006) advocates for the use of the Youden index because it minimizes overall misclassi cation, a clinically relevant metric. In contrast, the Euclidean index minimizes the square of overall misclassi cation, which is hard to interpret clinically [9]. Hajian-Tilaki (2018) found through simulation that the Youden index and Euclidean index yield di erent optimal cut-points when variances of the biomarker's distribution in those with the disease and those without di er. Hajian-Tilaki (2018) did not explore how di erent population variances a ected sensitivity and speci city of the optimal cut-points [10]. I will expand on this work by examining sensitivity and speci city of optimal cut-points as a function of the di erence in variances. CHAPTER 2 SIMULATIONS I used Monte Carlo simulations to explore the relationship between the di erence in variances in the population with the disease and the population without. Consider a continuous biomarker where larger values are associated with a greater probability of having the disease. We will consider a simulated population size of 100,000 and a disease prevalence of 10%. Let the random variable for the distribution of the biomarker in those without the disease, X, be normally distributed with mean 0 and variance 1. Similarly, let the random variable of the biomarker in those with the disease, Y, be normally distributed with mean 1 and variance σ2 with σ 2 ≥ 1. In this framework, we can examine how increasing V ar[Y ] a ects the optimal cut-points and their corresponding sensitivities and speci cities. In the interval, and Y σ 2 ∈ [1, 20], I incremented σ 2 by 0.1. With each σ 2 increment, new realizations of X were drawn, and the ROC curve, optimal cut-points, and corresponding sensitivities and speci cities were computed. Empirical ROC curves for each sample were estimated with the R package pROC [11]. Figure 2.1 summarizes optimal Youden and Euclidean cut-points as V ar[Y ] increases. Both the optimal Youden and Euclidean cut-points have an increasing monotonic, concave down J-shape. Similar to what Hajian-Tilaki (2018) found, the optimal Youden and Euclidean cut-points are similar when the di erence in variances is small. V ar[Y ] However, as increases, the optimal Youden cut-points increase at a faster rate than the optimal Euclidean cut-points. As a result, the optimal Euclidean cut-points are smaller than the optimal Youden cut-points. Figure 2.2 shows the di erence between sensitivity and speci city for the optimal Youden and Euclidean cut-points as V ar[Y ] increases. The di erence in sensitivity and speci city for both the optimal Youden and Euclidean cut-points have a decreasing monotonic, concave up J-shape. Since the optimal cut-points are similar when V ar[Y ] ≈ V ar[X], the absolute 6 Optimal Cut−Points 1.6 1.2 Euclidean Youden 0.8 5 10 15 20 Variance Figure 2.1. Simulated optimal Youden & Euclidean cut-points. X ∼ N (0, 1) and Y ∼ N (1, σ 2 ). di erences between sensitivity and speci city are also similar for optimal Youden and Euclidean cut-points. However, as V ar[Y ] increases, the magnitude of the sensitivity/speci city di erence increases much faster for the optimal Youden cut-point. As a result, the optimal Euclidean cut-points have a smaller absolute di erence between sensitivity and speci city, indicating a better balance between the two. 7 0.0 Se − Sp −0.2 Euclidean Youden −0.4 5 10 15 20 Variance Figure 2.2. Sensitivity - Speci city for simulated optimal Youden & Euclidean cut-points. X ∼ N (0, 1) and Y ∼ N (1, σ 2 ). CHAPTER 3 MAIN FINDINGS 3.1 Derivations Both of the curves in Figure 2.2 are reminiscent of an exponential curve. observation, we sought to derive the mathematical relationship between With this V ar[Y ] and Se−Sp. Same as above, we assume X, the distribution of the biomarker in those without the disease to be normal with mean 0 and variance 1. We also assume Y, the distribution of the biomarker in those with the disease to be normal with mean 1 and variance σ2. With these assumptions, sensitivity and speci city, can be expressed as functions of normal CDFs. Where FX Se = P (test + \|diseased) = P (Y ≥ c) = 1 − GY (c) (3.1) Sp = P (test − \|non − diseased) = P (X < c) = FX (c) (3.2) is the CDF of X, and 3.1.1 GY is the CDF of Y. Theoretical Optimal Youden Cut-Point Substituting the above expressions for sensitivity and speci city into equation 1.1 the Youden index can be de ned as: c∗Y u = max[Se + Sp − 1] = max[1 − GY (c) + FX (c) − 1] c∗Y u = max[FX (c) − GY (c)] (3.3) 9 Since the normal CDF is di erentiable, the maximum can be found by taking the rst partial derivative with respect to c and solving for critical points. ∂ (FX (c) − GY (c)) = fX (c) − gY (c) = 0 ∂c −(c−1)2 1 −c2 1 √ e 2 − √ e 2σ2 = 0 2π σ 2π −c2 1 −(c−1)2 e 2 = e 2σ2 σ −c2 1 (c − 1)2 = ln − 2 σ 2σ 2 2 2 −c 1 c − 2c + 1 = ln − 2 σ 2σ 2 2 2 c 1 1 −c c + 2 − 2 + 2 = ln 2 2σ σ 2σ σ 1 − σ2 2 c 1 1 c − 2 + 2 − ln = 0 2σ 2 σ 2σ σ Using the quadratic formula we can nd the optimal cut-point for the Youden inedex, c∗Y u , as a function of σ. c∗Y u = 1 σ2 ± q 2 )2 − 4( 1−σ )( 2σ1 2 − ln σ1 ) ( −1 σ2 2σ 2 1−σ 2 σ2 (3.4) With the closed form solution of the optimal Youden cut-point known, the di erence between sensitivity and speci city can be written as, Se − Sp = 1 − GY (c∗Y u ) − FX (c∗Y u ) Se − Sp = 1 − Φ( Where Φ() c∗Y u − 1 ) − Φ(c∗Y u ) σ (3.5) is the standard normal CDF. 3.1.2 Theoretical Optimal Euclidean Cut-Point The Euclidean index can also be expressed as a function of normal CDFs by using equations 1.2, 3.1, and 3.2 . c∗Eu = min[ p (1 − Se)2 + (1 − Sp)2 )] = min[(1 − Se)2 + (1 − Sp)2 ] = min[(1 − (1 − GY (c)))2 + (1 − FX )(c))2 ] = min[GY (c)2 + (1 − FX (c))2 ] 10 Since the normal CDF is di erentiable, the minimum can be found by taking the rst partial derivative with respect to c and solving for critical points. ∂ (GY (c)2 + (1 − FX (c))2 ) ∂c = 2GY (c)gY (c) + 2(1 − FX (c))fX (c) = 0 gY (c)Φ( c−1 ) + fX (c) + fX (c)Φ(c) = 0 σ The optimal Euclidean cut-point does not have a closed form solution. The optimal Euclidean cut-point can be estimated numerically by minimizing equation 1.2 [12]. With the numerically estimated optimal Euclidean cut-point, the di erence between sensitivity and speci city can be written as, Se − Sp = 1 − Φ( c∗Eu − 1 ) − Φ(c∗Eu ) σ (3.6) Figure 3.1 shows the theoretical optimal Youden and Euclidean cut-points match the simulated cut-points. Notably, the optimal Euclidean cut-points increase at a slower rate than the optimal Youden cut-points. Similarly, the theoretical di erence in sensitivity and speci city match the simulated di erence (Figure 3.2). This con rms in the binormal case that as the variance between X and Y increases, the optimal Euclidean cut-points have a smaller absolute di erence between sensitivity and speci city than the optimal Youden cut-points. 3.2 The Variance Controls Optimal Cut-Points Across Di erent Means So far we have demonstrated that the di erence in variances in the biomarker's distribution in those without the disease and those with the disease controls the optimal cut-points of the Youden and Euclidean index. Next, we wanted to explore if this relationship is preserved as the means between these distributions change. Again X ∼ N (0, 1) and Y ∼ N (µ, σ 2 ). As before the optimal cut-points, and their di erence in sensitivity and speci city were simulated and computed using the derived formulas as a function of done in the four cases where X and Y µ = {0.5, 1, 1.5, 2}. V ar[Y ]. This was As expected, increasing the mean between increased optimal cut-points for both the Youden and Euclidean index (Figure 3.3). Based on the graphs, as the mean increases, the di erence between the Youden and Euclidean optimal cut-points decreases. This change in the optimal cut-points also changes 11 Optimal Cut−Points 1.6 Simulated Euclidean 1.2 Simulated Youden Theoretical Euclidean Theoretical Youden 0.8 5 10 15 20 Variance Figure 3.1. function of Theoretical and simulated optimal Youden and Euclidean cut-points as a V ar[Y ]. X ∼ N (0, 1) and Y ∼ N (1, σ 2 ). their sensitivity and speci city (Figure 3.4). The Euclidean index still has a smaller absolute di erence of sensitivity and speci city across all variances. However, as the mean increases, \|Se − Sp\| for the Youden index decreases. 12 0.0 −0.2 Se − Sp Simulated Euclidean Simulated Youden Theoretical Euclidean Theoretical Youden −0.4 5 10 15 20 Variance Figure 3.2. Sensitivity and speci city di erence for theoretical and simulated optimal Youden and Euclidean cut-points as a function of V ar[Y ]. X ∼ N (0, 1) and Y ∼ N (1, σ 2 ). 13 Mean 0.5 Mean 1 2.0 2.0 1.5 Optimal Cut−Points Optimal Cut−Points 1.5 1.0 0.5 Simulated Euclidean Simulated Youden Theoretical Euclidean 1.0 Theoretical Youden 0.5 5 10 Variance 15 20 5 2.0 2.0 1.5 1.5 1.0 0.5 20 simulated euclidean simulated youden theoretical euclidean 1.0 theoretical youden 0.5 5 Figure 3.3. and 15 Mean 2 Optimal Cut−Points Optimal Cut−Points Mean 1.5 10 Variance 10 Variance 15 20 5 10 Variance 15 20 Optimal Youden and Euclidean cut-points as a function of Y ∼ N (µ, σ 2 ) where µ = {0.5, 1, 1.5, 2}. σ 2 . X ∼ N (0, 1) 14 Mean 0.5 Mean 1 0.0 −0.2 −0.2 Simulated Euclidean Se − Sp Se − Sp 0.0 Simulated Youden Theoretical Euclidean Theoretical Youden −0.4 −0.4 −0.6 −0.6 5 10 Variance 15 20 5 10 Variance 15 20 Mean 2 Mean 1.5 0.0 −0.2 −0.2 Simulated Euclidean Se − Sp Se − Sp 0.0 Simulated Youden Theoretical Euclidean Theoretical Youden −0.4 −0.4 −0.6 −0.6 5 Figure 3.4. 10 Variance 15 20 5 10 Variance 15 20 Sensitivity - Speci city for optimal Youden and Euclidean cut-points as a 2 function of σ . X ∼ N (0, 1) and Y ∼ N (µ, σ 2 ) where µ = {0.5, 1, 1.5, 2}. CHAPTER 4 APPLICATIONS AND CONCLUSION 4.1 Real Data Analysis To illustrate the di erences between optimal Youden and Euclidean cut-points, we will use an example of identifying cytokines predictive of metastasized cancer using data from [13]. In the study, 27 patients underwent lymph node biopsies. Lymph node malignancy was ascertained by histology, and multiplex immunoassay was used to detect and quantify the concentration of 34 cytokines. Of the 27 patients, 15 had malignant lymph nodes, and 12 had benign lymph nodes. Of the 34 cytokines, 6 were found to be associated with malignant lymph nodes, which indicates their potential use in screening for metastasized cancer. The optimal Youden and Euclidean cut-points were estimated for each of the 6 potential cytokines using the binormal assumption and equations 3.4 & 1.2. After assessing the histograms, the cytokine measurements were log transformed to satisfy the binormal assumption. Figure 4.1 shows the estimated optimal Youden and Euclidean cut-points for each of the 6 cytokines. The the standard deviation of the log of the angiopoietin-2 cytokine concentration is the same in those with benign lymph nodes and those with malignant lymph nodes. As expected, the optimal Youden and Euclidean cut-points are the same for angiopoietin-2. For the 5 other cytokines, the standard deviations are di erent in the two populations and yield di erent optimal cut-points. Also as expected, speci city is greater than sensitivity for both optimal cut-points (Figure 3.2). IL-6 and uPA have the greatest standard deviation di erences, and the worst balance between sensitivity and speci city for their optimal cut-points. For IL-6 and uPA, the Euclidean optimal cut-point has better balance between sensitivity and speci city. The Euclidean cut-point is strikingly better for uPA. The sensitivity and speci city for the Youden cut-point is 0.467 and 1.0 respectively where as the sensitivity and speci city for the Euclidean cut-point is 0.667 and 1.0 respectively. With 16 the same speci city, the Euclidean cut-point has 0.2 greater sensitivity than the Youden cut-point. This example highlights the dramatic di erence between optimal Youden and Euclidean cut-points when there is a large di erence in the biomarker variance in those without the disease and those with. Youden Angiopoien-2 sVEGFR-1 PLGF VEGF-A IL-6 uPA Figure 4.1. SD 0 0.320 0.517 0.757 0.947 1.218 Cut-Point 6.791 7.724 4.522 7.537 3.948 6.176 Se 0.933 0.867 0.800 0.800 0.667 0.467 Euclidean Sp 0.917 1.000 1.000 1.000 1.000 1.000 Cut-Point 6.791 7.708 4.225 7.550 3.834 5.809 Se 0.933 0.867 0.800 0.800 0.800 0.667 Sp 0.917 1.000 0.917 1.000 1.000 1.000 Optimal Youden and Euclidean cut-points (pg/mL) for cytokines that predict metastasized cancer. SD is the di erence in population standard deviations. Se=Sensitivity. Sp=Speci city. 4.2 Future Directions This study found that the optimal Euclidean cut-point had a smaller absolute di erence in sensitivity and speci city than the optimal Youden cut-point in the binormal case. This begs the question, do these ndings generalize? Addressing this question will require exploring di erent distributional families (e.g., logistic or gamma). Additionally, this study assumed that the biomarker distributions in those without the disease and those with are of the same family, an assumption that may not be valid in practice. It will be important to examine if mixtures of the biomarker distributions a ect optimal cut-points. We found the optimal Euclidean cut-point to be uniformly better at maximizing sensitivity and speci city even across varying population means. This result may not hold across di erent distributional families, warranting further investigation of the population means on the behavior of optimal cut-points. Exploring the surface of optimal cut-points as a multivariable function of the population means and variances will help address this aim. 4.3 Conclusions In this paper, we derived the optimal Youden cut-point in the binormal case. Additionally, we showed that the optimal Youden cut-point is greater than the optimal Euclidean cut- 17 point, and that this di erence is pronounced as the variance of the biomarker between those without the disease and those with increase. Due to the di erence in optimal cut-points, the Euclidean index yields optimal cut-points with smaller absolute di erences between sensitivity and speci city. Based on these ndings, we recommend use of the Euclidean index to compute the optimal cut-point of diagnostic tests if the goal is to produce a test that maximizes sensitivity and speci city. CHAPTER 5 * Bibliography [1] A. Morabia and F. F. Zhang. History of medical screening: from concepts to action . In: Postgraduate Medical Journal 80.946 (Aug. 2004), pp. 463 469. doi: 10.1136/pgmj.2003.018226. [2] url: (visited on 04/14/2021). CDC. CDC Vitalsign: Cervical Cancer is Preventable. Centers for Disease Control and Prevention. Jan. 6, 2020. cancer/index.html [4] 0032-5473. CDC. Syphilis - 2018 Sexually Transmitted Diseases Surveillance. Oct. 8, 2019. https://www.cdc.gov/std/stats18/syphilis.htm [3] issn: url: https://www.cdc.gov/vitalsigns/cervical- (visited on 04/12/2021). Ann Aschengrau and George R. Seage. Essentials of epidemiology in public health. 3rd ed. OCLC: ocn826123155. Burlington, MA: Jones & Bartlett Learning, 2014. 526 pp. isbn: 978-1-4496-5733-8 978-1-284-02891-1. [5] Hans Lilja, David Ulmert, and Andrew J. Vickers. Prostate-speci c antigen and prostate cancer: prediction, detection and monitoring . In: Nature Reviews. Cancer 8.4 (Apr. 2008), pp. 268 278. [6] issn: 1474-1768. doi: 10.1038/nrc2351. W. J. Youden. Index for rating diagnostic tests . In: Cancer 3.1 (Jan. 1950), pp. 32 35. issn: 0008-543X. doi: 10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3. 0.co;2-3. [7] Farrokh Habibzadeh, Parham Habibzadeh, and Mahboobeh Yadollahie. On determining the most appropriate test cut-o value: the case of tests with continuous results . In: Biochemia medica 26.3 (2016). Publisher: Medicinska naklada, pp. 297 307. [8] Xiao-hua Zhou, Donna K. McClish, and Nancy A. Obuchowski. Statistical methods in diagnostic medicine. 2nd ed. Wiley series in probability and statistics. Hoboken, N.J: Wiley, 2011. 545 pp. [9] isbn: 978-0-470-18314-4. Neil J. Perkins and Enrique F. Schisterman. The Inconsistency of Optimal Cutpoints Obtained using Two Criteria based on the Receiver Operating Characteristic Curve . In: American Journal of Epidemiology 163.7 (Apr. 1, 2006), pp. 670 675. issn: doi: 10.1093/aje/kwj063. url: http://academic.oup.com/ aje / article / 163 / 7 / 670 / 77813 / The - Inconsistency - of - Optimal - Cutpoints Obtained (visited on 04/16/2021). 1476-6256, 0002-9262. [10] Karimolla Hajian-Tilaki. The choice of methods in determining the optimal cut-o value for quantitative diagnostic test evaluation . In: Statistical Methods in Med- ical Research 27.8 (Aug. 2018), pp. 2374 2383. 0962280216680383. issn: 1477-0334. doi: 10 . 1177 / BIBLIOGRAPHY [11] 19 Xavier Robin et al. pROC: an open-source package for R and S+ to analyze and issn: 1471doi: 10 . 1186 / 1471 - 2105 - 12 - 77. url: https : / / bmcbioinformatics . compare ROC curves . In: BMC Bioinformatics 12.1 (Dec. 2011), p. 77. 2105. biomedcentral.com/articles/10.1186/1471-2105-12-77 [12] (visited on 04/16/2021). Richard P Brent. Algorithms for minimization without derivatives. Courier Corporation, 2013. [13] Ali I Saeed et al. A novel cytokine pro le associated with cancer metastasis to mediastinal and hilar lymph nodes identi ed using ne needle aspiration biopsy A pilot study . In: Cytokine 89 (2017). Publisher: Elsevier, pp. 98 104.
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6qg4swj