OCR Text |
Show Original Contribution Section Editors: Clare Fraser, MD Susan Mollan, MD Deep Learning System Outperforms Clinicians in Identifying Optic Disc Abnormalities Caroline Vasseneix, MD, Simon Nusinovici, PhD, Xinxing Xu, PhD, Jeong-Min Hwang, MD, Steffen Hamann, MD, PhD, John J. Chen, MD, PhD, Jing Liang Loo, MBBS, MMed, FRCS(Ed), Leonard Milea, BA, Kenneth B.K. Tan, MBBS, MMed EM, Daniel S.W. Ting, MD, PhD, Yong Liu, PhD, Nancy J. Newman, MD, Valerie Biousse, MD, Tien Ying Wong, MD, PhD, Dan Milea, MD, PhD, Raymond P. Najjar, PhD, for the BONSAI (Brain and Optic Nerve Study With Artificial Intelligence) Group Background: The examination of the optic nerve head (optic disc) is mandatory in patients with headache, hypertension, or any neurological symptoms, yet it is rarely or poorly performed in general clinics. We recently developed a brain and optic nerve study with artificial intelligence-deep learning system (BONSAI-DLS) capable of accurately detecting optic disc abnormalities including papilledema (swelling due to elevated intracranial pressure) on digital fundus photoVisual Neuroscience Group (CV, SN, DT, TYW, DM, RPN), Singapore Eye Research Institute, Singapore; Duke NUS Medical School (DT, TYW, DM, RPN), National University of Singapore, Singapore; Institute of High Performance Computing (XX, YL), Agency for Science, Technology and Research (A*STAR), Singapore; Department of Ophthalmology (J-MH), Seoul National University College of Medicine, Seoul National University Bundang Hospital, Seongnam-si, Korea (the Republic of); Department of Ophthalmology (SH), Rigshospitalet, University of Copenhagen, Kobenhavn, Denmark; Departments of Ophthalmology and Neurology (JJC), Mayo Clinic Rochester, Minnesota; Singapore National Eye Centre (JLL, DT, TYW, DM), Singapore; Berkeley University (LM), Berkeley, California; Department of Emergency Medicine (KT), Singapore General Hospital, Singapore; Departments of Ophthalmology, Neurology and Neurological Surgery (NJN, VB), Emory University School of Medicine, Atlanta, Georgia; and Department of Ophthalmology (RPN), Yong Loo Lin School of Medicine, National University of Singapore, Singapore. Supported by Singapore National Medical Research Council for the Clinician Scientist Individual Research grant (CIRG18Nov-0013) and Duke-NUS Medical School, for the Ophthalmology and Visual Sciences Academic Clinical Program grant (05/FY2019/P2/06-A60). The authors report no conflicts of interest. Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s Web site (www. jneuro-ophthalmology.com). BONSAI group members are listed in acknowledgments. C. Vasseneix, D. Milea, and R. P. Najjar contributed equally. Address correspondence to Dan Milea, MD, PhD, Singapore Eye Research Institute, 20 College Road Discovery Tower, Level 6, The Academia 169856, Singapore; E-mail: dan.milea@singhealth.com.sg Vasseneix et al: J Neuro-Ophthalmol 2023; 43: 159-167 graphs with a comparable classification performance to expert neuro-ophthalmologists, but its performance compared to first-line clinicians remains unknown. Methods: In this international, cross-sectional multicenter study, the DLS, trained on 14,341 fundus photographs, was tested on a retrospectively collected convenience sample of 800 photographs (400 normal optic discs, 201 papilledema and 199 other abnormalities) from 454 patients with a robust ground truth diagnosis provided by the referring expert neuroophthalmologists. The areas under the receiver-operatingcharacteristic curves were calculated for the BONSAI-DLS. Error rates, accuracy, sensitivity, and specificity of the algorithm were compared with those of 30 clinicians with or without ophthalmic training (6 general ophthalmologists, 6 optometrists, 6 neurologists, 6 internists, 6 emergency department [ED] physicians) who graded the same testing set of images. Results: With an error rate of 15.3%, the DLS outperformed all clinicians (average error rates 24.4%, 24.8%, 38.2%, 44.8%, 47.9% for general ophthalmologists, optometrists, neurologists, internists and ED physicians, respectively) in the overall classification of optic disc appearance. The DLS displayed significantly higher accuracies than 100%, 86.7% and 93.3% of clinicians (n = 30) for the classification of papilledema, normal, and other disc abnormalities, respectively. Conclusions: The performance of the BONSAI-DLS to classify optic discs on fundus photographs was superior to that of clinicians with or without ophthalmic training. A trained DLS may offer valuable diagnostic aid to clinicians from various clinical settings for the screening of optic disc abnormalities harboring potentially sight- or life-threatening neurological conditions. Journal of Neuro-Ophthalmology 2023;43:159–167 doi: 10.1097/WNO.0000000000001800 © 2023 by North American Neuro-Ophthalmology Society M edical imaging has greatly improved the screening and diagnosis of various diseases, including in ophthalmology. Moreover, the recent emergence of artificial 159 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution intelligence (AI) is expected to provide a valuable additional aid to physicians of various specialties for the interpretation of medical images (1). Several studies aimed to compare AI and more specifically deep learning (DL), to human clinicians for the interpretation of medical imaging outcomes, to evaluate the applicability of that tool in real-world settings, and how DL algorithms could affect the clinicians’ decisionmaking process (2,3). In neuro-ophthalmology, because experts are too few and not always readily available (4), ocular imaging has recently emerged as an alternative for screening and diagnosis of optic nerve head (optic disc) abnormalities caused by various conditions, including life- or sight-threatening diseases such as cerebral space-occupying lesions and various optic neuropathies, or manifestations of systemic diseases such as malignant hypertension (5,6). However, even when fundus photographs are available, the detection and classification of optic disc abnormalities by non-ophthalmologytrained physicians remains suboptimal (7), and a remote interpretation by neuro-ophthalmologists via telemedicine is usually necessary, but not always effective (8). We have recently developed within an international consortium, BONSAI (Brain and Optic Nerve Study with Artificial Intelligence) (9), a deep learning system (BONSAI-DLS) capable of discriminating with high accuracy among normal optic discs, discs with papilledema (optic disc swelling secondary to raised intracranial pressure), and optic discs displaying other abnormalities (e.g., congenital abnormalities, optic disc drusen, inflammatory, ischemic or atrophic changes) on ocular fundus photographs. Our BONSAI-DLS had a similar performance for the classification of optic disc appearance on ocular fundus photographs as expert neuro-ophthalmologists (overall error rate of 15.3% [122/800] for the DLS vs 15.6% [125/800] for expert 1 and 19.9% [159/800] for expert 2) (10). To date, for the diagnosis of optic disc abnormalities because of neurological or systemic diseases, no studies compared the performance of a DLS to health care providers who are not trained in ophthalmology, but who are supposed to diagnose and manage in first-line patients with optic disc abnormalities (e.g., neurologists, internists and emergency department physicians) (11). The aim of this study was to compare the performance of the BONSAI-DLS to the performance of various clinicians with and without ophthalmic training in classifying optic disc appearance on the same set of fundus photographs previously used to compare the DLS with expert neuroophthalmologists (10). METHODS Study Design In this cross-sectional multicenter study, we used a sample of fundus photographs retrospectively collected from the 160 BONSAI Consortium, previously used to compare the BONSAI-DLS to expert neuro-ophthalmologists (10), to evaluate and compare the classification performance of the BONSAI-DLS and 30 clinicians with and without ophthalmic training. The BONSAI Consortium’s objective was to develop, train, and test the performance of BONSAI-DLS in classifying optic discs as being normal, displaying features of papilledema or other optic disc abnormalities (9). The study was approved by the Centralized Institutional Review Board (CIRB) of SingHealth, Singapore and each contributing institution, and was conducted in accordance with the Declaration of Helsinki. Informed consent was exempted given the retrospective nature of the study and the use of de-identified medical information and ocular fundus photographs. Deep Learning System, Eligibility Criteria, Reference Standard and Retinal Imaging The BONSAI-DLS is described in details elsewhere (9,10) (see Supplemental Digital Content, Appendix 1, http:// links.lww.com/WNO/A670). De-identified standard digital ocular fundus photographs of patients of different ethnicities with normal optic discs and those with definite neuro-ophthalmic diagnoses affecting the optic discs (including papilledema of various severities) were provided by neuro-ophthalmology experts from 3 international reference centers from the BONSAI study (Copenhagen, Denmark; Mayo Clinic, Rochester, USA and Seoul, South Korea) (10). The photographs were not used in the training of the BONSAI-DLS and were taken after pupil dilation, using various digital desktop cameras (10). The reference standard and inclusion criteria were described in details elsewhere (10). All fundus photographs provided by the 3 participating centers were reviewed by 2 neuro-ophthalmologists (D.M. and C.V.), and after exclusion of images of poor quality or with incomplete data or mislabeling, a convenience sample of 800 of 1,347 fundus photographs (composed of half normal and half abnormal optic discs) from 454 patients (346 patients with both eyes imaged and 108 singletons with 1 eye imaged) (10), was used (Table 1). Clinicians’ Selection and Testing Procedure Thirty clinicians of different levels of expertise from 5 different medical specialties participated in this study, all practicing in the Singapore General Hospital and the Singapore National Eye Center, Singapore. Twelve clinicians had ophthalmic training: 6 general ophthalmologists and 6 optometrists. Eighteen clinicians were physicians without ophthalmic training, naive to fundus photographs reading, with some experience in optic disc examination, including 6 neurologists, 6 internists, and 6 ED physicians. None of them had fellowship level training or specific expertise in neuro-ophthalmology. Vasseneix et al: J Neuro-Ophthalmol 2023; 43: 159-167 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution TABLE 1. Demographics and characteristics of patients and imaged eyes per center Patients characteristics Mean age ± SD (yr) Female sex (%) Race (%) Asian Caucasian Other Number of patients (imaged eyes) Normal optic discs Papilledema Other OD abnormalities Total patients (eyes) Type of other OD abnormalities (per eye) Optic atrophy Optic disc drusen NAION Other OD swelling Congenital abnomalous OD Other Copenhagen Rochester Seoul Overall 40.1 ± 16.9 73.7 50.9 ± 18.6 62.6 43.0 ± 21.2 53.0 44.5 ± 20.1 60.1 0 100 0 0 98.4 1.6 99.6 0.4 0 48.7 50.9 0.4 46 (86) 23 (45) 30 (57) 99 (188) 58 (92) 26 (50) 39 (66) 123 (208) 131 (222) 53 (106) 48 (76) 232 (404) 235 (400) 102 (201) 117 (199) 454 (800) 18 24 3 4 3 5 28 11 16 2 7 2 26 17 20 10 2 1 72 52 39 16 12 8 NAION, nonarteritic ischemic optic neuropathy; OD, optic disc. A semi-automated software12 was used to randomly display the 800 fundus photographs on the same individual computer screen (LG-34WK650, 100% brightness, 80% contrast) and combine the grading provided by the clinicians in an excel sheet, as previously performed for expert neuro-ophthalmologists (10). Before being tested, all clinicians received a short training session on how to use the software on 50 fundus photographs illustrating all categories of optic disc findings. Subsequently, without any clinical information, each clinician was independently given the fundus photographs and was asked to classify the optic discs as normal, papilledema, or other optic disc abnormalities. The time spent to complete the classification of 800 fundus photographs was recorded for each clinician. Statistical Analyses The performance characteristics of the BONSAI-DLS and the clinicians were determined through the calculation of the overall multiclass error rate percentage (i.e., the rate of incorrect overall classification compared with the reference standard), and sensitivity, specificity, and accuracy, using a one-vs-rest strategy (normal vs the rest, papilledema vs the rest, and other abnormalities vs the rest). The area under the receiver operating characteristic curve (AUC) was previously calculated for the BONSAI-DLS (10). A pairwise McNemar test with Bonferroni correction was used to compare the sensitivities, specificities, and accuracies between the BONSAI-DLS and each clinician (13). The percent error rate was compared between ophthalmology-trained and non–ophthalmology-trained clinicians using a Mann–Whitney U Test. The percent error rate was also compared between specialty groups using Vasseneix et al: J Neuro-Ophthalmol 2023; 43: 159-167 a Kruskal–Wallis analysis of variance on ranks followed by a pairwise comparison using a Tukey post hoc test. Analyses were performed using a modified version of the package “DTComPair” in R v.3.6.3: A Language and Environment for Statistical Computing (R Core Team, Vienna, Austria). The multiclass classification intergrader agreements (within each group of clinicians) were computed using Cohen kappa agreement scores. Confidence intervals (95% CI) were calculated for the AUC, sensitivity, specificity, accuracy, and kappa scores. Unless indicated otherwise, data are represented in the text, tables, and figures as average (95% CI). RESULTS Demographics and Fundus Photographs Characteristics The study population was the same as in our previous study (10), and included 454 patients, mean age was 44.5 (±20.1) years, and 273/454 (60.1%) were women (Table 1). Half of the fundus photographs were normal (400/800, 50.0%) and the other half was divided into papilledema (201 images, 25.1%), and other optic disc abnormalities, (199 images, 24.9%) listed in Table 1. Error Rates of Brain and Optic Nerve Study With Artificial Intelligence-Deep Learning System and Clinicians for Overall Classification The overall error rates of the clinicians for the classification of optic disc appearance on this sample of 800 photographs were 24.4% for ophthalmologists, 24.8% for optometrists, 161 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution 38.2% for neurologists, 44.8% for internists, and 47.9% for ED physicians, all statistically higher than the error rate of the BONSAI-DLS (P , 0.001) (Fig. 1A, B). Clinicians with ophthalmic training displayed a lower error rate than physicians without ophthalmic training (Fig. 1A, 24.6% vs 43.6%, T = 79, P , 0.001). The performance of ophthalmologists and optometrists was comparable (Fig. 1B, q = 0.33, P = 1). The BONSAI-DLS was on average 168 times faster in the classification of 800 fundus photographs compared with clinicians (average times: BONSAI-DLS = 25 seconds; clinicians = 70 minutes, STD ±22.3 min, range 40–139 minutes). Comparison of Performance (Area under the Receiver Operating Characteristic Curve, Accuracy, Sensitivity and Specificity) Between Brain and Optic Nerve Study With Artificial Intelligence-Deep Learning System and Clinicians The BONSAI-DLS had an AUC of 0.96 (CI 95% 0.94– 0.97), 0.97 (CI 95% 0.96–0.98) and 0.89 (CI 95% 0.87–0.92) for the detection of papilledema, normal optic discs, and other optic disc abnormalities, respectively (10). The overall performances of clinicians for the detection of papilledema, normal optic discs, and other optic disc abnormalities were all below the ROCs of the DLS, indicating a lower classification performance overall (Fig. 2). Of the 30 clinicians, 30 (100%), 26 (86.7%), and 28 (93.3%) had significantly lower accuracies than the DLS for the grading of papilledema, normal optic discs, and other optic disc abnormalities, respectively (Fig. 3A, Table 2, See Supplemental Digital Content, Appendix Tables 1 and 2, http://links.lww.com/WNO/A670). Three ophthalmologists and one optometrist were as accurate as the DLS for the detection of normal optic discs, and one ophthalmologist and one optometrist were as accurate as the DLS for the detection of other optic disc abnormalities. For the detection of papilledema, 11/30 (36.7%) clinicians were significantly less sensitive than the DLS, whereas 28/30 (93.3%) were less specific. The results of sensitivity and specificity are oppositely interdependent. Hence, some neurologists, internists, and ED physicians displayed relative high sensitivities, at the detriment of a high false-positive rate (i.e., low specificity, Figs. 2, 3B and C, Table 2). For the detection of normal optic discs and other optic disc abnormalities, 24/30 (80%) and 30/30 (100%) clinicians were less sensitive than the DLS, and 8/30 (26.7%) and 16/30 (53.3%) were less specific, respectively (Figs. 2, 3B and C, See Supplemental Digital Content, Appendix Tables 1 and 2, http://links.lww.com/WNO/A670). Interagreement Within Clinician Groups Intergrader agreements for ophthalmologists, optometrists, neurologists, internists, and ED physicians followed a decreasing trend, from moderate (0.62) for ophthalmologists and optometrists to minimal agreement (0.25) for the ED physicians (See Supplemental Digital Content, Appendix Figure 1, http://links.lww.com/WNO/A670). FIG. 1. A–B. Overall error rates of the deep learning system (dashed grey line) and clinicians for the classification of optic disc appearance. A. Clinicians are grouped according to their training in ophthalmology. B. Detailed results for each group of specialists, from the highest (left) to the lowest (right) performance. Data are represented in percentage, as average ± SD. The error rate of the deep learning system was significantly lower than that of all the clinicians (A, B). Physicians without ophthalmic expertise (n = 18) had significantly higher error rates than ophthalmologists and optometrists (n = 12) (A Mann– Whitney U test [P , 0$001] and B Kruskal–Wallis one-way ANOVA on ranks (H = 20$67, P , 0$001)). Tukey test was used for post hoc analyses. ANOVA, analysis of variance. 162 Vasseneix et al: J Neuro-Ophthalmol 2023; 43: 159-167 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution FIG. 2. A–C. Receiver operating characteristic curve (ROC) corresponding to the performance of the deep learning system (BONSAI-DLS) and clinicians (n = 30) for the classification of optic discs on fundus photographs. A. Normal optic discs. B. Discs with papilledema. C. Optic discs with other abnormalities. The results for clinicians are presented as average and standard deviation for each group of specialists. The DLS had an AUC of 0.97 (95% CI 0.96–0.98) for normal optic discs, 0.96 (95% CI 0.94–0.97) for papilledema and 0.89 (95% CI 0.87–0.92) for other optic disc abnormalities. For more details on the performance characteristics of the DLS and clinicians, see Table 2; Supplemental Digital Content, Appendix Tables 1 and 2. AUC, area under the receiver operating characteristic curve. CONCLUSIONS The main finding of this study is that a DLS outperformed nonexpert clinicians from various medical specialties, including non-ophthalmology-trained physicians and eyecare professionals, for the classification of optic discs on ocular fundus photographs. Notably, the DLS was more accurate than any individual clinician who performed the same task for the detection of papilledema, a sign of often life or sight-threatening conditions. Deep learning algorithms are increasingly developed to detect systemic biomarkers or retinal diseases associated with general conditions using fundus photographs acquired with digital cameras (14,15). However, only few studies used deep learning techniques to detect optic disc abnormalities related to neuro-ophthalmic conditions (11). Optic disc abnormalities can affect a substantial number of patients with severe hypertension, headaches, or neurologic deficits who present to nonophthalmic providers (16). To our knowledge, there is no other study comparing the performance of a deep learning algorithm to that of non– ophthalmology-trained physicians, such as neurologists, internists, or emergency physicians, for the detection of optic disc abnormalities. In this study, neurologists, internists, and ED physicians evaluating fundus photographs without any clinical information, missed 26.8% of papilledema on average vs 8.5% for the DLS, 16% on average for nonexpert eye-care professionals (i.e., general ophthalmologists and optometrists), and 10.7% for expert neuroophthalmologists (10). Likewise, nonophthalmologists on average falsely diagnosed normal optic discs or optic discs with other abnormalities as papilledema 20.6% of the time vs 4.3% for the DLS, 11.1% on average for eye-care professionals, and 5.9% for expert neuro-ophthalmologists (10). High error rates and false positives for detection of Vasseneix et al: J Neuro-Ophthalmol 2023; 43: 159-167 papilledema could potentially lead to diagnostic error or delays, or to unnecessary and costly tests or referrals (17). Comparing DLS to clinicians is an important step before a potential implementation of automated screening or diagnosis of diseases in real-world clinical practice. In ophthalmology, some studies showed that DLS performance was at least as good as that of expert human evaluators and better than that of nonexpert eye-care professionals (2,18–20). However, 2 recent reviews highlighted several concerns regarding study design, reporting standards and risk of bias of many studies that compared the performance of DLS with human clinicians for interpretation of medical imaging (2,3). In this study, we used a DLS previously trained on 14,341 fundus photographs in a large multicentre study, which is externally validated. The reference standard was robust, established by expert neuroophthalmologists from the participating centers, and the human comparator group was composed of 30 clinicians, of various specialties, and grouped into 6 clinicians by specialty. The same dataset of 800 fundus photographs was used to compare the BONSAI-DLS, 2 expert neuroophthalmologists in a previous study (10), and the 30 clinicians included here. In our study, intergrader agreements were moderate for ophthalmologists and for optometrists (0.62 for both), and minimal to weak for non–ophthalmology-trained physicians (0.45, 0.29 and 0.25 for neurologists, internists and ED physicians, respectively). A lack of agreement among experts for interpretation of ocular imaging has been previously described (19,20). This variability of interpretation of fundus images among groups of physicians of equivalent level of expertise emphasizes the need for a reproducible deep learning algorithm for screening purposes. This study has inherent limitations. First, the clinicians were not given any clinical information along with the 163 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution FIG. 3. A–C. Accuracy (A), sensitivity (B) and specificity (C) of the DLS and clinicians for the detection of normal discs, discs with papilledema or other abnormalities. The gray area represents the 95% confidence interval of accuracy, sensitivity and specificity of the DLS, and the dashed gray line the average. Boxplots represent the mean and SD of accuracies, sensitivities and specificities in each group of clinicians. Each evaluator is represented by a circle. DLS, deep learning system. 164 Vasseneix et al: J Neuro-Ophthalmol 2023; 43: 159-167 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution TABLE 2. Performance of the deep learning system and clinicians for the classification of discs with papilledema from normal discs and discs with other abnormalities Classification of Discs With Papilledema DLS Ophthalmologist 1 Ophthalmologist 2 Ophthalmologist 3 Ophthalmologist 4 Ophthalmologist 5 Ophthalmologist 6 Optometrist 1 Optometrist 2 Optometrist 3 Optometrist 4 Optometrist 5 Optometrist 6 Neurologist 1 Neurologist 2 Neurologist 3 Neurologist 4 Neurologist 5 Neurologist 6 Internist 1 Internist 2 Internist 3 Internist 4 Internist 5 Internist 6 ED physician 1 ED physician 2 ED physician 3 ED physician 4 ED physician 5 ED physician 6 Sensitivity (%)† Specificity (%) Accuracy (%) 83.1 (77.9–88.3) 76.6 (70.8–82.5) 66.2 (59.6–72.7)*** 52.2 (45.3–59.1)*** 90.5 (86.5–94.6)* 82.6 (77.3–87.8) 82.1 (76.8–87.4) 85.6 (80.7–90.4) 88.6 (84.2–93.0) 68.2 (61.7–74.6)*** 89.1 (84.7–93.4) 85.6 (80.7–90.4) 97.0 (94.7–99.4)*** 66.7 (60.1–73.2)** 87.1 (82.4–91.7) 84.6 (79.6–89.6) 93.5 (90.1–96.9)*** 82.1 (76.8–87.4) 90.0 (85.9–94.2)* 94.5 (91.4–97.7)** 70.7 (64.4–76.9)* 19.4 (13.9–24.9)*** 72.6 (66.5–78.8)** 90.5 (86.5–94.6)* 72.6 (66.5–78.8)** 71.6 (65.4–77.9)* 51.2 (44.3–58.2)*** 88.6 (84.2–93.0) 87.1 (82.4–91.8) 41.8 (35.0–48.6)*** 90.5 (86.5–94.6)* 94.3 (92.5–96.2) 82.5 (79.4–85.5)***‡ 87.5 (84.8–90.1)*** 94.0 (92.1–95.9) 80.8 (77.6–84.0)*** 81.6 (78.5–84.7)*** 90.0 (87.6–92.4)*** 85.6 (82.8–88.5)*** 86.1 (83.4–88.9)*** 90.3 (87.9–92.7)** 81.8 (78.7–84.9)*** 81.3 (78.2–84.4)*** 81.1 (78.0–84.3)*** 80.0 (76.8–83.2)*** 76.5 (73.1–79.9)*** 70.5 (66.8–74.1)*** 72.8 (69.2–76.4)*** 79.0 (75.7–82.2)*** 48.7 (44.7–52.8)*** 68.1 (64.4–71.8)*** 87.0 (84.3–89.7)*** 95.0 (93.2–96.7) 87.1 (84.5–89.8)*** 68.9 (65.2–72.7)*** 86.1 (83.4–88.9)*** 59.1 (55.2–63.0)*** 51.9 (47.9–56.0)*** 75.8 (72.4–79.2)*** 79.5 (76.2–82.7) *** 40.2 (36.3–44.2)*** 79.5 (76.2–82.7)*** 91.5 (89.4–93.3) 81.0 (78.1–83.7)*** 82.1 (79.3–84.7)*** 83.5 (80.7–86.0)*** 83.3 (80.5–85.8)*** 81.9 (79.0–84.5)*** 88.0 (85.5–90.2)** 85.6 (83.0–88.0)*** 86.8 (84.2–89.0)** 84.8 (82.1–87.2)*** 83.6 (80.9–86.1)*** 82.4 (79.6–85.0)*** 85.1 (82.5–87.5)*** 76.6 (73.5–79.5)*** 79.1 (76.1–91.9)*** 74.0 (70.8–77.0)*** 78.0 (75.0–80.8)*** 79.8 (76.8–82.5)*** 59.1 (55.6–62.6)*** 74.8 (71.6–77.7)*** 82.9 (80.1–85.4)*** 76.0 (72.9–78.9)*** 83.5 (80.7–86.0)** 74.4 (80.7–86.0)*** 82.8 (79.9–85.3)*** 62.3 (58.8–65.6)*** 51.8 (48.2–55.3)*** 79.0 (76.0–81.8)*** 81.4 (78.5–84.0)*** 40.6 (37.2–44.1)*** 82.3 (79.4–84.8)*** † Data are represented as mean (95% CI). 95% CI were calculated using the asymptotic method. Statistical significance DLS vs clinicians *P , 0.05; **P , 0.01; ***P , 0.001. ED, emergency department; DLS, deep learning system. ‡ fundus photographs, which is not representative of real-life clinical settings and could have affected their performance (2). Second, for the purpose of comparing the DLS with humans’ performance in classifying optic disc findings, we selected a retrospective convenience sample composed of comparable distribution of each category of optic disc abnormalities in the different datasets (including approximately 50% normal discs, 25% discs with papilledema, and 25% other optic disc abnormalities). This is not the typical condition prevalence observed in clinical practice. However, the fundus photographs for each category were randomly collected, and the clinicians were not aware of the diagnosis distribution before being tested, likely reducing the risk of bias in their interpretation of the images. Furthermore, the severity of papilledema, which was not assessed by the expert neuro-ophthalmologists in this study, may have affected the performance of the DLS and the clinicians. Vasseneix et al: J Neuro-Ophthalmol 2023; 43: 159-167 To mitigate such potential biases, future prospective studies should include consecutive patients, selected from real-life conditions, with additional clinical information along with ocular imaging. Third, all the clinicians who participated in this study were practicing in the same hospital group, the Singapore General Hospital. Although most of them were trained abroad, this could represent a bias to the clinicians’ performance. In conclusion, an AI-based DLS outperformed 30 clinicians for the classification of optic disc appearance on ocular fundus photographs, especially for the detection of papilledema, a potentially life- and sight-threatening condition not infrequently encountered in nonophthalmic settings, particularly in neurology clinics. The BONSAI-DLS could be a valuable diagnostic aid in medical and referral decisionmaking when patients present to neurology, internal medicine, or the ED with visual loss, headaches, neurologic 165 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution symptoms, or severe hypertension. Such a diagnostic aid could be particularly valuable in low-income countries or regions that lack immediate accessibility to ophthalmologists. It also has the potential to influence management and improve value-based driven care and reduce unnecessary imaging and treatment in developed health care systems. To confirm the use of the BONSAI-DLS in real-life settings, further prospective studies are needed using consecutive data collection, ideally from nonmydriatic digital cameras. In addition, the performance of the DLS should be compared with standards of care, especially in settings where neuroophthalmology care is lacking. STATEMENT OF AUTHORSHIP Conception and design: C. Vasseneix, L. Milea, D. Ting, N. J. Newman, V. Biousse, D. Milea, R. P. Najjar; Acquisition of data: C. Vasseneix, X. Xu, J.-M. Hwang, S. Hamann, J. J Chen, J. L. Loo, L. Milea, K. Tan, Y. Liu, N. J. Newman, V. Biousse, D. Milea, R. P. Najjar; Analysis and interpretation of data: C. Vasseneix, S. Nusinovici, X. Xu, Y. Liu, N. J. Newman, V. Biousse, D. Milea, R. P. Najjar. Drafting the manuscript: C. Vasseneix, D. Milea, R. P. Najjar; Revising the manuscript for intellectual content: C. Vasseneix, S. Nusinovici, X. Xu, Y. Liu, N. J. Newman, V. Biousse, T. Y. Wong, D. Milea, R. P. Najjar; Final approval of the completed manuscript: C. Vasseneix, S. Nusinovici, X. Xu, J.-M. Hwang, S. Hamann, J. J. Chen, J. L. Loo, L. Milea, K. Tan, D. Ting, Y. Liu, N. J. Newman, V. Biousse, T. Y. Wong, D. Milea, R. P. Najjar. ACKNOWLEDGMENTS The authors thank the departments of Neurology, Internal Medicine, and Emergency Medicine of Singhealth, and all the clinicians who agreed to participate in this study. The authors also thank all the collaborators from the BONSAI study group. BONSAI Study Group Collaborators and Affiliations (alphabetical order by city): Philippe Gohier, MD, Department of Ophthalmology, University Hospital Angers, Angers, France. Neil Miller, MD, Departments of Ophthalmology, Neurology and Neurosurgery, Johns Hopkins University School of Medicine, Baltimore, Maryland. Kavin Vanikieti, MD, Department of Ophthalmology, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand. Chiara La Morgia, MD, PhD, IRCCS Istituto delle Scienze Neurologiche di Bologna, UOC Clinica Neurologica, Bologna, Italy, Dipartimento di Scienze Biomediche e Neuromotorie, Università degli Studi di Bologna, Bologna, Italy. Marie-Bénédicte Rougier, MD, PhD, Service d’Ophtalmologie. Unité Rétine - Uvéites Neuro-Ophtalmologie, Hôpital Pellegrin, CHU de Bordeaux, Bordeaux, France. Selvakumar Ambika, DO, DNB, Department of Neuro-ophthalmology, Sankara Nethralaya-A Unit of Medical Research, Foundation, Chennai, India. Pedro Fonseca, MD, Department of Ophthalmology, Centro Hospitalar e Universitário de Coimbra 166 (CHUC), Coimbra, Portugal, Coimbra Institute for Biomedical Imaging and Translational Research (CIBIT), Faculty of Medicine University of Coimbra (FMUC), Coimbra, Portugal. Wolf Alexander Lagrèze, MD, Eye Center, Medical Center, Medical Faculty, University Freiburg Germany, Freiburg, Germany. Nicolae Sanda, MD, PhD, Clinical Neuroscience Department, Geneva University Hospital, Geneva, Switzerland. Christophe Chiquet, MD, PhD, Department of Ophthalmology, University Hospital of Grenoble-Alpes, Grenoble-Alpes University, HP2 Laboratory, INSERM U1042, Grenoble, France. Hui Yang MD, PhD, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, P.R. China. Carmen K. M. Chan, MRCP, FRCSEd(Ophth). Carol Y. Cheung, PhD, Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China, Hong Kong Eye Hospital, Hong Kong Special Administrative Region, China. Tran Thi Ha Chau, MD, Department of Ophthalmology, Lille Catholic Hospital, Lille Catholic University and Inserm U1171, Lille, France. Neringa Jurkute, MD, FEBO. Patrick Yu-Wai-Man, MB, BS, FRCPath, FRCOphth, PhD. Moorfields Eye Hospital NHS Foundation Trust, London, United Kingdom, UCL Institute of Ophthalmology, University College London, London, United Kingdom. Richard Kho, MD, American Eye Center, Mandaluyong City, Manila, Philippines. Jost B Jonas, MD, Department of Ophthalmology, Medical Faculty Mannheim of the Ruprecht-Karls-University of Heidelberg, Mannheim, Germany. Catherine Vignal-Clermont, MD, Fondation Adolphe de Rothschild, Paris, France. Dong Hyun Kim, MD. Hee Kyung Yang, MD, Ophthalmology Department, Seoul National University College of Medicine, Seoul National University Bundang Hospital, Korea (the Republic of). Tin Aung, MD, PhD. Shweta Singhal, MBBS, PhD. Sharon Tow, MBBS, FRCSEd. Monisha Esther Nongpiur, MD. Shamira Perera, MD. Arun Narayanaswamy, MD. Umapathi N. Thirugnanam, MD, Singapore National Eye Centre, Singapore Eye Research Institute, Singapore, Duke-NUS Medical School, Singapore, Yong Loo Lin School of Medicine, National University of Singapore, Singapore General Hospital, Singapore. Clare L. Fraser, MBBS, MMed, FRANZCO, Save Sight Institute, Faculty of Health and Medicine, The University of Sydney, NSW, Australia. Luis J. Mejico, MD, Department of Neurology, SUNY Upstate Medical University, Syracuse, New York. Masoud Aghsaei Fard, MD, Farabi Eye Hospital, Tehran University of Medical Science, Tehran, Iran. REFERENCES 1. Ting DSW, Pasquale LR, Peng L, Campbell JP, Lee AY, Raman R, Tan GSW, Schmetterer L, Keane PA, Wong TY. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol. 2019;103:167–175. 2. Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, Mahendiran T, Moraes G, Shamdas M, Kern C, Ledsam JR, Vasseneix et al: J Neuro-Ophthalmol 2023; 43: 159-167 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution Schmid MK, Balaskas K, Topol EJ, Bachmann LM, Keane PA, Denniston AK. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digital Health. 2019;1:e271–e297. 3. Nagendran M, Chen Y, Lovejoy CA, Gordon AC, Komorowski M, Harvey H, Topol EJ, Ioannidis JPA, Collins GS, Maruthappu M. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ. 2020;368:m689. 4. Bruce BB, Lamirel C, Wright DW, Ward A, Heilpern KL, Biousse V, Newman NJ. Nonmydriatic ocular fundus photography in the emergency department. N Engl J Med. 2011;364:387–389. 5. Biousse V, Bruce BB, Newman NJ. Ophthalmoscopy in the 21st century: the 2017 H. Houston Merritt lecture. Neurology. 2018;90:167–175. 6. Biousse V, Newman NJ. Diagnosis and clinical features of common optic neuropathies. Lancet Neurol. 2016;15:1355–1367. 7. Bruce BB, Bidot S, Hage R, Clough LC, Fajoles-Vasseneix C, Melomed M, Keadey MT, Wright DW, Newman NJ, Biousse V. Fundus photography vs. ophthalmoscopy outcomes in the emergency department (FOTO-ED) phase III: web-based, inservice training of emergency providers. Neuroophthalmol. 2018;42:269–274. 8. Rathi S, Tsui E, Mehta N, Zahid S, Schuman JS. The current state of teleophthalmology in the United States. Ophthalmology. 2017;124:1729–1734. 9. Milea D, Najjar RP, Zhubo J, Ting D, Vasseneix C, Xu X, Aghsaei Fard M, Fonseca P, Vanikieti K, Lagreze WA, La Morgia C, Cheung CY, Hamann S, Chiquet C, Sanda N, Yang H, Mejico LJ, Rougier MB, Kho R, Tran TH, Singhal S, Gohier P, ClermontVignal C, Cheng CY, Jonas JB, Yu-Wai-Man P, Fraser CL, Chen JJ, Ambika S, Miller NR, Liu Y, Newman NJ, Wong TY, Biousse V. Artificial intelligence to detect papilledema from ocular fundus photographs. N Engl J Med. 2020;382:1687–1695. 10. Biousse V, Newman NJ, Najjar RP, Vasseneix C, Xu X, Ting DS, Milea LB, Hwang J, Kim DH, Yang HK, Hamann S, Chen JJ, Liu Y, Wong TY, Milea D, Ronde‐Courbis B, Gohier P, Biousse V, Newman NJ, Vasseneix C, Miller N, Padungkiatsagul T, Poonyathalang A, Suwan Y, Vanikieti K, Milea LB, Amore G, Barboni P, Carbonelli M, Carelli V, La Morgia C, Romagnoli M, Rougier M, Ambika S, Komma S, Fonseca P, Raimundo M, Hamann S, Karlesand I, Alexander Lagreze W, Sanda N, Thumann G, Aptel F, Chiquet C, Liu K, Yang H, Chan CK, Chan NC, Cheung CY, Chau Tran TH, Acheson J, Habib MS, Jurkute N, Yu‐Wai‐Man P, Kho R, Jonas JB, Chen JJ, Sabbagh N, Vignal‐Clermont C, Hage R, Khanna RK, Hwang J, Kim DH, Yang HK, Aung T, Cheng C, Lamoureux E, Loo JL, Milea D, Najjar RP, Singhal S, Ting D, Tow S, Vasseneix C, Wong TY, Liu Y, Xu X, Jiang Z, Fraser CL, Mejico LJ, Fard MA; for the BONSAI Vasseneix et al: J Neuro-Ophthalmol 2023; 43: 159-167 Brain and Optic Nerve Study with Artificial Intelligence Study Group. Optic disc classification by deep learning versus expert neuro‐ophthalmologists. Ann Neurol. 2020;88:785–795. 11. Milea D, Singhal S, Najjar RP. Artificial intelligence for detection of optic disc abnormalities. Curr Opin Neurol. 2020;33:106–110. 12. Milea L, Najjar RP. Classif-Eye: A Semi-automated Image Classification Application, 2020. GitHub repository. Available at: https://github.com/milealeonard/Classif-Eye/. Accessed April 13, 2020. 13. McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika. 1947;12:153–157. 14. Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digital Med. 2018;1:39. 15. Rim TH, Lee G, Kim Y, Tham YC, Lee CJ, Baik SJ, Kim YA, Yu M, Deshmukh M, Lee BK, Park S, Kim HC, Sabayanagam C, Ting DSW, Wang YX, Jonas JB, Kim SS, Wong TY, Cheng CY. Prediction of systemic biomarkers from retinal photographs: development and validation of deep-learning algorithms. Lancet Digital Health. 2020;2:e526–e536. 16. Sachdeva V, Vasseneix C, Hage R, Bidot S, Clough LC, Wright DW, Newman NJ, Biousse V, Bruce BB. Optic nerve head edema among patients presenting to the emergency department. Neurology. 2018;90:e373–e379. 17. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56. 18. Jammal AA, Thompson AC, Mariottoni EB, Berchuck SI, Urata CN, Estrela T, Wakil SM, Costa VP, Medeiros FA. Human versus machine: comparing a deep learning algorithm to human gradings for detecting glaucoma on fundus photographs. Am J Ophthalmol. 2020;211:123–131. 19. De Fauw J, Ledsam JR, Romera-Paredes B, Nikolov S, Tomasev N, Blackwell S, Askham H, Glorot X, O’Donoghue B, Visentin D, van den Driessche G, Lakshminarayanan B, Meyer C, Mackinder F, Bouton S, Ayoub K, Chopra R, King D, Karthikesalingam A, Hughes CO, Raine R, Hughes J, Sim DA, Egan C, Tufail A, Montgomery H, Hassabis D, Rees G, Back T, Khaw PT, Suleyman M, Cornebise J, Keane PA, Ronneberger O. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018;24:1342–1350. 20. Brown JM, Campbell JP, Beers A, Chang K, Ostmo S, Chan RVP, Dy J, Erdogmus D, Ioannidis S, Kalpathy-Cramer J, Chiang MF; for the Imaging and Informatics in Retinopathy of Prematurity i-ROP Research Consortium. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA Ophthalmol. 2018;136:803–810. 167 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. |
References |
1. Ting DSW, Pasquale LR, Peng L, Campbell JP, Lee AY, Raman R, Tan GSW, Schmetterer L, Keane PA, Wong TY. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol. 2019;103:167-175. 2. Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, Mahendiran T, Moraes G, Shamdas M, Kern C, Ledsam JR, Schmid MK, Balaskas K, Topol EJ, Bachmann LM, Keane PA, Denniston AK. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digital Health. 2019;1:e271-e297. 3. Nagendran M, Chen Y, Lovejoy CA, Gordon AC, Komorowski M, Harvey H, Topol EJ, Ioannidis JPA, Collins GS, Maruthappu M. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ. 2020;368:m689. 4. Bruce BB, Lamirel C, Wright DW, Ward A, Heilpern KL, Biousse V, Newman NJ. Nonmydriatic ocular fundus photography in the emergency department. N Engl J Med. 2011;364:387-389. 5. Biousse V, Bruce BB, Newman NJ. Ophthalmoscopy in the 21st century: the 2017 H. Houston Merritt lecture. Neurology. 2018;90:167-175. 6. Biousse V, Newman NJ. Diagnosis and clinical features of common optic neuropathies. Lancet Neurol. 2016;15:1355-1367. 7. Bruce BB, Bidot S, Hage R, Clough LC, Fajoles-Vasseneix C, Melomed M, Keadey MT, Wright DW, Newman NJ, Biousse V. Fundus photography vs. ophthalmoscopy outcomes in the emergency department (FOTO-ED) phase III: web-based, in-service training of emergency providers. Neuroophthalmol. 2018;42:269-274. 8. Rathi S, Tsui E, Mehta N, Zahid S, Schuman JS. The current state of teleophthalmology in the United States. Ophthalmology. 2017;124:1729-1734. 9. Milea D, Najjar RP, Zhubo J, Ting D, Vasseneix C, Xu X, Aghsaei Fard M, Fonseca P, Vanikieti K, Lagreze WA, La Morgia C, Cheung CY, Hamann S, Chiquet C, Sanda N, Yang H, Mejico LJ, Rougier MB, Kho R, Tran TH, Singhal S, Gohier P, Clermont-Vignal C, Cheng CY, Jonas JB, Yu-Wai-Man P, Fraser CL, Chen JJ, Ambika S, Miller NR, Liu Y, Newman NJ, Wong TY, Biousse V. Artificial intelligence to detect papilledema from ocular fundus photographs. N Engl J Med. 2020;382:1687-1695. 10. Biousse V, Newman NJ, Najjar RP, Vasseneix C, Xu X, Ting DS, Milea LB, Hwang J, Kim DH, Yang HK, Hamann S, Chen JJ, Liu Y, Wong TY, Milea D, Ronde‐Courbis B, Gohier P, Biousse V, Newman NJ, Vasseneix C, Miller N, Padungkiatsagul T, Poonyathalang A, Suwan Y, Vanikieti K, Milea LB, Amore G, Barboni P, Carbonelli M, Carelli V, La Morgia C, Romagnoli M, Rougier M, Ambika S, Komma S, Fonseca P, Raimundo M, Hamann S, Karlesand I, Alexander Lagreze W, Sanda N, Thumann G, Aptel F, Chiquet C, Liu K, Yang H, Chan CK, Chan NC, Cheung CY, Chau Tran TH, Acheson J, Habib MS, Jurkute N, Yu‐Wai‐Man P, Kho R, Jonas JB, Chen JJ, Sabbagh N, Vignal‐Clermont C, Hage R, Khanna RK, Hwang J, Kim DH, Yang HK, Aung T, Cheng C, Lamoureux E, Loo JL, Milea D, Najjar RP, Singhal S, Ting D, Tow S, Vasseneix C, Wong TY, Liu Y, Xu X, Jiang Z, Fraser CL, Mejico LJ, Fard MA; for the BONSAI Brain and Optic Nerve Study with Artificial Intelligence Study Group. Optic disc classification by deep learning versus expert neuro‐ophthalmologists. Ann Neurol. 2020;88:785-795. 11. Milea D, Singhal S, Najjar RP. Artificial intelligence for detection of optic disc abnormalities. Curr Opin Neurol. 2020;33:106-110. 12. Milea L, Najjar RP. Classif-Eye: A Semi-automated Image Classification Application, 2020. GitHub repository. Available at: https://github.com/milealeonard/Classif-Eye/. Accessed April 13, 2020. 13. McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika. 1947;12:153-157. 14. Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digital Med. 2018;1:39. 15. Rim TH, Lee G, Kim Y, Tham YC, Lee CJ, Baik SJ, Kim YA, Yu M, Deshmukh M, Lee BK, Park S, Kim HC, Sabayanagam C, Ting DSW, Wang YX, Jonas JB, Kim SS, Wong TY, Cheng CY. Prediction of systemic biomarkers from retinal photographs: development and validation of deep-learning algorithms. Lancet Digital Health. 2020;2:e526-e536. 16. Sachdeva V, Vasseneix C, Hage R, Bidot S, Clough LC, Wright DW, Newman NJ, Biousse V, Bruce BB. Optic nerve head edema among patients presenting to the emergency department. Neurology. 2018;90:e373-e379. 17. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44-56. 18. Jammal AA, Thompson AC, Mariottoni EB, Berchuck SI, Urata CN, Estrela T, Wakil SM, Costa VP, Medeiros FA. Human versus machine: comparing a deep learning algorithm to human gradings for detecting glaucoma on fundus photographs. Am J Ophthalmol. 2020;211:123-131. 19. De Fauw J, Ledsam JR, Romera-Paredes B, Nikolov S, Tomasev N, Blackwell S, Askham H, Glorot X, O'Donoghue B, Visentin D, van den Driessche G, Lakshminarayanan B, Meyer C, Mackinder F, Bouton S, Ayoub K, Chopra R, King D, Karthikesalingam A, Hughes CO, Raine R, Hughes J, Sim DA, Egan C, Tufail A, Montgomery H, Hassabis D, Rees G, Back T, Khaw PT, Suleyman M, Cornebise J, Keane PA, Ronneberger O. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018;24:1342-1350. 20. Brown JM, Campbell JP, Beers A, Chang K, Ostmo S, Chan RVP, Dy J, Erdogmus D, Ioannidis S, Kalpathy-Cramer J, Chiang MF; for the Imaging and Informatics in Retinopathy of Prematurity i-ROP Research Consortium. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA Ophthalmol. 2018;136:803-810. |