Title | Deep Learning and Transfer Learning for Optic Disc Laterality Detection: Implications for Machine Learning in Neuro-Ophthalmology |
Creator | T. Y. Alvin Liu; Daniel S. W. Ting; Paul H. Yi; Jinchi Wei; Hongxi Zhu; Prem S. Subramanian; Taibo Li; Ferdinand K. Hui; Gregory D. Hager; Neil R. Miller |
Affiliation | Department of Ophthalmology (TYAL, NRM), Wilmer Eye Institute, Johns Hopkins University, Baltimore, Maryland; Department of Ophthalmology (DSWT), Singapore Eye Research Institute, Singapore National Eye Center, Duke-NUS Medical School, National University of Singapore, Singapore; Department of Radiology (PHY, FKH), Johns Hopkins University, Baltimore, Maryland; Department of Biomedical Engineering (JW), Johns Hopkins University, Baltimore, Maryland; Computational Interaction and Robotics Lab (HZ, GDH), Johns Hopkins University, Baltimore, Maryland; Department of Ophthalmology (PSS), University of Colorado School of Medicine, Aurora, Colorado; School of Medicine (TL), Johns Hopkins University, Baltimore, Maryland; and Malone Center for Engineering in Healthcare (GDH), Johns Hopkins University, Baltimore, Maryland |
Abstract | Background: Deep learning (DL) has demonstrated human expert levels of performance for medical image classification in a wide array of medical fields, including ophthalmology. In this article, we present the results of our DL system designed to determine optic disc laterality, right eye vs left eye, in the presence of both normal and abnormal optic discs. Methods: Using transfer learning, we modified the ResNet-152 deep convolutional neural network (DCNN), pretrained on ImageNet, to determine the optic disc laterality. After a 5-fold cross-validation, we generated receiver operating characteristic curves and corresponding area under the curve (AUC) values to evaluate performance. The data set consisted of 576 color fundus photographs (51% right and 49% left). Both 30° photographs centered on the optic disc (63%) and photographs with varying degree of optic disc centration and/or wider field of view (37%) were included. Both normal (27%) and abnormal (73%) optic discs were included. Various neuro-ophthalmological diseases were represented, such as, but not limited to, atrophy, anterior ischemic optic neuropathy, hypoplasia, and papilledema. Results: Using 5-fold cross-validation (70% training; 10% validation; 20% testing), our DCNN for classifying right vs left optic disc achieved an average AUC of 0.999 (±0.002) with optimal threshold values, yielding an average accuracy of 98.78% (±1.52%), sensitivity of 98.60% (±1.72%), and specificity of 98.97% (±1.38%). When tested against a separate data set for external validation, our 5-fold cross-validation model achieved the following average performance: AUC 0.996 (±0.005), accuracy 97.2% (±2.0%), sensitivity 96.4% (±4.3%), and specificity 98.0% (±2.2%). Conclusions: Small data sets can be used to develop high-performing DL systems for semantic labeling of neuro-ophthalmology images, specifically in distinguishing between right and left optic discs, even in the presence of neuro-ophthalmological pathologies. Although this may seem like an elementary task, this study demonstrates the power of transfer learning and provides an example of a DCNN that can help curate large medical image databases for machine-learning purposes and facilitate ophthalmologist workflow by automatically labeling images according to laterality. |
Subject | Algorithms; Deep Learning; Diagnostic Techniques, Ophthalmological; Humans; Machine Learning; Neurology; Ophthalmology; Optic Disk / diagnostic imaging; Optic Nerve Diseases / diagnosis; ROC Curve |
OCR Text | Show Original Contribution Deep Learning and Transfer Learning for Optic Disc Laterality Detection: Implications for Machine Learning in Neuro-Ophthalmology T. Y. Alvin Liu, MD, Daniel S. W. Ting, MD, PhD, Paul H. Yi, MD, Jinchi Wei, BSE, Hongxi Zhu, BS, MS, Prem S. Subramanian, MD, PhD, Taibo Li, BS, ME, Ferdinand K. Hui, MD, Gregory D. Hager, PhD, Neil R. Miller, MD Background: Deep learning (DL) has demonstrated human expert levels of performance for medical image classification in a wide array of medical fields, including ophthalmology. In this article, we present the results of our DL system designed to determine optic disc laterality, right eye vs left eye, in the presence of both normal and abnormal optic discs. Methods: Using transfer learning, we modified the ResNet152 deep convolutional neural network (DCNN), pretrained on ImageNet, to determine the optic disc laterality. After a 5fold cross-validation, we generated receiver operating characteristic curves and corresponding area under the curve (AUC) values to evaluate performance. The data set consisted of 576 color fundus photographs (51% right and 49% left). Both 30° photographs centered on the optic disc (63%) and photographs with varying degree of optic disc centration and/or wider field of view (37%) were included. Both normal (27%) and abnormal (73%) optic discs were included. Various neuro-ophthalmological diseases were represented, such as, but not limited to, atrophy, anterior ischemic optic neuropathy, hypoplasia, and papilledema. Results: Using 5-fold cross-validation (70% training; 10% validation; 20% testing), our DCNN for classifying right vs left optic disc achieved an average AUC of 0.999 (±0.002) Department of Ophthalmology (TYAL, NRM), Wilmer Eye Institute, Johns Hopkins University, Baltimore, Maryland; Department of Ophthalmology (DSWT), Singapore Eye Research Institute, Singapore National Eye Center, Duke-NUS Medical School, National University of Singapore, Singapore; Department of Radiology (PHY, FKH), Johns Hopkins University, Baltimore, Maryland; Department of Biomedical Engineering (JW), Johns Hopkins University, Baltimore, Maryland; Computational Interaction and Robotics Lab (HZ, GDH), Johns Hopkins University, Baltimore, Maryland; Department of Ophthalmology (PSS), University of Colorado School of Medicine, Aurora, Colorado; School of Medicine (TL), Johns Hopkins University, Baltimore, Maryland; and Malone Center for Engineering in Healthcare (GDH), Johns Hopkins University, Baltimore, Maryland. The authors report no conflicts of interest. Address correspondence to T. Y. Alvin Liu, MD, Department of Ophthalmology, Wilmer Eye Institute, The Johns Hopkins Hospital, 600 N. Wolfe Street, Maumenee 726, Baltimore, MD 21287; E-mail: tliu25@jhmi.edu 178 with optimal threshold values, yielding an average accuracy of 98.78% (±1.52%), sensitivity of 98.60% (±1.72%), and specificity of 98.97% (±1.38%). When tested against a separate data set for external validation, our 5-fold crossvalidation model achieved the following average performance: AUC 0.996 (±0.005), accuracy 97.2% (±2.0%), sensitivity 96.4% (±4.3%), and specificity 98.0% (±2.2%). Conclusions: Small data sets can be used to develop highperforming DL systems for semantic labeling of neuroophthalmology images, specifically in distinguishing between right and left optic discs, even in the presence of neuro-ophthalmological pathologies. Although this may seem like an elementary task, this study demonstrates the power of transfer learning and provides an example of a DCNN that can help curate large medical image databases for machine-learning purposes and facilitate ophthalmologist workflow by automatically labeling images according to laterality. Journal of Neuro-Ophthalmology 2020;40:178-184 doi: 10.1097/WNO.0000000000000827 © 2019 by North American Neuro-Ophthalmology Society A rtificial intelligence (AI) in the form deep learning (DL) has generated immense interest in the medical field in recent years. Briefly, DL methods are representation learning methods that use multilayered neural networks, the performance of which can be enhanced using backpropagation algorithms to change reiteratively the internal parameters (1). With well-annotated, large data sets, DL can be used to classify medical images accurately, and it has been applied in a wide variety of medical disciplines, including pathology (2), dematology (3), radiology (4-6), and ophthalmology. Within ophthalmology, deep learning systems (DLSs) have been developed to detect various conditions, such as diabetic retinopathy (7-11), age-related macular degeneration (7,12-15), glaucoma (7,16-18), retinopathy of prematurity (19), and cardiovascular diseases (20,21), and often times, Liu et al: J Neuro-Ophthalmol 2020; 40: 178-184 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution the performance of these DLSs has been found to be on par with that of human clinicians. DL applications in ophthalmology have been particularly successful due to the availability of very large, annotated data sets, such as the color fundus photographs from the Age-Related Eye Disease Study (AREDS) (22). However, manually annotating these large data sets is labor intensive and time consuming. Therefore, one potential application of DL toward ophthalmology is in the automatic semantic labelling of fundus images; for example, the designation of laterality, which can streamline machine learning-related work flow and database curation. Recognizing that a laterality algorithm can be useful for efficiently organizing large databases, understanding that the optic disc/peripapillary area is likely important for a computer algorithm to determine laterality as shown by Jang et al (23) and hypothesizing that the presence of optic disc pathologies will likely diminish the performance of such an algorithm, we set out to develop a DL system using transfer learning that can reliably discern laterality even in the presence of various optic disc pathologies, for example, in the setting of a neuro-ophthalmology data set. METHODS Data Set The majority (62.7%) of our primary data set contained deidentified color fundus photographs obtained from the neuro-ophthalmological practice of one of the authors (N.R.M.). These photographs were deidentified over several decades. Black, white, and Asian patients were included, but more specific demographic information (other than diagnosis) was not available. They were either digital photographs or analog photographs that were digitized at various resolutions; all were taken with a 30° camera centered on the optic disc. Three other publicly available data sets were included in our primary data set. The DRIONS database (24) contained 110 color fundus photographs taken with a color analogical fundus camera (roughly 45° field-of-view) and digitized at a resolution of 600 · 400 and 8 bits/pixel. This cohort contained 46.2% male and 100% white, with a mean age of 53.0 years. The High‐Resolution Fundus database (25) contained 15 images of healthy patients, 15 images of patients with diabetic retinopathy, and 15 images of patients with glaucoma. The images were likely taken with a camera with a 45° field-of-view. No other demographic or technical information was available for this data set. Sixty images from the American Society of Retina Specialists database (26) were included. These images were taken with different cameras with varying degrees of optic disc centration. No other demographic or technical information was available for these 60 images. A total of 576 images were included: 291 right eyes and 285 left eyes. The images contained both normal (157 images; 27.3% of total) and abnormal optic discs (419 images; 72.7%). In addition, Liu et al: J Neuro-Ophthalmol 2020; 40: 178-184 the data set contained both 30° photographs centered on the optic disc (362 images; 62.8%) and photographs with varying degree of optic disc centration and/or wider field of view (214 images; 37.2%). All images were deidentified and annotated by a neuro-ophthalmologist (N.R.M.) and a retinal specialist (T.Y.A.L.). A separate data set, obtained from the neuro-ophthalmological practice of one of the authors (P.S.S.), was used for external validation. This data set contained 100 images (50 right eyes and 50 left eyes; 67 abnormal and 33 normal). No protected health information was obtained or recorded. This research study was reviewed by our institutional review board (IRB) and deemed to be IRB exempt. Computer Hardware and Software Specifications After the data set was annotated, the deidentified images were transferred to a personal computer running the Windows 10 operating system and containing an Intel Core i5 central processing unit (CPU) (Intel Corporation, Santa Clara, CA), 8 GB RAM and a Nvidia GeForce GTX 1050 graphics processing unit (GPU) (Nvidia Corporation, Santa Clara, CA). The images then were uploaded to a computing cluster, with CPU and GPU nodes consisting of an Intel Broadwell dual socket, 14-core 2.6 GHz CPU with 128 GB RAM, and 2 Nvidia Tesla K80 GPUs (Nvidia Corporation), respectively. All computations were performed using 6 CPUs and 3 GPUs. All available images were obtained in the Joint Photographic Experts Group, Portable Network Graphics or bitmap format, and resized to 256 · 256 pixels. All DL software was programmed using the PyTorch (Version 0.4.1) DL framework (https://pytorch.org). Deep Learning System Development A deep convolutional neural network (DCNN) is a complex computational model that uses multiple algorithmic layers to create high-level interpretations of data (e.g., classifying images), as opposed to performing single specific tasks (e.g., detecting a line or edge on an image) (1). Given that we have a small data set, fine-tuning all parameters in a DCNN architecture becomes infeasible. As a result, this study used an alternative DL approach called transfer learning and adopted a readily available ResNet-152 (27) CNN that was pretrained on ImageNet (http://www.image-net.org), a database of 1.2 million color images of everyday objects sorted into 1000 categories. We started with the ResNet152 and redefined the last linear layer to have 2 outputs instead of the default 1000. In the training/validation phase, we fine-tuned all the model parameters (with pretrained weights as initialization) using our data set. The last layer was a single, linear layer with 2 neurons and used softmax regression as the activation function. The second-to-last layer was an average pooling layer containing 2048 neurons, and each batch contained 10 images. Transfer learning is based on the theory that networks trained to recognize 179 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution TABLE 1. Proportion of images from each data source in each subset CV1 CV1 CV1 CV2 CV2 CV2 CV3 CV3 CV3 CV4 CV4 CV4 CV5 CV5 CV5 training validation testing training validation testing training validation testing training validation testing training validation testing ASRS DRIONS HRF NRM 9.7 6.9 14.7 11.7 10.3 6.1 10.9 12.1 7.8 9.2 15.5 12.2 10.2 10.3 11.3 19.9 17.2 17.2 19.6 5.2 24.3 19.1 15.5 20.9 21.1 17.2 13.0 19.1 17.2 20.0 7.2 10.3 8.6 6.7 13.8 8.7 8.9 5.2 5.2 6.7 13.8 8.7 7.4 10.3 7.8 63.2 65.5 59.5 62.0 70.7 60.9 61.0 67.2 66.1 63.0 53.4 66.1 63.3 62.1 60.9 The numbers represent percentages. ASRS, American Society of Retina Specialists; CV, cross-validation; HRF, high‐resolution fundus; NRM, deidentified color fundus photographs from the neuro-ophthalmological practice of author N.R.M. certain high-level features, such as edges and shadows, can be optimized for more precise classification of a newly introduced data set unrelated to the original training data. In brief, the general work flow in developing a DLS involves 6 steps: acquisition of a data set, expert annotation of the data set (the ground truth), division of the data set into training, validation and testing subsets that are mutually exclusive, training the DCNN using the training subset, fine-tuning the DCNN using the validation subset, and, finally, testing the performance of the DCNN using the testing subset. We used the above data sets to train, validate, and test the ResNet-152 DCNN for classification of images into right eyes and left eyes. During training, we used cross entropy to measure the loss of our model and stochastic gradient descent with the following solver parameters to optimize our model: 49 epochs, a learning rate of 0.001, momentum of 0.9, and weight decay of 1 · 1025. We trained primarily to be able to distinguish between right eye and left eye optic disc in color fundus photographs. We compared validation weight accuracy at each epoch to previous best-performing weights, and if previous performance was exceeded, we saved the weight. Image Processing Before training, validation, and testing, every image was resized to 256 · 256 pixels and cropped to 224 · 224 pixels to fit the input dimension of our ResNet-152 model. During each training and validation epoch, each image was randomly cropped, rotated ±5°, flipped vertically with a 50% chance, and the image brightness perturbed to prevent overfitting. No data augmentation was performed for the testing subset. Statistical Analysis Using our primary data set, we performed 5-fold crossvalidation. Each of the 5 cross-validation tests consisted of all FIG. 1. Receiver operating characteristic curve and area under the curve of the deep learning system for detection of right vs left optic disc, compared with professional graders' performance, with ophthalmologists' grading as reference standard. This graph was generated from cross-validation #3, the set of experiment with the median performance among the 5-fold crossvalidation. 180 Liu et al: J Neuro-Ophthalmol 2020; 40: 178-184 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution TABLE 2. Key performance metrics of our deep learning system for detection of optic disc laterality generated from a 5-fold cross-validation AUC Accuracy Sensitivity Specificity CV #1 CV #2 CV #3 CV #4 CV #5 Average SD 1 100% 100% 100% 0.998 97.39% 96.49% 98.28% 1 100% 100% 100% 0.996 96.52% 96.49% 96.55% 1 100% 100% 100% 0.999 98.78% 98.60% 98.97% 0.002 1.52% 1.72% 1.38% AUC, area under the curve; CV, cross-validation. 576 images and was split into 3 subsets: training subset (70%), validation subset (10%), and testing subset (20%). The testing subset in each of the 5 cross-validation tests was different and mutually exclusive of each other. As a result, at the conclusion of the 5-fold cross-validation, each of the 576 images would have been subjected to testing once. We then combined accuracy, receiver operating characteristic curves, and corresponding area under the curve (AUC) values to evaluate performance. Standard diagnostic measures of performance (accuracy, sensitivity, and specificity) also were generated based on optimal thresholds chosen by F1 Score. The proportion of images from each data source used in the primary data set is summarized in Table 1. The 5-fold cross-validation model was then tested against a separate data set for external validation. Heat Map Generation To identify features in the color fundus photographs used by the DCNN to determine optic disc laterality, we created heatmaps through class activation mapping (28), a technique that visually highlights areas of importance in terms of classification decision within an image (the "warmer" the color, e.g., red, the more important is a particular area). We chose this technique for its ability to convey information in a visually vivid manner. The original image was preserved, allowing all the image features to remain present, and the overlaid color spectrum provided a clear linear scale of feature importance. RESULTS Using 5-fold cross-validation, our DCNN for classifying right vs left optic disc achieved an average AUC of 0.999 (±0.002) (Fig. 1) with optimal threshold values, yielding an average accuracy of 98.78% (±1.52%), sensitivity of 98.60% (±1.72%), and specificity of 98.97% (±1.38%). Key results are summarized in Table 2. When we horizontally flipped the images 100% of the time during the training and FIG. 2. Class activation mapping analysis of 30° photographs centered on the right (A) and left (B) optic discs. Similar analysis of wider-field photographs of the right (C) and left (D) optic discs. Liu et al: J Neuro-Ophthalmol 2020; 40: 178-184 181 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution TABLE 3. Characteristics of the 7 images (of 576 images) that failed testing during the 5-fold cross-validation experiment Image # 1 2 3 4 5 6 7 Diagnosis Image Quality Issue Toxic optic neuropathy Congenital anomaly of the disc Congenital anomaly of the disc Normal disc; macula dystrophy Normal disc; age-related macular degeneration Nonglaucomatous cupping Optic disc hypoplasia No No No Yes; part of the disc margin was obscured Yes; part of the disc margin was obscured Yes; part of the disc margin was obscured No validation phases, the AUC of our model dropped significantly to an average of 0.317 (±0.057), indicating that it was, on average, making opposite predictions in the testing phase as expected. The AUC was not 0 for the following reasons. Flipping the images horizontally mainly affected "global" features. "Local" features (such as edges) were embedded in the pretrained ResNet-152 weights and were likely unchanged or minimally changed in the training/validation phase, given such local features are universally similar across different image data sets. These local features partially explained why the resulting classifier did not attain an AUC of 0. In addition, the optimization procedure tends to guide the learning process (i.e., the objective function) and maximize its performance. That is, the DCNN stops adjusting its weights once there is no improvement in the AUC, and the AUC gets "stuck" at that value-it is not designed to go the opposite way to minimize the AUC. During class activation mapping analyses, activation was shown at the optic disc, at the peripapillary area, and at the central macula (Fig. 2A-D). In our 5-fold cross-validation experiment, every single image in the entire primary database was tested once. Of the 576 images, only 7 were incorrectly labelled during testing, yielding an overall error rate of 1%. Of the 7 images, 2 contained a normal optic disc. The characteristics of these 7 images are summarized in Table 3. Sample images that failed testing are shown in Figure 3. For external validation, we tested our 5-fold crossvalidation model against a separate data set with 100 images. In this experiment, our 5-fold cross-validation model achieved the following average performance: AUC 0.996 (±0.005), accuracy 97.2% (±2.0%), sensitivity 96.4% (±4.3%), and specificity 98.0% (±2.2%). DISCUSSION In this study, we aimed to develop a DLS that can reliably discern eye laterality, for the purpose of laying the ground work for further neuro-ophthalmology machine learning endeavors. Our DLS is able to reliably detect eye laterality, with an average AUC of 0.999, and it achieved similarly robust performance during external validation, with an average AUC of 0.996. Here are our observations. First, semantic labelling is possible in DL, even with very few image samples using transfer learning. The development of a DLS for medical images traditionally has involved a large number of data points, typically involving tens of thousands of clinical images. Using transfer learning, (29,30) our DLS is able to achieve an average AUC of 0.999, although our data set only included 576 images. Laterality detection in the context of ophthalmology using DL has been published before, (10,23) but these studies typically used much larger data sets for training. Using the study by Jang et al (23) as an example, although both that model and our model achieved similar performances, with a mean accuracy of 98.9% and 98.78%, respectively, the 2 data sets differed vastly in size- their data set contained 25,911 images, and our data set contained 576 images. The performance of our algorithm FIG. 3. Sample images that failed testing during the 5-fold cross-validation experiment. Congenital anomaly of the disc (A, B). An example of image quality failure, where the margin of a normal optic disc is partially obscured (C). 182 Liu et al: J Neuro-Ophthalmol 2020; 40: 178-184 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution suggests that DL, using transfer learning and within a certain clinical context in which the binary classification is relatively "obvious" to the DCNN, can produce clinically deployable results, even if the training data set is small. Our result is also in agreement with the observations made in the radiology DL literature that the easier the task, the smaller the data set required to achieve a high AUC (6). Second, our class activation mapping analysis shows activation at the disc and in the peripapillary area (Fig. 2A, B), similar to the findings by Jang et al (23) that these areas are important features for the DCNN to determine laterality. We did additional class activation mapping analysis for photographs with wider field-of-view (Fig. 2C, D), which showed additional activation in the central macula, suggesting that in photographs in which the central macula is visible, the perimacular retinal vessels and foveal reflex are also important features for the DCNN. Third, we analyzed the 7 images in our primary data set that failed testing in our 5-fold cross-validation experiment. Of these 7 images, 3 had image quality issues; namely, the disc margin was partially obscured. Of the 7 images, 3 had grossly disorganized optic disc head structure, due to congenital anomaly of the disc and optic disc hypoplasia. Although it is difficult to draw definitive conclusions given our low error rate, it appeared that suboptimal image quality and gross disorganization of optic nerve head structure could diminish the performance of our DLS. By contrast, the presence of a blurred disc margin did not seem to have significant effects on the performance of our DLS, given the accurate results for images of neuro-ophthalmological conditions that present with a blurred disc margin, such as papilledema, optic disc drusen, and anterior ischemic optic neuropathy. Fourth, in neuro-ophthalmology, the laterality of the involved eye and whether the pathologic process is unilateral vs bilateral are extremely important clinical variables that influence the differential diagnoses and dictate subsequent imaging and/or systemic evaluation. Therefore, the development of any AI algorithm for neuro-ophthalmological diseases invariably will require the correct identification of the optic disc involved-this was a major impetus for the current study. The novelty and strength of our database lies in the wide range of neuro-ophthalmological conditions represented, including anterior ischemic optic neuropathy, optic atrophy, compressive optic neuropathy, congenital anomalies of optic disc, optic disc drusen, hereditary optic neuropathy (e.g., Leber's), optic disc hypoplasia, optic disc infiltration, morning glory disc, nonglaucomatous cupping, optic nerve sheath meningioma, papilledema, tilted disc, and toxic optic neuropathy. Given our finding and the finding by Jang et al (23) that the optic disc/peripapillary area is important for laterality differentiation and that the performance of a DCNN could be affected by the presence of pathologies, it is important to note that our DLS still could reliably detect eye laterality in the presence of various optic disc pathologies, suggesting that our DLS would likely perform reasonably Liu et al: J Neuro-Ophthalmol 2020; 40: 178-184 well, when deployed in a neuro-ophthalmology clinic or used for curating a neuro-ophthalmology data set. Although distinguishing optic disc laterality is a relatively straight forward task for a human clinician, such an algorithm nevertheless is instrumental for future machinelearning endeavors in the field of neuro-ophthalmology. For example, such an algorithm can rapidly, automatically segregate right eye disc photographs from left eye disc photographs in an image bank containing a large number of optic disc images. Also, as the field of medical-image AI is gravitating toward simultaneous multilabelling within the same image, for example, labelling an image as "a left optic disc with blurred disc margins, suggestive of anterior ischemic optic neuropathy," such an algorithm also will be useful for generating one of these essential labels. Most images in our primary data set are derived from the clinical practice of one neuro-ophthalmologist based in an urban setting. Although we aimed to increase the diversity of images by including images from 3 other databases, it is unclear how our DLS will perform when tested against a different database with images derived from a patient population of vastly different ethnic distribution or disease prevalence. Also, it is unclear how our DLS will perform in a "real world" setting when deployed clinically. These uncertainties, together with the lack of detailed demographic information in our data set, are the major weakness of our current study. CONCLUSIONS With transfer learning and only several hundred images for training, we have developed a DLS that can reliably detect eye laterality even in the presence of a variety of optic disc pathologies. As the next step, we will evaluate the performance of our DLS against prospectively collected clinical images and against a data set obtained from outside of the United States; that is, one that is derived from a patient population of different ethnic make-up and/or disease prevalence. STATEMENT OF AUTHORSHIP Category 1: a. conception and design: T. Y. A. Liu, D. S. W. Ting, P. H. Yi, and N. R. Miller; b. acquisition of data: T. Y. A. Liu, N. R. Miller, and P. S. Subramanian; c. analysis and interpretation of data: T. Y. A. Liu, J. Wei, H. Zhu, T. Li, and N. R. Miller. Category 2: a. drafting the manuscript: T. Y. A. Liu; b. revising it for intellectual content: T. Y. A. Liu, D. S. W. Ting, P. H. Yi, J. Wei, H. Zhu, T. Li, F. K. Hui, G. D. Hager, N. R. Miller, and P. S. Subramanian. Category 3: a. final approval of the completed manuscript: T. Y. A. Liu, D. S. W. Ting, P. H. Yi, J. Wei, H. Zhu, T. Li, F. K. Hui, G. D. Hager, N. R. Miller, and P. S. Subramanian. REFERENCES 1. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436-444. 2. Ehteshami Bejnordi B, Veta M, Johannes van Diest P, van Ginneken B, Karssemeijer N, Litjens G, van der Laak J, the CC, Hermsen M, Manson QF, Balkenhol M, Geessink O, 183 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Stathonikos N, van Dijk MC, Bult P, Beca F, Beck AH, Wang D, Khosla A, Gargeya R, Irshad H, Zhong A, Dou Q, Li Q, Chen H, Lin HJ, Heng PA, Hass C, Bruni E, Wong Q, Halici U, Oner MU, Cetin-Atalay R, Berseth M, Khvatkov V, Vylegzhanin A, Kraus O, Shaban M, Rajpoot N, Awan R, Sirinukunwattana K, Qaiser T, Tsang YW, Tellez D, Annuscheit J, Hufnagl P, Valkonen M, Kartasalo K, Latonen L, Ruusuvuori P, Liimatainen K, Albarqouni S, Mungal B, George A, Demirci S, Navab N, Watanabe S, Seno S, Takenaka Y, Matsuda H, Ahmady Phoulady H, Kovalev V, Kalinovsky A, Liauchuk V, Bueno G, Fernandez-Carrobles MM, Serrano I, Deniz O, Racoceanu D, Venancio R. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA. 2017;318:2199-2210. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115-118. Chung SW, Han SS, Lee JW, Oh KS, Kim NR, Yoon JP, Kim JY, Moon SH, Kwon J, Lee HJ, Noh YM, Kim Y. Automated detection and classification of the proximal humerus fracture by using deep learning algorithm. Acta Orthop. 2018;89:468- 473. Lakhani P, Sundaram B. Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology. 2017;284:574- 582. Lakhani P. Deep convolutional neural networks for endotracheal tube position and X-ray image classification: challenges and opportunities. J Digit Imaging. 2017;30:460- 468. Ting DSW, Cheung CY, Lim G, Tan GSW, Quang ND, Gan A, Hamzah H, Garcia-Franco R, San Yeo IY, Lee SY, Wong EYM, Sabanayagam C, Baskaran M, Ibrahim F, Tan NC, Finkelstein EA, Lamoureux EL, Wong IY, Bressler NM, Sivaprasad S, Varma R, Jonas JB, He MG, Cheng CY, Cheung GCM, Aung T, Hsu W, Lee ML, Wong TY. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 2017;318:2211-2223. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, Venugopalan S, Widner K, Madams T, Cuadros J, Kim R, Raman R, Nelson PC, Mega JL, Webster DR. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316:2402-2410. Gargeya R, Leng T. Automated identification of diabetic retinopathy using deep learning. Ophthalmology. 2017;124:962-969. Raju M, Pagidimarri V, Barreto R, Kadam A, Kasivajjala V, Aswath A. Development of a deep learning algorithm for automatic diagnosis of diabetic retinopathy. Stud Health Technol Inform. 2017;245:559-563. Takahashi H, Tampo H, Arai Y, Inoue Y, Kawashima H. Applying artificial intelligence to disease staging: deep learning for improved staging of diabetic retinopathy. PLoS One. 2017;12:e0179790. Burlina P, Pacheco KD, Joshi N, Freund DE, Bressler NM. Comparing humans and deep learning performance for grading AMD: a study in using universal deep features and transfer learning for automated AMD analysis. Comput Biol Med. 2017;82:80-86. Burlina PM, Joshi N, Pekala M, Pacheco KD, Freund DE, Bressler NM. Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmol. 2017;135:1170-1176. 184 14. Matsuba S, Tabuchi H, Ohsugi H, Enno H, Ishitobi N, Masumoto H, Kiuchi Y. Accuracy of ultra-wide-field fundus ophthalmoscopy-assisted deep learning, a machine-learning technology, for detecting age-related macular degeneration. Int Ophthalmol. 2019;39:1269-1275. 15. Treder M, Lauermann JL, Eter N. Automated detection of exudative age-related macular degeneration in spectral domain optical coherence tomography using deep learning. Graefes Arch Clin Exp Ophthalmol. 2018;256:259-265. 16. Asaoka R, Murata H, Iwase A, Araie M. Detecting preperimetric glaucoma with standard automated perimetry using a deep learning classifier. Ophthalmology. 2016;123:1974-1980. 17. Cerentini A, Welfer D, Cordeiro d'Ornellas M, Pereira Haygert CJ, Dotto GN. Automatic identification of glaucoma using deep learning methods. Stud Health Technol Inform. 2017;245:318-321. 18. Muhammad H, Fuchs TJ, De Cuir N, De Moraes CG, Blumberg DM, Liebmann JM, Ritch R, Hood DC. Hybrid deep learning on single wide-field optical coherence tomography scans accurately classifies glaucoma suspects. J Glaucoma. 2017;26:1086-1094. 19. Brown JM, Campbell JP, Beers A, Chang K, Ostmo S, Chan RVP, Dy J, Erdogmus D, Ioannidis S, Kalpathy-Cramer J, Chiang MF. Imaging, informatics in retinopathy of prematurity research C. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA Ophthalmol. 2018;136:803-810. 20. Poplin R, Varadarajan AV, Blumer K, Liu Y, McConnell MV, Corrado GS, Peng L, Webster DR. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat Biomed Eng. 2018;2:158-164. 21. Ting DSW, Wong TY. Eyeing cardiovascular risk factors. Nat Biomed Eng. 2018;2:140-141. 22. Age-Related Eye Disease Study Research Group. The AgeRelated Eye Disease Study (AREDS): design implications. AREDS report no. 1. Control Clin Trials. 1999;20:573-600. 23. Jang Y, Son J, Park KH, Park SJ, Jung KH. Laterality classification of fundus images using interpretable deep neural network. J Digit Imaging. 2018;31:923-928. 24. Carmona EJ, Rincón M, García-Feijoó J, Martínez-de-la-Casa JM. Identification of the optic nerve head with genetic algorithms. Artif Intell Med. 2008;43:243-259. 25. Budai A, Bock R, Maier A, Hornegger J, Michelson G. Robust vessel segmentation in fundus images. Int J Biomed Imaging. 2013;2013:154860. 26. American Society of Retina Specialists. Retina Image Bank. Available at: http://imagebank.asrs.org/. Accessed September 18. 2018. 27. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016:770-778. Presented June 26, 2016‐July 1, 2016; Las Vegas, NV. 28. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016:2921-2929. Presented June 26, 2016‐July 1, 2016; Las Vegas, NV. 29. Kermany DS, Goldbaum M, Cai W, Valentim CCS, Liang H, Baxter SL, McKeown A, Yang G, Wu X, Yan F, Dong J, Prasadha MK, Pei J, Ting MYL, Zhu J, Li C, Hewett S, Dong J, Ziyar I, Shi A, Zhang R, Zheng L, Hou R, Shi W, Fu X, Duan Y, Huu VAN, Wen C, Zhang ED, Zhang CL, Li O, Wang X, Singer MA, Sun X, Xu J, Tafreshi A, Lewis MA, Xia H, Zhang K. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 2018;172:1122-1131.e9. 30. Ting DSW, Liu Y, Burlina P, Xu X, Bressler NM, Wong TY. AI for medical imaging goes deep. Nat Med. 2018;24:539-540. Liu et al: J Neuro-Ophthalmol 2020; 40: 178-184 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. |
Date | 2020-06 |
Language | eng |
Format | application/pdf |
Type | Text |
Publication Type | Journal Article |
Source | Journal of Neuro-Ophthalmology, June 2020, Volume 40, Issue 2 |
Collection | Neuro-Ophthalmology Virtual Education Library: Journal of Neuro-Ophthalmology Archives: https://novel.utah.edu/jno/ |
Publisher | Lippincott, Williams & Wilkins |
Holding Institution | Spencer S. Eccles Health Sciences Library, University of Utah |
Rights Management | © North American Neuro-Ophthalmology Society |
ARK | ark:/87278/s6bc9nwb |
Setname | ehsl_novel_jno |
ID | 1592870 |
Reference URL | https://collections.lib.utah.edu/ark:/87278/s6bc9nwb |