OCR Text |
Show Original Contribution Section Editors: Clare Fraser, MD Susan Mollan, MD Interobserver and Intra-Observer Reliability of Eyelid Tests for Ocular Myasthenia Gravis Thanchat Jienmaneechotchai, MD, Supanut Apinyawasisuk, MD, Supharat Jariyakosol, MD, Parima Hirunwiwatkul, MD Background: Lid fatigability test (LFT), Cogan lid twitch (CLT), and forced eyelids closure test (FECT) are simple clinical screening tests for ocular myasthenia gravis (OMG). However, these tests are subjectively interpreted. We thus evaluated the interobserver and intra-observer reliability of each test. Methods: The 3 eyelid tests were performed in ptotic patients associated with various conditions, including OMG and others. Video clips of all tests were recorded using smartphone with built-in camera in the following order; LFT, CLT, and FECT. All video clips were distributed to 3 neuro-ophthalmologists and 3 general ophthalmologists, who were trained to evaluate the tests using a single standard instruction. After 3 months, all video clips were reorganized for the second evaluation. Interobserver and intraobserver reliability were calculated using Cohens’ Kappa coefficient and Fleiss Kappa statistic. Results: The 3 eyelid tests were performed and recorded in 35 patients, which included the diagnosis of OMG, levator muscle dehiscence, partial oculomotor nerve palsy, and Horner syndrome. CLT received moderate-to-substantial interobserver reliability in neuro-ophthalmologist group (Fleiss Kappa 0.77 [95% CI 0.60–0.94] and 0.66 [95% CI 0.46–0.85] in first and second evaluation respectively), but the results varied in general ophthalmologist group (Fleiss Kappa 0.58 [95% CI 0.37–0.79] and 0.54 [95% CI 0.33– 0.76] in first and second evaluation respectively). FECT and LFT received lower interobserver reliability in both groups. CLT also received moderate-to-almost perfect intra-observer reliability in neuro-ophthalmologist group (Cohen Kappa 0.55, 0.58, and 0.92), whereas FECT and LFT received lower intra-observer reliability. The intra-observer reliability Ophthalmology Department (TJ, SA, SJ, PH), King Chulalongkorn Memorial Hospital, Bangkok, Thailand; and Department of Ophthalmology (TJ, SA, SJ, PH), Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand. The authors report no conflicts of interest. Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s Web site (www. jneuro-ophthalmology.com). Address correspondence to Supanut Apinyawasisuk, MD, Department of Ophthalmology, Faculty of Medicine, Chulalongkorn University, 1873 Rama 4 Road, Pathum Wan, Bangkok 10330, Thailand; E-mail: s.apinyawasisuk@gmail.com 230 varied among general ophthalmologists for all 3 eyelid tests. Conclusions: CLT is the most reliable test among the 3 eyelid tests. However, all tests should be interpreted with caution by general ophthalmologists. Journal of Neuro-Ophthalmology 2022;42:230–233 doi: 10.1097/WNO.0000000000001425 © 2021 by North American Neuro-Ophthalmology Society O cular myasthenia gravis (OMG) is an autoimmune disorder producing autoantibodies against acetylcholine receptor located at the postsynaptic neuromuscular junction resulting in fatigable weakness of the levator palpabrae superioris, extraocular muscles, and orbicularis oculi causing ptosis, diplopia, and lagophthalmos (1). Diagnosis of OMG is sometimes challenging because of limited sensitivity and specificity of diagnostic tests (2). Patients with ptosis sometimes undergo many diagnostic tests to exclude OMG. To avoid performing unnecessary invasive tests, simple clinical tests such as lid fatigability test (LFT), Cogan lid twitch (CLT), and forced eyelids closure test (FECT) should be used as a screening method. Previous studies reported that CLT and FECT had reasonably high sensitivity and specificity. In CLT, sensitivity ranged from 50% to 75% and specificity ranged from 91.7% to 99%. In FECT, sensitivity and specificity were 94% and 91% (3–5). Because the result of these tests is subjectively interpreted by individuals, this study aims to evaluate interobserver and intra-observer reliability of the 3 eyelid tests; LFT, CLT, FECT in patients with ptosis caused by OMG and other mimicking conditions. METHODS This observational study included patients with ptosis caused by OMG and other conditions presented to Neuro-ophthalmology Clinic, Oculoplastic and Jienmaneechotchai et al: J Neuro-Ophthalmol 2022; 42: 230-233 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution TABLE 1. Cohen kappa coefficient shows interobserver reliability among neuro-ophthalmologists Interpreters First evaluation Second evaluation N1; N1; N2; N1; N1; N2; N2 N3 N3 N2 N3 N3 LFT (95% CI) 0.56 0.44 0.37 0.64 0.39 0.44 (0.28–0.85) (0.13–0.76) (0.03–0.70) (0.35–0.93) (0.04–0.73) (0.10–0.78) CLT (95% CI) 0.86 0.64 0.61 0.61 0.49 0.70 (0.67–1) (0.35–0.93) (0.30–0.93) (0.30–0.93) (0.16–0.83) (0.43–0.98) FECT (95% CI) 0.75 0.49 0.47 0.40 0.41 0.58 (0.51–0.98) (0.19–0.80) (0.15–0.79) (0–0.84) (0.02–0.80) (0.24–0.92) CI, confidence interval; CLT, Cogan lid twitch; FECT, forced eyelids closure test; LFT, lid fatigability test; N1, neuro-ophthalmologist #1; N2, neuro-ophthalmologist #2; N3, neuro-ophthalmologist #3. Reconstructive Surgery Clinic, and General Outpatient Clinic at the Department of Ophthalmology, King Chulalongkorn Memorial Hospital from February 2017 to February 2018. Inclusion criteria were patients who were 18 years old or older, presented with ptosis at the time of performing the tests or had a history of ptosis. Exclusion criteria were patients who could not understand how to perform the eyelid tests, patients whose upper eyelids could not move downward from any causes (unable to perform CLT), and patients who had abnormal movement of the eyelids that interfere with the evaluation of the tests. The diagnosis of OMG was made based on the typical clinical findings and positivity of at least one of the following diagnostic tests: acetylcholine receptor antibody, single-fiber electromyography, and repetitive nerve stimulation test. The patient with negative tests who showed dramatic responsiveness to pyridostigmine treatment or later developed generalized myasthenia gravis also received the diagnosis. A positive ice pack test was always confirmed by one of the aforementioned diagnostic tests to establish the diagnosis. The result of the 3 eyelid tests being studied was not used to diagnose the condition. The other conditions mimicking OMG were diagnosed using their standard diagnostic methods. Levator muscle dehiscence was diagnosed by experienced oculoplastic specialists, based on clinical findings and negativity of ice pack test. Partial oculomotor nerve palsy was diagnosed based on the clinical signs of ptosis plus limitation of the involved extraocular muscles. Horner syndrome was diagnosed based on clinical signs of ptosis plus ipsilateral miosis and positive cocaine test. The study protocol was approved by the institutional review board of the Faculty of Medicine, Chulalongkorn University and adhered to the tenets of the Declaration of Helsinki. Informed consent was obtained from all patients and interpreters. The Thai Clinical Trials Registry approved the project and the identification number is TCTR20200415002. Three eyelid tests were performed in each patient and video clips were recorded using a smartphone with built-in camera positioned at the patient’s primary gaze. Only part of the face, from just above the eyebrows to below lower eyelids margin was framed. The tests were performed in the following order; LFT, CLT, and FECT. LFT was performed by asking the patient to look upward for 1 minute then observing the downward movement of the upper eyelid as a positive result. CLT was performed by asking the patient to look downward for 5–10 seconds before immediate refixate at primary position whereas an examiner immobilized the patient’s frontalis muscle with the examiner’s fingers to eliminate the contribution of the muscle. The examiner then observed the presence of excessive upward eyelids movement followed by downward drooping as a positive result. FECT was performed by asking the patient to close eyelids tightly for 5–10 seconds before immediately open and fixate at primary position whereas an examiner immobilized the patient’s frontalis muscle as performed in CLT. The excessive upward eyelids movement followed by downward drooping was defined as a positive result (See Supplemental Digital Content, Video, http://links.lww. com/WNO/A526). An investigator (T.J.) recorded and TABLE 2. Cohen kappa coefficient shows interobserver reliability among general ophthalmologists Interpreters First evaluation Second evaluation G1; G1; G2; G1; G1; G2; G2 G3 G3 G2 G3 G3 LFT (95% CI) 0.26 0.07 0.40 0.31 0.18 0.36 (0–0.62) (0–0.47) (0.04–0.77) (0.01–0.60) (0–0.47) (0–0.78) CLT (95% CI) 0.41 0.26 0.21 0.35 0.03 0.64 (0.01–0.80) (0–0.75) (0–0.69) (0–0.75) (0–0.48) (0.31–0.97) FECT (95% CI) 0.15 0.03 0.32 0.11 0.05 0.37 (0–0.57) (0–0.50) (0–0.81) (0–0.43) (0–0.38) (0–1) CI, confidence interval; CLT, Cogan lid twitch; FECT, forced eyelids closure test; G1, general ophthalmologist #1; G2, general ophthalmologist #2; G3, general ophthalmologist #3; LFT, lid fatigability test. Jienmaneechotchai et al: J Neuro-Ophthalmol 2022; 42: 230-233 231 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution TABLE 3. Fleiss kappa shows interobserver reliability in each group LFT (95% CI) N N G G Test Test Test Test 1 2 1 2 0.50 0.54 0.39 0.31 CLT (95% CI) (0.29–0.72) (0.33–0.76) (0.17–0.61) (0.09–0.54) 0.77 0.66 0.58 0.54 FECT (95% CI) (0.60–0.94) (0.46–0.85) (0.37–0.79) (0.33–0.76) 0.62 0.66 0.47 0.31 (0.42–0.82) (0.46–0.85) (0.25–0.69) (0.09–0.54) CI, confidence interval; CLT, Cogan lid twitch; FECT, forced eyelids closure test; G, general ophthalmologist; LFT, lid fatigability test; N, neuro-ophthalmologist. put all de-identified patients’ video clips into digital video discs, then distributed the digital video discs to all interpreters, which included 3 neuro-ophthalmologists (P.H., S.J., S.A.) and 3 general ophthalmologists. Each interpreter was trained to evaluate the tests using the same instruction. Test result was classified as positive or negative. Interpreters watched the video clips and evaluated the result of 3 tests in all patients, then filled in the record form. After 3 months, T.J. re-arranged the order of all video clips and redistributed them to all interpreters for the second evaluation. All interpreters were unaware of the final diagnosis of each patient in the video clip. Cohen kappa coefficient was used to analyze the intraobserver reliability and interobserver reliability between 2 interpreters in each group. Fleiss kappa was used to calculate interobserver reliability among 3 interpreters in each group. RESULTS Thirty-five participants with ptosis were tested with 3 tests during the study period. Most participants received the diagnosis of OMG or levator muscle dehiscence. A small group of participants had partial oculomotor nerve palsy or Horner syndrome. In neuro-ophthalmologists group, CLT received highest interobserver reliability, Cohen kappa 0.49–0.86. FECT and LFT received lower interobserver reliability, Cohen kappa 0.40–0.75 and 0.37–0.64 respectively (Table 1). In general ophthalmologists group, the interobserver reliability highly varied. However, CLT received highest interobserver reliability, Cohen kappa 0.03–0.64. FECT and LFT received lower interobserver reliability, Cohen kappa 0.03–0.37 and 0.07–0.40 respectively (Table 2). We obtained the same results when using Fleiss kappa to analyze interobserver reliability among 3 interpreters. Among neuro-ophthalmologists, CLT received highest reliability, Fleiss kappa of 0.77 (95% CI 0.60–0.94) and 0.66 (95% CI 0.46–0.85) in first and second evaluation respectively. Among general ophthalmologists, CLT received highest reliability, Fleiss kappa of 0.58 (95% CI 0.37–0.79) and 0.54 (95% CI 0.33–0.76) in first and second evaluation respectively. FECT and LFT received lower interobserver reliability in both neuro-ophthalmologist and general ophthalmologist group (Table 3). 232 In neuro-ophthalmologists group, CLT received highest intra-observer reliability, Cohen kappa of 0.55, 0.58, and 0.92. FECT received lower intra-observer reliability, Cohen kappa of 0.38, 0.44, and 0.62. LFT received lowest intraobserver reliability, Cohen kappa of 0.16, 0.53, and 0.55. Intra-observer reliability in general ophthalmologist varied among interpreters and was unable to be calculated in one interpreter for FECT because the observed concordance was smaller than mean-chance concordance (Table 4). DISCUSSION This is the first study evaluating reliability of the 3 eyelid tests used for OMG screening. Although previous studies have shown that the tests have reasonably high accuracy (3– 5), both interobserver and intra-observer reliability should be addressed for these subjective tests. We found that CLT had high interobserver and intra-observer reliability when being evaluated by neuro-ophthalmologists. However, the reliability decreased when being evaluated by general ophthalmologists. We explained that CLT has been widely considered a standard clinical test for OMG and routinely performed by most neuro-ophthalmologists than the other 2 tests. However, most general ophthalmologists may have less OMG patients in their practices; thus, they may infrequently perform CLT and lack of experience in evaluating the test. The more experience the interpreters have, the higher level of agreement in tests’ result interpretation. TABLE 4. Cohen kappa coefficient shows intraobserver reliability in neuro-ophthalmologists and general ophthalmologists LFT (95% CI) N1 N2 N3 G1 G2 G3 0.55 0.53 0.16 0.29 0.64 0.41 (0.25–0.85) (0.22–0.84) (0–0.52) (0–0.60) (0.35–0.93) (0–0.84) CLT (95% CI) 0.58 0.92 0.55 0.03 0.58 0.47 (0.28–0.89) (0.77–1) (0.23–0.88) (0–0.48) (0.23–0.92) (0.04–0.90) FECT (95% CI) 0.38 0.62 0.44 0.10 0.26 NA (0.03–0.73) (0.32–0.93) (0.10–0.78) (0–0.42) (0–0.75) CI, confidence interval; CLT, Cogan lid twitch; FECT, forced eyelids closure test; G, general ophthalmologist; LFT, lid fatigability test; N, neuro-ophthalmologist; NA, not available. Jienmaneechotchai et al: J Neuro-Ophthalmol 2022; 42: 230-233 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. Original Contribution In contrast to CLT, FECT had lower reliability among both neuro-ophthalmologists and general ophthalmologists. Although FECT was invented in 1982 (5), it has been performed quite limitedly by a small group of neuroophthalmologists, whereas most ophthalmologists are not familiar with the test. Despite of the fact that we observe similar eyelid movement pattern (overshoot of the upper eyelid followed with downward movement) in FECT and CLT as a positive result, immediate eyelid opening in FECT may partially lift the upper eyelids and falsely enhance the overshoot eyelid movement more than primary position refixation in CLT. This sometimes makes the observers confused whether the overshoot movement is a positive result of FECT or is associated with the lifting effect of the immediate eyelid opening. These factors may explain why FECT had lower, wide-ranged reliability. The sequence of the tests performed in this study (LFT, CLT, FECT, respectively) may potentially affect the interpretation. We hypothesized that FECT’s result was enhanced by the patient’s fatigue caused by the preceding tests (LFT and CLT) and the result of FECT should theoretically be the most obvious for the observers to interpret it in the same way. However, the reliability of FECT was surprisingly lower than expected and this implies that FECT may actually have lower reliability when being performed separately or before the other tests. The strength of our study is that we used a single standard instruction to train all interpreters to eliminate the chance of disagreement caused by their different background knowledge. In addition, we gave the interpreters an interval between the first and second evaluation to lower the possibility that the interpreter may be able to recognize the patients and reevaluate the test as the first evaluation that may falsely raise intra-observer reliability. However, our study comes with some limitations. First, all interpreters evaluated the test result by watching the video clips which may be different from realworld practice that the observer sits in front of the patient and observe the test face-to-face. Second, the recording technique itself was sometimes challenging because the smartphone camera was not precisely positioned at the patient’s primary gaze as planned causing difficulty in evaluation because of poor recording view. Third, to generalize the result of this study, we should be aware of the sequence of the tests being performed. Jienmaneechotchai et al: J Neuro-Ophthalmol 2022; 42: 230-233 Fourth, because this study aimed to investigate the overall reliability that could be applied to all conditions causing ptosis rather than for each specific condition, we did not record the definite diagnosis of patient in each video clip and were unable to perform a subgroup analysis for each condition such as OMG or levator muscle dehiscence. However, this important concern should be addressed in a further study investigating reliability of the tests separately for OMG and levator muscle dehiscence. Fifth, the diagnostic accuracy of each test was not evaluated in our study according to the similar reason that the diagnoses were not recorded. In conclusion, eyelid tests are simple screening tests for OMG. CLT is the most reliable among the 3 tests. Nevertheless, these tests should be used with caution when being performed and evaluated by general ophthalmologists because of low interobserver and intra-observer reliability. STATEMENT OF AUTHORSHIP Category 1: a. Conception and design: T. Jienmaneechotchai, S. Apinyawasisuk, S. Jariyakosol, and P. Hirunwiwatkul; b. Acquisition of data: T. Jienmaneechotchai; c. Analysis and interpretation of data: T. Jienmaneechotchai. Category 2: a. Drafting the manuscript: T. Jienmaneechotchai, S. Apinyawasisuk, S. Jariyakosol, and P. Hirunwiwatkul; b. Revising it for intellectual content: T. Jienmaneechotchai, S. Apinyawasisuk, S. Jariyakosol, and P. Hirunwiwatkul, Category 3: a. Final approval of the completed manuscript: T. Jienmaneechotchai, S. Apinyawasisuk, S. Jariyakosol, and P. Hirunwiwatkul. REFERENCES 1. Vaphiades MS, Bhatti MT, Lesser RL. Ocular myasthenia gravis. Curr Opin Ophthalmol. 2012;23:537–542. 2. Benatar M. A systematic review of diagnostic studies in myasthenia gravis. Neuromuscul Disord. 2006;16:459–467. 3. Singman EL, Matta NS, Silbert DI. Use of the Cogan lid twitch to identify myasthenia gravis. J Neuroophthalmol. 2011;31:239– 240. 4. Van Stavern GP, Bhatt A, Haviland J, Black EH. A prospective study assessing the utility of Cogan’s lid twitch sign in patients with isolated unilateral or bilateral ptosis. J Neurol Sci. 2007;256:84–85. 5. Apinyawasisuk S, Zhou X, Tian JJ, Garcia GA, Karanjia R, Sadun AA. Validity of forced eyelid closure test: a novel clinical screening test for ocular myasthenia gravis. J Neuroophthalmol. 2017;37:253–257. 233 Copyright © North American Neuro-Ophthalmology Society. Unauthorized reproduction of this article is prohibited. |