Signal processing, human factors, and modelling to support bedside care in the intensive care unit

Signal processing, human factors, and modelling to support bedside care in the intensive care unit

Title	Signal processing, human factors, and modelling to support bedside care in the intensive care unit
Publication Type	dissertation
School or College	College of Engineering
Department	Biomedical Engineering
Author	Görges, Matthias
Date	2011-08
Description	Medical error causes preventable death in nearly 100,000 patients per year in the US alone. Common sources for error include medication related problems, technical equipment failure, interruptions, complicated and error-prone devices, information overload (providing too much patient data for one person to process effectively), and environmental problems like inadequate lighting or distracting ambient noise. Intensive care units are one of the riskiest locations in a hospital, with up to 9 reported events per 100 patient days. This risk is in large contrast to anesthesia in the operating rooms. Here much advancement in the area of patient safety has been made in the past, dropping the average risk for anesthesia related death to less than 1 in 200,000 anesthetics-an improvement by a factor of 20 in the past 30 years. Improvements in technology and other innovations contributing to this success now need to be adapted for and implemented in the intensive care unit setting. Nurses are increasingly regarded as key decision makers within the healthcare team, as they outnumber physicians 4:1. Reducing nurses' workload and improving medical decision making by providing decision support tools can have a significant impact in reducing the chances of medical errors. This dissertation consists of four manuscripts: 1) a review of previous medical display evaluations, providing insight into solutions that have worked in the past; 2) a study on reducing false alarms and increasing the usefulness of the remaining alarms by introducing alarm delays and detecting alarm context;, such as suctioning automatically silencing ventilator alarms; 3) a study of simplifying the frequent but complicated task of titrating vasoactive medications by providing a titration support tool that predicts blood pressure changes 5 minutes into the future; and 4) a study on supporting the triage of unfamiliar patients by introducing a far-view display that incorporates information from previously disparate devices and presents trend and alarm information at one easy to scan and interpret location.
Type	Text
Publisher	University of Utah
Subject	Graphical display; Human factors; Intensive care unit; Nursing; Signal processing
Dissertation Institution	University of Utah
Dissertation Name	Doctor of Philosophy
Language	eng
Rights Management	Copyright © Matthias Görges 2011
Format	application/pdf
Format Medium	application/pdf
Format Extent	1,416,902 bytes
Identifier	us-etd3,59319
Source	Original housed in Marriott Library Special Collections, RA4.5 2010 .G67
ARK	ark:/87278/s68346rw
DOI	https://doi.org/doi:10.26053/0H-VRMV-3R00
Setname	ir_etd
ID	194506
OCR Text	Show SIGNAL PROCESSING, HUMAN FACTORS, AND MODELLING TO SUPPORT BEDSIDE CARE IN THE INTENSIVE CARE UNIT by Matthias Görges A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Bioengineering The University of Utah August 2010 Copyright © Matthias Görges 2010 All Rights Reserved THE UNIVERSITY OF UTAH GRADUATE SCHOOL STATEMENT OF DISSERTATION APPROVAL The dissertation of Matthias Görges has been approved by the following supervisory committee members: Dwayne R. Westenskow , Chair 4/26/2010 Date Approved Douglas A. Christensen , Member 4/28/2010 Date Approved Robert S. MacLeod , Member 4/26/2010 Date Approved Boaz A. Markewitz , Member 4/26/2010 Date Approved Joseph A. Orr , Member 4/26/2010 Date Approved and by Richard D. Rabbitt , Chair of the Department of Bioengineering and by Charles A. Wight, Dean of The Graduate School. ABSTRACT Medical error causes preventable death in nearly 100,000 patients per year in the US alone. Common sources for error include medication related problems, technical equipment failure, interruptions, complicated and error-prone devices, information overload (providing too much patient data for one person to process effectively), and environmental problems like inadequate lighting or distracting ambient noise. Intensive care units are one of the riskiest locations in a hospital, with up to 9 reported events per 100 patient days. This risk is in large contrast to anesthesia in the operating rooms. Here much advancement in the area of patient safety has been made in the past, dropping the average risk for anesthesia related death to less than 1 in 200,000 anesthetics-an improvement by a factor of 20 in the past 30 years. Improvements in technology and other innovations contributing to this success now need to be adapted for and implemented in the intensive care unit setting. Nurses are increasingly regarded as key decision makers within the healthcare team, as they outnumber physicians 4:1. Reducing nurses' workload and improving medical decision making by providing decision support tools can have a significant impact in reducing the chances of medical errors. This dissertation consists of four manuscripts: 1) a review of previous medical display evaluations, providing insight into solutions that have worked in the past; 2) a study on reducing false alarms and increasing the usefulness of the remaining alarms by introducing alarm delays and detecting alarm context, such as suctioning automatically silencing ventilator alarms; 3) a study of simplifying the frequent but complicated task of titrating vasoactive medications by providing a titration support tool that predicts blood pressure changes 5 minutes into the future; and 4) a study on supporting the triage of unfamiliar patients by introducing a far-view display that incorporates information from previously disparate devices and presents trend and alarm information at one easy to scan and interpret location. My parents. CONTENTS ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OF FIGURES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x ACKNOWLEDGMENTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi CHAPTERS 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Medical Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Medical Decision Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.3 Human Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Goals and Contributions to the Literature . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Motivation for Focusing on the Intensive Care Unit . . . . . . . . . . 3 1.2.2 Review of Physiologic Monitoring Display Evaluations . . . . . . . . 3 1.2.3 Alarm Reductions Using Delays and Clinical Context . . . . . . . . . 4 1.2.4 Titration Advisory System with Patient Specific Sensitivity Iden-tification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.5 Intensive Care Unit Far-View Display Supporting Triaging Tasks 5 1.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2. EVALUATIONS OF PHYSIOLOGIC MONITORING DISPLAYS: A SYSTEMATIC REVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5.1 Study Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5.2 Study Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5.3 Display Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.5.4 Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.5.5 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.5.6 Dependent Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.6.1 Study Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.6.2 Study Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.6.3 Study Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.6.4 Tasks and Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.6.5 Future Display Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.8 PubMed Search Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.9 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3. IMPROVING ALARM PERFORMANCE IN THE MEDICAL INTENSIVE CARE UNIT USING DELAYS AND CLINICAL CONTEXT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3.1 Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3.2 Data Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3.3 Alarm Classifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3.4 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.4.1 Ventilator Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4.2 Unnecessary Alarms Occurring During Patient Care . . . . . . . . . . 62 3.4.3 Health Care Provider Presence and Tasks . . . . . . . . . . . . . . . . . 62 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.5.1 Comparison with the Literature . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.5.2 Alarm Classification Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.5.3 Introducing an Alarm Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.5.4 Reducing Ventilator Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.5.5 Reducing InfP and FeedP Alarms . . . . . . . . . . . . . . . . . . . . . . . . 68 3.5.6 Reducing Alarms Occurring During Patient Care . . . . . . . . . . . 69 3.5.7 Health Care Provider Presence and Tasks . . . . . . . . . . . . . . . . . . 69 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.7 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4. A TOOL PREDICTING FUTURE MEAN ARTERIAL BLOOD PRESSURE VALUES IMPROVES THE TITRATION OF VASOACTIVE DRUGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2.1 Alternatives to Manual Titration . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2.2 Purpose of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3.1 Identification of Patient Sensitivity . . . . . . . . . . . . . . . . . . . . . . . 74 4.3.2 Sensitivity Identification Performance Evaluation . . . . . . . . . . . 77 4.3.2.1 Creating Unique Patient Responses to SNP Infusions . . . . . 78 4.3.2.2 Identification of SNP Sensitivity . . . . . . . . . . . . . . . . . . . . . 80 4.3.3 Dopamine and Dobutamine Sensitivity Identifications . . . . . . . . 80 4.3.4 Blood Pressure Titration Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 vi 4.3.4.1 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.3.4.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.3.4.3 Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.3.4.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.3.5 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.4.1 Sensitivity Identification Performance . . . . . . . . . . . . . . . . . . . . 82 4.4.2 Blood Pressure Titration Tool Evaluation . . . . . . . . . . . . . . . . . . 85 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.5.1 Existing Sensitivity Identification Methods . . . . . . . . . . . . . . . . . 91 4.5.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.5.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.6 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.7 Appendix A: Identification of Optimal Step Size and Duration . . . . . . 95 4.8 Appendix B: Dopamine Sensitivity Identification . . . . . . . . . . . . . . . . 96 4.8.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.8.2 Evaluation of Sensitivity Identification Performance . . . . . . . . . . 96 4.8.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.9 Appendix C: Dobutamine Sensitivity Identification . . . . . . . . . . . . . . . 98 4.9.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.9.2 Exponential Saturating Sensitivity Identification . . . . . . . . . . . . 98 4.9.3 Evaluation of Sensitivity Identification Performance . . . . . . . . . . 100 4.9.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5. A FAR-VIEW INTENSIVE CARE UNIT MONITORING DISPLAY ENABLES FASTER TRIAGE . . . . . . . . . . . . . . . . . . . . . 104 5.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.2.2 Problems with Current Monitoring . . . . . . . . . . . . . . . . . . . . . . 105 5.2.3 Prioritizing Attention to Patients . . . . . . . . . . . . . . . . . . . . . . . . 106 5.2.4 Purpose of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.3.1 Far-View Display Development . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.3.1.1 Trend Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.3.1.2 Alarm Indicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.3.1.3 Syringe Pump Information . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.3.1.4 Therapy Support Indicator . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.3.2 Far-View Display Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.3.2.1 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.3.2.2 Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.3.2.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.3.2.4 Training and Quiz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.3.2.5 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.3.2.6 Scenario and Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.3.2.7 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 vii 5.4.1 Decision Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.4.2 Decision Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.4.3 Workload Scores and Display Preference . . . . . . . . . . . . . . . . . . 119 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.5.1 Decision Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.5.2 Decision Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.5.3 Accuracy Difference Between Both Far-View Displays . . . . . . . . 122 5.5.4 Workload Scores and Display Preference . . . . . . . . . . . . . . . . . . 123 5.5.5 Comparison with Existing Solutions from the Literature . . . . . . 123 5.5.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.5.7 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.1 Central Theme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.1.1 Four Manuscripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.1.2 Contribution of the Four Parts to the Central Theme . . . . . . . . . 129 6.2 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 6.2.1 Review of Physiologic Monitoring Display Evaluations . . . . . . . 131 6.2.2 Alarm Reductions Using Delays and Clinical Context . . . . . . . . . 132 6.2.3 Titration Advisory System with Patient Specific Sensitivity Iden-tification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.2.4 Intensive Care Unit Far-View Display Supporting Triaging Tasks 133 6.3 Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 6.4.1 Review of Physiologic Monitoring Display Evaluations . . . . . . . 134 6.4.2 Alarm Reductions Using Delays and Clinical Context . . . . . . . . . 134 6.4.3 Titration Advisory System with Patient Specific Sensitivity Iden-tification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.4.4 Intensive Care Unit Far-View Display Supporting Triaging Tasks 136 6.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 viii LIST OF FIGURES 3.1 Number and duration of alarms per hr . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2 Cumulative alarm number and classification . . . . . . . . . . . . . . . . . . . . . 61 3.3 Number and duration of health care provider visits to the patient's room 64 3.4 Tasks frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.1 Sodium-nitroprusside titration advisor . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.2 An illustration of the steps in the sensitivity identification algorithm . . 76 4.3 MATLAB implementation of Slate's sodium-nitroprusside model . . . . . 79 4.4 The error in our prediction of mean arterial blood pressure 5 min after starting a sodium-nitroprusside infusion rate of 2 mcg/kg/min . . . . . . . 83 4.5 The error in our estimation of sensitivity to sodium-nitroprusside for 100 simulated patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.6 The error in our estimation of sensitivity to dopamine for 100 simulated patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.7 The error in our prediction of steady-state mean arterial blood pressure after an increase in dobutamine infusion rate of 1 mcg/kg/min . . . . . . . 87 4.8 User performance with and without the advisory system . . . . . . . . . . . . 88 4.9 NASA TLX self-reported workload scores with and without use of the advisory system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.10 Sodium-nitroprusside (SNP) sensitivity identification error over most of the SNP model's parameter space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.1 Far-view display in "Bar" presentation, showing a linear 12 hr trend looking like a strip chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.2 Far-view display in "Clock" presentation, showing the trend information on a circle looking like a 12 hr clock . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.3 Control display consisting of a Dräger Kappa XLT patient monitor and an Alaris Medley infusion pump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.4 Answer times for each display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.5 Answer times for each display, grouped by scenario . . . . . . . . . . . . . . . . 118 5.6 Answer accuracy for each display, grouped by scenario . . . . . . . . . . . . . 120 5.7 NASA TLX self-reported workload scores for each display . . . . . . . . . . 121 6.1 Manuscript summaries and how they tie together . . . . . . . . . . . . . . . . . 130 LIST OF TABLES 2.1 Physiological monitoring display evaluations . . . . . . . . . . . . . . . . . . . . . 14 3.1 Alarm frequency, duration, and classification . . . . . . . . . . . . . . . . . . . . . 59 3.2 Number of ventilator alarms per hr . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.3 Ventilator alarms occurring during or within 2 min of patient care tasks 63 4.1 Sodium-nitroprusside model parameters . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2 Dopamine model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.3 Dobutamine model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.1 Differences between critical and less critical patients . . . . . . . . . . . . . . . 112 ACKNOWLEDGMENTS I would first like to thank Dwayne Westenskow, Director of the Anesthesiology Bioengineering division, for inviting me to join his research group and providing me the opportunity to perform the research described in this dissertation. His expert advice, encouraging comments, integrity and dedication to research were invaluable. Thanks to Joseph Orr, Co-Director of the Anesthesiology Bioengineering division, for his guidance and support of my research activities in the lab, as well as providing me with the opportunity to be involved in a second display design and evaluation project during my time in Utah. I appreciate the time and guidance provided by the members of my PhD su-pervisory committee: Dwayne R. Westenskow, Douglas A. Christensen, Robert S. MacLeod, Boaz A. Markewitz, and Joseph A. Orr. Their commitment to this project helped focus the project, and create a good balance between science, engineering and medicine. Thanks to Kai Kück for having the foresight to start exploring context awareness and its applications in medicine. His support and encouraging feedback during our bimonthly conference calls helped keep the project on track and focused on the importance of applying engineering to medicine in order to improve patient care and safety. Without his involvement I would never have pursued this research. I am also indebted to Mark Ansermino at the University of British Columbia, who reviewed this dissertation and provided many useful comments. I am looking forward to joining his research group as a postdoctoral fellow this fall. I would like to express many thanks to other collaborators in this project: Jim Agutter for instructions in good design and superb suggestions and feedback during the creation of the far-view display. Nancy Staggers for providing me with the opportunity of expanding a project I started during her Human Factors class, which culminated in a review of existing monitoring evaluation literature. Sven Koch, not only for working on the close-view nursing display, which is intended to accompany the far-view display developed, but also for frequent feedback in the far-view display design and evaluation process. I am most thankful to Cris LaPierre and Lara Brewer for sharing the entire PhD experience with me - while becoming close friends - and frequently providing tips, encouragements and support for my research. I am thankful for the support of the Anesthesia department staff and my fellow labmates Tammy Anderson, Sören Höhne, Cameron Jacobson, and Carl Tams. Special thanks to David Liu for statistics and study design advice and Bryce Hill for providing exciting community projects totally unrelated to my research. Special gratitude goes to my parents, whom this dissertation is dedicated to, my siblings and my friends for their support and encouragement during my long time away from home. Permission to reprint the paper published in Anesthesia & Analgesia was granted by the International Anesthesia Research Society and the papers published in the Journal of Clinical Monitoring and Computing were granted by Springer Science + Business Media. Finally, I would like to thank Drägerwerk AG, Lübeck, Germany for their interest and support of this research project. xii CHAPTER 1 INTRODUCTION This dissertation is the compilation of my work at the University of Utah focusing on reducing nurses' workload and improving medical decision making, thereby reduc-ing the chances of medical errors. It consists of four manuscripts. The first is a review of previous medical display evaluations. The remaining three are studies suggesting the following improvements to nurses' work: a) reduction of false alarms and increased usefulness of remaining alarms; b) simplification of a common but complicated task, titration of vasoactive medications; and c) support for triaging unfamiliar patients using a far-view display. 1.1 Background 1.1.1 Medical Error In 1997 an Institute of Medicine report estimated the number of preventable deaths caused by medical error to be between 44,000 and 98,000.1 This report started the modern patient-safety movement. Preventable medication errors have been found to occur in up to 1.5% of all hospital admissions.2 Medical errors are common in intensive care units (ICUs), with 36-89 reported events per 1,000 ICU patient days.3, 4 Causes of errors include complicated and error-prone devices, information overload (providing too much patient data for one person to process effectively), and environ-mental problems like inadequate lighting or distracting ambient noise.5 The most common medical errors in the ICU are medication errors, problems with intravenous infusions, and technical equipment failure.6 Problems in patient identification,7 wrong patients or wrong location in operations,8 interruptions,9 and team communication in the operating room10 are only some of the areas where improvements are needed and have been proposed. Computerized physician-order-entry or decision-support systems 2 can reduce certain types of medication error but have the drawbacks of slowing clinical workflow and introducing new errors if not performed carefully.11 1.1.2 Medical Decision Making Evidence-based medicine12, 13 aims to address the problem of clinical-practice variation by replacing personal clinical experience as the primary resource for medical decision-making with practice recommendations and guidelines based on systematic studies of populations.14 Sources of medical decision-making support15 include ar-tificial neural networks,16, 17 statistical methods such as Bayesian interference18 or fuzzy logic,19 case-based reasoning,20 and expert systems.21, 22 Data integration, using clinical dashboards23 or single indicators combining multiple variables,24 has shown promise for improving patient care. Nurses are increasingly regarded as key decision makers within the healthcare team25 and outnumber physicians 4:1. Nurses prefer humans as information sources as these deliver context specific information when needed. Additionally, literature use almost never occurs at the point of decision-making but rather after the fact.25 Research information needs to be presented in formats maximized for limited con-sumption opportunities, as nurses have limited time to explore literature.26 Finally, the follow-up report to "to err is human"1 specifically asked for decision support tools, such as reminders and alerts.27 1.1.3 Human Factors Human factors, the science of applying understanding of human capabilities and limitations to the design, development, and deployment of systems and services, has led to major safety improvements in aviation28 and nuclear engineering.29 More recently it has been applied to medicine, starting as early as the 1980s in the field of anesthesiology.1 In this field, collections of preventable incidents30 or closed insurance claims31 led to recommendations for preventing and detecting such incidents. Lack of situational awareness or inadequate situational awareness has been iden-tified as one of the primary factors in accidents attributed to human error.32 There are three levels of situational awareness: 1) perception, which includes detection of elements or identification of values; 2) comprehension, which includes the synthesis 3 of multiple elements towards understanding the current situation; and 3) projection, which extrapolates trends forward in time (e.g., for therapy planning). All three levels of situational awareness must be fulfilled to prevent errors.33-35 In the ICU, human factors techniques such as qualitative observations have been used to identify problems in commonly occurring tasks; for example, interruptions to a nurse's attention during medication preparation and tasks being forgotten because of large cognitive workload of nurses.36 Safety problems caused by shortcomings in nontechnical skills such as task management, teamwork, situation awareness, and decision making can be analyzed using root-cause analysis or observational studies.37 Clinical technologies such as graphical displays, medical-design interfaces and clinical-application designs have been analyzed for their usability and improvements have been reported, but they still need to focus more on nurses as their users.38 1.2 Goals and Contributions to the Literature 1.2.1 Motivation for Focusing on the Intensive Care Unit Anesthesiology has been at the forefront of technology and patient safety, as practitioners of anesthesiology are enthusiastic about technological innovation.39 Ex-amples of innovation in this field include the introduction of cardiac monitoring, pulse oximetry, and capnography and have led to anesthesiology being acknowledged as a model for patient safety in medicine.40 These technological improvements and other innovations now need to be adapted for and implemented in the ICU. The following four chapters contain manuscripts focusing on reducing nurses' workload and improving medical decision making, thereby reducing the chances of medical errors. 1.2.2 Review of Physiologic Monitoring Display Evaluations The purpose of this evaluation, which forms Chapter 2 of this dissertation, was to present the findings of past physiologic monitoring display evaluations that demon-strate reductions in medical errors and provider workload (both physical and mental) and improvements in medical decision making. It provides an opportunity to examine past work across studies and learn which ideas worked well and which did not, and 4 it sets the stage for the design and conduct of future evaluations in two subsequent studies performed in this dissertation. Participants were faster detecting an adverse event or making a diagnosis or decision in 57% of the evaluations. They showed an improved accuracy in a clinical decision or diagnosis 67% of the studies measuring this and a perceived workload decrease in 43% of the studies accessing this variable. The majority of the evaluations (61%) used anesthesiologists, practitioners in a field from which many medical innovations originate, and only 16% used nurses. This highlights the need for future clinical studies to focus on participants besides anesthesiologists. 1.2.3 Alarm Reductions Using Delays and Clinical Context The purpose of this study, which comprises Chapter 3 of this dissertation, was to identify methods for reducing the number of false alarms by using time delays and the correlations between alarms and clinical context. This information was obtained by observing health care providers caring for patients in the MICU. The study proposed a 19 sec alarm delay, which would have reduced 67% of the ignored and ineffective alarms, thereby reducing the noise level in the unit and potentially reducing nurses' workload. It identified nurses as the main monitoring users, making 66% of all visits to a patient's room, which should lead future research to design displays supporting nurses specifically. It also observed that nurses used equipment functions in a way not intended by the manufacturers (e.g., intentionally entering a smaller infusate volume than was available, so that the infusion pump alarm reminded them when the pump was nearly empty). These behaviors lead to unnecessary alarms. Additionally, nurses had to integrate information from many disparate sources, with only information from the cardiac monitor being available outside the patient's room. Finally, we observed that the titration of vasoactive medications was a challenging task, requiring signifi-cant nursing resources (in terms of staff availability as well as mental workload for the nurse performing this task). Future work should allow for combining clinical context, such as provider presence, performed tasks (suctioning causing alarm silencing, or titrating medications with predictions of vitals sign changes), and the patient's state in the physiological monitor. 5 1.2.4 Titration Advisory System with Patient Specific Sensitivity Identification Chapter 4 of this dissertation is the first example of supporting nurses in their clinical practice, by reducing their workload and improving their decision making. The purpose of this study was to use simulation to test the feasibility of using small-step changes in infusion rates to automatically identify a patient's sensitivity to sodium-nitroprusside (SNP), dobutamine, or dopamine as the drug is being infused and to evaluate whether an advisory system that predicts blood pressure values 5 min in the future enhances a clinician's ability to manage SNP infusion. Findings indicate a 52-82% improvement in the accuracy of the mean arterial blood pressure (MAP) prediction when using the identification system for the three investigated medications (SNP, dopamine and dobutamine); a median time reduction of 6.1 min to reach the desired MAP; and a significant reduction of mental workload and effort. Finally, the sensitivity identification led to a proposed extension of existing therapy support indicators, such as the inspired oxygen fraction and ventilator provided minute volume supporting blood oxygen saturation, to vasoactive drugs altering heart rate, blood pressure or cardiac output. 1.2.5 Intensive Care Unit Far-View Display Supporting Triaging Tasks Chapter 5 of this dissertation is the second example of supporting nurses in their clinical practice, by supporting them in triaging unfamiliar patients. The goal of the study was to test two hypotheses: a) the information provided by a far-view display allows a clinician to faster identify which patients need the most immediate attention, and b) the far-view display will reduce the clinicians' mental workload and improve situational awareness. The novel display was designed specifically for nurses as its main users (proposed in the Chapter 2) and includes infusion pumps indicating the time until they are empty (proposed in Chapter 3), as well as therapy support indicators (proposed in Chapter 4). It might find a future application not only in making triage decisions of unfamiliar patients but also in communicating patients' vital signs in change-of-shift reports. A nurse-specific close-view display, integrating multiple devices, such as cardiac patient monitors, infusion pumps, ventilators and 6 the electronic medical record, into a single easy to use device for nurses was designed and evaluated as a separate project performed by Sven Koch.41 1.3 References 1. Committee on Quality of Health Care in America. Errors in Health Care: A Leading Cause of Death and Injury. In: Kohn LT, Corrigan JM, Donaldson MS, editors. To Err Is Human: Building a Safer Health System. 1st ed. National Academies Press; 2000. p. 26-48. 2. Bates DW, Spell N, Cullen DJ, Burdick E, Laird N, Petersen LA, et al. The costs of adverse drug events in hospitalized patients. Adverse Drug Events Prevention Study Group. JAMA. 1997 Jan;277(4):307-311. 3. Osmon S, Harris CB, Dunagan WC, Prentice D, Fraser VJ, Kollef MH. Re-porting of medical errors: an intensive care unit experience. Crit Care Med. 2004 Mar;32(3):727-733. 4. Rothschild JM, Landrigan CP, Cronin JW, Kaushal R, Lockley SW, Burdick E, et al. The Critical Care Safety Study: The incidence and nature of adverse events and serious medical errors in intensive care. Crit Care Med. 2005 Aug;33(8):1694-1700. 5. Donchin Y, Seagull FJ. The hostile environment of the intensive care unit. Curr Opin Crit Care. 2002 Aug;8(4):316-320. 6. Flaatten H, Hevroy O. Errors in the intensive care unit (ICU). Experiences with an anonymous registration. Acta Anaesthesiol Scand. 1999 Jul;43(6):614-617. 7. Murphy MF, Kay JDS. Patient identification: problems and potential solutions. Vox Sang. 2004 Jul;87 Suppl 2:197-202. 8. Sandberg WS, Häkkinen M, Egan M, Curran PK, Fairbrother P, Choquette K, et al. Automatic detection and notification of 'wrong patient-wrong location' errors in the operating room. Surg Innov. 2005 Sep;12(3):253-260. 9. Liu D, Grundgeiger T, Sanderson PM, Jenkins SA, Leane TA. Interruptions and blood transfusion checks: lessons from the simulated operating room. Anesth Analg. 2009 Jan;108(1):219-222. 10. Davies JM. Team communication in the operating room. Acta Anaesthesiol Scand. 2005 Aug;49(7):898-901. 11. Handler JA, Feied CF, Coonan K, Vozenilek J, Gillam M, Peacock PRJ, et al. Computerized physician order entry and online decision support. Acad Emerg Med. 2004 Nov;11(11):1135-1141. 12. Timmermans S, Mauck A. The promises and pitfalls of evidence-based medicine. Health Aff (Millwood). 2005 Jan;24(1):18-28. 7 13. Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn't. BMJ. 1996 Jan;312(7023):71-72. 14. Eddy DM. Clinical Decision Making: From Theory to Practice: A Collection of Essays From the Journal of the American Medical Association. 1st ed. Sudbury, MA: Jones and Bartlett Publishers; 1996. 15. Berner ES, editor. Clinical Decision Support Systems: Theory and Practice. 2nd ed. New York, NY: Springer-Verlag; 2006. 16. Kattan MW, Beck JR. Artificial neural networks for medical classification decisions. Arch Pathol Lab Med. 1995 Aug;119(8):672-677. 17. Sharpe PK, Caleb P. Artificial neural networks within medical decision support systems. Scand J Clin Lab Invest Suppl. 1994;219:3-11. 18. Ashby D. Bayesian statistics in medicine: a 25 year review. Stat Med. 2006 Nov;25(21):3589-3631. 19. Bates JHT, Young MP. Applying fuzzy logic to medical decision making in the intensive care unit. Am J Respir Crit Care Med. 2003 Apr;167(7):948-952. 20. Dussart C, Pommier P, Siranyan V, Grelaud G, Dussart S. Optimizing clin-ical practice with case-based reasoning approach. J Eval Clin Pract. 2008 Oct;14(5):718-720. 21. Brokel JM, Shaw MG, Nicholson C. Expert clinical rules automate steps in delivering evidence-based care in the electronic health record. Comput Inform Nurs. 2006 Jul;24(4):196-205. 22. Heldt T, Long B, Verghese GC, Szolovits P, Mark RG. Integrating data, models, and reasoning in critical care. Conf Proc IEEE Eng Med Biol Soc. 2006;1:350- 353. 23. Egan M. Clinical dashboards: impact on workflow, care quality, and patient safety. Crit Care Nurs Q. 2006 Oct;29(4):354-361. 24. Tarassenko L, Hann A, Young D. Integrated monitoring and analysis for early warning of patient deterioration. Br J Anaesth. 2006 Jul;97(1):64-68. 25. Thompson C, Cullum N, McCaughan D, Sheldon T, Raynor P. Nurses, information use, and clinical decision making-the real world potential for evidence-based decisions in nursing. Evid Based Nurs. 2004 Jul;7(3):68-72. 26. Thompson C, McCaughan D, Cullum N, Sheldon T, Raynor P. Barriers to evidence-based practice in primary care nursing-why viewing decision-making as context is helpful. J Adv Nurs. 2005 Nov;52(4):432-444. 27. Committee on Data Standards for Patient Safety. Patient Safety: Achieving a New Standard for Care. Aspden P, Corrigan JM,Wolcott J, Erickson S, editors. Washington, DC: National Academies Press; 2004. 8 28. Stokes A, Wickens C. Aviation Displays. In: Wiener E, Nagel D, editors. Human Factors in Aviation. San Diego, CA: Academic Press; 1988. p. 387-431. 29. Stanton NA, editor. Human Factors in Nuclear Safety. Bristol, PA: Taylor & Francis; 1996. 30. Cooper JB, Newbower RS, Kitz RJ. An analysis of major errors and equipment failures in anesthesia management: considerations for prevention and detection. Anesthesiology. 1984 Jan;60(1):34-42. 31. Cheney FW. The American Society of Anesthesiologists Closed Claims Project: what have we learned, how has it affected practice, and how will it affect practice in the future? Anesthesiology. 1999 Aug;91(2):552-556. 32. Nullmeyer RT, Stella D, Montijo GA, Harden SW. Human Factors in Air Force Flight Mishaps: Implications for Change. In: The Interservice/Industry Training, Simulation & Education Conference. 2260. Arlington, VA; 2005. p. 1-11. 33. Drews F, Westenskow D. Human computer interaction in health care. In: Carayon P, editor. Handbook of Human Factors and Ergonomics in Health Care and Patient Safety. Lawrence Erlbaum Associates; 2006. p. 423-38. 34. Goodstein L. Discriminative display support for process operators. In: Rasmussen J, Rouse W, editors. Human Detection and Diagnosis of System Failures. Springer; 1981. p. 433-449. 35. Pew RW. The State of Situation Awareness Measurement: Heading Toward the Next Century, in Situation Awareness Analysis and Measurement. In: Endsley MR, Garland DJ, editors. Situation Awareness Analysis and Measurement. Mahwah, NJ: Lawrence Erlbaum Associates; 2000. p. 33-47. 36. Potter P, Wolf L, Boxerman S, Grayson D, Sledge J, Dunagan C, et al. Understanding the cognitive work of nursing in the acute care environment. J Nurs Adm. 2005 Jul;35(7-8):327-335. 37. Reader T, Flin R, Lauche K, Cuthbertson BH. Non-technical skills in the intensive care unit. Br J Anaesth. 2006 May;96(5):551-559. 38. Alexander G, Staggers N. A systematic review of the designs of clinical technology: findings and recommendations for future research. ANS Adv Nurs Sci. 2009 Jul;32(3):252-279. 39. Melo MFV, Leone BJ. Introduction of new monitors into clinical anesthesia. Anesth Analg. 2008 Sep;107(3):749-750. 40. Gaba DM. Anaesthesiology as a model for patient safety in health care. BMJ. 2000 Mar;320(7237):785-788. 41. Koch SH, Staggers N, Weir CR, Agutter J, Liu D, Westenskow DR. Integrated Information Displays for ICU Nurses: Field Observations, Display Design, and Display Evaluation. In: Proceedings of the 53rd Annual Meeting of the Human Factors and Ergonomics Society. 429. San Francisco, CA; 2010. . CHAPTER 2 EVALUATIONS OF PHYSIOLOGIC MONITORING DISPLAYS: A SYSTEMATIC REVIEW 2.1 Abstract The purpose of this paper is to present the findings from a systematic review of evaluation studies for physiologic monitoring displays, centered on empirical assess-ments across all available settings and samples. The findings from this review give readers the opportunity to examine past work across studies and set the stage for the design and conduct of future evaluations. A broad literature search of the literature from 1991 to June 2007 on PubMed and PsycINFO databases was completed to locate data-based articles for physiologic mon-itoring device display evaluations. The results of this search plus several unpublished works yielded 23 publications and 31 studies. Participants were faster detecting an adverse event, making a diagnosis or a clinical decision in 18 of 31 studies. They showed improved accuracy in a clinical decision or diagnosis in 13 of 19 studies and they perceived a decreased mental workload in 3 of 8 studies. Eighteen studies used a within subjects design (mean sample size 16.5), and 9 studies used a between group design (mean group size 7.6). Study settings were usability laboratories for 15 studies and patient simulation laboratories for 6 studies. Study participants were anesthesiologists or anesthesiology residents for 19 studies and nurses for 5 studies. The advent of integrated graphical displays ushered a new era into physiological With kind permission from Springer Science+Business Media: Görges M, Staggers N. Evaluations of physiological monitoring displays: a systematic review. J Clin Monit Comput. 2008;22(1):45-66. ©Springer 2007 10 monitoring display designs. All but one study reported significant differences between traditional, numerical displays and novel displays; yet we know little about which graphical displays are optimal and why particular designs work. Future authors should use a theoretical model or framework to guide the study design, focus on other clinical study participants besides anesthesiologists, employ additional research methods and use more realistic and complex tasks and settings to increase external validity. 2.2 Introduction The use of physiological monitoring displays is an essential part of clinical care in contemporary health settings. More to the point, the design and interpretation of these displays allows clinicians to detect critical events in a time-sensitive manner, op-timally leading to improved patient outcomes. Empirical evaluations of physiological display designs have been published since the early 1990s when computer technology was advanced enough for graphical, real-time monitoring to occur. Yet, no systematic review of the field is currently available. Two previous, less formal reviews are published. Sanderson et al.1 discussed advantages and disadvantages of advanced display technology, comparing these dis-play methods for anesthesiology: Advanced visual displays, head-mounted displays, auditory displays and combinations thereof. As part of a literature review of 9 citations through the year 2002, Drews and Westenskow2 examined previous work on traditional and graphical displays for detection, diagnosis and treatment modalities in anesthesia. Both of these excellent reviews center on anesthesiology. However, nurses are the largest group of clinical display users in clinical settings. This review improves upon previous work by broadening the assessments to all evaluations in all settings, including citations through mid-2007, and employing formal systematic review techniques to analyze past work. The purpose of this paper is to present the findings from a systematic review of evaluation studies for physiologic monitoring displays, centered on empirical assess-ments across all available settings and samples. The findings will give readers the opportunity to examine past work across studies and set the stage for the design and 11 conduct of future evaluations. 2.3 Background The first recording of a human electrocardiogram (ECG) in 1887 and its im-provements by Einthoven led to the development of cardiac patient monitors. Com-puterized ECG was one of the first applications for continuous patient monitoring.3 Since then, standard cardiovascular patient monitoring has changed little. Only small enhancements, such as color displays or trending (both tabular and graphical) have been incorporated into displays available in the marketplace. A more significant but rather hidden improvement occurred with better alarm algorithms, e.g., outlined by Imhoff and Kuhls,4 and sensors to reduce the number of false alarms. Current physiological patient monitoring displays follow the single-sensor, single indicator paradigm, showing one waveform and/or numeric for each sensor.5 Some sensors provide more than one indicator, such as pulse oximeters or pulmonary artery catheters. Most important, all available monitors still require health care providers to integrate multiple sources of pertinent information in their heads to make an appropriate clinical decision. Some novel graphical displays are available commercially; however, few have been formally evaluated. Conversely, recent empirical evaluations for proposed in-tegrated displays have been completed, but only two are commercially available in the marketplace currently: (a) an anesthesia drug display evaluated by Syroid et al.6 and Drews et al.7 is in the GE CareStation's Navigator Applications Suite (GE Healthcare, Waukesha, WI), and (b) a variation of George Blike's display is in Dräger's Zeus anesthesia workstation (Dräger Medical AG, Germany). The numeric, polygon and histogram displays evaluated by Gurushanthaiah et al.8 were initially in the Ohmeda Modulus CD anesthesia machine (Ohmeda, Madison, WI now GE Healthcare). However, this anesthesia machine is no longer available and newer versions do not include the novel display. Thus, only two integrated displays in the commercial market have had the benefit of an empirical evaluation. 12 2.4 Methods A broad literature search of the literature from 1991 to June 2007 was undertaken to locate articles dealing with evaluations of physiologic monitoring device displays. The search began with the year 1991 because the technical capabilities for displays were not advanced enough before then to provide graphical displays. The search was performed on PubMed and PsycINFO databases using the terms found in Appendix A. The search yielded 1,012 (999 on PubMed and 13 on PsycINFO) references. Both authors independently assessed citations for relevancy using the following criteria: (a) physiological monitoring display evaluation, (b) empirical assessment, and (c) English language. Exclusion criteria were: (a) editorials or opinion pieces, (b) descriptions of usage or adoption only, (c) design explanations with no evaluation, (d) review articles, and (e) qualitative research. The raters compared relevancy results and discussed any differences in findings. Where differences existed, the citation was included for further evaluation. Additionally, if relevancy could not be determined from the title, the citation was included in the next step of the relevancy assessment. From these initial references, 93 articles were identified as being potentially rel-evant. The authors independently evaluated the abstracts and categorized them into one of the following: relevant, questionably relevant and not relevant. The raters compared the results for agreement; for any discrepancies, the raters discussed each abstract. If any question about relevancy remained, the article was rated as questionably relevant and the full article was retrieved for evaluation. At the end of this process, all articles rated as relevant or questionably relevant were retrieved for further evaluation. A total of 59 articles were retrieved, read, rated and discussed by the two raters. The articles were rated for relevancy in a dichotomous manner, yielding 18 articles. One additional article,11 published in late 2007 while this manuscript was under review, was added to the set because of its pertinence. Fugitive literature was included when it was discovered: (a) 2 posters, (b) 1 doctoral dissertation and (c) one 1 paper from a journal (Cognition, Technology & Work) not listed in PubMed or PsycINFO. The final set consisted of 23 references. 13 2.5 Results The 23 articles matching the relevance criteria are listed in Table 2.1. Several of the articles reported results of multiple studies; therefore, the total number of completed studies is 31. Each of the studies was evaluated using a quality assessment called QUASII.29 This new instrument was developed as a tool specifically for assessing empirical studies in clinical informatics. Items are organized around the four ‘‘threats to validity model'' of Cook and Campbell30 and Shadish, Cook and Campbell31 and were adapted from the general meta-analytic literature and accepted texts on evaluating research quality.32-34 During the item development for the instrument, clarification was achieved iteratively, until an inter-rater reliability with a final overall kappa between two raters of 0.85-0.94 was obtained. The QUASII scores for the articles ranged between 78 and 123 out of possible total of 126. 2.5.1 Study Settings Studies were completed in laboratories in Australia, Canada, Germany, Sweden, the United Kingdom and the United States; 12 of 31 were performed at the University of Utah. The most common study settings were usability laboratories (15 studies) or a patient simulation laboratory (6 studies). Two studies were conducted in a naturalistic environment, one on a medical intensive care unit and one in a meeting room of a neonatal intensive care unit. The remaining 8 studies used static computer screens, computer simulations and in 2 cases, paper mock-ups of designs where the setting was immaterial. 2.5.2 Study Participants Researchers used both clinical and nonclinical participants. Nineteen studies used anesthesiologists and anesthesiology residents. Six studies had various nurse, respira-tory therapist and/or physician participants. Six study samples were nonclinical-2 each with engineering students, general public and anesthesia staff, and psychology undergraduates. 14 Table 2.1: Physiological monitoring display evaluations Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Agutter et al. (2006)9 30 nurses (15 student nurses and 15 nurses) Static computer screens in a laboratory setting Design: Within subjects comparing a graphical visualization for arterial blood gas and respiratory values to a traditional numeric display. Nurse expertise as a between groups variable Task: 22 questions about acid-base and respiratory parameters Time to diagnosis, accuracy, and perceived workload Faster in responding accurately More accurate in the diagnosis and trending of acid-base questions More accurate in the diagnosis of oxygenrelated parameters Reduced perceived workload 115 Iterative design with usability evaluations of each design Fixed order of events, but one group started with visual graphic and the other with the traditional display 15 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Agutter et al. (2003)10 20 anesthe-siologists Human patient simulator in a simulated operating room Design: Between groups comparing cardiovascular values on a numeric and a graphical display Task: Two scenarios (anaphylaxis or AP during a total hip replacement and myocardial infarction or MI during a radical prostatectomy); each lasted 10 min, talk-aloud protocol Times: To detect an adverse event, to diagnosis, to treatment, vital sign deviations and perception of workload Faster MI detection time with the graphical display but no difference for AP No difference in time to diagnose Faster treatment time for MI using the graphical display Less BP and CVP deviation in MI using the graphical display Users rated the graphical display more useful than the control group No differences in perceived workload 88 Small sample size per cell No assessment of group equivalency Randomized scenario order and display condition Short 10-min scenario 16 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Albert et al. (2007)11 16 anethesi-ologists (7 attendings, three 2nd-year and six 3rd-year residents) Human patient simulator in a simulated operating room Design: Between groups comparing cardiovascular values on a numeric and a graphical display Task: Five scenarios (mild pain, myocardial ischemia/ infarction or MI, left ventricular failure or LVF, hypovolemia and acute respiratory distress syndrome or ARDS) each lasting 5-9 min, talk-aloud protocol Expert ranking of performance, times to diagnose and treatment, perception of workload Improved performance with the graphical display for mild pain, MI and LVF. No difference for hypovolemia and ARDS Faster detection time for MI, LVF and high pulmonary wedge pressure with the graphical display Faster treatment time for MI with the graphical display No effect on perceived workload 104 Small sample size per cell Randomized, counterbalanced design Data from the sepsis scenario was discarded, disrupting the counterbalanced design Short, 5-9-min scenarios 17 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Blike et al. (2000)12 7 anesthesi-ologists (5 senior residents and 2 attendings) Static computer screens in a laboratory setting Design: Within subjects comparing 3 display formats (numeric, object or OD and object minus shapes or OMS) Task: 2 diagnostic tasks in 10 randomly presented scenarios (5 with and 5 without shock) during 2 sessions (Displaysnumeric and OMS, then OD and OMS). Time to detect shock and accuracy of possible etiology. Faster detection time with OMS Worse accuracy in recognizing the clinical state with OD Faster etiology determination with the OD Both numeric and OD had higher error rates for etiology determination than OMS 86 Possible order effect due to display. OMS tested twice and etiology time significantly faster in session 2 Learning effect as detection time averaged 1.8 in 2nd session versus 2.2 in the 1st one Random order of scenario and display 18 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Blike et al. (1999)13 11 anesthe-siologists (senior residents and attendings) Static computer screens in a laboratory setting Design: Within subjects comparing graphical object and numeric displays Task: 10 clinical scenarios (5 with and 5 without shock) in a fixed presentation order during separate testing sessions. Time to decision and diagnostic accuracy of shock/no shock condition. Faster time to recognize noshock and determine shock etiology with object display Improved diagnostic accuracy with object display Lower proportion of erroneous diagnostic decisions with object display 106 Task simplicity (stated by the author) Could have assessed performance equivalency for levels of physicians Random order for scenarios, fixed display order Potential learning effect as same scenarios were repeated 19 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Cole and Steward (1994)14 8 respiratory therapists (4 supervi-sors) using paper sheets Design: Within subjects comparing a paper graphical metaphor to a table of respiratory values Task: 32 trials judging the patient's respiratory state (4 different states 4 trials x 2 displays). Ordered 2 different ways. Time to decision and accuracy Anecdotal report that learning times for the metaphor took less than 5 min Time halved to make a decision with metaphor Similar error rates with both 94 Counterbalanced blocks (4) of 8 trials. Random assignment of subjects to blocks Potential learning effect (only 2 sequences versus random order) Less than 10 min training time for all subjects Doig (2006) [15, study 2] 30 critical care nurses Static computer screens in a laboratory setting Design: Between groups comparing a new visual graphic with the standard numeric display Task: 25 multiple response questions based on patient scenarios. Usability questionnaire Time and accuracy of diagnosis or clinical decision, display usability No improvement or reduction in data interpretation accuracy Improvements in response accuracy for 2 scenarios, one for each display type Graphical display was favorably rated in terms of acceptance and usability 94 Randomized order of scenarios 5-7 min short display training provided for both groups Group equivalency assessed 20 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Drews et al. (2006)7 30 anesthe-siologists with three levels of expertise Human patient simulator in a simulated operating room Design: Between groups comparing a visual display of real-time drug concentrations to a control group without the display Task: Intravenous anesthesia for simulated shoulder surgery. Surgical plan altered once to increase task complexity Hemodynamic control of a simulated patient (deviation from baseline vital signs), patient induction, wake up, overall procedure times, perceived workload, satisfaction and subjective utility of the drug display Significantly less heart rate and blood pressure deviations using drug display 2-min faster wake-up time Shorter total procedure times Higher subjective performance with the display No interaction effects for expertise and asks 118 Standardized training for both groups Surgeon interacting with anesthesiologist following prescripted comments, questions and visual cues 21 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Effken et al. (1997)16 Study 1. 18 psychology undergrad-uates Computer simulator in a laboratory setting Design - Study 1: Between groups comparing 3 displays (traditional strip-chart or TSC, integrated balloon or IBD, and etiological potentials or EPD) showing cardiovascular values Task: Three scenarios (low heart strengths, high resistance, low fluid) twice each Study 1: Time to initiate treatment, number of drugs used, percentage of time in the target range Study 1: No differences for time to treat Fewer drugs and more time in target vital sign range with EPD Low heart strength scenario showed the greatest time in the vital sign target range Study 1: 78 Psychology students not familiar with clinical tasks Small sample size Training for 20-30 min on each display Use of simulated drugs influencing only 1 parameter each Effken et al. (1997)16 Study 2: 11 psychology undergrad-uates Computer simulator in a laboratory setting Study 2: Same as study 1 using a within subjects design Study 2: Same as above Study 2: Faster times to initiate treatment for both IBD and EPD Fewer drugs with EPD overall Fewer drugs with EPD in low fluid scenario Low heart strength scenario showed drugs TSC >IBD > EPD Study 2: 96 Use of psychology students Counterbalanced scenario presentation order, but same display order Training on all displays 22 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Effken et al. (1997)16 Study 3: 6 experienced critical care nurses and 6 nursing students Computer simulator in a laboratory setting Study 3: Same as Study 2 adding skill levels as a between groups variable. Study 3: Same as above Study 3: Faster time to initiate for IBD and EPD with no difference between skill levels Fewer drugs with EPD but fewer drugs in low fluid and heart strength scenarios only Greater time in cardiovascular target with EPD Novices equaled experts' target time performance with IBD No difference in low fluid for IBD and EPD More time in target with EPD in than with the two other displays Study 3: 109 23 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Görges et al. (2006)17 12 2nd- and 3rd-year anesthesia residents using static computer screens Poster pre-sentation Design: Within subjects comparing three different trend windows (control, simple trend and complex trend) Task: 6 scenarios (control, bronchospasm, pulmonary edema, pneumothorax, pulmonary embolism, malignant hyperthermia, control scenario) Time to correct diagnosis and perceived workload No differences in time with a trend toward decreased times for correct diagnosis using simple trend and complex trend 113 Randomized order of events and displays Small sample size Should reanalyze data using repeated measures ANOVA versus Fisher's ANOVA 24 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Gurushan-thaiah et al. (1995)8 Study 1: 13 anesthesiol-ogy residents (1-4th year) Computer simulator in a laboratory setting Design - study 1: Combined within subjects comparing 3 displays (polygon, histogram or numeric), and between groups for high (9 trials each per display) and low (4) stimuli. Subsequently, frequency data paired to create a within subjects variable Task: 6 anesthesia scenarios with 10 physiologic variables lasting 6 min during 2 separate sessions Study 1: Time to detect change, accuracy (which variable and the direction of the change) Study 1: No effect for time on stimulus frequency or accuracy when analyzed as a between groups variable Faster times for all other residents compared to firstyear residents Faster detection time with the histogram or polygon display Increased accuracy (changed variable and direction of change) with histogram and polygon display Correct identification responses occurred more rapidly than incorrect ones and no difference between identification and direction of change Study 1: 123 Pilot work done Training with competency levels verification to determine adequacy Small sample for between groups design Assessed for confounders (caffeine, alcohol, sleep) Change detection without interpretation of cause 25 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Gurushan-thaiah et al. (1995)8 Study 2: 5 of the same subjects studied in 4 additional sessions Study 2: Same task, design, displays with additional trials. Randomized, blinded, Latin-squared within groups design with high/low frequency randomized in pairs. Study 2: Same Study 2: Faster response time and accuracy for histogram and polygon displays No performance (time or accuracy ) improvement with additional sessions (users were sufficiently practiced) Study 2: 123 Randomized, blinded, crossover, Latin-Square design Gurushan-thaiah et al. (1995)8 Study 3: 5 nonmedical volunteers (anesthesia staff) Study 3: Between groups (anesthesiology users and nonmedical users) Study 3: Same Study 3: No differences for time with displays for nonmedical volunteers Decreased accuracy between nonmedical and anesthesia residents with all displays Study 3: 102 26 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Jungk et al. (2000)18 Study 1: 16 anesthesiol-ogists Anesthesia computer simulator in a usability laboratory Design-Study 1: Within subjects comparing a simulator monitor with the same monitor plus an ecological interface (EI) Task: Two critical incidents (blood loss and cuff leakage) during a simulated inguinal hernia repair. Eye-tracking and think-aloud protocol. Study 1: Number of successful trials (identifying critical events), time to identify events; time and frequency of eye fixation on various display regions Study 1: 43% of the surgery time spent on the EI Faster identification of cuff leakage with EI Equivalent time to identify blood loss in both 3 of 8 subjects using the EI missed the blood loss event; none did with the control Eye fixation was diverse Study 1: 111 3 subjects had experience with the EI 45min training, familiarization times 27 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Jungk et al. (2000)18 Study 2: 8 anesthesiol-ogists Anesthesia computer simulator in a usability laboratory Study 2: Within subjects design and same tasks as Study 1 except the use of a redesigned ecological interface display (EI) Study 2: Time to identify critical events and number of successful trials Study 2:All correctly identified blood loss but 1 of 8 missed the cuff leakage event Faster identification of both events with the EI Study 2: 113 45 min training or familiarization times All subjects (same subjects as study 1) used the new design. Results compared to the previous study 28 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Jungk et al. (1999) 19 20 anesthe-siologists (experts and novices) Static computer screens in a laboratory setting Design: Within subjects comparing 2 new displays (profilogram or PD and ecological display or ED) to a traditional trend display (TD) Task: Normalizing vital signs from a pathological start state by adjusting sliders. Think-aloud protocol and eye-tracking used. Ideal circulatory performance (fewer frequency of slider actions, eye tracking parameters, vital sign parameters, and time to completion) ED accuracy highest. Goal not achieved in 37% of tasks with TD, 19% with PD and 13% with ED. No effect of experience or age on analysis parameters Faster trial time, lower frequency of slider actions and eye fixations for the traditional TD. Correlation between time and entropy (strategic scan paths = system understanding) for ED and TD 83 Unclear whether displays and tasks were counterbalanced Potentially subjects still learning the task with only 2 tasks Analyzed differences between trial 1 & 2 Control task not clinically relevant 20-30 min training 29 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Law et al. (2005) 20 40 neonatal intensive care unit volunteers (3 levels of nurses and 2 levels of physicians) Static computer screens tested in a meeting room Design: Within subjects, counter-balanced comparing text summaries to trend graphs for NICU patients Task: 8 medical scenarios each for 2 conditions. Actions selected from a standard list of 18 items. Conditions completed on days 0-31, most in 3-21 days. Scenario completion time, main expected actions, proportion of correct actions, proportion of nurse and doctor actions, total number of actions and of these the number of appropriate actions Higher accuracy with text for main actions, proportion of correct ones, nurse/doctor actions, total number of actions and proportion of chosen actions that were appropriate Higher subjective preference for the graphical display No differences in speed of responses, groups or an interaction effect 112 Scenarios may not be equivalent Subjects may remembers scenarios during short intervals No randomized order of events or presentation condition Trends contained information not available in the text presentation 30 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Liu and Osvalder (2004)21 20 nursing students Static computer screens in a laboratory setting Design: Within subjects comparing a circular graphical design and numerical reference data Tasks: Six scenarios showing before and after state of a ventilator deviation. Randomized task sequences during 2 testing sessions. Objective: Change detection time, 3 types of errors (number of deviations, their meaning and the overall situation) Subjective: Deviation severity, reasons for their decision and state opinions about the circular display design. No differences in detection time. Fewer errors in interpreting the meaning of changes No difference in the number of detected deviations or assessing the overall situation Most preferred and found it easier to detect changes and assess the overall situation with the circular, graphical display 108 Nursing students were new to ventilator issues (construct validity issue) Used a pilot study to optimize study methods Discussed prototype with investigator with added scenarios 31 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Michels et al. (1997) 22 10 anesthe-siologists Anesthesia computer simulator in a laboratory setting Design: Between groups comparing graphical to traditional numeric and waveform display of physiological variables Task: 4 critical events (blood loss, inadequate paralysis, endotracheal tube cuff leak, depletion of soda lime) Detection time and correct identification of critical anesthesia events Results dependent upon clinical event Faster detection for 2 of 4 events (inadequate paralysis and cuff leak) with graphical display Correct identification sooner for 3 of 4 events (paralysis, cuff leak and blood loss) with graphical display 94 Very small sample per cell (5) No assessment for group equivalency Same sequence of scenarios used for each participant 15 min introduction to displays Alarms silenced to rely on visual observations only 32 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Ng et al. (2005)23 10 engineering students Simulated clinical setting in a usability laboratory Design: Within subjects comparing 3 alarms: vibro-tactile, auditory alarm and a combination of the two. Task: 24 randomly generated alarm events for training. 30 events during a 30 min interval based on real clinical data using 6 simulated alarm patterns in three levels of severity. Subjects trained to recognize 6 alarm patterns Training, identification rate (number of events detected), accuracy of alarm patterns, response time, comfort and satisfaction No difference in number of training alarms required to learn display alarms Higher identification rate with the vibro-tactile than audible or combined alarm display Higher identification rate for combined than auditory alone No difference in time to respond to an alarm Perception that vibro-tactile would attract attention more readily Preference for vibro-tactile (4) than auditory (3) or combination (3) Reduced accuracy for combined than vibro-tactile alone (for Level 1 alarm only) 90% of the subjects reported some discomfort with the vibro-tactical alarms. Subjects preferred the vibro-tactile alarm despite the discomfort 107 Use of engineering students performing clinical tasks Auditory accuracy for level 1 alarm only Pilot study used to optimize vibro-tactile display Randomized display order. Under if scenarios randomized 33 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Syroid et al. (2002)6 15 anesthe-siologists (seven attendings, three 2nd-year and five 3rd-year residents) Anesthesia computer simulator in a laboratory setting Design: Within subjects, counter-balanced with and without a graphic display showing intravenous drug concentrations Tasks: 2 clinical scenarios (abscess drainage and mass removal) using the same 3 drugs Precision in drug adminis-tration, number of bolus doses, vital signs to indicate pain response, and perceived workload. Lower variation (tighter control) in the effect-site concentrations of anesthetics with the drug display During maintenance, more remifentanil doses given with the drug display No differences in propofol boluses No differences in vital signs (pain levels) Perceived decreased mental demand, frustration, effort and increased performance with the drug display 116 Subjects commented that the bolusing of anesthetic agents was not realistic Randomized scenario and display order Simulation required extra effort to obtain patient responses Low task complexity, short scenarios, artificial simulation 34 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Wachter et al. (2006)24 19 clinical volunteers (nine anesthesia faculty, four 2nd-year residents and six 3rd-year residents) from 2 universities Patient simulator in a usability lab Design: Between groups comparing a pulmonary graphical display to traditional numeric displays Task: Five scenarios (4 adverse, obstructed endotracheal tube, endobronchial intubation, intrinsic PEEP, hypoventilation; 1 normal event). Time to correct diagnosis, time to treatment (experts viewed videotapes) and perception of workload Faster detection and treatment times for 2 of 4 events - obstructed endotracheal tube and intrinsic PEEP events using the graphical display. Unnecessary treatment given by 3 clinicians using the graphical and 5 using numerical display No difference in diagnostic accuracy Lower subjective workload for obstructed endotracheal tube and intrinsic PEEP scenarios. 89 No assessment of group equivalency. Did not measure critical individual differences Pilot study used to determine adequate training time Randomized order of events No data about group equivalency No discussion about unnecessary treatments 35 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Wachter et al. (2005)25 32 caregivers (critical care physicians, nurses and respiratory therapists) Pulmonary metaphor graphical display used in an actual intensive care unit Design: Descriptive 11 day observational study of display use in a medical intensive care unit. Display observations per caregiver visit, perceived usefulness, acceptance, desirability and accuracy of the display Profession/number of times entering the room/number of display observations per visit Nurses/ 775/ 1.3, Respiratory therapists (RTs)/ 74/ 3 Physicians/ 34/ 6 Physicians and RTs looked at the display more often over the course of the study No difference in questionnaire response for caregiver groups Perceptions ranged from 5-6.5 (0-9 scale on usefulness, desirability, accuracy and acceptance) N/A, Descriptive Study Display provided new (etCO2) information not available to caregivers beforehand Mid-scale perception ratings interpreted as positive 36 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Wachter et al. (2003)26 46 clinicians (22 anesthes-iologists, 1 nurse anes-thetists, 18 residents and 5 medical students from 3 facilities) Static computer screens in a laboratory setting Design: Descriptive for 5 design iterations for a pulmonary graphical display evaluated using paper-based tests Correct identification of pulmonary design components to anatomical parts and pulmonary variables, ability to diagnose pulmonary events. Improved anatomical intuitiveness by 25% (to 98%) and variable mapping intuitiveness by 34% (to 91%) for 5th design Fifth design decreased diagnostic accuracy by 4%. (to 79%). N/A, Descriptive Study Use of multiple choice tests limited choices for subjects Different compositions of iteration testing groups as well as different sample sizes Participants not given waveforms or history for displayed values available 37 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Watson and Sander-son (2004)27 Study 1: 23 paid general public participants (7 men, 16 women) Laboratory setting Design-Study 1: Within subjects comparing 3 recorded respiratory sonifications for 3 conditions (respiratory rate or RR, end-tidal carbon dioxide or etCO2 and tidal volume or VT). Task: 12 anesthesia scenarios (3 for training) lasting 4.5-5min each with physiological events and mechanical changes Study 1: Assessing abnormality (high, low or normal value) and direction (increasing, decreasing or steady), confidence of judgment and perception of workload Study 1:Improved abnormality assessment with the varying sonification, especially for sonification of etCO2 and VT, which also had a slight preference in user preference. No effect for direction judgments Subjects preferred the varying tone for RR, VT and etCO2 No workload effect Study 1: 96 Use of the general public for a clinical task Large age range (19-55) Possible order effect Use of prerecorded audio files without scenario randomization 38 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Watson and Sander-son (2004)27 Study 2: 11 anesthesiol-ogists and 10 information technology postgradu-ates Laboratory setting Design-Study 2: Within subjects, same objectives. Task: Six scenarios with fewer abnormal changes than Study 1 Study 2: same as in study 1 Study 2:Improved abnormality judgments and direction for anesthesiologists than IT postgraduates. Anesthesiologists had higher perceived workload but not significantly Study 2: 104 Arithmetic control task Use of prerecorded audio files. No scenario randomization 39 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Watson and Sander-son (2004) 27 Study 3: Same participants as in study 2 Laboratory setting Design-Study 3: Same design and objectives Task: Nine scenarios lasting approximately 9 min each, using a computer simulation with sonification alone (S), visual display (V) and combined (SV). Used a distracter task of arithmetic determinations. Added an additional alarm for heart rate or HR. Arithmetic accuracy communicated as the main study goal Study 3: same as in study 1 Study 3:Improved abnormality judgment main effect with SV, then V, then S but no effect for anesthesiologists Higher abnormality judgment with HR task and least with VT Anesthesiologists performed better than IT postgrads Less directional accuracy with VT than other events Higher confidence in O2 judgments and lowest in RR Anesthesiologists preferred the combined mode although it was perceived to have the highest workload Study 3: 104 Quasi-randomized query for parameters Potential learning effects 40 Table 2.1 continued Source Sample, setting Study design, tasks Dependent variable(s) Key findings QUASII score and quality considerations Zhang et al. (2002) 28 Study 1: 12 anesthesiol-ogists (attending and residents) Human patient simulator in a simulated operating room Design-Study 1: Within subjects comparing Blike's 3-D object display to traditional numerical display Task: 6 scenarios in random order for training. Four 10-min events (hypovolemia, myocardial ischemia, arrhythmia, bronchospasm) Study 1: Time to recognize event, time to diagnose and situational awareness (SA) scores Study 1:No difference in event recognition time for cardiovascular events Faster detection times for bronchospasm with the 3-D object display. Interaction effect: Intermediate level SA scores greater for hypovolemia with the object display. Interaction effect: Low level SA scores greater during arrhythmia, hypovolemia and bronchospasm with traditional displays Study 1: 105 Issues with training, practice Potential order effect for displays Randomized scenario order Simulation freeze technique to allow subjects to answer questionnaires Scenarios had different difficulty levels 41 Nine of the 31 studies reported the sample's mean age, ranging from 31-42.6 years. In one paper27 the ages of the nonclinical samples vary from 19-55 and 29-62 in comparison to the clinician group's age range of 23-44 years. Six of the 31 studies report the expertise of participants in mean postgraduate years, ranging from 5-13.9 years. Ten studies did not report expertise while 13 studies include samples with 2 or more levels of expertise. Doig15 mentioned that study groups were balanced for intensive care nurses' expertise. Other participant variables were measured: 5 studies measured hr of sleep in the previous night, 5 reported participants' caffeine and medication consumption and 1 obtained additional measures such as color vision, vision quality, and dominant hand. Average sample sizes ranged from 5-46 subjects. Within subjects designs had a mean sample size of 16.5 while between group designs had an average of 7.6 participants per cell. Total sample sizes for between groups studies ranged from 5 to 30. 2.5.3 Display Type A variety of displays were studied: 13 hemodynamic/cardiovascular, 6 pulmonary/ respiratory, 4 integrated anesthesia and 2 anesthesia drug graphical displays, 3 respi-ratory sonifications, and 1 each vibro-tactile and sonification display, arterial blood gas graphic and physiologic trend graphic. All but Görges et al.17 reported significant improvements for accuracy and/or speed with the new designs. 2.5.4 Study Design Eighteen studies used a within subjects design while 9 used a between groups design. Two studies employed combined designs (both within subjects and between groups), and two other studies were descriptive (an observation and a description of design iterations for a pulmonary metaphor). Twenty-one studies randomized (or counterbalanced) scenario order and 10 randomized display order. In fact, Gurushan-thaiah et al.8 used Latin-squared randomization to guide the order of tasks. 42 2.5.5 Tasks Fifteen studies devised anesthesia scenarios and 2 others used medical decision tasks. Seven studies used deviation or event detection tasks while 2 studies used multiple choice questions about respiratory events. The 2 descriptive studies outlined the use of the display in normal clinical workflow. Nonclinical participants worked with the clinical scenarios in 6 studies. These participants included psychology students,16 nonmedical anesthesia staff,8 engineering students,23 the general public and IT postgraduates,27 and bioengineering students.28 Twenty-two authors reported giving training to participants while 2 studies pro-vided ‘‘instruction.'' Nineteen authors reported that participants were allowed to practice with the new device. The combination of practice and training with displays lasted from 2-45 min. One author allowed more practice if participants did not meet cut scores. Seven authors either used cut scores for admitting participants into the study or had participants practice until specific performance goals were met. 2.5.6 Dependent Variables The most common dependent variable was time to complete a task (make a diagnosis, detect an adverse event or initiate treatment), measured in 30 of the 31 evaluation studies. Participants were faster detecting an adverse event or making a diagnosis or decision in 18 studies.7-14, 18, 19, 22, 24, 28 Participants in 13 of 19 studies showed improved accuracy in a clinical decision or diagnosis.8, 9, 11, 13, 15, 19-23, 27 Five studies used a control task, measuring the percentage of time spent within a target range or deviations in vital signs. With graphical designs, participants6, 7, 10, 16 had less vital sign deviations or deviations from a target range. Three of 8 studies showed decreased perceived workload, with a graphical design,6, 9, 24 and 3 studies described screen display regions of interest measured with an eye tracker. Other dependent variables included 3 studies measuring satisfaction, subjective utility, situational awareness, display usefulness and whether the scenario was realistic. Overall, these studies demonstrated the positive impacts of a graphical design on speeding clinician time to detect an event, determine a diagnosis, determine a correct diagnosis and stay within a target range of variables. 43 2.6 Discussion None of the studies reported using a theoretical model or framework to guide the study or its methods although a number of theoretical works are now available.35-39 Theoretical models or frameworks are organizing structures researchers can use to assist with study design. These conceptual structures allow researchers to consider major variables of interest as well as potential confounding variables. For instance, frameworks with a developmental timeline37, 38 remind researchers to consider both practice and training because users and technology change over time. Likewise, individual characteristics guide researchers to measure and/or control for partici-pant differences. These kinds of elements might appear straightforward to readers; however, these variables were not consistently reported or considered in published studies. 2.6.1 Study Settings The most common settings for studies were usability laboratories or those sim-ulating operating rooms (ORs). However, practicing clinicians use monitors in a number of settings besides the OR, e.g., emergency departments, telemetry units, intensive care units, and prehospital modes of transportation such as air transport and ambulances. In particular, pediatric units, neonatal displays, and even battlefields are not represented in available studies. Remote monitoring of critical care patients, e.g., as outlined by Breslow et al.,40 is a relatively new care delivery method, presenting a novel setting for future evaluations. With the exception of select intensive care units, settings mentioned here are as yet unexplored or simulated in usability laboratories. Drews and Westenskow2 noted that, at this point, researchers cannot be clear about how the studies performed in lab settings correlate to participants' performance in actual clinical settings. The combination of embedding the participant into a more realistic environment, like a simulated clinical setting with a human patient simulator, is a good step forward; however, researchers will want to test their displays in actual clinical settings as well. 44 2.6.2 Study Participants Anesthesiologists comprised 61% of the total participants in past studies. Displays are not yet designed and evaluated for the largest group of monitor users: Nurses. Their concerns and tasks are distinct from anesthesiologists, so designs are needed for nurses' particular tasks and mental models. More important, current commercial physiological displays do not supporting a walk-by, at-a-glance assessment of the patient's status, a benefit needed by nurses as they multitask during patient care. Respiratory therapists (RTs) are another group of understudied monitor users. Display users in various settings will not be homogeneous even within professions. For instance, nurses performing trauma care in the emergency department may require different display designs than nurses in intensive care units with the more routine monitoring that occurs there. Likewise, physicians other than anesthesiologists have not been included in evaluation studies, except in two studies.20, 25 Participant demographics and individual characteristics are inconsistently reported and/or controlled.2 Age was not reported in 18 studies and caffeine intake was not reported in 23 studies. Expanding upon that notion, the age range of study participants, when reported at all, varied as much as 30 years. Factors such as age and caffeine intake may be potential confounding variables in studies using response times as a dependent variable. For example, Gurushanthaiah et al. [8, study 3] reported an influence of age and caffeine consumption on participant response times for nonclinical volunteers. Age and caffeine did not influence their results for clinicians; however, the sample size of 5 was very small. Response time and age are positively correlated so including participants in their 50s or 60s should be carefully considered in the future and a more narrow age range should be contemplated. Expertise is another important variable to track or control, especially if a between-groups experimental design is used. Levels of expertise may be a confounder to the observed results, particularly when students are combined with more seasoned clinicians. Future researchers should routinely report participant demographics and pertinent variables such as caffeine intake. Last, using nonclinical participants, while convenient, raises questions about the external validity and significance of the results. That anesthesiologists out-performed 45 IT professionals or the general public is not surprising. 2.6.3 Study Designs The majority of studies used within subjects designs. These are particularly well suited to studies involving response time because they control for individual differences which can vary widely across users. Studies using between groups designs received lower quality ratings primarily due to the control for individual differences and the larger sample size required to assure adequate power. Six of the 9 studies with a between groups design had fewer than 15 participants per cell (mean = 7.6) and did not assess group equivalence. No researcher reported conducting a power analysis. Without a power analysis, researchers should have at least 15 per cell in a between group study to assure adequate power.41 2.6.4 Tasks and Scenarios A few authors reported validity assessments for clinical scenarios, e.g., Blike et al.12 or Doig,15 using clinical experts to validate scenarios or consulting sample case studies from the medical literature. Other authors shortened scenarios for study purposes, e.g., Syroid et al.6 or Wachter et al.24 While these abbreviated scenarios are likely to increase the mental workload, they artificially condense time frames,2 which may confuse the study participant or cause them to eliminate potentially correct diagnoses. Future researchers can learn from these examples by including a scenario validity assessment, e.g., using external experts and considering the use of more realistic scenarios. Multiple scenarios are likely to have different levels of complexity, e.g., detecting bronchospasm compared to detecting an arrhythmia28 or detecting bronchospasm compared to detecting a pulmonary embolism.17 Differences in task complexity need to be assessed and controlled for carefully, as they may become additional covariates that can mask valid results. Once understood, complexity levels can either be randomized to reduce an order effect or controlled across groups to assure equivalency. Of course, tasks can only be randomized if this technique does not destroy the clinical relevancy of the scenario. Otherwise, several scenarios can be presented with equivalent tasks in differing order. 46 Low mental workload is common across current studies. Displays were essentially isolated from other stimuli, merely showing waveforms and numeric information of the different sensors familiar to clinicians. In most studies, participants can focus exclusively on the required control or diagnostic task without competing demands. Sanderson et al.1 warn that new displays reveal higher order properties of patient states, yet their benefits in high mental workload situations is unknown. In a realistic environment, a clinician often takes care of more than one patient and may need to perform several tasks at once. Attention to clinician mental workload is needed in the future. New designs may include variables not typically measured in the clinical setting, creating a dilemma for designers.22 Choices are: (a) to not display certain elements of the design, (b) to not show the display at all, or (c) to assume values in order for the display to function, all which might pose substantial problems for obtaining FDA approval. Albert et al.11 offered one solution: condensing the Agutter et al.10 display by the missing variables while preserving the overall metaphor. Future researchers can eliminate nonclinical control tasks such as arithmetic dis-tracter tasks, e.g., as in.19, 27 These do not assist with the external validity of the study and they create a different mental workload than typical clinical tasks. More relevant control tasks are participants' pagers beeping during the scenario, staff talking to the participant during the task, overhearing staff cell phone conversations and other ambient noise. Interruptions are a common occurrence in all settings, yet only a few studies7, 10, 24 integrated disruptions and distractions into their simulated or actual study settings, e.g., having an investigator distract and interrupt the par-ticipant by acting like a surgeon. Scenarios with distractions and requirements for multitasking7, 42 provide for more realistic environments for participants and aid in requirements development for designers. Seven studies used cut-scores to test training adequacy before participants were admitted to the study. Cut-scores or other competency assessments can be useful for future researchers to decrease individual differences and variability across subjects. Pilot tests are particularly useful to test study methods, training requirements and to determine the number of tasks to display to ensure adequate practice. Researchers can 47 display performance times plotted against tasks to observe the resulting performance curves. When the performance curve flattens, the number of tasks and practice is adequate. 2.6.5 Future Display Evaluations Thirty of 31 studies reported significant findings with the new display. This is likely a publication bias; however, from the collected studies, one might surmise that any novel design is a significant one. The next logical step may be to compare graphical designs to each other to find out why particular designs are significant. Additionally, adding a qualitative portion to a study could identify why users find particular designs optimal. Sanderson43 cites an interview with Matt Weinger about future patient monitoring that would provide real-time, continuous information on organ functions down to the cellular level. Designers will be challenged to integrate vast numbers of values into logical displays to aid clinical decision-making under time pressure. The NASA-TLX44 is a tool used in 6 studies. The tool measures various aspects of perceived mental workload, is easy for participants to use, and provides another dimension to users' work with displays. The development of this instrument is de-scribed in an original paper44 and a comparison with alternative methods of workload assessments instruments can be found in Rubio et al.45 Future researchers may wish to incorporate one of these tools into their work and also perform formal psychometric testing for the instrument to build upon the fine conceptual development of this tool. All studies to date have examined only the dyad of user and display. However, clinicians typically work as teams in clinical environments. How a monitor might be devised to address the work of teams has not been studied. Last, the opportunities for future researchers are great because many currently available displays lack empirical evaluations. 2.7 Conclusions The advent of integrated graphical displays ushered a new era into physiological monitoring display designs. This systematic review analyzed 31 studies of these novel designs. All but one study reported significant differences between traditional, 48 numerical displays and novel displays using graphs or sound - decreasing the time to detect an event or the time to make a diagnosis or increasing the accuracy of the diagnosis. Yet we know little about which graphical displays are optimal and why particular designs work. Most studies focused on anesthesia-related participants while future work can explore nurses, respiratory therapists, nonanesthesia physician users as well as teams of users. The majority of current studies were conducted in laboratory settings. In the future, more realistic, complex tasks and settings would provide greater external validity for studies. Most acute care clinical settings and concomitant tasks in emergency departments, pediatric units, ambulances, neonatal intensive care units, and even battlefields are, as yet, unexplored. Future researchers can improve their studies by: (a) Using a theoretical model or framework to guide the study, (b) Reporting and controlling for individual differences of participants, (c) Completing validity assessments of clinical scenarios to ensure clinical realism, (d) Assuring adequate power in the study by conducting a power analysis to estimate numbers of required participants, and (e) Adding a qualitative component to studies in order to better understand how designs work for clinical decision-making. 2.8 PubMed Search Terms PubMed search terms (‘‘computer simulation''[MeSH] OR ‘‘data display''[MeSH] OR ‘‘monitoring, physiologic'' [MeSH:noexp] OR ‘‘patient Journal of Clinical Mon-itoring and Computing simulation'' [MeSH] OR ‘‘user-computer interface'' [MeSH] OR ‘‘models, biological''[MeSH:noexp] OR ‘‘computer graphics''[MeSH]) AND (‘‘blood pressure''[MeSH] OR ‘‘heart rate'' [MeSH] OR ‘‘intubation, intra-tracheal/ instrumentation'' [MeSH] OR ‘‘hemodynamic processes''[MeSH] OR ‘‘respi-ration''[ MeSH] OR ‘‘respiration, artificial''[MeSH] OR ‘‘anesthesiology''[MeSH] OR ‘‘Anesthetics''[MeSH] OR ‘‘Critical Care''[MeSH] OR ‘‘Intensive Care Units'' [MeSH]) AND (ecological[tiab] OR graphic[tiab] OR graphics [tiab] OR graphical[tiab] OR GUI[tiab] OR visual[tiab] OR simulator[tiab] OR simulation[tiab]) AND English[lang] AND (‘‘1991/01/01''[EDAT] : ‘‘2007/06/01'' [EDAT]) AND ‘‘Journal Article''[ptyp] 49 2.9 Acknowledgments The authors would like to thank Dr. Dwayne Westenskow for his thoughtful comments on a previous draft of this manuscript. 2.10 References 1. Sanderson PM, Watson MO, Russell WJ. Advanced patient monitoring dis-plays: tools for continuous informing. Anesth Analg. 2005 Jul;101(1):161-168. 2. Drews FA, Westenskow DR. The right picture is worth a thousand numbers: data displays in anesthesia. Hum Factors. 2006 Spring;48(1):59-71. 3. Jenkins JM. Computerized electrocardiography. Crit Rev Bioeng. 1981 Nov;6(4):307-350. 4. Imhoff M, Kuhls S. Alarm algorithms in critical care monitoring. Anesth Analg. 2006 May;102(5):1525-1537. 5. Goodstein L. Discriminative display support for process operators. In: Rasmussen J, Rouse W, editors. Human Detection and Diagnosis of System Failures. Springer; 1981. p. 433-449. 6. Syroid ND, Agutter J, Drews FA, Westenskow DR, Albert RW, Bermudez JC, et al. Development and evaluation of a graphical anesthesia drug display. Anesthesiology. 2002 Mar;96(3):565-575. 7. Drews FA, Syroid N, Agutter J, Strayer DL, Westenskow DR. Drug delivery as control task: improving performance in a common anesthetic task. Hum Factors. 2006 Spring;48(1):85-94. 8. Gurushanthaiah K, Weinger MB, Englund CE. Visual display format affects the ability of anesthesiologists to detect acute physiologic changes. A lab-oratory study employing a clinical display simulator. Anesthesiology. 1995 Dec;83(6):1184-1193. 9. Agutter J, Albert R, Syroid N, Doig A, Johnson K, Westenskow D. Arterial Blood Gas Visualization for Critical Care Clinicians. In: Proceedings of the Annual Meeting of the Society for Technology in Anesthesiology. San Diego, CA; 2006. . 10. Agutter J, Drews F, Syroid N, Westneskow D, Albert R, Strayer D, et al. Evaluation of graphic cardiovascular display in a high-fidelity simulator. Anesth Analg. 2003 Nov;97(5):1403-1413. 11. Albert RW, Agutter JA, Syroid ND, Johnson KB, Loeb RG,Westenskow DR. A simulation-based evaluation of a graphic cardiovascular display. Anesth Analg. 2007 Nov;105(5):1303-1311. 50 12. Blike GT, Surgenor SD, Whalen K, Jensen J. Specific elements of a new hemodynamics display improves the performance of anesthesiologists. J Clin Monit Comput. 2000;16(7):485-491. 13. Blike GT, Surgenor SD, Whalen K. A graphical object display improves anesthesiologists' performance on a simulated diagnostic task. J Clin Monit Comput. 1999 Jan;15(1):37-44. 14. Cole WG, Stewart JG. Human performance evaluation of a metaphor graphic display for respiratory data. Methods Inf Med. 1994 Oct;33(4):390-396. 15. Doig AK. Graphical Cardiovascular Display for Hemodynamic Monitoring [PhD Thesis]. University of Utah. Salt Lake City, UT; 2006. 16. Effken JA, Kim NG, Shaw RE. Making the constraints visible: testing the ecological approach to interface design. Ergonomics. 1997 Jan;40(1):1-27. 17. Görges M, Förger K, Westenskow DR. Trend Based Decision Support System For Anesthesiologists Improves Diagnosis Speed and Accuracy. In: Proceedings of the Annual Mountain West Biomedical Engineering Conference. Snowbird, UT; 2006. . 18. Jungk A, Thull B, Hoeft A, Rau G. Evaluation of two new ecological interface approaches for the anesthesia workplace. J Clin Monit Comput. 2000;16(4):243- 258. 19. Jungk A, Thull B, Hoeft A, Rau G. Ergonomic evaluation of an ecological interface and a profilogram display for hemodynamic monitoring. J Clin Monit Comput. 1999 Dec;15(7-8):469-479. 20. Law AS, Freer Y, Hunter J, Logie RH, McIntosh N, Quinn J. A comparison of graphical and textual presentations of time series data to support medical decision making in the neonatal intensive care unit. J Clin Monit Comput. 2005 Jun;19(3):183-194. 21. Liu Y, Osvalder AL. Usability evaluation of a GUI prototype for a ventilator machine. J Clin Monit Comput. 2004 Dec;18(5-6):365-372. 22. Michels P, Gravenstein D, Westenskow DR. An integrated graphic data display improves detection and identification of critical events during anesthesia. J Clin Monit. 1997 Jul;13(4):249-259. 23. Ng JYC, Man JCF, Fels S, Dumont G, Ansermino JM. An evaluation of a vibro-tactile display prototype for physiological monitoring. Anesth Analg. 2005 Dec;101(6):1719-1724. 24. Wachter SB, Johnson K, Albert R, Syroid N, Drews F, Westenskow D. The evaluation of a pulmonary display to detect adverse respiratory events using high resolution human simulator. J Am Med Inform Assoc. 2006 Nov;13(6):635- 642. 51 25. Wachter SB, Markewitz B, Rose R, Westenskow D. Evaluation of a pulmonary graphical display in the medical intensive care unit: an observational study. J Biomed Inform. 2005 Jun;38(3):239-243. 26. Wachter SB, Agutter J, Syroid N, Drews F, Weinger MB, Westenskow D. The employment of an iterative design process to develop a pulmonary graphical display. J Am Med Inform Assoc. 2003 Jul;10(4):363-372. 27. Watson M, Sanderson P. Sonification supports eyes-free respiratory monitoring and task time-sharing. Hum Factors. 2004 Fall;46(3):497-517. 28. Zhang Y, Drews FA, Westenskow DR, Foresti S, Agutter J, Bermudez JC, et al. Effects of Integrated Graphical Displays on Situation Awareness in Anaesthesiology. Cognition, Technology & Work. 2002 Jun;4(2):82-90. 29. Phansalkar S, Staggers N, Weir C. Development of the QUASII (QUality Assessment of Studies in Informatics Implementations) Instrument. In: VA HSR&D National Meeting. Washington, DC; 2006. . 30. Cook TD, Campbell DT. Quasi-Experimentation: Design and Analysis Issues for Field Settings. Boston: Houghton Mifflin; 1979. 31. Shadish WR, Cook TD, Campbell DT. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Boston: Houghton Mifflin; 2002. 32. Cooper H, Hedges LV, editors. The Handbook of Research Synthesis. New York: Russell Sage Foundation; 1994. 33. The Cochrane Collaboration. The Cochrane Manual; 2007 [updated 8/23/2007; cited 9/18/2007]. Available from: http://www.cochrane.org/admin/manual.htm. 34. Shadish WR, Fuller S, editors. The Social Psychology of Science. New York: Guilford Press; 1994. 35. Ammenwerth E, Iller C, Mahler C. IT-adoption and the interaction of task, technology and individuals: a fit framework and a case study. BMC Med Inform Decis Mak. 2006;6:3. 36. Carayon P, Schoofs Hundt A, Karsh BT, Gurses AP, Alvarado CJ, Smith M, et al. Work system design for patient safety: the SEIPS model. Qual Saf Health Care. 2006 Dec;15 Suppl 1:50-58. 37. Despont-Gros C, Mueller H, Lovis C. Evaluating user interactions with clinical information systems: a model based on human-computer interaction models. J Biomed Inform. 2005 Jun;38(3):244-255. 38. Staggers N. Human-computer interaction. In: Englebardt S, Nelson R, editors. Information Technology in Health Care: An Interdisciplinary Approach. Harcourt Health Science Company; 2001. p. 321-345. 52 39. Daniels J, Fels S, Kushniruk A, Lim J, Ansermino JM. A framework for evaluating usability of clinical monitoring technology. J Clin Monit Comput. 2007 Oct;21(5):323-330. 40. Breslow MJ, Rosenfeld BA, Doerfler M, Burke G, Yates G, Stone DJ, et al. Effect of a multiple-site intensive care unit telemedicine program on clinical and economic outcomes: an alternative paradigm for intensivist staffing. Crit Care Med. 2004 Jan;32(1):31-38. 41. Hinkle DE, Wiersma W, Jurs SG. Applied Statistics for the Behavioral Sciences. Boston, MA: Houghton Mifflin; 2003. 42. Strayer DL, Drews FA, Crouch DJ. A comparison of the cell phone driver and the drunk driver. Hum Factors. 2006 Summer;48(2):381-391. 43. Sanderson P. The multimodal world of medical monitoring displays. Appl Ergon. 2006 Jul;37(4):501-512. 44. Hart S, Staveland L. Development of NASA-TLX (Task Load Index) Results of Empirical and Theoretical Research. In: Hancock P, Meshkati N, editors. Human Mental Workload. Amsterdam: North Holland Press; 1988. p. 139-183. 45. Rubio S, Diaz E, Martin J, Puente J. Evaluation of subjective mental workload: A comparison of SWAT, NASA-TLX, and workload profile methods. Applied Psychology: An International Review. 2004;53(1):61-86. CHAPTER 3 IMPROVING ALARM PERFORMANCE IN THE MEDICAL INTENSIVE CARE UNIT USING DELAYS AND CLINICAL CONTEXT 3.1 Abstract In an intensive care unit, alarms are used to call attention to a patient, to alert a change in the patient's physiology, or to warn of a failure in a medical device; however, up to 94% of the alarms are false. Our purpose in this study was to identify a means of reducing the number of false alarms. An observer recorded time-stamped information of alarms and the presence of health care team members in the patient room; each alarm response was classified as effective (action taken within 5 min), ineffective (no response to the alarm), and ignored (alarm consciously ignored or actively silenced). During the 200-hr study period, 1271 separate entries by an individual to the room being observed were recorded, 1214 alarms occurred and 2344 tasks were performed. On average, alarms occurred 6.07 times per hr and were active for 3.28 min per hr; 23% were effective, 36% were ineffective, and 41% were ignored. The median alarm duration was 17 sec. A 14 sec delay before alarm presentation would remove 50% of the ignored and ineffective alarms, and a 19 sec delay would remove 67%. Suctioning, washing, repositioning, and oral care caused 152 ignored or ineffective ventilator alarms. With kind permission from Wolters Kluwer Health / Lippincott, Williams & Wilkins: Görges M, Markewitz BA, Westenskow DR. Improving alarm performance in the medical intensive care unit using delays and clinical context. Anesth Analg. 2009 May;108(5):1546-52. ©2009 International Anesthesia Research Society 54 Introducing a 19 sec alarm delay and automatically detecting suctioning, reposi-tioning, oral care, and washing could reduce the number of ineffective and ignored alarms from 934 to 274. More reliable alarms could elicit more timely response, reduce workload, reduce noise pollution, and potentially improve patient safety. 3.2 Introduction Intensive care unit (ICU) alarms were designed to call attention to a patient, to alert a change in the patient's physiology or to alert staff to a device problem. Alarms are triggered when a physiologic variable crosses a set threshold. In their excellent literature review, Imhoff and Kuhls report alarm frequencies of 1.6 to 14.6 alarms/hr and a false alarm rate of up to 90%.1 Chambrin et al.2 reported the lowest rate of alarms at 1.6 alarms/hr; however, their study did not include infusion pumps (InfP) or alerts. Tsien and Fackler3 reported one of the highest alarm rates at 9.8 alarms/hr in a noisier environment, but limited their study to alarms from the cardiac patient monitor. The problem with simple threshold alarms is that up to 94.5% of the alarms that sound in the ICU are false, are provider-induced,4 and frequently sound unnecessarily.1, 2, 4 Default settings by the equipment manufacturers are set to avoid missing a single false negative alarm and thereby result in many false positive alarms.5 New alarm algorithms and improvements in sensors are reported to reduce the number of false alarms, but many of these suggestions have not been incorporated into current monitors nor have their improvements been evaluated in patients.1 Rheineck- Leyssius and Kalkman6 proposed a highly effective method for reducing pulse oxime-ter (Spo2) alarms by introducing a 6 sec delay thereby reducing alarm rates by 50%. One of the new and interesting approaches to reducing the number of false alarms is the use of context awareness.7, 8 Dey8 defines context-awareness as: "A system is context-aware if it uses context to provide relevant information and/or services to the user, where relevancy depends on the user's task." Chambrin et al.2 report that 42% of the transient ICU alarms are triggered by patient movement or respiratory effort. Therefore, an alarm system that knows the patient is moving or coughing could suppress many motion induced alarms. Although other investigators2-4, 9, 10 55 have classified false alarms into general categories, such as "staff manipulation" or "the patient," we propose using specific tasks performed by the health care provider and each patient's current condition and actions. Some work regarding alarms and their context has been performed. For example, Seagull and Sanderson11 investi-gated anesthesia alarms in the context of the surgical phase (induction, maintenance, emergence). However, there is still more to explore in the ICU setting. The purpose of this study was to observe alarms in the medical ICU (MICU) to identify methods for reducing the number of false alarms by using time delays and the correlations between alarms and clinical context. 3.3 Methods Approval was obtained from the University of Utah Health Sciences Center's IRB and informed consent was obtained from 22 participating health care team members. At the beginning of each day, for 24 days, the investigator randomly selected a patient room in the MICU, where a tracheally intubated patient was receiving respi-ratory support. A different patient and room were chosen every morning, except one patient who was observed twice. The investigator recorded health care team members' actions while they were in the patient's room and whether they came into the room in response to an alarm. Health care team members included attending physicians, fellow physicians, resident physicians, nurses, respiratory therapists, health care assistants, physical therapists, medical students, pharmacists, and other providers. Observations began at approximately 7:30 am and ended before 7 pm. 3.3.1 Setting The 12-bed adult MICU is organized in an H shape, with individual patient rooms to the north and south, a central station in its center, and additional function rooms between the two rows of rooms. The doors to the patient's rooms were left open unless procedures were performed or privacy was required. The unit was staffed with one nurse for every two patients, one health care assistant, and one health unit coordinator. Respiratory therapists checked a patient's ventilator when paged or at least once every 4 hr. Most patients had sepsis, respiratory failure, acute respiratory distress syndrome, multisystem organ failure, or renal failure. Approximately 25% of 56 the patients had myocardial infarction, cardiomyopathy, or arrhythmias. A cardiac monitor with at least electrocardiography, Spo2, and noninvasive arterial blood pressure (NBP) modules was present in each patient's room (HP M1094B, Philips Medical Systems, N.A., Bothell, WA). The unit's central monitoring station was generally not staffed. Ventilators included a Siemens Servo 300/300A (Draeger Medical, Telford, PA), a Nellcor Puritan Bennett 840 (Nellcor Puritan Bennett LLC, Pleasanton, CA), or a Viasys Avea (VIASYS Healthcare, Conshohocken, PA). Alaris Medley infusion pumps were used in every room (Cardinal Health Dublin, OH). Flexiflo Quantum feeding pumps (Abbott Laboratories, Abbott Park, IL) were used in 13 observed rooms. 3.3.2 Data Recording Time-stamped detailed information of alarms and the presence of health care team members were recorded manually using a COMPAQ iPAQ Pocket PC (Hewlett- Packard Company, Palo Alto, CA) and abcDB Database v.6.0 (PocketSOFT.ca, Lloydminster, SA, Canada). For health care team members, the time of entrance and exit as well as the provider category were recorded using a predefined list. When an alarm occurred, the observer recorded the device sounding the alarm, the alarm threshold settings, the alarm cause if identifiable, and the variable that produced the alarm: heart rate, Spo2, arterial blood pressure or NBP, pulmonary artery pressure, central venous pressure, temperature, peak airway pressure, minute volume (MV), tidal volume (TV), respiratory rate (RR) and apnea, InfP faults and feeding pump (FeedP) faults. For bedside tasks, the observer selected interventions from a predefined list and added free text comments with more detail. The following task categories were used: device alarm silenced, drug administered/dosage changed, patient assessment, physical therapy, washing, oral care, patient monitor settings changed, ventilator settings changed, data charted, arterial blood gas drawn, blood glucose levels measured, patient repositioned, airway suctioned, or other action taken. 3.3.3 Alarm Classifications During the study, the observer classified each alarm as true, true irrelevant or false. However, the observer was not a clinician, so all alarms were reclassified after the 57 conclusion of the study using the following categories: effective, ineffective, or ignored. An alarm was classified as effective when an alarm-related action was performed by a qualified health care provider within 5 min of the end of the alarm. A qualified provider is one who has the authority to take alarm-related action. For example, physical therapists, phlebotomists, and health care assistants were only qualified to call for assistance, whereas nurses were qualified to administer medications, suction the patient's airway and change patient monitor settings. Only respiratory therapists and physicians were qualified to change ventilator settings. Effective alarms were separated into two categories based on the action performed: (a) Technical actions include restarting infusion pumps, changing alarm thresholds, remeasuring values, changing sensor positions, reconnecting breathing circuits and all other equipment-related actions, and (b) patient actions included giving sedatives to an agitated patient, suctioning the airway, changing vasoactive drug infusion rates, repositioning agitated patients, and all other patient-related actions. An alarm was classified as ineffective if the alarm sounded, but a qualified health care provider did not enter the room in response to the alarm or was not present during the alarm. An alarm was classified as ignored when a qualified health care provider was present in the patient's room and no alarm-related action was taken during or within 5 min of the end of the alarm or the alarm was silenced from the nursing station and no action occurred. 3.3.4 Data Analysis Analysis of the data was performed using MATLAB (The MathWorks, Natick, MA). The pocket PC generated ACCESS/EXCEL files (Microsoft Corporation, Red-mond, WA) were parsed, events were categorized and alarm start and end times were paired with the times a person entered and left the room. 3.4 Results Twenty-two health care team members participated in the study and gave written consent: 13 nurses, 3 nursing student interns, 3 respiratory therapists, 1 health care assistant, and 2 attending physicians. Several others, including phlebotomists, techni-cians and residents, who participated in the study gave verbal consent. Two-hundred 58 hr of data were collected from 22 patients over 24 days (13 males and 9 females, mean age 54.6 18.5 yr with a range from 21 to 93 yr). One day's data were lost and during 1 day participating health care team members did not care for a patient who met the inclusion criteria. Observations were made for an average of 9.16 hr per day (range, 6.25-10.5 hr). Two patients' lungs were ventilated using a Viasys Avea ventilator, 10 patients using a Siemens Servo 300 or 300A ventilator and 10 patients using a Nellcor Puritan Bennett 840 ventilator. Respiratory therapists, and occasionally the attending physicians or fellow physicians, changed the ventilator alarm thresholds; nurses changed the cardiac monitor alarm thresholds. We observed 10 changes to the patient monitor's alarm settings (5 NBP, 1 Spo2 and 4 not recorded) and 23 changes to ventilator alarm settings (8 MV, 4 peak airway pressure, 4 TV, 1 RR, 1 multiple changes, and 5 not recorded). During the 200 hr of observation, 1214 alarms occurred (6.07 alarms per hr): Table 3.1 shows that 5.3% were effective and patient-related, 17.7% were effective and technically related, 36.3% were ineffective, and 40.7% were ignored. Figure 3.1 shows the number of alarms generated by each variable and the length of time each alarm was active. The median alarm length was 17 sec (range, 1 sec to 17.25 min): 45.1% lasted for 15 sec, 74.4% for 30 sec, and 89.4% for 60 sec. Of all the alarms, 34.3% ended without any health care team member being present in the patient's room. Thus they canceled themselves when the alarming condition cleared. Many more alarms cleared when no health care team member qualified to respond to this alarm was present. Only the feeding pump and the infusion pump always required user intervention for the alarm to stop. Figure 3.2 shows the total number of alarms for each of the four alarm types. A 19 sec alarm delay would reduce the number of ignored and ineffective alarms by 67.1%, whereas a 14 sec alarm delay would reduce it by 51.3%. For the effective alarms, the median time between the end of the alarm and the timestamp for the solution was 20 sec; 77 solutions were performed before the alarm had ended. 59 Table 3.1: Alarm frequency, duration, and classification No. of alarms (#) Alarm frequency (#/hr) Alarm duration (sec/hr) Effective patient (%) Effective technical (%) Ignored (%) Ineffective (%) Tidal volume 247 1.24 15.9 7.7 3.6 39.3 49.4 Minute volume 197 0.99 21.0 9.1 7.1 55.8 27.9 Pulse oximeter 188 0.94 36.5 1.1 3.7 32.4 62.8 Infusion pump 147 0.74 42.7 0.0 82.9 17.1 0.0 Heart rate and arrhythmias 134 0.67 14.0 3.7 5.2 50.0 41.0 Blood pressure (arterial and noninvasive) 127 0.64 38.2 7.1 12.6 53.5 26.8 Respiratory rate 75 0.38 10.2 8.0 9.3 3 7.3 45.3 Peak airway pressure 37 0.19 2.9 13.5 2.7 43.2 40.5 Other 32 0.16 2.3 0.0 18.8 59.4 21.9 Feeding pump 30 0.15 13.7 0.0 90.3 9.7 0.0 Overall 1214 6.07 197.5 5.3 17.8 40.7 36.2 60 0 0.5 1 Number of Alarms per Hour <15 sec <30 sec < 1 min < 2 min < 5 min > 5 min TV MV SpO2 Inf P HR ABP RR Pmax Other Feed P 0 10 20 30 40 Duration of Alarms in Seconds per Hour Figure 3.1: Number and duration of alarms per hr. The alarms are sorted by the alarm frequency, starting with the device with the most alarms per hr. The gray shading indicates the length the alarm was active, where each category does not include alarms already included in shorter-length categories. Alarms are: HR heart rate and arrhythmias; Spo2 pulse oximeter; ABP arterial or noninvasive blood pressure; Pmax peak airway pressure; MV minute volume; TV tidal volume; RR respiratory rate; InfP infusion pump; FeedP feeding pump; and Other all alarms not fitting into these categories. 61 0 15 30 60 90 120 150 180 0 50 100 150 200 250 300 350 400 450 500 Alarm Duration [seconds] Cumulative Alarm Number Ignored Ineffective Effective − Technical Effective − Patient 19 seconds Figure 3.2: Cumulative alarm number and classification. Alarms lasting longer than 180 sec were categorized as having lasted 181 sec. The dashed line at the 19 sec mark indicates the proposed alarm delay duration. I I I I I I I : "I ........ - ........ .... -~~~-------.-----.--- / - """"""""""""""""""""
Reference URL	https://collections.lib.utah.edu/ark:/87278/s68346rw