Improved patient identification and feature extraction through free text query and processing for clinical research

Update Item Information
Publication Type dissertation
School or College School of Medicine
Department Biomedical Informatics
Author Redd, Douglas Fletcher
Title Improved patient identification and feature extraction through free text query and processing for clinical research
Date 2016-05
Description Electronic Health Records (EHRs) provide a wealth of information for secondary uses. Methods are developed to improve usefulness of free text query and text processing and demonstrate advantages to using these methods for clinical research, specifically cohort identification and enhancement. Cohort identification is a critical early step in clinical research. Problems may arise when too few patients are identified, or the cohort consists of a nonrepresentative sample. Methods of improving query formation through query expansion are described. Inclusion of free text search in addition to structured data search is investigated to determine the incremental improvement of adding unstructured text search over structured data search alone. Query expansion using topic- and synonym-based expansion improved information retrieval performance. An ensemble method was not successful. The addition of free text search compared to structured data search alone demonstrated increased cohort size in all cases, with dramatic increases in some. Representation of patients in subpopulations that may have been underrepresented otherwise is also shown. We demonstrate clinical impact by showing that a serious clinical condition, scleroderma renal crisis, can be predicted by adding free text search. A novel information extraction algorithm is developed and evaluated (Regular Expression Discovery for Extraction, or REDEx) for cohort enrichment. The REDEx algorithm is demonstrated to accurately extract information from free text clinical iv narratives. Temporal expressions as well as bodyweight-related measures are extracted. Additional patients and additional measurement occurrences are identified using these extracted values that were not identifiable through structured data alone. The REDEx algorithm transfers the burden of machine learning training from annotators to domain experts. We developed automated query expansion methods that greatly improve performance of keyword-based information retrieval. We also developed NLP methods for unstructured data and demonstrate that cohort size can be greatly increased, a more complete population can be identified, and important clinical conditions can be detected that are often missed otherwise. We found a much more complete representation of patients can be obtained. We also developed a novel machine learning algorithm for information extraction, REDEx, that efficiently extracts clinical values from unstructured clinical text, adding additional information and observations over what is available in structured text alone.
Type Text
Publisher University of Utah
Subject MESH Electronic Health Records; Information Systems; Information Storage and Retrieval; Algorithms; Machine Learning; Cohort Studies; Medical Informatics Computing; Natural Language Processing; Quality Improvement; Data Mining; Patient Identification Systems
Dissertation Institution University of Utah
Dissertation Name Doctor of Philosophy
Language eng
Relation is Version of Digital version of Improved Patient Identification and Feature Extraction Through Free Text Query and Processing for Clinical Research
Rights Management Copyright © Douglas Fletcher Redd 2016
Format Medium application/pdf
Format Extent 4,309,914 bytes
Source Original in Marriott Library Special Collections
ARK ark:/87278/s6m37b72
Setname ir_etd
ID 1426439
Reference URL https://collections.lib.utah.edu/ark:/87278/s6m37b72
Back to Search Results