Improved patient identification and feature extraction through free text query and processing for clinical research

Improved patient identification and feature extraction through free text query and processing for clinical research

Publication Type	dissertation
School or College	School of Medicine
Department	Biomedical Informatics
Author	Redd, Douglas Fletcher
Title	Improved patient identification and feature extraction through free text query and processing for clinical research
Date	2016-05
Description	Electronic Health Records (EHRs) provide a wealth of information for secondary uses. Methods are developed to improve usefulness of free text query and text processing and demonstrate advantages to using these methods for clinical research, specifically cohort identification and enhancement. Cohort identification is a critical early step in clinical research. Problems may arise when too few patients are identified, or the cohort consists of a nonrepresentative sample. Methods of improving query formation through query expansion are described. Inclusion of free text search in addition to structured data search is investigated to determine the incremental improvement of adding unstructured text search over structured data search alone. Query expansion using topic- and synonym-based expansion improved information retrieval performance. An ensemble method was not successful. The addition of free text search compared to structured data search alone demonstrated increased cohort size in all cases, with dramatic increases in some. Representation of patients in subpopulations that may have been underrepresented otherwise is also shown. We demonstrate clinical impact by showing that a serious clinical condition, scleroderma renal crisis, can be predicted by adding free text search. A novel information extraction algorithm is developed and evaluated (Regular Expression Discovery for Extraction, or REDEx) for cohort enrichment. The REDEx algorithm is demonstrated to accurately extract information from free text clinical iv narratives. Temporal expressions as well as bodyweight-related measures are extracted. Additional patients and additional measurement occurrences are identified using these extracted values that were not identifiable through structured data alone. The REDEx algorithm transfers the burden of machine learning training from annotators to domain experts. We developed automated query expansion methods that greatly improve performance of keyword-based information retrieval. We also developed NLP methods for unstructured data and demonstrate that cohort size can be greatly increased, a more complete population can be identified, and important clinical conditions can be detected that are often missed otherwise. We found a much more complete representation of patients can be obtained. We also developed a novel machine learning algorithm for information extraction, REDEx, that efficiently extracts clinical values from unstructured clinical text, adding additional information and observations over what is available in structured text alone.
Type	Text
Publisher	University of Utah
Subject MESH	Electronic Health Records; Information Systems; Information Storage and Retrieval; Algorithms; Machine Learning; Cohort Studies; Medical Informatics Computing; Natural Language Processing; Quality Improvement; Data Mining; Patient Identification Systems
Dissertation Institution	University of Utah
Dissertation Name	Doctor of Philosophy
Language	eng
Relation is Version of	Digital version of Improved Patient Identification and Feature Extraction Through Free Text Query and Processing for Clinical Research
Rights Management	Copyright © Douglas Fletcher Redd 2016
Format Medium	application/pdf
Format Extent	4,309,914 bytes
Source	Original in Marriott Library Special Collections
ARK	ark:/87278/s6m37b72
Setname	ir_etd
ID	1426439
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6m37b72

Back to Search Results