Novel applications of natural language processing and machine learning to extract information from clinical text and automate cancer stage collection in a central cancer registry

Update Item Information
Publication Type dissertation
School or College School of Medicine
Department Biomedical Informatics
Author AAlAbdulsalam, Abdulrahman Khalifa
Title Novel applications of natural language processing and machine learning to extract information from clinical text and automate cancer stage collection in a central cancer registry
Date 2018
Description The primary objective of cancer registries is to capture clinical care data of cancer populations and aid in prevention, allow early detection, determine prognosis, and assess quality of various treatments and interventions. Furthermore, the role of cancer registries is paramount in supporting cancer epidemiological studies and medical research. Existing cancer registries depend mostly on humans, known as Cancer Tumor Registrars (CTRs), to conduct manual abstraction of the electronic health records to find reportable cancer cases and extract other data elements required for regulatory reporting. This is often a time-consuming and laborious task prone to human error affecting quality, completeness and timeliness of cancer registries. Central state cancer registries take responsibility for consolidating data received from multiple sources for each cancer case and to assign the most accurate information. The Utah Cancer Registry (UCR) at the University of Utah, for instance, leads and oversees more than 70 cancer treatment facilities in the state of Utah to collect data for each diagnosed cancer case and consolidate multiple sources of information.Although software tools helping with the manual abstraction process exist, they mainly focus on cancer case findings based on pathology reports and do not support automatic extraction of other data elements such as TNM cancer stage information, an important prognostic factor required before initiating clinical treatment. In this study, I present novel applications of natural language processing (NLP) and machine learning (ML) to automatically extract clinical and pathological TNM stage information from unconsolidated clinical records of cancer patients available at the central Utah Cancer Registry. To further support CTRs in their manual efforts, I demonstrate a new approach based on machine learning to consolidate TNM stages from multiple records at the patient level.
Type Text
Publisher University of Utah
Subject Information technology; Computer science; Medicine
Dissertation Name Doctor of Philosophy
Language eng
Rights Management (c) Abdulrahman Khalifa AAlAbdulsalam
Format Medium application/pdf
ARK ark:/87278/s6w144v9
Setname ir_etd
ID 1496338
Reference URL https://collections.lib.utah.edu/ark:/87278/s6w144v9
Back to Search Results