Description |
Consistent assessment of the quality of health data is a growing concern in translational research. Different approaches are used by researchers to understand the quality of a given dataset, which are interchangeably called Data Quality Assessment (DQA), data profiling or data characterization. Multiple conceptual Data Quality Frameworks (DQF) have been proposed in literature each consisting of different Data Quality Concepts (DQC). There is a lack of consensus among these DQFs and the existing DQCs have vague semantics. DQA is mostly performed on an ad-hoc basis, and there is a need to have systematic processes and shared methods of Data Quality (DQ) representation for enabling FAIR (Findable, Accessible, Interoperable, and Reusable) principles in translational research. This dissertation focuses on addressing three important challenges in DQA for translational research. The first challenge is in understanding primitives associated with DQA and representing DQC in a computable store (knowledge repository). The second challenge is in designing and developing shared (standard) representations of DQC and their associated visualization methods. The final challenge is in designing and developing an architecture for DQA across heterogeneous and disparate data sources, which leverages the knowledge repository and visualization meta-schema developed in previous steps. The findings from this dissertation are a first step towards architecting a generic, platform-agnostic, and scalable software ecosystem approach to address DQA in health data. The iv product deliverables were disseminated as open-source artifacts, supporting reproducible processes for the health data quality community |