What is health data standardisation and why is it important?
Hannah Gaimster, PhD
"The world's health data assets are currently not being used to their full potential"
Introduction to data standardisation
The amount of data required to address critical questions continuously grows in research and healthcare. New technologies have made it possible to create large health datasets. These technologies include digitising medical tools, accumulating electronic health records (EHRs), and lower-cost genome sequencing.
These enormous collections of data have the potential to answer crucial research questions and ultimately improve lives. The 100,000 Genomes study on rare diseases and the research describing the host variables producing severe COVID-19, conducted on nearly 60,000 people, are recent landmark studies demonstrating the power of big data in health research.
Health data comes from various sources, including biobanks, clinical trials, patient registries, internal studies and EHRs. Consequently, with this diversity comes wide variability in how data is described and stored, for example:
- There are many different formats that health data can take, including free text (like doctor's notes) and CSV or JSON files.
- Different datasets can use distinct terms to describe the same information. For example, "sex" and "gender" are alternative names for the same field in different datasets.
- Between datasets, different medical vocabularies can be employed, such as ICD10 and SNOMED.
- Frequently, data is not cleaned, resulting in typos or mistakes.
To solve these health data analysis problems, data must be transformed into interoperable formats. This process is known as data standardisation.
What is health data standardisation?
"Health data standardisation is the process of bringing data into an agreed-upon common format that allows for collaborative analysis. It is critical as it enables combined research and analysis of the data."
When data is standardised in this way, it can be effectively combined- making it more valuable than simply the sum of its parts. More data can power research statistically and also lead to increased findings.
Common Data Models (CDMs) are being increasingly utilised in the healthcare sector to overcome the lack of consistency in health data.
Collaborative health research on data across nations, sources, and systems is made possible by the standard approach to health data provided by CDMs. Combining and assessing information is much simpler when all health data are organised following a single worldwide standard.
Examples of clinical CDMs are the Observational Medical Outcomes Partnership (OMOP) CDM and Clinical Data Interchange Standards Consortium (CDISC)
What is OMOP?
OMOP is an open community data standard created to standardise observational data formats and content and to facilitate quick analyses. The OHDSI standardised vocabulary is a key part of the OMOP CDM. The OHDSI vocabularies enable standard analytics and allow the organisation and standardisation of medical terms to be used across the various clinical domains of the OMOP CDM.What is CDISC?
CDISC creates data standards for the gathering, analysing, and sharing of clinical data in conjunction with a wide spectrum of international professionals. Researchers, pharmaceutical and biotech firms, governmental organisations (such as the FDA, PMDA, and NMPA), and technology suppliers all utilise CDISC standards. The standards help to make data more easily accessible, interoperable, and reusable so that clinical research and global health can be improved.
Featured resource:
Read Lifebit's whitepaper on Lifebit's approach to data standardisation.
Why is health data transformation important?
The World Economic Forum estimates that 97% of hospital data goes unused.
Since the majority of users of health data (64%) lack the knowledge necessary to standardise data quickly, researchers spend too much time preparing the data for analysis.
80%
of data scientists'
time is spent
cleaning and
organising data.
It is clear that limited health data standardisation stalls research progress.
Without standardisation, health data cannot be combined for research and analysis, preventing collaboration across datasets. Health data must be standardised so researchers can collaborate quickly and effectively across global health data resources. In turn, increased collaboration can lead to new insights and discoveries in the healthcare and research sector.
Featured resource:
Read our article on the potential and challenges of health data linkage.
Large cohorts of health data are often required to gain novel insights and better understand the basis and treatments of disease. Researchers must be able to effectively combine health datasets to improve the statistical power of their work. The ability to harness more data, from more sources, in less time can enable faster insights to be gained from this valuable health data.
"One genome-wide association study showed that increasing sample size by 10-fold led to a 100-fold increase in findings, enabling genetic variants of interest to be more easily validated and studied"
Summary
Health data comes from various sources and exists in a mixture of formats. Combining this data to gain novel insights can only be achieved if the data is made interoperable. Standardising health datasets is crucial to fully maximise insights and discoveries.
Look out for the next blog in our series, where we will describe further, specific benefits that standardisation of health data can bring to researchers and clinicians.
Author: Hannah Gaimster, PhD
Contributors: Hadley E. Sheppard, PhD and Amanda White
About Lifebit
Lifebit provides health data standardisation services for clients, including Genomics England, Boehringer Ingelheim, Flatiron Health and more, to help researchers transform data into discoveries.
Lifebit’s services are making health data usable quickly.
Interested in learning more about Lifebit’s health data standardisation services and how we accelerate research insights for academia, healthcare and pharmaceutical companies worldwide?