In research and healthcare, the size of datasets needed to solve crucial problems is continuing to increase. New technologies including the digitisation of healthcare tools, the accumulation of electronic healthcare records and massively reduced costs for high throughput technologies like genome sequencing all contribute to these large datasets.
However, secure storage and analysis of these large, sensitive datasets is becoming significantly harder. There are three key reasons for this:
Data federation is solving the problem of data access, without compromising data security
Researchers and clinicians are missing out on the potential that these huge health datasets can bring as they are difficult to access and combine for analysis for risk of compromising security. Research progress and patient benefits are stalling due to inefficient models for secure health data access.
Data federation is solving the problem of data access, without compromising data security. In its simplest terms: Data federation is a software process that enables numerous databases to work together as one. Using this technology is highly relevant for accessing sensitive biomedical health data, as the data remains within appropriate jurisdictional boundaries, while metadata is centralised and searchable and researchers can be virtually linked to where it resides for analysis.
This is an alternative to a model in which data is moved or duplicated then centrally housed - when data is moved it becomes vulnerable to interception and movement of large datasets is often very costly for researchers.
With federation, data is never moved or copied. Security is maximised throughout data analysis and querying the data. There are other important advantages in using federated data analysis, which are summarised in the table below.
The benefits of expanded security and decreased costs that data federation brings serve to safely democratise valuable access to health and biomedical information, ultimately empowering researchers to share safely, access and collaborate over data worldwide.
In the cases of genomics, the majority of research undertaken to date focuses on populations of European heritage. This lack of diversity in genomics research is a serious problem because it can result in misdiagnosis, inadequate understanding of conditions, and inconsistent care delivery. As a result, not everyone benefits equally from genetic medicine. To boost confidence and encourage participation in research for underrepresented communities, a global, focused engagement effort alongside enhanced transparency and building public trust are needed.
Public and patient trust remains a key factor in participant recruitment, particularly for historically marginalised populations. In a federated data access model, the public’s data remains in the secure control of the data custodian, which could help engender increased trust. However, it is crucial that data access agreements must be negotiated in a manner that is acceptable for research participants, particularly in historically underrepresented, marginalised or vulnerable groups.
It is also possible that federated platforms, with their associated benefits of lower cost, could help make big data analytics more accessible to lower and middle income countries. Additionally, this could help improve diversity of the cohorts that can be built and accessed via federated networks.
Ultimately, data federation can help democratise data access and promote global collaboration to help ensure equitable benefits sharing
Summary
In summary, data federation can bring many wide ranging benefits to researchers. It can provide secure access to global cohorts of data to help power their analysis, answer important research questions and lead to scientific discovery. Federated data analysis offers maximum value for money as costly data transfers are avoided. Ultimately, data federation can help democratise data access and promote global collaboration to help ensure equitable benefits sharing.
Look out for the next blog in our series where we will take a detailed look into the key technical requirements that are required for organisations to enable data federation.
Author: Hannah Gaimster, PhD
Contributors: Hadley E. Sheppard, PhD and Amanda White
About Lifebit
At Lifebit, we develop secure federated data analysis solutions for clients including Genomics England, NIHR Cambridge Biomedical Research Centre, Danish National Genome Centre and Boehringer Ingelheim to help researchers turn data into discoveries.
Interested in learning more about Lifebit’s federated data solution?