Federated Architecture in Genomics with Dr. Pablo Prieto Barja

Genome initiatives spearheaded by burgeoning numbers of data custodians such as Genomics England (GEL), a Lifebit customer, leverage technological advances to amass large volumes of patient data. As a result, data custodians are veritable gold mines of genome data that could transform precision medicine and provide an equitable solution for diverse healthcare requirements. Pharmaceutical companies recognize the growing value of genome data, partnering with biobanks to power massive sequencing projects such as AstraZeneca’s 2 Million Genomes Project and Boehringer Ingelheim joining a nation-wide research collaboration in Finland to analyse 500K genomes.

Federated architecture in genomics plays a crucial role in the integration of diverse genomic datasets.

Understanding federated architecture in genomics is essential for improving data access and collaboration.

The advancements in federated architecture in genomics are paving the way for more efficient data analysis.

Federated architecture in genomics allows researchers to access genomic data across multiple platforms.

There is, however, a significant barrier that deters scientists from tapping into these genomic initiatives: the sequenced data is fragmented and stored in silos, where the available data remains locked in safe environments that are accessible only to a few authorised personnel to mitigate security risks. Data consumers such as big pharma thus need to find innovative solutions that collate fragmented data successfully while keeping data privacy and security at their core.

Federation is a technology that offers a seamless solution for bridging the gap between holistic data accessibility and very large, unwieldy datasets, enabling data consumers to grapple with exabytes of data while leveraging it for deep clinical analysis. Talking us through what federation means, and what a federated analysis for genome data looks like, is Lifebit co-founder and CTO Dr Pablo Prieto Barja, who is a recognized industry leader in informatics programmes for whole genome sequencing, bioinformatics, medical informatics and high performance computing.

Utilizing federated architecture in genomics enhances the ability to tailor treatments to individual patients.

The concept of federated architecture in genomics is pivotal in addressing the challenges of data fragmentation and access, enabling researchers to unlock the potential of genomic data across various platforms.

Missing ingredient in Data Structures

Understanding Federated Architecture in Genomics

In the early days of his research, Dr Prieto Barja recalls the excitement surrounding the advent of next-generation sequencing (NGS) techniques, which heralded the transformation of precision medicine. Precision medicine is an evidence-based healthcare approach that focuses on an individual’s genome data to stratify them into treatment groups for drug discovery, thus holding the potential to tailor a patient’s treatment according to their genetic constitution.

Federated architecture in genomics serves as a foundation for building secure data-sharing platforms.

By employing federated architecture in genomics, researchers can improve reproducibility and data integrity.

The future of federated architecture in genomics looks promising as technology continues to evolve.

Federated architecture in genomics is crucial for achieving scalable solutions for genomic research.

Implementing federated architecture in genomics can significantly enhance data accessibility for researchers.

Federated architecture in genomics facilitates collaboration among various research institutions.

With federated architecture in genomics, data sharing becomes more secure and efficient.

Federated architecture in genomics is essential for maintaining compliance with data privacy regulations.

The integration of federated architecture in genomics leads to enhanced research capabilities.

Maximising the scope and scale of precision medicine requires multiple data samples to validate clinical insights. NGS exponentially increased dataset volumes, and launched multiple genome sequencing enterprises such as the Encyclopedia of DNA Elements Project (ENCODE), in which Dr Prieto Barja was a researcher.

However, as data volumes exponentially increased, scientists struggled to analyse it.

“ENCODE was a huge project…but there was so much data, [that] was really unstructured, and required a lot of processing and data analysis. And people didn’t know how to properly use it,” says Dr Prieto Barja.

Bringing order to the overwhelming data influx needed collaborations and cross-functional expertise from diverse fields such as bioinformatics and technical data engineering. As Dr Prieto Barja worked toward structuring genome data, he was amazed at the sheer potential and insights that could be gleaned from genomic data. However, individual organisations were building their own softwares to wrangle their data, which would cause increasing fragmentation.

Finding himself in the unique position of understanding the potential of harnessing genome data to power healthcare, while combating the challenges of managing huge datasets, Dr Prieto Barja turned his focus on ways to support researchers’ efforts to organise and analyse genomic data without compromising its security.

Federated architecture in genomics can help overcome data silos and connect disparate datasets.

By leveraging federated architecture in genomics, researchers can maximize the value of genomic data.

“There was a huge reproducibility crisis being raised.” Dr Prieto Barja explains. “[Genome data] created a lot of confusion on how to use the right tools for analysis. We had to ask: what are the standards and best practices that can be used to standardise data and store it?”

Genome data needed to be benchmarked and secured so that a global standard could be maintained for data normalisation, formatting and storage. Also, distributed datasets are conventionally siloed to forestall security breaches.

Utilizing federated architecture in genomics is key to addressing the challenges of data sharing.

In conclusion, federated architecture in genomics is essential for ensuring the future of genomic research.

The advancement of federated architecture in genomics is crucial for the future of personalized medicine.

Federated architecture in genomics will continue to evolve and shape the landscape of healthcare.

Federated architecture in genomics supports the need for privacy-preserving data analysis in research.

Closing the gap between data custodians (providers) and data consumers (researchers) needed innovative platform solutions such as federated analysis, for data management, security, scaling and accessibility.

Federation is a disruptor in the Genomics Field

The Global Alliance for Genomics and Health (GA4H) is a policy-framing body that sets the standard and frameworks, and provides open source Application Programming Interfaces (APIs), to enable secure access to genomic data. As genome sequencing initiatives continue to multiply, the GA4H maintains the genome data ‘life-cycle’ from generation to analysis through competent approaches such as data federation, allowing diverse institutions to adopt it so that data can be made more discoverable and researchers get better access to resources around the world.

While working closely with researchers in the genomics field, Dr Prieto Barja recognized the value of adopting standardised industry practices that would enable responsible data sharing and data normalisation.

Ultimately, the integration of federated architecture in genomics will propel research into new and innovative territories.

“Combining technology with infrastructure, we thought of building [platforms] that conform to industry standards, allowing organisations around the world to use our solutions for solving real-world problems.”

Exploring federated architecture in genomics opens new avenues for healthcare innovation.

Researchers in genomics need secure, accessible and collaborative platforms that can adapt to advancing technologies and successfully manage large volumes of data. Most importantly, a trusted research environment (TRE) that provides a virtual collaboration of fragmented datasets would enable data analysis without having to shift data around. A federated architecture accomplishes this, where data can stay in its location for analysis, thus maintaining all security and compliance requirements.

Federation technology has been deployed in diverse fields; for example, Google coined the term Federated Learning in 2016, for a machine learning exercise that leveraged data from multiple distributed datasets.

“Federation could be understood in terms of tech companies and our mobile phones. Mobiles generate tons of data from multiple applications, such as usage patterns, which remain stored on the device. These data can be compiled and tracked in a decentralised manner while conforming to app restrictions, thus providing data for analytics. For example, a GPS system helps us navigate routes based on traffic data generated by multiple users.”

Federated architecture in genomics can redefine how we approach genetic research and data utilization.

“Similarly, multiple datasets can be compiled for federated analytics to help researchers validate the quality of a genome dataset, and leverage it for learning about a disease, without having to move sensitive information from its secure location.”

The significance of federated architecture in genomics is increasingly recognized in the scientific community.

Federated analysis is thus gaining popularity to power healthcare initiatives, such as the UK National Health Service (NHS) adopting federated learning to manage diverse clinical data, and Canadian Distributed Infrastructure for Genomics (CanDIG) employing federation to draw insights from both genomic and clinical datasets.

Federated architecture in genomics enables researchers to collaborate without compromising data security.

The UK government, in their recent Genome UK policy paper, have also outlined that they plan to set up a federated infrastructure for management of UK genomics data resources. Federated analysis in genomics is advantageous in many aspects. Data custodians have full control over their data, and can follow their own custom guidelines to deploy infrastructures that conform to their governance models. It also promotes data traceability, allowing researchers to understand the scale and scope of genome data usability.

Future of federated analysis in Genomics

By 2025, more than 60 million patients are expected to have their genomes sequenced- a gold mine for big pharmaceuticals. Federated analysis of complex data allows a seamless integration of distributed datasets, but its application is not confined to the genomics space.

“Federated analysis and federated learning, [they’re] going to be life changing, and a huge disruptor for healthcare,” says Dr Prieto Barja. “As a use case, federated machine learning in the NHS is using imaging data for diagnosis of eye diseases. The eye condition can be picked up in its early stages without ever having to go to the doctor.”

Currently, relevant clinical data is scattered throughout different health centres, hospitals, clinics and healthcare providers. Also, the data may not be standardised, thus leading to poor interfacing between different datasets.

Therefore, while federated analysis could redefine the future of healthcare and genomics, it pivots on the deployment of data standardisation to harness its full potential. Lifebit and other platforms use key standards such as the common data model (CDM) of the Observational Medical Outcomes Partnership (OMOP) that captures data uniformly across different health institutions.

Other standards that are getting adopted more widely across the healthcare industry and which are used on Lifebit include the Fast Healthcare Interoperability Resources (FHIR) and Health Level 7 (HL7).

Thus, federated analysis and federated learning could, in the future, allow researchers to apply their algorithms and analytics to distributed data, avoiding issues with compliance since no data needs to be moved.

“Federation is thus the way forward, not just for genomics, but for any sort of clinical healthcare data where multiple data sources are going to be in the future.” concluded Dr Prieto Barja.

If you have any questions about federated analysis and Lifebit’s patented platform to deploy it, reach out to us here.

Dr Pablo Prieto Barja has over 15 years of experience in IT, including service management experience maintaining and managing bioinformatics platforms. He was instrumental in the development of novel and innovative methods, frameworks and best practices for big bioinformatics data analysis, including Nextflow and the assessment of reproducibility in HPC and its impact in large scale bioinformatics analysis.

By Industry

By Goal

By Goal

Software

1. FEDERATED RESEARCH & DISCOVERY

2. FEDERATED DATA AUTOMATION

3. FEDERATED DATAHUB

Trusted Data Hub

4. ULTIMATE SECURITY & GOVERNANCE SOLUTIONS

Data

By Use Case

1. FEDERATED RESEARCH & DISCOVERY

Data Enclave

Biomarker Discovery

Back or reverse translation

2. FEDERATED DATA AUTOMATION

OMO/FHIR & Custom Data Model Standardisation

Enterprise Data Catalog (EDC)

Health & Variant Store

3. FEDERATED DATAHUB

DataHub

4. ULTIMATE SECURITY & GOVERNANCE SOLUTIONS

Airlock

FedRamp-in-a-box

By Use Case

Data Solutions

Learn

Contact

Support

Help center

24/7 support

By Industry

By Goal

By Goal

Software

1. FEDERATED RESEARCH & DISCOVERY

2. FEDERATED DATA AUTOMATION

3. FEDERATED DATAHUB

Trusted Data Hub

4. ULTIMATE SECURITY & GOVERNANCE SOLUTIONS

Data

By Use Case

1. FEDERATED RESEARCH & DISCOVERY

Data Enclave

Biomarker Discovery

Back or reverse translation

2. FEDERATED DATA AUTOMATION

OMO/FHIR & Custom Data Model Standardisation

Enterprise Data Catalog (EDC)

Health & Variant Store

3. FEDERATED DATAHUB

DataHub

4. ULTIMATE SECURITY & GOVERNANCE SOLUTIONS

Airlock

FedRamp-in-a-box

By Use Case

Data Solutions

Learn

Contact

Support

Help center

24/7 support

Missing ingredient in Data Structures

Understanding Federated Architecture in Genomics

Federation is a disruptor in the Genomics Field

Future of federated analysis in Genomics

If you have any questions about federated analysis and Lifebit’s patented platform to deploy it, reach out to us here.

Lifebit & Boehringer Ingelheim Join Forces to Unlock Health Data

Delivering Precision Medicine Programmes at Scale with Dr. Bacchelli