I have had the opportunity to get to know a great many people in life sciences. Whether pharmaceutical, biotechnology, or direct-to-consumer (DTC) companies, they all share the common goal of advancing medicine and improving life. These brilliant researchers and scientists also share a common challenge: they are drowning in Big Data – the very data that holds the key to new discoveries and the promise of revolutionising health care, especially in light of the big data challenges in genomics. Addressing the big data challenges in genomics is critical to unlocking new avenues in research and treatment.
Genomics has progressed faster than anyone could have foreseen. Since 2003, when the first whole human genome was sequenced (at a cost of US$2.7 billion), we have managed to reduce sequencing time from years to mere hours and the cost to do so to about US$550 (with the anticipated ‘$100 genome’ set to push the boundaries even further). Not surprisingly, this has had the effect of amassing torrents of genetic data worldwide with no abatement in sight.
As the field progresses, understanding the big data challenges in genomics will be crucial for maximizing the potential of genomic discoveries.
To tackle the big data challenges in genomics, researchers must adopt innovative strategies and technologies that facilitate data integration and analysis.
Understanding Big Data Challenges in Genomics
As we move forward, understanding and overcoming the big data challenges in genomics will play a vital role in the future of personalized medicine.
By 2025 estimates predict that over 60 million patients will have their genomes sequenced in a healthcare setting. Another study estimates that up to 2 billion genomes in total will be sequenced by 2025, translating to approximately 40 exabytes of data.
The accumulation of genomic data highlights the need to address big data challenges in genomics, ensuring that we can leverage this information effectively.
By addressing the big data challenges in genomics, organizations can better harness the potential of their data and drive impactful discoveries.
Collaboration among researchers is essential to overcome the big data challenges in genomics, as it enables sharing of insights and resources.
Companies must also invest in solutions that address big data challenges in genomics to stay competitive in this rapidly evolving field.
Addressing the big data challenges in genomics is not only necessary for advancement but also critical for ensuring compliance with regulations.
We have come so far, so fast.
The UK Biobank recently reported that its data is growing to 15 petabytes by 2025, making downloading entirely unfeasible. To put this in perspective, if you were to download 15 petabytes of data using the fastest available retail fibre optics it would take 7.6 years – and even the most advanced cloud transfers would take more than 14 days.
Currently, the majority of all genomics data on the planet is collected and stored in silos – in public and private biobanks, research facilities, DTC genetic testing companies, pharmaceutical organisations, and so on.
The pharmaceutical companies I regularly meet with typically have data distributed across their teams and across their organisations, spanning countries and jurisdictions. Even within organisations there is the major challenge of trying to combine disparate data sets to perform meaningful analyses, with regulations associated with cross-border patient data transfers further exacerbating the problem.
Now let’s add to the mix partnerships. Keeping with my previous example, large pharmaceutical companies often collaborate with biobanks that restrict data from leaving their environments – making integration of private and public data sets impossible. The lack of standardisation across these multiple diverse datasets introduces yet another hurdle.
Organisations and consortia need a way to preserve the safety and integrity of their data while streamlining access and analyses.
Some biobanks now allow researchers to BYOD (bring your own data), essentially permitting users to upload their data to run their analyses over combined datasets. However, this is not practical for a number of reasons:
- Transfer costs are prohibitive,
- This solution equates to copying massive datasets, creating double the storage costs,
- Time – uploading genomics data takes forever (see above), and
- Regulations and restrictions surrounding moving sensitive data present roadblocks – especially for pharmaceutical companies concerned about patents
And now we’re back to where we started. It’s all too problematic.
So what’s the answer?
I have had the privilege to experience first-hand how pioneering organisations, for the very first time, are able to run their analyses across massively distributed data sets, yielding results that are far more impactful, in a multi-party collaborative way without the data ever moving.
Seems like magic. But it’s not.
The integration of technology can significantly alleviate the big data challenges in genomics, creating more streamlined workflows.
Ultimately, by focusing on the big data challenges in genomics, we can enhance research outcomes and improve patient care.
This is federated data analysis and it literally abstracts the most complex, distributed, and fragmented IT landscapes into a user experience that makes it appear as if data, whether local in your HPC/hybrid cloud environment, in the public cloud or sitting in a biobank on the other side of the world, is all in one place, similar to a personal computer experience.
The most beautiful part is – data never moves. Unnecessary data storage and duplication costs are eliminated, and painful data transfers are a thing of the past. Access and analyses are instantaneous where previously it took days or weeks or months, and all the while data compliance and security is assured.
At Lifebit, our mission is to democratise the analysis and understanding of genetic big data to leap forward cures and enhance life. Recognising that the major problem impeding progress is massively distributed omics data, we built the end-to-end genomics cloud operating system that brings computation and analysis to the data, wherever it resides. Lifebit CloudOS is accelerating genomics research and delivering enriched insights in personalised medicine. Users are able to scale quickly while drastically reducing costs and speeding time to insights. We are seeing positively transformative impacts for our customers daily – the scientists and researchers who share the mission to radically change how we do healthcare.
If you would like to learn how Lifebit can help solve your genomics data access and analysis challenges or just want to chat about life sciences, please drop me a line at thorben@lifebit.ai.