Trusted Research Environments (TREs) are highly secure and controlled computing environments that allow researchers to gain access to data in a safe way. Also known as “Data Safe Havens” or “Secure Data Environments”, these secure digital environments enable approved researchers to remotely access, store, and analyse sensitive data in a single location.
Designed to protect the privacy and security of sensitive data, trusted research environments have been supporting the secure sharing of sensitive data in the UK since 2013. TREs are used by a range of organisations and industries, including research institutions, universities, health systems, charities and government bodies. [1][2][3][4] These can be fully open-source (eg OpenSafely), in-house built, or built by commercial companies, with diverse benefits and features across these varied approaches.
TREs support the highest level of data governance by removing the need to share data physically among researchers and organisations.
Data instead remains in a secure environment and is analysed in situ by authorised researchers with tools available in the TRE.
With clear evidence that health, care and research and development sectors require deep, linked health-related data, trusted research environments are increasingly recognised as a solution that can provide secure access and analytics functionality to authorised researchers, while also increasing public trust in data use. As such, the trusted research environment landscape and associated technology are evolving rapidly in the UK and further afield.
What is a Trusted Research Environment?
TREs support the highest level of data governance by removing the need to share data physically among researchers and organisations
The opportunities for data-driven research and innovation today have never been larger. The availability of large-scale health data for research is immense. In the genomics field for example, there is now roughly 2 to 40 billion gigabytes of data generated each year. This health data holds huge potential to accelerate society’s understanding of how to detect, prevent, and treat disease.
Studying larger sample datasets can lead to increased insights, as shown in numerous genetic association studies. For example, the first schizophrenia-associated variant was identified using a cohort of 3000 individuals, yet subsequent analysis of a cohort 10x larger uncovered over 100x the variants. [5]
However, the potential of health data is far from being realised. To preserve patient privacy, much of the world’s health data is stored within institutional siloed environments that are unavailable to researchers or difficult to access. [6] Agreements to enable data sharing between organisations are complex, and even where researchers are approved for access, it can typically take organisations six months or longer to make these approvals for data access. [7]
Traditional modes of data access and sharing rely on sensitive datasets being copied, moved, or downloaded into personal/organisational devices or centralised platforms. With the sensitive nature and sheer scale of health and genomic data, this mode of data access is inefficient or unsustainable.
Further, with an alarming rise in reports of large-scale data breaches and data mining activities, and a long-overdue shift in public awareness towards personal data sovereignty, maintaining public trust in health data research is critical. [8][9][10]
TREs can address some of the concerns around data security and patient privacy - with multi-layered security controls and robust monitoring and auditing capabilities. Importantly, trusted research environments represent a shift in data access from a ‘lending library’ to a ‘reading library’ approach. In the TRE model, approved researchers can use the data within the library, but this information never leaves the library.
Further, trusted research environments provide the functionality and infrastructure to support the research on sensitive health data at scale. They are solving the problem of authorised data sharing by enabling research progress without sacrificing data security - ensuring data are handled in a secure and responsible manner
In order to power research and progress therapeutic development while maintaining public trust, trusted research environments must strike the delicate balance between usability and security. As trusted research environments are built and procured across industries, there are several important features needed to ensure safe data access:
A central feature of trusted research environments is recommended to be the Five Safes framework, originating from the UK’s Office for National Statistics, it consists of five pillars - safe people, safe projects, safe settings, safe data and safe outputs. The framework’s pillars span all stages of data management to make data available for research, while protecting confidentiality at all times. This set of principles is widely regarded as the gold standard for sensitive data protection.
A recent white paper from the UK Health Data Research Alliance, convened by Health Data Research UK (HDR UK), built upon this framework to establish guidelines and best practices for building trusted research environments, ensuring data services (like trusted research environment providers) provide safe access to data.[2]
Beyond the 5 Safes, there are several key features and best practises of trusted research environments that are needed to enable researchers to safely and effectively access and analyse data - both in terms of safeguarding sensitive data and providing the analytics and infrastructure to support research at scale.
Custodians (e.g., biobanks and healthcare providers) of health data cohorts have been tasked with a critical role of safeguarding participants’ data. As part of an organisational-level data governance framework, trusted research environments need a multi-layered approach to safeguarding sensitive data, to ensure data are handled in a secure and responsible manner. Alongside ethical approval for data access that involves patients and the public in decision making, this governance framework can help to build public trust.
Well-defined governance frameworks lay out the roles and responsibilities of different stakeholders, including researchers, institutional review boards, and information security teams, to ensure that patient data is handled responsibly. However, this can become increasingly complex, with data governance standards rapidly changing across regions and between institutions. Working with a trusted research environment provider can alleviate these complications. When choosing a provider, certifications in industry-recognised standards including ISO27001 and Cyber Essentials Plus signify that the provider is well equipped to manage private and sensitive data.
Biobanks with hundreds of thousands of these datasets quickly scale to housing petabytes in volume - this creates challenges with cost, computational resources and storage. Cloud-based Trusted Research Environments can form part of the solution - with the “elastic” nature of cloud computing, TRE-owners only pay for the resources they need.
Integration
As data will be ingested into the TRE from a range of sources (e.g., electronic medical records and laboratory information management systems), TREs should be able to integrate with diverse sources and systems.
Federation
When integrating data from various sources, it is important to consider the risk and financial costs associated with physically moving data. Federation capabilities simplify the linking of disparate data sources without physically having to move the data itself. Within a federated architecture, data will remain within appropriate jurisdictional boundaries, while metadata is centralised and searchable.
Automated data transformation
Health data comes from a wide range of sources. With this diversity comes wide variability in how data are described and stored, which creates challenges for researchers preparing data for analyses.
TREs need automated systems within the platform to efficiently convert raw data to standardised analysis-ready data. This includes established ETL (Extract, Transform, Load) pipelines and APIs for interfacing between TREs and the data source. FAIRifcation of data within the trusted research environment further makes data Findable, Accessible, Interoperable, and Reusable with the incorporation of unique identifiers for data and metadata management.
End-to-end solution
Once the data is in a usable format, trusted research environments should incorporate built-in analytics to transform the analysis-ready data into insights. Genomics England’s Trusted Research Environment includes integrated, open-source tools to enable researchers to analyse the data that is housed within the Trusted Research Environment.
Key Features of a Trusted Research Environment
Health and multi-omics data are of high value for research, yet the scale and sensitivity of this data bring unique challenges for enabling secure data access. trusted research environments can solve many issues surrounding secure data access in healthcare settings. There are numerous benefits for researchers, organisations, and patients, compared to traditional methods where data is copied and moved.
Key advantages for using a trusted research environment for using a trusted research environment in health data research and management:
8 Advantages of Using a Trusted Research Environment in Healthcare Research & Data Management
To preserve patient privacy, much of the world’s health data is stored within institutional siloed environments that are unavailable to researchers or difficult to access
Across biobanking, governments and health providers, trusted research environments are being increasingly adopted as a means to achieve both data accessibility and security.
We highlight some case studies of how trusted research environments are being used across the life sciences industry:
National Health Service England (NHS England)
Recently, NHS Digital, in partnership with Health Data Research UK, developed a TRE that provides academic researchers access to cardiovascular and cancer data for COVID-19 research. Published in the British Medical Journal, the partnership with national health data custodians provides linked, nationally collated electronic health records for approved research within secure, privacy-protecting environments. [14]
By combining individual-level data across national healthcare settings, data on age, sex, and ethnicity are complete for around 95% of the population. This resource has already proven essential for accurate recording and thus research on cardiovascular disease, providing researchers across the UK with rapid access to data.
Secure Anonymised Information Linkage (SAIL) Databank
A rich population databank, whose TRE provides global researchers secure remote access to datasets with anonymised health and social care data records for the population of Wales.1 In operation since 2007, the SAIL Databank operates on the UK Secure Research Platform, a private research cloud with customisable technology.
Research publications resulting from the databank are in the hundreds - a recent example, in the largest study of its kind, found that COVID-19 vaccines offer effective protection against infection for high-risk healthcare workers. [15]
Genomics England
The UK government’s public sector research endeavour, Genomics England currently hosts the data from over 135,000 NHS patients within a TRE for approved research use. The TRE is a cloud-based tool (powered by AWS and Lifebit) that approved researchers can use to access the clinical and genomic data from participants with cancer, rare disease, and COVID-19. With separate data access processes distinguishing public from the private sector, researchers that want to access data must apply to become a member of either the Genomics England Clinical Interpretation Partnership (academics, students, and clinicians) or the Discovery Forum (industry partners).
Danish National Genome Center
A federated TRE deployed within the Danish National Genome Center’s supercomputing cluster will serve as the scalable and secure data management and analysis platform for Denmark’s national researchers, clinical scientists, and international collaborators. Powered by the Lifebit Platform, the TRE will deliver a next-generation computational infrastructure. The Danish National Genome Center and its collaborators will recruit and sequence whole genomes of 60,000 patients diagnosed with cancer, autoimmune disorders, and rare diseases by 2024.
Looking to the future, many governments, health systems, and biobanks see TREs as a secure long-term solution for research and clinical use of sensitive health data.
This is most apparent in the UK, as set out in recent national policy guidance. In 2022, the UK government commissioned an independent review by Professor Ben Goldacre on the use of National Health Service (NHS) health data for research and analysis. This review, and others, have recommended that TREs, or ‘Secure Data Environments’, should be the default way to access health and social care data for R&D going forward.
Yet with a rapidly changing data, regulatory, legal, and technology landscape, TRE owners and suppliers must keep pace with developments to ensure TREs are sustainable into the future. We explore some key priorities and challenges for the future that relate to TREs for health data.
Conducting meaningful Patient and Public Involvement and Engagement (PPIE) in the design and use of trusted research environments is becoming a best practice to minimise the risks of data misuse and focus research on studies where there is a demonstrable public benefit.
There are widespread examples demonstrating how patient and public involvement in decision-making on trusted research environments can lead to improved research output. Maintaining transparency on trusted research environment design and governance procedures is vital to ensure that public trust is maintained to allow long-term success and growth of population health initiatives that will ultimately save lives.
Amongst the widespread push for greater data protection and patient privacy, there is a need to factor in the knock-on effects for the flow of data access in research. This is where innovative technologies and approaches can bridge this gap and create trusted research environments that are sustainable into the longer term:
Federation is widely regarded as a key technology enabler for linking up disparate datasets, including data stored in TREs. [17] Federation across TREs means data can be virtually linked for combined analysis whilst remaining at its source. This means researchers can easily access, collaborate, and analyse disparate datasets without data movement.
With the ability to scale with increasing volumes of data, ensure data privacy and protection, and enable secure access for approved research, trusted research environments can serve all ends of the health research community. Enabling valuable research at scale to improve the lives of patients, trusted research environments represent a sustainable and secure long-term solution for managing and using big data.
Editor’s note: This post was originally published on March 28, 2023 and may be occasionally updated for accuracy and comprehensiveness.
Read Lifebit’s white paper on best practices for building a Trusted Research Environment
Read Lifebit’s white paper on security and data governance
1. Lyons, R. A. et al. The SAIL databank: linking multiple health and social care datasets. BMC Med. Inform. Decis. Mak. 9, 3 (2009).
2. UK Health Data Research Alliance & NHSX. Building Trusted Research Environments - Principles and Best Practices; Towards TRE ecosystems. https://zenodo.org/record/5767586 (2021) doi:10.5281/ZENODO.5767586.
3. Nik-Zainal, P. S. et al. Multi-party trusted research environment federation: Establishing infrastructure for secure analysis across different clinical-genomic datasets. https://zenodo.org/record/7085536 (2022) doi:10.5281/ZENODO.7085536
4. Trusted Research Environment service for England. NHS Digital (2022).
5. Visscher, P. M. et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 101, 5–22 (2017).
6. 4 ways data is improving healthcare. World Economic Forum (2019).
7. Learned, K. et al. Barriers to accessing public cancer genomic data. Sci. Data 6, 98 (2019).
8. Kilzi, Michel. The Anatomy Of Personal Data Sovereignty. Forbes (2021).
9. Thousands of patients hit by NHS data breaches. Independent https://www.independent.co.uk/news/health/data-nhs-patient-breaches-privacy-b1877154.html (2021).
10. Google reportedly mining millions of Americans personal health data. CBS News https://www.cbsnews.com/news/google-mining-millions-of-americans-personal-health-data-report-says/ (19AD).
11. Cheah, P. Y. & Piasecki, J. Data Access Committees. BMC Med. Ethics 21, 12 (2020).
12. Denton, N. et al. Data silos are undermining drug development and failing rare disease patients. Orphanet J. Rare Dis. 16, 161 (2021).
13. Koutkias, V. From Data Silos to Standardized, Linked, and FAIR Data for Pharmacovigilance: Current Advances and Challenges with Observational Healthcare Data. Drug Saf. 42, 583–586 (2019).
14. Wood, A. et al. Linked electronic health records for research on a nationwide cohort of more than 54 million people in England: data resource. BMJ n826 (2021) doi:10.1136/bmj.n826.
15. Bedston, S. et al. COVID-19 vaccine uptake, effectiveness, and waning in 82,959 health care workers: A national prospective cohort study in Wales. Vaccine 40, 1180–1189 (2022).
16. Mitchell, C., Ordish, J., Johnson, E., Brigden, T. & Hall, A. The GDPR and genomic data. (2020).
17. Thorogood, A. et al. International federation of genomic medicine databases using GA4GH standards. Cell Genomics 1, 100032 (2021).