How Real World Data is Revolutionizing Clinical Evidence Generation in Oncology

RWD for Clinical Evidence in Oncology: Top 2025 Gains
The Shifting Landscape of Cancer Research
Real world data for clinical evidence generation in oncology is changing the development and approval of cancer treatments. It moves beyond the limits of traditional clinical trials to capture the complexity of routine patient care.
Key applications of real-world data in oncology include:
- Pharmacovigilance – Active surveillance for safety signals and rare adverse events
- External control arms – Supporting single-arm trials for breakthrough therapies
- Treatment effectiveness studies – Evaluating outcomes in diverse patient populations
- Natural history studies – Understanding disease progression patterns
- Post-marketing surveillance – Monitoring long-term safety and effectiveness
- Regulatory submissions – Supporting FDA and EMA approval decisions
The precision oncology era presents a challenge: traditional randomized controlled trials (RCTs), the gold standard, are ill-suited for the growing number of rare molecular subgroups. An FDA analysis showed 176 oncology drug indications were approved based on single-arm studies over 20 years, underscoring the need for alternative evidence.
Conventional trials are slow, expensive, and often exclude the diverse patients who will ultimately use the treatments, limiting real-world applicability. With less than 5% of adult cancer patients participating in trials, trial populations are younger and healthier than the general patient population.
This gap necessitates new, rigorous approaches to clinical evidence generation that reflect the complexity of routine cancer care.
I’m Maria Chatzou Dunford, CEO and Co-founder of Lifebit. For over 15 years, I’ve focused on developing computational platforms for secure, federated analysis of biomedical data. My work empowers real world data for clinical evidence generation in oncology using AI-powered platforms that protect patient privacy while accelerating drug findy and regulatory decisions.
Key real world data for clinical evidence generation in oncology vocabulary:
Understanding Real-World Data (RWD) and Real-World Evidence (RWE)
The information from doctor visits, lab tests, and treatments forms the basis of real world data for clinical evidence generation in oncology. To understand its impact, we must first define some key terms.
Real-world data (RWD) is the information generated during routine healthcare. The FDA defines it as “data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources.” This raw material is the foundation upon which evidence is built.
Real-world evidence (RWE) is the clinical insight derived from analyzing RWD. It’s the actionable picture created by assembling RWD pieces through rigorous scientific methods. This distinction is crucial; raw data, with all its inherent complexities and potential for bias, requires careful analysis, validation, and contextualization to become actionable evidence that can reliably inform clinical practice and improve cancer care.
Key RWD sources include:
- Electronic Health Records (EHRs): These are the digital chronicles of a patient’s medical journey, containing rich clinical detail. They include structured data like diagnoses (ICD codes), lab results, and prescriptions, as well as unstructured data like clinical notes and pathology reports. Unlocking the value of unstructured data often requires advanced techniques like Natural Language Processing (NLP) to extract meaningful information.
- Insurance claims and billing data: This data provides a longitudinal view of a patient’s interactions with the healthcare system, including diagnoses, procedures, and prescriptions. While it excels at tracking treatment pathways, healthcare resource utilization, and costs over time, it often lacks the granular clinical detail found in EHRs, such as tumor stage or biomarker status.
- Patient registries: These are organized systems that collect standardized information about a group of patients who share a certain characteristic, such as a specific cancer type (e.g., the NCI’s Surveillance, Epidemiology, and End Results (SEER) Program) or exposure to a particular treatment. Registries are invaluable for long-term follow-up and studying the natural history of diseases.
- Digital health solutions: The proliferation of wearable devices and mobile health apps has created a new stream of patient-generated health data (PGHD). These tools can capture real-time information on activity levels, vital signs, and patient-reported outcomes (PROs), offering a unique window into a patient’s quality of life and experience outside the clinical setting.
For deeper scientific perspectives on how researchers define and categorize RWD, you can explore scientific research on RWD definitions.
How Regulatory Bodies Define RWD and RWE
Regulatory bodies now enthusiastically support the use of real world data for clinical evidence generation in oncology, a major shift from past skepticism. This evolution is driven by the need for more timely and representative evidence, especially in areas like precision oncology and rare cancers.
The FDA’s Oncology Center of Excellence (OCE) has a dedicated Real World Evidence Program and has published a comprehensive “Framework for FDA’s Real-World Evidence Program.” This framework outlines the agency’s approach to evaluating RWE for use in regulatory decisions, focusing on whether the RWD is fit-for-purpose and whether the study design and analysis meet regulatory standards. Similarly, the European Medicines Agency (EMA) has launched initiatives like the Data Analysis and Real World Interrogation Network (DARWIN EU®) to establish a network of data sources and provide timely RWE. Health Technology Assessment (HTA) bodies like the UK’s NICE and Germany’s IQWiG also increasingly rely on RWE to inform reimbursement decisions and bridge evidence gaps, particularly for therapies approved via accelerated pathways.
These frameworks acknowledge that RWE complements traditional trials by providing crucial insights into long-term safety, comparative effectiveness, and treatment outcomes in diverse, real-world populations that are often excluded from RCTs. This fills critical knowledge gaps and helps ensure that regulatory and reimbursement decisions are based on a holistic understanding of a treatment’s performance.
This global regulatory shift signals a move toward a more integrated evidence generation paradigm that better reflects the realities of modern cancer care. Navigating this complex environment requires robust technological and methodological support. Learn how our regulatory compliance solutions help teams generate compliant, high-quality RWE that meets the standards of global authorities.
Weighing the Pros and Cons: RWD vs. Traditional Clinical Trials
Real world data for clinical evidence generation in oncology doesn’t replace traditional clinical trials; it complements them. Each approach offers unique strengths and weaknesses, and a modern evidence strategy leverages both to fully understand how cancer treatments perform from bench to bedside.
Advantages of RWD
The primary value of RWD is its ability to capture the complex reality of everyday cancer care, providing insights that are often unattainable through the rigid structure of an RCT.
Key advantages include:
- Increased generalizability: RCTs often have strict inclusion and exclusion criteria, resulting in a study population that is younger, healthier, and less diverse than the patients who will ultimately use the treatment. RWD, by contrast, reflects the full spectrum of patients seen in routine practice, including the elderly, those with multiple comorbidities, and individuals from diverse ethnic and socioeconomic backgrounds. This ensures that evidence of a drug’s effectiveness is applicable to the actual patient population.
- Massive scale and longitudinality: RWD sources can encompass millions of patient records collected over many years. This scale is crucial for studying rare cancers, identifying rare adverse events that might not appear in a smaller trial population, and understanding long-term treatment outcomes and survival patterns.
- Faster insights and cost-effectiveness: Because the data has already been collected as part of routine care, RWD studies can often be conducted more quickly and at a fraction of the cost of a prospective RCT. This allows researchers to test hypotheses, explore new research questions, and generate evidence much more rapidly.
- Feasibility for rare diseases and ethical considerations: For many rare cancers or molecularly defined subgroups, recruiting enough patients for a traditional RCT is simply not feasible. In these cases, RWD can be used to create external control arms or conduct observational studies that provide the only viable path for evidence generation. It can also be more ethical than assigning patients to a placebo arm when an effective treatment is believed to exist.
Feature | Traditional RCTs | Real-World Data Studies |
---|---|---|
Patient Population | Carefully selected, homogenous, often excludes complex patients | Diverse, heterogeneous, reflects everyday clinical practice |
Setting | Controlled, academic research centers | Routine clinical care settings |
Speed | Years to complete | Weeks to months for retrospective insights |
Cost | Very expensive, resource-intensive | Generally more affordable, leverages existing data |
Generalizability | Limited to trial populations (low external validity) | High – mirrors real-world patients (high external validity) |
Data Quality | High, standardized, collected for research | Variable, requires curation, cleaning, and validation |
Bias Control | Strong through randomization (high internal validity) | Requires advanced statistical methods to control for bias |
Key challenges in using real world data for clinical evidence generation in oncology
Despite its immense potential, using real world data for clinical evidence generation in oncology presents significant challenges that must be rigorously addressed to produce reliable evidence.
- Data quality and comprehensiveness: RWD is collected for clinical care, not research. This can lead to missing data (e.g., a lab test not ordered), inconsistent or non-standardized data entry, and errors in coding. Clinical details crucial for oncology research, such as cancer stage, performance status, or biomarker results, may be buried in unstructured clinical notes, requiring sophisticated methods like NLP to extract. Ensuring the data is accurate, complete, and fit-for-purpose is a critical first step.
- Bias and confounding variables: Unlike in an RCT, treatment assignment in the real world is not random. This introduces a high risk of bias. For example, confounding by indication occurs when sicker patients are more likely to receive a newer treatment, which can make the treatment appear less effective than it is. Selection bias can also occur if the available data does not represent the entire patient population. Addressing these biases requires advanced statistical methods like propensity score matching, inverse probability of treatment weighting, and instrumental variable analysis.
- Patient privacy and data security: RWD contains sensitive personal health information, making privacy and security paramount. Researchers must adhere to strict regulations like HIPAA in the US and GDPR in Europe. This requires robust de-identification techniques to remove direct identifiers and governance frameworks to control data access. Secure platforms, such as federated Trusted Research Environments (TREs), are essential to allow analysis without exposing or moving raw patient data.
- Interoperability and data harmonization: Patient data is often fragmented across multiple, disparate systems (e.g., different hospitals’ EHRs, labs, pharmacies) that do not speak the same language. A major technical hurdle is achieving interoperability—the ability to link and integrate these different data sources. This often requires mapping the data to a Common Data Model (CDM), such as the OMOP CDM, which standardizes the structure and vocabulary, enabling large-scale, multi-institutional analyses.
Methodologies for Real World Data for Clinical Evidence Generation in Oncology
Generating meaningful and reliable real world data for clinical evidence generation in oncology is not simply about accessing data; it’s about applying the right research methodology to answer a specific question while accounting for the data’s inherent complexities.
- Observational studies are the cornerstone of RWE. These studies, which include cohort and case-control designs, observe patients in routine practice without assigning a specific intervention. A cohort study follows a group of patients (a cohort) over time to compare outcomes between those exposed and not exposed to a treatment. A case-control study works backward, identifying patients with a specific outcome (cases) and comparing their past exposures to a similar group without the outcome (controls). While powerful, these designs are susceptible to bias. To improve their validity, researchers increasingly use frameworks like target trial emulation, which involves explicitly designing an observational analysis to mimic the key components of a hypothetical randomized trial, thereby making potential biases more transparent and easier to address.
- Pragmatic Clinical Trials (PCTs) represent a hybrid approach that bridges the gap between traditional RCTs and purely observational research. While they retain the core strength of randomization, PCTs are designed to reflect real-world practice. They feature broader eligibility criteria, enroll a more diverse patient population, and often measure outcomes that are more relevant to patients and providers, such as quality of life or hospital admissions. They aim to determine a treatment’s effectiveness in a real-world setting, as opposed to its efficacy in a highly controlled one.
- External Control Arms (ECAs), also known as synthetic control arms, are a critical innovation for oncology. For many rare diseases or in cases of breakthrough therapies where it would be unethical to use a placebo, single-arm trials are common. An ECA, constructed from RWD, provides a comparator group for these trials. This involves carefully selecting patients from RWD sources (like EHRs or registries) whose baseline characteristics (age, disease stage, comorbidities) are matched to those of the patients in the single-arm trial. While ECAs are gaining acceptance from regulators, their creation requires rigorous methodology and transparent reporting to ensure a fair comparison.
The European Organisation for Research and Treatment of Cancer (EORTC) exemplifies this shift, prioritizing studies that produce Randomized Real-World Evidence to make research more relevant and accessible. This requires sophisticated analytical capabilities, such as advanced AI/ML analytics, to manage complex data and maintain scientific validity.
The Rise of Randomized Real-World Evidence (R²WE)
The field is moving beyond the simplistic RCT vs. RWD debate. The recognition that randomization can be embedded within real-world data collection has led to the rise of Randomized Real-World Evidence (R²WE).
R²WE combines the analytical rigor of randomization—the best way to control for confounding—with the real-world relevance of RWD. Pragmatic trials are a key source of R²WE, as they are designed to answer the practical question: does this treatment work in the messy, complex environment of real clinical settings? Innovative designs like Cohort multiple RCTs (cmRCTs) or Trials-within-Cohorts (TwiCs) take this a step further. These designs embed multiple randomized trials within a single large, existing observational patient cohort. Patients in the cohort consent to their data being used for research and to be potentially offered participation in future trials. This model dramatically streamlines recruitment, reduces costs, and makes research a more integrated part of routine care, as pioneered by the EORTC’s OligoCare cohort.
R²WE does not replace traditional RCTs, which remain the gold standard for demonstrating efficacy for initial drug approval. Instead, it offers a powerful and practical path forward for answering a wide range of questions in oncology, especially for comparative effectiveness, label expansion into broader populations, and optimizing treatment strategies for fragmented patient subgroups.
From Lab to Clinic: RWD’s Impact on Drug Development and Regulatory Approval
Real world data for clinical evidence generation in oncology is no longer a niche application; it is becoming a critical asset that streamlines and informs the entire drug development lifecycle, from initial discovery to post-marketing surveillance and beyond.
RWD provides critical insights at every stage:
- Discovery and translation: In the earliest stages, RWD helps researchers understand the natural history of a disease, identifying patient populations with high unmet medical needs. By analyzing large-scale clinical and genomic datasets, scientists can validate new therapeutic targets, identify potential biomarkers for patient stratification, and understand the real-world burden of disease, which helps build the case for a new therapeutic program.
- Clinical study design and execution: RWD is transforming how clinical trials are planned. It can be used to model the impact of different inclusion/exclusion criteria on potential recruitment numbers, helping to design more realistic and efficient protocols. It also helps identify clinical trial sites with large pools of eligible patients and can be used to build more effective patient diversity plans by characterizing patient demographics at different locations, a key focus for both the FDA and EMA.
- Regulatory submission: RWE is increasingly being included in regulatory submissions to provide supportive evidence. Its most prominent role is in creating external control arms for single-arm trials in rare cancers, which has supported several oncology drug approvals. RWE is also used to support label expansion applications, demonstrating a drug’s effectiveness in a broader population than was studied in the pivotal trials. The FDA’s RWE Program provides clear guidance on how such evidence can be used to support regulatory decision-making.
- Value and Market Access: After a drug is approved, RWE is essential for demonstrating its value to payers and HTA bodies. This evidence can show how a treatment performs compared to the local standard of care, its impact on healthcare resource utilization (like hospitalizations), and its overall cost-effectiveness in a specific healthcare system. This real-world value demonstration is crucial for securing favorable reimbursement and ensuring patient access.
- Post-marketing surveillance and pharmacovigilance: Once a drug is on the market, RWD enables continuous, active safety monitoring. Instead of relying on passive reporting of adverse events, researchers can proactively query large datasets to detect rare or long-term safety signals much more quickly. RWD is also used to fulfill post-marketing requirements or commitments mandated by regulatory agencies as a condition of approval.
How RWD Informs Regulatory and HTA Decisions
Globally, regulatory agencies and HTA bodies are systematically integrating RWE into their decision-making frameworks, marking a fundamental shift in how medical evidence is generated and valued.
RWE is used to:
- Support accelerated drug access and single-arm trial approvals, often by providing robust historical or external control data that contextualizes the results of a non-randomized study.
- Enable conditional reimbursement or coverage with evidence development (CED) schemes, where payers agree to cover a new therapy while further RWD is collected to confirm its long-term value and effectiveness.
- Fill evidence gaps left by RCTs, such as a drug’s performance in elderly patients or those with comorbidities, providing a more complete picture for clinicians and policymakers.
- Support HTA submissions by demonstrating a drug’s real-world value proposition and cost-effectiveness, tailored to the specific context of a national or regional healthcare system.
- Inform and update clinical treatment guidelines with evidence that reflects how different therapies are being used and how they perform across the full spectrum of clinical scenarios.
The goal is to create a more holistic and dynamic evidence ecosystem, modernizing evidence generation by combining the internal validity of trials with the external validity and breadth of RWD. This synergy leads to better-informed, faster decisions that ultimately help more patients access effective treatments.
Frequently Asked Questions about RWD in Oncology
Exploring real world data for clinical evidence generation in oncology raises many important questions for researchers, clinicians, and patients. Here are answers to some of the most common ones.
How does RWD help study cancer in specific patient groups?
Traditional trials often have narrow eligibility criteria that exclude complex patients, such as the elderly, those with pre-existing conditions (comorbidities), or patients with poor performance status. Real world data for clinical evidence generation in oncology inherently includes these underrepresented groups, providing a more complete and generalizable picture of treatment effectiveness. It allows researchers to analyze outcomes in specific real-world scenarios, such as how a drug performs in patients with organ dysfunction or those on concomitant medications. This is especially vital for precision oncology. For targeted therapies aimed at rare genetic mutations, patient populations can be too small and geographically dispersed for traditional trials. RWD allows for the aggregation of data from numerous institutions, creating a large enough cohort to validate biomarkers, understand treatment patterns, and assess outcomes in these rare subgroups, thereby accelerating the promise of personalized medicine.
What are the main ethical considerations for using RWD?
The use of RWD, which is derived from personal health information, requires strict ethical oversight and robust governance. Key considerations include:
- Patient privacy and de-identification: Protecting patient identity is non-negotiable. This requires sophisticated de-identification techniques to remove direct identifiers (like name and address) and mitigate the risk of re-identification from quasi-identifiers. Regulations like GDPR and HIPAA provide a legal framework for this.
- Informed consent: For retrospectively collected data, obtaining specific consent for each research project is often impossible. This has led to models like “broad consent,” where patients agree upfront for their de-identified data to be used for future research. Regardless of the model, transparency with patients about how their data is being used is a crucial ethical principle.
- Data security and governance: Data must be stored in highly secure environments to prevent breaches. Governance structures, such as Data Access Committees, are needed to review and approve research requests, ensuring that the data is used responsibly and for scientifically valid purposes. Technologies like Lifebit’s Trusted Research Environment (TRE) provide a secure, auditable space for analysis, minimizing risk.
- Transparency and fairness: Research methodologies and findings must be transparent to ensure scientific validity and reproducibility. Furthermore, it’s critical that insights from RWD are applied equitably to avoid perpetuating or creating new healthcare disparities. Analyses must be designed to understand treatment effects across different demographic and socioeconomic groups.
How is data quality in RWD ensured for regulatory-grade evidence?
Turning raw RWD into regulatory-grade evidence requires a meticulous, multi-step process to ensure data quality, reliability, and relevance. This is a major focus for agencies like the FDA. Key steps include:
- Data Curation and Validation: This involves a rigorous process of cleaning the data, correcting errors, handling missing values, and validating key data points against source documents where possible. It’s about transforming messy, raw data into a research-ready dataset.
- Use of Common Data Models (CDMs): To enable analysis across different data sources (e.g., multiple hospitals), data is often mapped to a CDM like the Observational Medical Outcomes Partnership (OMOP). A CDM standardizes the structure, format, and vocabulary of the data, ensuring consistency and interoperability.
- Defining Data Provenance: It is essential to maintain a clear, auditable trail of where the data came from and every transformation it has undergone. This transparency allows regulators and other researchers to assess the data’s lineage and quality.
- Fitness-for-Purpose Assessment: Not all RWD is suitable for every research question. A critical step is to assess whether a given dataset has the necessary completeness, accuracy, and granularity (e.g., availability of specific biomarkers or staging information) to reliably answer the question at hand.
What is the future of real world data for clinical evidence generation in oncology?
The future is dynamic, integrated, and powered by AI. Key trends that are shaping the landscape include:
- Integration with multi-modal data: The true power of RWD will be unlocked by combining clinical RWD with other data types, including genomic, transcriptomic (‘omics’), medical imaging, and digital pathology data. This multi-modal view will provide unprecedented insights into disease biology and treatment response, fueling the next wave of precision oncology.
- AI and machine learning: Advanced analytics will be essential for extracting predictive insights from these vast and complex datasets. AI and ML, like the tools in our Real-time Evidence & Analytics Layer (R.E.A.L.), can identify complex patterns, predict patient outcomes, and automate the extraction of information from unstructured data like clinical notes and pathology reports.
- Federated data networks: To overcome data silos and privacy barriers, the future is federated. Our federated AI platform enables global collaboration by bringing the analysis to the data, rather than moving sensitive data. This ensures security and privacy while allowing researchers to query massive, international datasets.
- The Learning Healthcare System: Ultimately, these trends are converging toward the creation of a learning healthcare system. In this system, RWD is collected at the point of care, analyzed in near-real-time to generate new evidence, and the resulting insights are fed back to clinicians to inform treatment decisions for the next patient. This creates a continuous cycle of improvement, closing the gap between research and practice.
The Future is Real: Embracing RWD for a New Era in Oncology
The landscape of real world data for clinical evidence generation in oncology is undergoing a revolution. We are moving beyond the limitations of traditional trials toward a more inclusive approach that captures the full complexity of cancer care.
Challenges like data quality and privacy are being actively addressed with innovative technology and methods. The rise of Randomized Real-World Evidence (R²WE) shows we can achieve both scientific rigor and real-world applicability.
With regulatory bodies like the FDA’s Oncology Center of Excellence championing this shift, it’s clear that real world data for clinical evidence generation in oncology is the future of cancer research.
At Lifebit, we are enabling this change. Our federated AI platform, featuring our Trusted Research Environment (TRE), Trusted Data Lakehouse (TDL), and R.E.A.L. (Real-time Evidence & Analytics Layer), allows researchers to securely analyze global biomedical data without moving it, breaking down data silos while ensuring privacy.
Our federated approach solves a core challenge in RWD research: enabling secure, global collaboration by bringing the analysis to the data.
The patient impact is already clear, with faster approvals and more personalized treatments. The future integration of RWD with genomic data, powered by AI, will create a learning healthcare system that continuously improves our understanding of cancer.
The future of oncology is data-driven and patient-centered, with real-time evidence generation that closes the gap between research and practice.
We are committed to making this future a reality. Our platform empowers researchers to harness the full potential of RWD. The revolution is here, and we are building a world where every cancer patient benefits from the collective wisdom in our health data.