The Pharma Executive’s Guide to Real World Data: Oncology and Beyond

Why Real World Data Pharma Is Reshaping Drug Development
Real world data pharma is clinical information collected from routine healthcare settings—like electronic health records (EHRs), insurance claims, patient registries, and wearables—that pharmaceutical companies use to generate evidence about how drugs perform outside controlled trials. In the modern landscape of drug development, RWD has evolved from a supplementary tool to a primary driver of innovation, regulatory strategy, and market access.
Key Applications of Real World Data in Pharma:
- Drug Development – Identify patient populations, design trials, and fill evidence gaps where randomized controlled trials (RCTs) aren’t feasible. This includes optimizing trial protocols to ensure they reflect the actual patient journey.
- Regulatory Approvals – Support FDA submissions for new indications, accelerated approvals, and post-market requirements under the 21st Century Cures Act. This legislation has fundamentally changed how the FDA views evidence, allowing for more flexible pathways for life-saving treatments.
- Safety Surveillance – Monitor adverse events and long-term effectiveness across diverse patient populations in real-world settings. Unlike trials, which may only last months, RWD allows for years of longitudinal tracking.
- Precision Medicine – Enable oncology label expansions, rare disease natural history studies, and personalized treatment strategies by identifying specific biomarkers and genetic profiles that respond to therapy.
- Market Access – Demonstrate comparative effectiveness and real-world value for payers and health technology assessments (HTAs), ensuring that drugs are not only approved but also reimbursed and accessible to patients.
Traditional randomized controlled trials were the gold standard for decades. But they’re expensive, slow, and exclude the very patients who need treatments most—pregnant women, children, older adults, and people with multiple conditions. Real world data (RWD) captures what happens when real patients take real medications in real clinics. When analyzed rigorously, it becomes real-world evidence (RWE)—clinical proof that regulators like the FDA now accept for drug approvals. This shift is not just about efficiency; it is about equity and ensuring that medical research reflects the true diversity of the human population.
The numbers tell the story. The real-world evidence analytics market is projected to reach $2.93 billion by 2029. The FDA approved a new use for a transplant drug based entirely on real-world evidence in July 2021. And 79% of biopharma executives report their organizations are extremely or very committed to RWE investments—because it works. Furthermore, the integration of RWD into the drug lifecycle can potentially save hundreds of millions of dollars in development costs by reducing the need for massive, multi-year Phase III trials in certain indications.
But there’s a gap. While 90% of pharma leaders say RWE delivers measurable business outcomes, only 9% consider their programs extremely successful. The challenge isn’t whether to use real world data pharma—it’s how to do it right. Data quality issues, confounding bias, lack of standardization, and cybersecurity risks create barriers that slow progress and waste resources. Many organizations struggle with “data silos,” where valuable information is trapped in incompatible formats across different hospital systems or geographic regions.
I’m Maria Chatzou Dunford, CEO and Co-founder of Lifebit, where we’ve built a federated AI platform that powers real world data pharma research across secure, compliant environments for public sector institutions and pharmaceutical organizations globally. Over the past 15 years working in computational biology, genomics, and health-tech, I’ve seen how the right infrastructure can open up real world data pharma to accelerate drug discovery and improve patient outcomes. Our mission is to bridge the gap between raw data and actionable medical insights, ensuring that every patient’s data point contributes to the next breakthrough.

Real world data pharma helpful reading:
Why Real World Data Pharma Is Overtaking Outdated RCTs
For years, the pharmaceutical industry relied almost exclusively on Randomized Controlled Trials (RCTs). While RCTs are excellent for establishing efficacy in a vacuum, they often fail to reflect the complexity of everyday medicine. This creates what we call the efficacy-effectiveness gap: a drug might work perfectly in a pristine lab setting but perform differently when a 75-year-old patient with three other conditions takes it at home. This gap is particularly pronounced in chronic diseases like diabetes or cardiovascular conditions, where lifestyle factors and medication adherence play a massive role in outcomes.

The shift toward real world data pharma is driven by the need to understand the real-world-data-vs-real-world-evidence distinction. While RWD is the raw material, RWE is the actionable insight. By utilizing RWD, we can capture longitudinal patient journeys—tracking a patient from their first symptom through diagnosis, treatment, and long-term follow-up. This provides a much more granular view of the “patient experience” than the snapshot provided by a traditional trial.
| Feature | Randomized Controlled Trials (RCTs) | Real-World Data (RWD) |
|---|---|---|
| Environment | Highly controlled, protocol-driven | Routine clinical practice |
| Patient Diversity | Strict inclusion/exclusion criteria | Diverse, representative populations |
| Cost | Extremely high per patient ($40k+) | More cost-effective at scale |
| Duration | Fixed timeframe (months to years) | Can track patients for decades |
| External Validity | Limited to trial population | High (reflects general population) |
| Data Source | Case Report Forms (CRFs) | EHRs, Claims, Wearables, Registries |
A landmark moment occurred when the FDA approved new use of transplant drug based on real-world evidence. Instead of demanding a new multi-year trial, the FDA looked at data from a registry that tracked patients already using the drug. This proved that RWD isn’t just a “nice-to-have”—it’s a regulatory-grade asset that saves lives by getting treatments to patients faster. This case study has become a blueprint for how pharma companies can leverage existing data to expand labels without the prohibitive costs of new interventional studies.
Where Real World Data Pharma Comes From
Generating robust evidence requires pulling from a variety of sources. We typically categorize these into several primary buckets, each offering a unique piece of the clinical puzzle:
- Electronic Health Records (EHRs): These contain rich, clinical details like physician notes, lab results, and diagnostic images. However, they are often siloed within specific hospital systems and require advanced Natural Language Processing (NLP) to extract meaningful insights from unstructured text.
- Medical Claims: This is administrative data from insurers. It’s excellent for tracking the “where” and “when” of care, providing a high-level view of healthcare utilization, but lacks the “why” found in clinical notes. We often advocate to link-claims-ehr-data-real-world-evidence to get a 360-degree view of the patient.
- Product and Disease Registries: Organized systems that collect standardized clinical data for specific populations, such as cancer patients or those with rare genetic disorders. These are often the most “research-ready” sources of RWD.
- Wearable Devices and DHTs: Digital Health Technologies (DHTs) provide continuous monitoring of heart rate, sleep, and activity levels, offering insights into a patient’s daily functional status that are impossible to capture in a clinic visit.
- Pharmacy Data: Detailed records of prescriptions filled, dosage changes, and medication adherence. This is critical for understanding how patients actually take their medicine in the real world.
For a deeper dive, check out these more info on RWD examples.
Navigating the FDA’s RWE Framework for Faster Approvals
The regulatory landscape has shifted dramatically in favor of real world data pharma. The 21st Century Cures Act of 2016 was the catalyst, mandating that the FDA create a framework for using RWE in regulatory decision-making. This was further bolstered by PDUFA VII, which introduced pilot programs specifically designed to advance the use of RWE for new drug indications and post-market study requirements. The FDA’s “Advancing Real-World Evidence Program” (ARWE) now provides a structured pathway for sponsors to gain feedback on their RWE proposals.
To succeed here, we recommend early engagement with the FDA through Type C meetings. These meetings allow sponsors to discuss their RWD sources and proposed analytical methods before submitting a formal application. The FDA’s guidance on submitting documents using real-world data and real-world evidence to FDA emphasizes that the data must be “fit-for-purpose”—meaning it must be both relevant to the clinical question and reliable in its quality. Reliability is assessed through data accrual, data curation, and the integrity of the data provenance.
Understanding the us-regulatory-guidance-on-using-real-world-data is essential for any executive looking to shave months or years off their development timelines. It is not just about having the data; it is about proving that the data is robust enough to support a regulatory decision.
How Real World Data Pharma Accelerates Oncology and Rare Disease Approvals
Oncology is perhaps the most advanced field for RWD application. Because many cancer treatments target specific genetic mutations, finding enough patients for a traditional RCT can be nearly impossible. In these cases, RWD acts as a bridge to innovation.
- Precision Oncology: We use real-world-data-for-clinical-evidence-generation-in-oncology to identify patient subgroups that respond best to targeted therapies. This allows for more personalized treatment plans and better patient outcomes.
- External Control Arms (ECAs): Instead of giving half the patients a placebo—which can be ethically challenging in terminal illnesses—we use RWD to create a “synthetic” control group of patients receiving the current standard of care. This is more ethical, faster, and often more representative of the actual standard of care.
- Label Expansion: RWD allows us to see how a drug approved for one type of cancer performs in another, supporting new indications without starting from scratch. This has been particularly successful in expanding the use of immunotherapies.
- Rare Diseases: In cases where only a few hundred people have a condition globally, RWD registries are often the only way to conduct natural history studies to understand how the disease progresses over time. This data is vital for establishing the baseline against which new treatments are measured.
Overcoming the 3 Biggest Barriers to RWD Success
Despite the potential, real world data pharma isn’t a magic wand. There are significant problems that often stop projects in their tracks, requiring a combination of technical expertise and strategic planning to overcome.
-
Data Quality and Confounding Bias: RWD is “dirty.” It was collected for treatment or billing, not research. Missing data points are common, and “confounding” occurs when unobserved factors (like a patient’s lifestyle, socioeconomic status, or diet) influence the results. Addressing this requires rigorous challenges-of-using-real-world-data-in-research protocols, including advanced statistical techniques like propensity score matching to ensure that comparisons between groups are fair.
-
Lack of Standardization: Every hospital records data differently. One clinic might record a diagnosis as a code, while another uses a free-text note. Without a common language, you can’t compare data from New York with data from London. We solve this by mapping data to the OMOP Common Data Model (CDM). This framework, supported by the OHDSI community, ensures interoperability across global networks, allowing researchers to run the same analysis across multiple disparate datasets simultaneously.
-
Cybersecurity and Privacy: Healthcare data is a prime target for cyberattacks. In fact, over 40% of healthcare organizations were hit by the WannaCry cryptoworm in 2017. Furthermore, strict privacy laws like GDPR in Europe and HIPAA in the US mean you can’t just move patient data across borders. This “data residency” requirement is one of the biggest hurdles for global pharma companies. To mitigate these risks, we follow the best practices for safety studies using electronic healthcare data sets provided by the FDA, which include maintaining strict audit trails, data encryption, and robust data provenance records.
Beyond these three, there is also the challenge of data silos. Many healthcare providers are hesitant to share data due to competitive concerns or lack of technical infrastructure. Overcoming this requires a shift toward collaborative research models where data owners retain control while allowing researchers to extract value.
The Future of Pharma: AI, Federated Access, and Global Data Networks
The next frontier of real world data pharma is the integration of Generative AI and Federated Learning. We are moving away from the old model of “centralizing” data (moving it all to one place), which is a security nightmare, a legal minefield, and often technically impossible due to the sheer volume of data involved.
Instead, we use ai-for-real-world-evidence to analyze data where it lives. This is the core of Lifebit’s philosophy. By using a Trusted Research Environment (TRE), researchers can send their algorithms to the data, rather than moving the data to the algorithms. This respects data sovereignty while allowing for massive, multi-country studies. This “data stays, code moves” paradigm is the only way to scale global health research in a post-GDPR world.
The combination of generative-ai-and-omop-revolutionizing-real-world-evidence allows us to automate the “cleaning” of data, turning unstructured doctor’s notes into standardized, research-ready variables in seconds. Generative AI can also help in creating synthetic data cohorts that preserve patient privacy while allowing for initial hypothesis testing.
Scaling Global Research with Real World Data Pharma
As we look toward 2026 and beyond, the focus is on multi-omic integration—combining EHR data with genomic, proteomic, and imaging data. This provides a truly holistic view of human health, moving from “what” happened to a patient to “why” it happened at a molecular level. This is the ultimate goal of precision medicine: the right drug for the right patient at the right time.
Leading organizations are now building their own real-world-data-company-complete-guide strategies to ensure they have secure, real-time access to these global data networks. Whether you are in the USA, UK, Israel, or Canada, the ability to collaborate across 5 continents without compromising security is the ultimate competitive advantage. The future of pharma lies in these distributed networks, where the collective intelligence of global healthcare systems can be harnessed to solve the world’s most pressing medical challenges.
Furthermore, the rise of Digital Twins—virtual representations of patients built from RWD—will allow researchers to simulate drug effects before a single dose is given to a human. This could revolutionize Phase I and II testing, making drug development safer and more predictable than ever before.
Frequently Asked Questions about Real World Data Pharma
How does RWD differ from data collected in randomized controlled trials (RCTs)?
The main difference is the setting and the population. RCTs happen in a “lab-like” environment with strict rules to isolate a drug’s effect, often excluding patients with comorbidities. RWD happens in the “wild”—it’s the data generated by actual doctors and patients during routine care. While RCTs prove a drug can work (efficacy), RWD proves it does work in the general population (effectiveness). For more, see our real-world-data-definition.
What are the primary regulatory considerations for non-interventional studies?
Non-interventional studies (where the researcher doesn’t assign the treatment) generally don’t require an Investigational New Drug (IND) application. However, if you want the FDA to accept the results for a label expansion or approval, you must ensure:
- Transparency: Provide the protocol and statistical analysis plan (SAP) before you start to avoid “data dredging.”
- Data Access: The FDA must be able to review the patient-level data if requested.
- Audit Trails: You must prove where the data came from and how it was changed during the curation process.
Check the us-regulatory-guidance-on-using-real-world-data for more specifics.
How is AI being used to generate RWE from unstructured clinical text?
AI, specifically Natural Language Processing (NLP), is used to “read” through millions of pages of doctor’s notes. It can identify things like a patient’s smoking status, their specific symptoms, or why they stopped taking a medication—details that are often missing from structured forms. This is called using AI to accelerate the discovery of insights and evidence from clinical text, and it’s a game-changer for data curation, turning unusable text into regulatory-grade evidence.
Can RWD replace Phase III clinical trials?
While RWD is increasingly used to support or even replace certain aspects of Phase III trials (like control arms), it is rarely a total replacement for the primary efficacy data required for a completely new drug class. However, for label expansions, rare diseases, and safety monitoring, RWD is becoming the primary source of evidence. The goal is a hybrid approach where RCTs and RWD complement each other to provide a fuller picture of drug performance.
Conclusion
The era of relying solely on “perfect” clinical trials is over. Real world data pharma is now a fundamental pillar of modern drug development, offering a path to faster approvals, more accurate safety monitoring, and better patient outcomes. The transition from traditional methods to RWD-driven strategies is not just a trend; it is a necessary evolution in a world where healthcare costs are rising and patient needs are becoming more complex.
At Lifebit, we believe that the biggest barrier to medical progress shouldn’t be data access. Our federated AI platform provides the secure, compliant infrastructure needed to link disparate data sources—from EHRs to multi-omics—across the globe. By keeping data where it resides and bringing the analysis to the source, we enable pharma executives to generate real-world evidence in real-time, without compromising on security or privacy.
The future of medicine is distributed, data-driven, and patient-centric. By embracing real world data pharma, we can move toward a healthcare system that learns from every patient interaction, turning the “messy” reality of clinical practice into the life-saving treatments of tomorrow. Are you ready to lead the charge?