Real World Evidence Generation: A Primer for the FDA Era

Real World Evidence Generation: How to Slash Drug Approval Timelines in 2026
Real world evidence generation is the process of changing patient data collected outside traditional clinical trials—such as electronic health records, medical claims, and wearable device data—into actionable clinical insights that inform drug approvals, treatment decisions, and healthcare policy.
What you need to know about Real world evidence generation:
- RWD sources include electronic health records (EHRs), medical claims, patient registries, wearable devices, and patient-reported data
- RWE generation requires rigorous data quality assessment across conformance, completeness, and plausibility dimensions
- Regulatory acceptance has expanded significantly since the 21st Century Cures Act, with FDA creating formal frameworks for RWE in drug approvals
- Key advantages over randomized controlled trials include broader patient representation, long-term outcome tracking, and faster, more cost-effective evidence generation
- Primary challenges involve managing bias, ensuring data quality across siloed sources, and meeting fitness-for-purpose standards for specific research questions
The healthcare industry is experiencing a fundamental shift in how we generate clinical evidence. Historically, the FDA relied almost exclusively on randomized controlled trials (RCTs) for drug approval decisions. But RCTs have limitations—they’re expensive, time-consuming, and often exclude the very patients who will ultimately use approved treatments. During the early days of the COVID-19 pandemic, real world evidence generation proved critical in accelerating vaccine rollout by providing rapid insights on safety and effectiveness across diverse populations that clinical trials couldn’t reach fast enough.
Today, regulators worldwide recognize that data from routine clinical practice can complement—and sometimes replace—traditional trial evidence. The 21st Century Cures Act required the FDA to expand the role of real-world evidence, leading to the 2018 Framework that now guides how RWE can support new drug indications and post-approval requirements. Health Canada, NICE, and other global regulators have followed suit with their own guidance documents.
Yet changing raw healthcare data into regulatory-grade evidence remains complex. Organizations struggle with fragmented data sources, inconsistent quality standards, and the technical barriers of bias mitigation and validation. Nine out of ten pharmaceutical executives report that their RWE investments deliver measurable business outcomes, but only 9% consider their programs extremely successful—revealing a significant execution gap.
I’m Maria Chatzou Dunford, CEO and Co-founder of Lifebit, where we’ve spent over a decade building federated AI platforms that enable secure, compliant real world evidence generation across siloed health datasets for pharmaceutical companies and public sector institutions worldwide—from our offices in London and New York to our teams in Singapore and Israel. This guide breaks down the fundamentals of RWE generation in the FDA era, from data quality principles to regulatory requirements, so you can steer this complex landscape with confidence.

Real world evidence generation basics:
RWD vs. RWE: Turn Messy Data Into Regulatory-Grade Evidence
To master real world evidence generation, we must first distinguish between the raw ingredients and the finished product. This distinction is not merely semantic; it is the foundation of regulatory acceptance and scientific validity.
Real-World Data (RWD) refers to the data relating to patient health status and the delivery of healthcare routinely collected from a variety of sources. It is the “raw” information—every prescription filled, every heart rate recorded by a smartwatch, and every diagnosis code entered into a billing system. RWD is often messy, unstructured, and collected for purposes other than research (such as billing or clinical documentation).
Real-World Evidence (RWE) is the clinical evidence regarding the usage and potential benefits or risks of a medical product derived from the analysis of RWD. In other words, RWE is what happens when we apply rigorous scientific methods, statistical controls, and clinical context to RWD to answer a specific research question.
The distinction is vital: you can have mountains of data, but without proper analysis, validation, and context, it isn’t evidence. As we steer the Real-World Data vs. Real-World Evidence landscape, we see that RWE can be generated through various study designs, including retrospective observational studies, prospective registries, and even pragmatic clinical trials. According to official FDA guidance, the goal is to use this evidence to support regulatory decision-making, such as approving new indications for a drug already on the market.
Primary Sources of Real-World Data
Where does this data come from? We generally categorize Real-World Data into several primary buckets, each with its own strengths and limitations:
- Electronic Health Records (EHRs): These contain a wealth of clinical information, including lab results, physician notes, and imaging. EHRs provide deep clinical granularity but are often fragmented across different health systems.
- Medical Claims and Billing Data: This is a massive source of information on healthcare utilization, costs, and longitudinal patient journeys. Because claims follow the patient across different providers, they are excellent for tracking long-term outcomes and total cost of care, though they lack the clinical “why” found in EHRs.
- Product and Disease Registries: Organized systems that collect standardized clinical data for specific populations, such as patients with rare diseases or those treated with a specific medical device. These are often higher quality than general EHR data because they are designed for research from the outset.
- Patient-Generated Data and Wearables: This includes data from in-home settings and wearable devices (like Apple Watches or Fitbits) that provide continuous monitoring of health metrics like heart rate, sleep patterns, and physical activity. This “off-site” data is becoming increasingly important for understanding the daily impact of chronic diseases.
- Social Determinants of Health (SDoH): Emerging RWD sources now include socioeconomic data, such as zip code-level information on food security, transportation access, and environmental exposures, which are critical for understanding health inequities in real world evidence generation.
Understanding these Real-World Data Definition nuances is the first step toward building a robust evidence-generation strategy that meets the high bar of regulatory scrutiny.
The Mechanics of Real World Evidence Generation: Proving Fitness-for-Purpose
Generating evidence that a regulator like the FDA or Health Canada will actually accept requires more than just running a query on a database. We have to prove the data is “fit for purpose.” This involves a two-pronged evaluation: Data Quality and Relevance.
Ensuring Data Quality in Real world evidence generation
Scientific research on data quality frameworks emphasizes three critical pillars that must be documented in any regulatory submission:
- Conformance: Does the data follow established standards and formats? This is where the OMOP Common Data Model (CDM) becomes essential. By mapping disparate data sources to a single standard, we ensure that a “myocardial infarction” in a London hospital is coded the same way as one in a New York clinic.
- Completeness: Are the necessary data elements present? For example, a dataset used for a cancer study is useless if it’s missing the date of diagnosis, the specific stage of the tumor, or the line of therapy. We must also account for “missingness”—is data missing at random, or is there a systematic reason why certain patients lack records?
- Plausibility: Does the data make sense? If a patient’s record shows a prostate cancer diagnosis for a biological female, or a blood pressure reading of 300/200, the data lacks plausibility. Automated data characterization tools are now used to flag these anomalies before analysis begins.

To bridge the gaps often found in single-source datasets, we frequently Link Claims EHR Data for Real-World Evidence. Linking claims (which show what happened) with EHRs (which show why it happened) creates a more complete picture of the patient journey. For instance, a claim might show a patient stopped taking a drug, but only the EHR note reveals it was due to a specific side effect rather than treatment failure.
The Role of Data Provenance and Traceability
Regulators now require a clear “audit trail” for RWE. This means documenting every step of the data’s journey—from the initial extraction from the hospital system to the final statistical analysis. This is known as data provenance. If a regulator cannot see how a raw data point was transformed into a final result, the evidence may be deemed unreliable. Modern real world evidence generation platforms use containerized workflows to ensure that every analysis is reproducible and transparent.
Key Elements of a Well-Designed Study Protocol
A “regulatory-grade” RWE study must be as carefully planned as a traditional trial. According to ENCePP (European Network of Centres for Pharmacoepidemiology and Pharmacovigilance) protocol checklists, a well-designed protocol should include:
- A Clearly Defined Research Question: Using the PICO framework (Population, Intervention, Comparator, Outcome).
- Appropriate Estimands: Precise definitions of the treatment effect being measured, accounting for intercurrent events like treatment discontinuation.
- Bias Mitigation Strategies: Detailed plans for how to account for the fact that patients weren’t randomly assigned to their treatments.
- Validation of Variables: Proving that a “diagnosis code” in the data actually corresponds to a real-world clinical event through medical record review or validation studies.
Why RCTs Fail: The Strategic Power of Real World Evidence
For decades, Randomized Controlled Trials (RCTs) were the undisputed “gold standard.” However, they have a major flaw: they are conducted in a “clinical vacuum.” While RCTs are excellent for establishing efficacy (can the drug work?), they often struggle to demonstrate effectiveness (does the drug work in the real world?).
RCTs often use strict inclusion and exclusion criteria, meaning they exclude elderly patients, pregnant women, or people with multiple comorbidities—the very people who often make up the majority of the real-world patient population. This creates a gap in generalizability.
| Feature | Randomized Controlled Trials (RCTs) | Real-World Evidence (RWE) |
|---|---|---|
| Setting | Controlled, experimental environment | Routine clinical practice |
| Population | Highly selected (homogeneous) | Broad and diverse (heterogeneous) |
| Cost | Extremely high ($100M+) | Lower (uses existing data) |
| Duration | Long (years of recruitment/follow-up) | Faster (can use retrospective data) |
| Safety | Good for common side effects | Excellent for rare adverse events |
| Flexibility | Rigid protocols | Adaptable to changing standards of care |
Overcoming the Limitations of Traditional Clinical Trials
By utilizing real world evidence generation, we can study Real-World Data for Clinical Evidence Generation in Oncology and other complex fields where long-term outcomes like “overall survival” might take a decade to manifest in a trial. RWE allows us to track these outcomes across millions of patients over many years, providing a level of statistical power that no single trial could achieve.
The Rise of External Control Arms (ECAs)
One of the most transformative applications of RWE is the creation of External Control Arms. In rare diseases or late-stage oncology, it is often unethical to give a patient a placebo. Instead of a traditional control group, researchers can use RWD to create a “synthetic” or “external” control arm of patients who received the current standard of care. This allows every patient in the actual trial to receive the experimental drug, accelerating recruitment and improving the ethical profile of the study. The FDA has already approved several drugs, such as Blincyto for leukemia, using evidence from external control arms.
Target Trial Emulation
To ensure RWE is as robust as an RCT, researchers use a framework called “Target Trial Emulation.” This involves designing the observational study to mimic the design of a hypothetical RCT as closely as possible. By defining a “time zero” (the moment treatment starts) and using strict eligibility criteria, researchers can minimize the “immortal time bias” that often plagues poorly designed observational studies. This methodology is becoming the standard for high-impact real world evidence generation.
Beat the Regulatory Maze: FDA, EMA, and Health Canada RWE Standards
The regulatory landscape has changed dramatically. The FDA’s 2018 Framework for RWE marked a turning point, signaling that the agency is open to using RWE to support new drug indications or satisfy post-approval study requirements. This was further bolstered by the 2023 guidance on using RWD to support regulatory decision-making for biologics and drugs.
In Canada, Health Canada’s R2D2 (Regulatory Review of Drugs and Devices) initiative is pushing for similar integration, focusing on how RWE can fill data gaps for drugs with conditional approvals. Meanwhile, the UK’s NICE RWE framework provides clear principles for using real-world data to inform health technology assessments (HTA) and pricing decisions.
The European Perspective: DARWIN EU
In Europe, the European Medicines Agency (EMA) has launched the DARWIN EU (Data Analysis and Real-World Interrogation Network). This is a federated network of data partners across Europe that allows the EMA to quickly generate RWE to support scientific committees. This initiative highlights the global shift toward a “data-first” regulatory approach, where RWE is used throughout the entire product lifecycle, from pre-authorization to post-market surveillance.
Regulatory Standards for Real world evidence generation
To be successful in a regulatory submission, we must adhere to US Regulatory Guidance on Using Real-World Data. This includes:
- Transparency and Pre-specification: Protocols and statistical analysis plans (SAPs) should be finalized and often publicly registered (e.g., on ClinicalTrials.gov or the EU PAS Register) before the analysis begins. This prevents “data dredging” or “p-hacking,” where researchers keep changing the analysis until they find a significant result.
- Data Standards and Interoperability: Submissions should use standardized formats like CDISC to facilitate review. Regulators are increasingly comfortable with the OMOP CDM, but the mapping process must be thoroughly documented.
- Audit Trails and Data Access: Regulators may want to see the “provenance” of the data—how it was extracted, cleaned, and transformed. In some cases, they may even request access to the underlying patient-level data to verify the findings.
FDA registry assessments emphasize that while registries are powerful, their data must be validated against medical records to ensure accuracy before they can support a marketing application. This “source data verification” is a critical step in the real world evidence generation pipeline.
Stop Data Silos: Solving the Biggest Challenges in RWE
Despite the promise, real world evidence generation is not without significant hurdles. The Challenges of Using Real-World Data in Research are primarily methodological and technical, requiring sophisticated solutions to overcome.
- Selection Bias and Confounding by Indication: This is the “Achilles’ heel” of RWE. In the real world, doctors don’t flip a coin to decide which drug to give a patient. They choose based on the patient’s severity, age, and comorbidities. This means patients who receive a certain drug might be inherently sicker than those who don’t, making the drug look less effective.
- Information Bias and Measurement Error: If one hospital records data differently than another, or if a specific side effect is under-reported in clinical notes, it can create “noise” in the results. Unlike RCTs, where every event is captured on a Case Report Form, RWD only captures what happens during a clinical encounter.
- Data Silos and Fragmentation: Patient data is often fragmented across different systems—pharmacy, primary care, specialist clinics, and hospitals—that don’t talk to each other. Creating a longitudinal view of a patient requires complex data linkage strategies.
Strategies for Bias Mitigation and Uncertainty
To combat these issues, modern real world evidence generation employs advanced statistical techniques that go far beyond simple regression models:
- Propensity Score Matching (PSM) and Weighting: This helps us create a “balanced” comparison group by matching patients who have similar characteristics but received different treatments. It essentially tries to “re-randomize” the observational data.
- Instrumental Variable Analysis: A technique borrowed from economics that uses a “proxy” variable to account for unmeasured confounding.
- Sensitivity Analysis (E-values): We test how much our results change if we alter our assumptions. For example, we can calculate an “E-value” to determine how strong an unmeasured confounder would have to be to negate our observed treatment effect.
- Quantitative Bias Analysis (QBA): This allows us to estimate the potential impact of selection bias or misclassification on our findings, providing a range of possible truths rather than a single point estimate.
Addressing Missing Data
Missing data is inevitable in RWD. Rather than simply excluding patients with missing records (which introduces more bias), researchers use techniques like Multiple Imputation by Chained Equations (MICE). This uses the patterns in the existing data to “fill in” the missing values with a range of plausible estimates, ensuring the final analysis reflects the uncertainty inherent in the data.
Federated AI: The Future of Real-Time Evidence Generation
We are entering an era of Real-Time Evidence Generation. The traditional model of RWE—where data is extracted, cleaned, and analyzed in a static batch—is being replaced by dynamic, continuous insights. The key to this shift is Federated AI.
Instead of waiting months for a retrospective study or struggling with the legal hurdles of moving sensitive patient data across borders, we can now use federated AI to query data across multiple global sites simultaneously. In this model, the data never leaves its original, secure location (e.g., a hospital’s server). Instead, the algorithm travels to the data, performs the analysis locally, and only sends the aggregated, anonymous results back to the researcher.
This “federated” approach solves the biggest problem in real world evidence generation: privacy and data sovereignty. By bringing the analysis to the data, we can respect strict data residency laws (like GDPR in Europe, HIPAA in the US, or the PDPA in Singapore) while still generating insights from millions of patients globally.
Revolutionizing Research with Generative AI and OMOP
The marriage of AI for Real-World Evidence and common data models like OMOP is changing the field. We are now seeing Biopharmas Digital Leap: How AI and Real-World Data are Shaping Evidence Generation through several key innovations:
- Automated Data Curation and NLP: Using Natural Language Processing (NLP) and Large Language Models (LLMs) to extract information from unstructured physician notes, pathology reports, and discharge summaries. This turns “dark data” into structured RWD that can be analyzed at scale.
- Generative AI for Synthetic Data: Generative AI can create “synthetic” patient datasets that mimic the statistical properties of real patients without containing any identifiable information. This allows researchers to develop and test their models in a safe environment before running them on real patient data.
- OMOP Mapping at Scale: Speeding up the mapping of disparate datasets to a common language. As explored in our guide on Generative AI and OMOP: Revolutionizing Real-World Evidence, AI can now automate the complex task of mapping local hospital codes to international standards, reducing the time for data preparation from months to days.
Trusted Research Environments (TREs)
The future of real world evidence generation also relies on Trusted Research Environments. These are highly secure, cloud-based platforms where researchers can access and analyze sensitive data under strict governance. TREs ensure that data is used only for approved purposes, providing a “safe haven” for collaboration between pharmaceutical companies, academic researchers, and healthcare providers.
Real World Evidence Generation: Your Top Questions Answered
How does RWE differ from traditional clinical trial data?
Traditional clinical trial data comes from a highly controlled, artificial environment with strict patient selection. RWE comes from routine clinical practice, reflecting a much broader and more diverse “real-world” population.
What are the most reliable sources of real-world data?
The “best” source depends on your question. EHRs are great for clinical depth; claims data is excellent for longitudinal tracking and cost; registries are often the gold standard for specific diseases or medical devices.
Can RWE be used to support new drug indications?
Yes. Under the FDA’s 2018 Framework and the 21st Century Cures Act, RWE is increasingly used to support label expansions, especially when a traditional RCT is not feasible or ethical.
Conclusion: Secure Your Competitive Edge with Real-Time Evidence
The “FDA Era” of real world evidence generation is no longer a future possibility—it is our current reality. The ability to transform messy, fragmented data into clear, regulatory-grade evidence is now a core competency for any biopharma company or healthcare agency.
At Lifebit, we provide the infrastructure to make this possible. Our federated AI platform, featuring the Trusted Research Environment (TRE) and Trusted Data Lakehouse (TDL), allows you to Optimize Real-World Evidence in Pharma by connecting to a global network of biomedical data securely and compliantly across the 5 continents we serve.
Whether you are looking to Link Claims EHR Data for Real-World Evidence or perform complex survival analyses on multi-omic datasets, our platform delivers the real-time insights you need to bring life-saving treatments to patients faster.