Real world data clinical trials: Essential 2025
Bridging the Gap Between Clinical Trials and Real-World Patient Care
Real world data clinical trials use information from routine healthcare to understand how treatments perform in everyday settings. Here’s a quick breakdown:
- Real-World Data (RWD): Health data gathered from sources like electronic health records (EHRs), insurance claims, disease registries, and mobile health devices.
- Real-World Evidence (RWE): The clinical insights derived from analyzing RWD, revealing a medical product’s usage, benefits, or risks in a real-world scenario.
- Key Difference: Unlike the selected patient groups in traditional randomized controlled trials (RCTs), RWD captures a broader, more diverse population, showing how treatments work for people with various health conditions and medications.
- Benefits in Clinical Trials: RWD/RWE improves trial efficiency by refining design, speeding up patient recruitment, informing subgroup analysis, and providing long-term safety and effectiveness data post-approval.
For decades, randomized controlled trials (RCTs) have been the gold standard for testing new medicines, offering clear answers under ideal conditions. However, these controlled settings often don’t reflect the diverse patient populations and treatment patterns of routine clinical practice.
Real-world data (RWD) and real-world evidence (RWE) bridge this gap. By analyzing health information generated daily, we gain insights into how treatments perform in patients’ actual lives. This approach is changing drug development and regulatory decisions, with bodies like the FDA increasingly using RWD for post-market safety monitoring and to support new drug indications.
This guide explores the impact of RWD and RWE, showing how they complement traditional research and accelerate medical innovation.
As Dr. Maria Chatzou Dunford, my expertise lies in leveraging advanced computing and AI to transform healthcare, particularly in real world data clinical trials. My work at Lifebit focuses on empowering data-driven drug findy and precision medicine through secure, federated data analysis.
Terms related to real world data clinical trials:
Understanding Real-World Data (RWD) and Real-World Evidence (RWE)
In real world data clinical trials, instead of creating artificial scenarios, we observe what’s already happening in hospitals, clinics, and patients’ homes.
Real-World Data (RWD) is health information collected during routine medical care. It’s the raw material—millions of data points from doctor visits, prescriptions, and fitness trackers that capture how healthcare works.
Real-World Evidence (RWE) is the insight gained from analyzing RWD. It’s the complete picture that shows how treatments perform when patients use them in their daily lives.
This data-to-evidence pipeline is a major shift in medical research, allowing us to understand how treatments work for real people with complex health profiles.
RWD vs. Traditional Clinical Trial Data: A Core Comparison
Traditional randomized controlled trials (RCTs) are conducted under carefully controlled conditions with specific patient types to prove a treatment works in an ideal setting. In contrast, real world data clinical trials accept the complexity of everyday healthcare, including diverse patient populations with multiple health conditions and medications.
Attribute | Real-World Data (RWD) | Randomized Controlled Trials (RCTs) |
---|---|---|
Patient Population | Diverse, inclusive, representative of general patient populations (varying ages, comorbidities, additional medications) | Highly selective, narrow, with strict inclusion/exclusion criteria |
Setting | Routine clinical practice, uncontrolled environments (“real world”) | Highly controlled, experimental settings |
Data Collection | Observational; routinely collected from various sources (EHRs, claims, registries, wearables) | Experimental; prospectively collected under strict protocols |
Primary Strengths | High generalizability, understanding of long-term outcomes/risks, insights into specific patient subgroups, cost-effective | High internal validity, strong causal inference (proof of efficacy), minimizes bias |
Data Timeliness | Can be retrospective (historical data) or prospective (ongoing collection), offering immediate access to large datasets. | Always prospective; data collection begins with the trial, which can take years. |
Cost | Generally lower cost as data is a byproduct of care delivery. | Extremely high cost due to protocol administration, patient monitoring, and site management. |
Endpoint Flexibility | Can explore a wide range of outcomes, including those not originally planned (e.g., long-term side effects, quality of life). | Endpoints are pre-specified and rigid; deviation from the protocol is not permitted. |
These approaches are complementary. RCTs prove a treatment can work, while RWE shows how it does work in the real world.
Where Does Real-World Data Come From?
RWD is generated continuously from numerous sources as part of normal medical care. Each source provides a unique piece of the patient puzzle.
- Electronic Health Records (EHRs): These digital patient charts are a treasure trove of clinical information, containing diagnoses (coded with systems like ICD-10), lab results (e.g., HbA1c levels for diabetes), medication orders, and rich, unstructured provider notes. They offer deep clinical granularity but can be fragmented across different healthcare systems.
- Insurance claims and billing data: This administrative data provides a longitudinal view of a patient’s journey through the healthcare system. It captures diagnoses, procedures, prescriptions filled, and healthcare resource utilization (e.g., hospital stays, specialist visits). While lacking deep clinical detail, its scale is immense, making it ideal for studying treatment patterns, adherence, and economic outcomes across millions of lives.
- Disease and product registries: These are specialized, curated databases that track patients with a specific condition (e.g., the Surveillance, Epidemiology, and End Results (SEER) Program for cancer) or those using a particular medical device or drug. Registries are invaluable for studying the natural history of rare diseases and for long-term safety and effectiveness monitoring.
- Wearables and mobile devices: A rapidly growing source of RWD, devices like smartwatches, continuous glucose monitors, and health apps generate high-frequency data streams on activity levels, heart rate, sleep patterns, and blood glucose. This offers an unprecedented window into a patient’s health status and behaviors between clinical visits, capturing data that was previously inaccessible.
- Patient-Reported Outcomes (PROs): This is data that comes directly from the patient without interpretation by a clinician. Collected via validated surveys or apps (e.g., the EQ-5D quality of life questionnaire), PROs capture critical information on symptoms, functioning, and overall well-being. The FDA guidance on PROs helps standardize their use to support labeling claims, ensuring the patient’s voice is central to the evaluation of a treatment.
Other key sources include pharmacy data, which details medication dispensing and adherence, and data on social determinants of health (e.g., socioeconomic status, environment), which provides context for health outcomes. The growth of diverse RWD, combined with powerful analytics, creates unprecedented opportunities to make clinical research more efficient and relevant.
The Expanding Role of Real-World Data in Clinical Trials
Real world data clinical trials represent a fundamental shift towards more efficient, patient-centric, and representative research. This approach offers increased efficiency, shorter timelines, and better access to breakthrough treatments, prompting drug makers, clinicians, and payers to use RWD to answer questions traditional trials cannot.
Optimizing Trial Design and Recruitment
RWD helps design smarter clinical trials from the start. By analyzing large-scale data on disease burden, standard-of-care treatment patterns, and disease progression, researchers can conduct robust feasibility analyses. This allows them to model the impact of different inclusion/exclusion criteria on potential recruitment numbers, preventing costly protocol amendments down the line. RWD also helps select clinically meaningful endpoints that reflect patient needs and real-world outcomes.
This data-driven approach enables more inclusive eligibility criteria and facilitates faster participant recruitment. Instead of relying on slow, site-by-site identification, researchers can query large healthcare databases to quickly find and contact potentially eligible candidates, dramatically accelerating trial timelines.
The Rise of Synthetic Control Arms
In situations where a placebo or standard-of-care control group is unethical or impractical—such as in rare disease research or oncology trials for targeted mutations—RWD can be used to create synthetic control arms (SCAs), also known as external control arms. These are virtual comparison groups, carefully constructed from historical patient data (e.g., from EHRs or previous clinical trials) to mirror the baseline characteristics of the patients in the trial’s treatment arm.
Advanced statistical methods, such as propensity score matching, are used to select patients for the SCA who are as similar as possible to the treated patients in terms of age, disease severity, comorbidities, and other key factors. This minimizes bias and allows researchers to estimate the new treatment’s effectiveness without enrolling a concurrent control group. Regulatory bodies like the FDA are increasingly open to well-designed studies using SCAs, particularly for diseases with high unmet medical need.
Generating Post-Launch Evidence and Improving Patient Outcomes
After a medicine is approved, RWD becomes vital for understanding its long-term performance and value in routine clinical practice.
- Long-term safety monitoring: The FDA’s Sentinel Initiative is a prime example of using RWD for ongoing pharmacovigilance. By actively monitoring insurance claims and EHR data, regulators can identify rare or long-term side effects that may not have appeared in shorter, pre-market trials.
- Comparative effectiveness research: When multiple treatments exist for a condition, RWD can show how they compare head-to-head in real-world settings. For example, two hypertension drugs may have similar efficacy in an RCT, but RWE might reveal that one has better patient adherence due to a simpler dosing regimen, leading to superior long-term blood pressure control in the community. This evidence is critical for clinical guidelines and physician decision-making.
- Informing treatment for subgroups and label expansion: RWD reveals how medicines work in specific patient groups often excluded from RCTs, such as the elderly, pregnant women, or patients with multiple comorbidities. This provides crucial insights for personalizing medicine and ensuring equitable care. Furthermore, positive RWE can support label extensions, expanding a medicine’s approved use to new populations or indications without the need for a completely new set of RCTs. For instance, data from pediatric registries could support the use of a drug previously approved only for adults.
The Impact of RWE on Payers and Healthcare Economics
RWE significantly influences the financial aspects of healthcare. Payers, including both government bodies and private insurers, use RWD to assess a treatment’s real-world impact on key economic outcomes, such as hospitalization rates, emergency room visits, and overall healthcare costs. This evidence is the bedrock of value-based care models, where payment is tied to performance.
RWE studies inform cost-effectiveness analysis and play a key role in pricing and reimbursement negotiations. By demonstrating a product’s real-world value beyond its clinical efficacy—for example, by showing it reduces the need for more costly interventions—pharmaceutical companies can build a stronger case for market access and favorable reimbursement. As noted in payer perspectives on RWE, this evidence is no longer a ‘nice-to-have’ but an essential component of their decision-making process.
Navigating the Challenges and Governance of RWD
While real world data clinical trials offer immense promise, working with RWD requires careful planning, robust methods, and strong governance to steer its unique challenges.
Overcoming Data Quality and Methodological Problems
RWD is often described as “messy” because it is not collected for research purposes. This leads to several significant challenges:
- Data reliability and gaps: Data can be inconsistent across different sources, duplicated, or missing key information (e.g., a lab result or a patient’s smoking status). Understanding and documenting this “missingness” is a critical first step.
- Interoperability issues: Different healthcare systems often use different coding standards (e.g., ICD-9 vs. ICD-10, or proprietary lab codes). This lack of a common language makes it difficult to combine and analyze data from multiple sources without extensive data cleaning and mapping.
- Unstructured data: A vast amount of valuable clinical information is locked away in free-text formats like doctors’ notes, pathology reports, and discharge summaries. Extracting this information requires advanced techniques like Natural Language Processing (NLP).
- Data Provenance: It is crucial to track the lineage of the data—where it came from, how it was collected, and what changes it has undergone. Without clear provenance, the reliability of the evidence generated is questionable.
- Inherent Bias: Since RWD is observational and not randomized, it is susceptible to numerous biases. These include selection bias (e.g., only patients who seek care are in the database), confounding by indication (e.g., sicker patients may be more likely to receive a newer, more aggressive treatment, making the treatment appear less effective), and measurement errors.
To address these issues, robust analytical methods that combine advanced statistics and machine learning are essential. One such technique is target trial emulation, which involves designing and analyzing an observational study to mimic a hypothetical randomized trial as closely as possible, helping to minimize bias and draw more reliable causal inferences.
The Regulatory Landscape for Real-World Data Clinical Trials
Regulatory bodies worldwide are creating frameworks to guide the use of RWD/RWE in decision-making. In the US, the 21st Century Cures Act mandated the creation of the FDA’s RWE Program. This program has since released multiple guidance documents, including the foundational “Framework for FDA’s Real-World Evidence Program,” which outlines the agency’s approach to evaluating RWE for use in regulatory decisions. The key principle is that the data must be “fit-for-purpose,” meaning its quality and relevance must be sufficient to answer the specific regulatory question at hand.
Globally, other agencies are following suit. The European Medicines Agency (EMA) has established the DARWIN EU® (Data Analysis and Real World Interrogation Network) to provide timely and reliable evidence from across Europe. These initiatives aim to standardize methodologies and ensure that RWE studies submitted for regulatory review are transparent, reproducible, and trustworthy.
Ethical Considerations: Privacy, Security, and Fairness
The power of RWD comes with significant responsibilities for protecting patients and ensuring equitable outcomes.
- Patient privacy: Protecting patient information is the cornerstone of trust. This is primarily achieved through data de-identification, where direct personal identifiers (name, address, etc.) are removed or scrambled to comply with strict regulations like HIPAA in the US and GDPR in Europe. It’s important to distinguish this from true anonymization, which is often impossible to guarantee.
- Patient consent: While de-identified data can often be used for research without specific consent, studies requiring identifiable data rely on clear and transparent consent models. Emerging concepts like dynamic consent allow patients to have more granular, ongoing control over how their data is used for different research purposes.
- Algorithmic fairness: As AI and machine learning are increasingly used to analyze RWD, it is crucial to proactively identify and mitigate potential biases in the data and algorithms. If a dataset underrepresents a certain demographic, a model trained on it may perform poorly for that group, perpetuating health disparities. Auditing algorithms for fairness is a critical step.
- Data security: Strong cybersecurity measures, including end-to-end encryption, strict access controls, and secure computing environments (like Trusted Research Environments), are non-negotiable to protect sensitive RWD from unauthorized access or cyberattacks.
Technology and Best Practices for RWE Strategy
Advanced technology and smart strategies are essential to harness the vast amount of information generated in real world data clinical trials.
How AI and NLP are Opening Up Deeper Insights from RWD
Artificial Intelligence (AI) and Machine Learning (ML) are critical for processing massive, complex RWD sets. These algorithms excel at identifying subtle patterns and correlations that are invisible to human analysis, enabling them to predict patient responses to treatments, forecast disease progression, and identify novel patient subgroups.
Natural Language Processing (NLP) is a specialized branch of AI that reads and understands human language. This technology is a game-changer for RWD, as it can transform unstructured data—such as clinical notes, pathology reports, and radiology summaries—into structured, analyzable information. For example, an NLP model can scan thousands of oncology notes to extract crucial data points like cancer stage, tumor size, and biomarker status, open uping insights that were previously hidden in text.
Predictive analytics, powered by AI, enables a shift from reactive to proactive healthcare by anticipating health trends and identifying at-risk patients before a critical event occurs. To accomplish this while upholding patient privacy, federated learning has emerged as a key technology. This approach allows AI models to be trained across multiple institutions without the sensitive patient data ever leaving its secure, local source. It’s like sending the algorithm to the data, rather than pooling all the data in one place, which dramatically improves security and facilitates collaboration.
Best Practices for Implementing Real-World Data Clinical Trials
A successful RWE strategy requires more than just technology. It demands a disciplined, strategic approach.
- Start early in development: Integrate RWE planning from the earliest stages of drug development. RWD can inform go/no-go decisions, trial design, and market access strategy long before a product reaches the launch phase.
- Define clear research questions: Begin with a specific, well-defined, and compelling question. A vague objective will lead to an unfocused analysis. The question should guide the choice of data sources, methods, and endpoints.
- Develop a Formal RWE Generation Plan: Treat observational studies with the same rigor as clinical trials. Create a formal protocol that pre-specifies the research question, data sources, patient cohorts, variables, analytical methods, and sensitivity analyses. This is essential for transparency and reproducibility.
- Use Common Data Models (CDMs): To enable large-scale analysis across different databases, data must be standardized. A CDM, such as the OMOP (Observational Medical Outcomes Partnership) Common Data Model, transforms data from various sources into a single, uniform structure and vocabulary. This allows researchers to run the same analysis code across a network of databases, generating more robust and generalizable evidence.
- Ensure transparency and reliability: Carefully document all processes, data sources, analytical methods, and code. This transparency is critical for meeting regulatory standards (like those outlined in the STROBE guidelines for observational studies) and building trust with stakeholders.
- Foster cross-functional collaboration: RWE generation is a team sport. Success requires bringing together experts from different fields—including data scientists, clinicians, epidemiologists, statisticians, and regulatory affairs specialists—to open up the full potential of RWD.
Success in this field requires a holistic strategy that combines cutting-edge technology with scientific rigor and collaborative partnerships, ultimately changing not just scientific knowledge but the delivery of healthcare itself.
Frequently Asked Questions about Real-World Data
Here are answers to some common questions about real world data clinical trials.
Can real-world evidence replace randomized clinical trials?
No, not entirely. Randomized Controlled Trials (RCTs) remain the gold standard for proving a drug’s efficacy under ideal, controlled conditions. Real-World Evidence (RWE) complements RCTs by showing how a treatment performs in diverse, real-world populations and settings. Together, they provide a more complete picture of a medical product’s true performance.
What is the main benefit of using RWD in research?
The main benefit of using Real-World Data (RWD) is its ability to show how treatments work in routine clinical practice. This leads to increased generalizability, as it includes a wider variety of patients than typical trials. It also provides insights into long-term outcomes and can improve research efficiency and cost-effectiveness. This helps clinicians make more informed, personalized decisions.
How is patient privacy protected when using RWD?
Patient privacy is paramount and is protected through rigorous measures. The primary method is de-identification, where personal identifiers are removed or scrambled from datasets to comply with regulations like HIPAA. For studies requiring identifiable data, explicit patient consent is obtained.
Furthermore, all RWD is handled on highly secure, compliant platforms with strict access controls and encryption. Privacy-preserving technologies like federated approaches are also key, allowing analysis to be performed on data where it resides without moving it, which significantly improves security.
Conclusion: From Data to Findy with Real-World Evidence
The rise of real world data clinical trials is reshaping how we develop treatments and deliver care. By complementing the foundational insights from randomized controlled trials, RWD and RWE provide a crucial window into the realities of patient experiences, creating a more complete picture that better serves patients.
This integration accelerates innovation by optimizing trial design, speeding up recruitment, and providing vital post-launch insights. It helps us understand treatment effects in diverse subgroups and demonstrate real-world value to payers.
While challenges in data quality, standardization, and governance exist, advancements in AI, NLP, and federated learning are providing powerful solutions. These technologies help open up the full potential of vast data resources while ensuring security and privacy.
At Lifebit, we are at the forefront of this change. Our next-generation federated AI platform provides secure, real-time access to global biomedical and multi-omic data. We empower large-scale, compliant research and pharmacovigilance with advanced capabilities for harmonization, AI/ML analytics, and federated governance.
Our platform components—the Trusted Research Environment (TRE) for security, the Trusted Data Lakehouse (TDL) for data management, and R.E.A.L. (Real-time Evidence & Analytics Layer) for insights—work together to enable real-time analytics, AI-driven safety surveillance, and secure collaboration. This makes the journey from clinic to real-world application clearer and more beneficial for researchers, clinicians, and patients.