How the FDA Uses Real-World Data to Keep Us Safe

How Real-World Data Examples Cut Drug Approval Times and Costs
Real-world data examples include electronic health records from hospitals, insurance claims, patient registries, wearable device readings, genomic databases, pharmacy dispensing records, and mobile health apps. These diverse sources capture what happens to patients in everyday clinical practice—not just in controlled research settings. In the modern pharmaceutical landscape, the ability to harness this data is no longer a luxury; it is a regulatory and economic necessity.
10 Essential Real-World Data Examples:
- Electronic Health Records (EHRs) – Detailed longitudinal data including diagnoses, lab results, medication orders, and physician notes.
- Medical Claims and Billing Data – Administrative data from insurance providers, including procedure codes (CPT/ICD-10) and healthcare utilization costs.
- Patient and Disease Registries – Specialized datasets like the Dutch Scalp Cooling Registry or the Cystic Fibrosis Foundation Patient Registry.
- Patient-Generated Data – High-frequency data from wearables (Apple Watch, Fitbit) and mobile health apps used for symptom tracking.
- Radiographic and Medical Imaging – Unstructured data from CT scans, MRIs, and X-rays, often processed via machine learning for pattern recognition.
- In Vitro Diagnostics (IVD) Data – Results from laboratory tests performed on biological samples, critical for companion diagnostic development.
- Genomic Data – Large-scale biobanks and DNA sequencing databases that link genetic variants to clinical phenotypes.
- Pragmatic Clinical Trials (PCTs) – Trials conducted within the routine clinical workflow to maximize external validity.
- Pharmacy Data – Real-time records of medication dispensing, refills, and adherence patterns across diverse populations.
- Socioeconomic and Environmental Data – Social Determinants of Health (SDOH) such as air quality, housing stability, and food security metrics.
Traditional clinical trials, while the gold standard for establishing causality, often suffer from the “efficacy-effectiveness gap.” They exclude patients with multiple health conditions (comorbidities), cost hundreds of millions of dollars, and take years to complete. Furthermore, the recruitment process for traditional trials often fails to represent the diversity of the actual patient population, leading to results that may not generalize to the “real world.” That’s where real-world data steps in—showing how treatments actually perform across diverse populations in everyday settings.
The FDA has reviewed 90 examples of real-world evidence submissions from 2012-2019, spanning medical devices to drug approvals. These included 18 510(k) notifications, 14 De Novo requests, and 57 premarket approval applications. Over 90% of life science organizations now use real-world data in clinical development to optimize trial design, identify patient cohorts, and monitor post-market safety. This shift is driven by the need to reduce the “Valley of Death” in drug development—the period where promising compounds fail due to lack of recruitment or unforeseen safety issues in broader populations.
The 21st Century Cures Act of 2016 was a landmark piece of legislation that directed the FDA to develop frameworks for evaluating real-world evidence (RWE). This opened the door for data from Medicare claims (180,000 cataract surgeries), mobile health apps, and patient registries to support regulatory decisions. Real-world evidence doesn’t replace traditional trials—it complements them by revealing long-term outcomes, rare side effects, and effectiveness in populations often excluded from research, such as the elderly, pregnant women, and those with complex chronic conditions.
I’m Maria Chatzou Dunford, CEO and Co-founder of Lifebit, and with over 15 years in genomics and biomedical data, I’ve seen how analyzing real-world data examples from federated environments can unlock breakthrough insights for precision medicine. My work focuses on helping organizations securely leverage diverse data sources to accelerate drug discovery and improve patient outcomes. By moving the analysis to the data, rather than the data to the analysis, we can overcome the traditional barriers of data silos and privacy concerns.

To understand why the FDA is so excited about real-world data examples, we first have to look at the “old way” of doing things. For decades, the gold standard for medical research has been the Randomized Controlled Trial (RCT). In an RCT, everything is perfectly controlled. Patients are carefully selected based on strict inclusion and exclusion criteria, they take their medicine at the exact same time under supervision, and they are often excluded if they have other complicating diseases. This creates a “clean” signal but an “artificial” environment.
But here is the problem: the “real world” is messy. People forget to take their pills. They have high blood pressure and diabetes and an allergy to cats. They might be taking five other medications that interact with the study drug. This is why we need to distinguish between efficacy (how a drug works in a perfect lab setting) and effectiveness (how it works in your neighborhood clinic). RWD provides the bridge between these two concepts, allowing researchers to observe the “natural history” of a disease and the true impact of an intervention.
RCTs vs. Real-World Data: A Deep Dive Comparison
| Feature | Randomized Controlled Trials (RCTs) | Real-World Data (RWD) |
|---|---|---|
| Setting | Controlled, artificial research sites | Routine clinical practice, hospitals, homes |
| Population | Highly selected, homogeneous, narrow | Diverse, heterogeneous, inclusive (real patients) |
| Cost | Extremely high (recruitment, monitoring) | Lower (leveraging existing digital footprints) |
| Duration | Short-term (months to few years) | Long-term (decades of longitudinal follow-up) |
| Focus | Efficacy (Can it work under ideal conditions?) | Effectiveness (Does it work in practice?) |
| Data Collection | Proactive, specific to study protocol | Reactive, captured during routine care |
| Bias Control | Randomization minimizes confounding | Requires advanced statistical modeling (e.g., Propensity Scores) |
As the FDA explanation of RWD and RWE points out, RWD is the raw material. When we analyze that raw material to get clinical insights, it becomes Real-World Evidence (RWE). You can think of RWD as the ingredients and RWE as the finished cake. Understanding the real-world data vs real-world evidence distinction is crucial for anyone in the life sciences today. Without rigorous analysis, RWD is just a collection of noise; with the right methodology, it becomes a powerful tool for regulatory approval.
By using RWD, the FDA can see how a drug affects pregnant women, the elderly, or different ethnic groups who might have been left out of the original trials. This leads to massive benefits of real-world data in clinical research, including faster approvals, better safety monitoring, and the ability to expand drug labels to new indications without always requiring a new, multi-year RCT.
10 Real-World Data Examples to Solve Your Clinical Evidence Gaps
We aren’t just talking about one type of data. The “real world” is vast and multi-dimensional. To truly understand a patient’s journey, researchers must synthesize data from multiple touchpoints. Here are the 10 most critical real-world data examples that are currently reshaping how we treat disease.
1. Electronic Health Records (EHRs)
EHRs are the digital backbone of modern medicine. They contain structured data (like ICD-10 codes and lab values) and unstructured data (like physician notes and discharge summaries). However, because they were built for clinical care and billing, not research, they can be fragmented. A patient might visit three different hospital systems, leaving a trail of partial records. That is why the FDA guidance on using EHRs in clinical investigations is so important—it provides a roadmap for ensuring data integrity, traceability, and quality when turning these messy notes into high-quality evidence. Natural Language Processing (NLP) is now frequently used to extract valuable insights from the “free text” in EHRs, such as specific symptoms or social factors not captured in drop-down menus.
2. Medical Claims and Billing Data
Every time you visit a doctor, an insurance claim is filed. These claims provide a massive “bird’s-eye view” of healthcare utilization across millions of lives. While they lack the clinical depth of an EHR (they don’t tell you why a doctor chose a specific code), they are excellent for tracking long-term patient journeys and costs. For instance, Medicare claims allowed researchers to study 180,000 cataract surgeries to find patterns in post-operative complications that a small trial would never see. Claims data is also vital for Health Economics and Outcomes Research (HEOR), helping payers determine the value of a new therapy compared to the standard of care.
3. Patient and Disease Registries
Registries are organized systems that use observational study methods to collect uniform data (clinical and other) to evaluate specified outcomes for a population defined by a particular disease, condition, or exposure. They are like a specialized, high-quality library. A great example is the Dutch Scalp Cooling Registry, which tracked 1,411 cancer patients to prove that scalp cooling systems actually help prevent hair loss during chemotherapy. Registries are particularly powerful for rare diseases where the total patient population is small, allowing for the creation of “natural history” studies that serve as a baseline for new treatments. You can learn more about how registries help us understand patient outcomes and their vital role in long-term safety monitoring.
4. Wearables and Mobile Apps
Your Apple Watch, Fitbit, or continuous glucose monitor (CGM) is a walking data generator. This “patient-generated health data” (PGHD) provides a continuous, high-resolution look at a patient’s health outside the clinic. Traditional trials rely on “snapshot” data (a blood pressure reading once a month at the clinic), but wearables provide the full movie. The FDA even approved a mobile contraception app based on data from 15,000 women who used it in their daily lives. This data is especially useful for neurological conditions like Parkinson’s, where tremors can be monitored 24/7 to see if a drug is truly working throughout the day.
5. Radiographic and Medical Imaging
Medical imaging is one of the fastest-growing sources of RWD. AI and deep learning algorithms are now being used to scan millions of historical X-rays, MRIs, and CT scans to find early signs of disease. These real-world data examples help us catch diseases months or even years earlier than traditional methods. In oncology, imaging RWD is used to measure tumor shrinkage (RECIST criteria) in real-world settings, providing evidence of a drug’s effectiveness in patients who might not meet the strict criteria of a Phase III trial.
6. In Vitro Diagnostics (IVD)
IVDs are tests done on samples such as blood, urine, or tissue taken from the human body. They can detect diseases or other conditions and can be used to monitor a person’s overall health to help cure, treat, or prevent disease. The FDA has already used IVD data in eight different regulatory decisions to approve new diagnostic tools. This data is essential for the development of “companion diagnostics,” which are tests that identify which patients are most likely to benefit from a specific biological product.
7. Genomic and Multi-omic Data
At Lifebit, we are particularly passionate about this. By linking your DNA (genomics), RNA (transcriptomics), and proteins (proteomics) with your health records, we can practice true “precision medicine.” This means moving away from a “one-size-fits-all” approach and giving you a drug that is specifically designed for your genetic makeup. Genomic RWD from large-scale initiatives like the UK Biobank or the All of Us Research Program is helping scientists identify new drug targets and understand why some patients develop resistance to certain therapies.
8. Pragmatic Clinical Trials (PCTs)
These are trials that happen inside the healthcare system rather than at specialized research sites. They use RWD to identify participants and often use the hospital’s own EHR system to collect outcomes. One study across 40 U.S. dialysis centers used RWD to compare different treatment strategies in real-time, without disrupting the patients’ normal care. PCTs are the ultimate hybrid, combining the randomization of an RCT with the real-world setting of RWD, providing high-quality evidence that is immediately applicable to clinical practice.
9. Pharmacy Dispensing Records
Do patients actually take their meds? In a clinical trial, adherence is nearly 100% because patients are monitored. In the real world, adherence can drop below 50%. Pharmacy data tells the truth about “primary non-adherence” (not picking up the prescription) and “secondary non-adherence” (not refilling it). It helps us understand “real-world adherence” and why some treatments fail outside the clinic. This data is crucial for pharmaceutical companies to understand the “real-world persistence” of their drugs compared to competitors.
10. Socioeconomic and Environmental Data
Where you live, the air you breathe, and your access to healthy food (Social Determinants of Health) are often better predictors of health outcomes than your DNA. Researchers used RWD during COVID-19 to see how local lockdowns and air quality affected mental health and respiratory outcomes. This is particularly useful for real-world data for clinical evidence generation in oncology, where environmental exposures and socioeconomic barriers to care significantly impact survival rates. Integrating SDOH data into RWE allows for a more holistic understanding of health equity.

Leveraging real-world data examples for Medical Device Approvals
Medical devices—from heart valves to contact lenses—rely heavily on RWE because it is often unethical or impractical to conduct traditional blinded trials for surgical implants. Between 2012 and 2019, the FDA reviewed 90 submissions that used RWE to support their case. This included everything from new heart valves to robotically assisted surgical tools.
A major player here is the NEST Coordinating Center (National Evaluation System for health Technology), which helps manufacturers pool data from different sources like registries and EHRs. For example, a pediatric contact lens study used insurance claims to prove that the rate of infection was incredibly low (less than 0.4%), which helped the device stay on the market safely. This is a perfect illustration of us regulatory guidance on using real-world data in action, where RWD provides the “post-market surveillance” necessary to ensure long-term safety.
How real-world data examples Support Rare Disease Research
If you have a disease that only affects 100 people in the world, you can’t run a 10,000-person clinical trial. For rare diseases, RWD isn’t just “nice to have”—it’s a lifeline. In many cases, it is the only way to generate enough evidence for approval.
Researchers use “external control arms” (also known as synthetic control arms). Instead of giving half the patients a placebo (which can be unethical in terminal rare diseases), they compare the patients receiving the new drug to historical real-world data examples of how the disease usually progresses. This was used for the drug Blinatumomab, which received accelerated approval for a type of leukemia because RWD showed it was significantly better than existing treatments. This type of target identification with real-world evidence is revolutionizing real-world data in clinical research for the most vulnerable patients, allowing for “single-arm” trials that are faster and more compassionate.
How to Use the FDA’s RWE Framework to Get Approved Faster
The FDA isn’t just “winging it” when it comes to RWD. They have developed a comprehensive Framework for its Real-World Evidence Program to ensure that the evidence generated is as robust as that from a traditional trial. This framework was born out of the 21st Century Cures Act, a law designed to bring medical innovations to patients faster by modernizing the regulatory process.
To be used by the FDA for a regulatory decision (such as adding a new indication to a drug label), the data must be “fit-for-purpose.” This is a high bar that involves two main pillars:
- Relevance: Does the data actually answer the clinical question? This includes having a sufficient number of patients, a representative population, and the specific outcomes (endpoints) needed to measure success. For example, if you are studying a drug for lung cancer, the RWD must include specific staging information and smoking history, which might be missing from basic billing claims.
- Reliability: Was the data collected accurately? Is it complete? This looks at the “data provenance”—where did the data come from, and how was it handled? The FDA evaluates the quality of the data accrual process and the integrity of the analysis. This is why having a clear “statistical analysis plan” (SAP) before looking at the data is essential to prevent “data dredging” or cherry-picking results.
We take regulatory compliance rwe seriously. Through initiatives like the Sentinel Initiative, the FDA monitors the safety of drugs already on the market by scanning the records of over 100 million patients. This system acts as an early-warning radar. If a safety signal (like an unexpected increase in heart attacks) pops up in the RWD, the FDA can act immediately to protect the public, often years before a formal study could be completed. This proactive approach to safety is one of the most significant shifts in 21st-century medicine.
Fix Privacy and Bias: How Federated AI Secures Real-World Data
If RWD is so great, why don’t we use it for everything? Because it’s hard! There are three massive hurdles that have historically prevented the widespread use of real-world data, and solving them requires cutting-edge technology.
- Data Quality and Standardization: As we mentioned, EHRs are often incomplete or recorded in different formats (e.g., one hospital uses Epic, another uses Cerner). If a doctor forgets to log a patient’s weight or uses a non-standard term for a symptom, that “missing data” can ruin a study. We use Common Data Models (CDMs) like OMOP to “translate” different data sources into a single, unified language.
- Analytical Bias (Confounding by Indication): In the real world, doctors don’t flip a coin to decide which drug to give. They give the strongest drugs to the sickest patients. If we just look at the raw data, it might look like the drug “caused” the patients to get worse, when really they were just sicker to begin with. To fix this, we use advanced statistical techniques like “target trial emulation” and “propensity score matching” to create a fair comparison between groups.
- Privacy and Security: Your health data is your most private information. We must follow HIPAA (in the USA) and GDPR (in Europe) to the letter. Traditionally, this meant “de-identifying” data and moving it to a central server, but this process is slow, risky, and often strips the data of its clinical value.
This is where Lifebit comes in. We use federated learning and Trusted Research Environments (TRE). Instead of moving sensitive data to a central server (which is risky and often legally impossible across borders), our AI goes to the data. The data stays safely behind the hospital’s or the national biobank’s firewall. The researcher only sees the aggregated “insights,” never the individual patient records. This “compute-to-data” model is the gold standard for how to use disease real-world population data to train AI models while keeping everyone’s identity 100% private. It allows for global collaboration on diseases like COVID-19 or rare cancers without ever compromising patient confidentiality.
Real-World Data FAQ: Stop Guessing and Start Using RWE
What is the main difference between RWD and RWE?
RWD (Real-World Data) is the raw information collected during routine care—like a lab result, a pharmacy receipt, or a heart rate log from a watch. RWE (Real-World Evidence) is the clinical insight you get after you analyze that data using rigorous scientific methods. RWD is the data; RWE is the proof. You cannot have RWE without RWD, but RWD without analysis is just noise.
Can Real-World Evidence replace Randomized Controlled Trials (RCTs)?
Usually, no. RCTs are still the best way to prove a drug can work in a controlled setting (efficacy). RWE is a “complement.” It tells us how the drug works in the “wild” (effectiveness) and helps find rare side effects that only show up after a million people take the medicine. However, in rare diseases, RWE is increasingly being used as a “synthetic control arm” to replace the placebo group.
Why is real-world data so important for rare diseases?
Because there aren’t enough patients to fill a traditional trial. If a disease only affects 200 people nationwide, you can’t put 100 of them in a placebo group. RWD allows us to pool data from across the globe to understand how a rare disease behaves and whether a new treatment is making a statistically significant difference compared to the natural history of the disease.
How does the FDA ensure RWD is high quality?
The FDA uses the “Fit-for-Purpose” standard, which evaluates the Relevance (does it contain the right data points?) and Reliability (was the data collected accurately and consistently?). They also require a pre-specified statistical analysis plan to ensure researchers don’t just look for the results they want to find.
What is Federated Learning in the context of RWD?
Federated learning is a machine learning technique that trains an algorithm across multiple decentralized edge servers (like different hospitals) holding local data samples, without exchanging them. This allows researchers to gain insights from massive datasets across different countries while keeping the data behind local firewalls to comply with privacy laws like GDPR.
Start Leveraging Real-World Data for Precision Medicine Today
The future of medicine isn’t just in the lab—it’s in the digital footprints we generate every single day. From the watch on your wrist to the insurance claim from your last check-up, real-world data examples are providing the FDA and life science organizations with the evidence they need to make healthcare safer, faster, and more personalized.
As we move toward a world of precision medicine, the ability to analyze these massive, complex datasets securely will be the key to the next great medical breakthrough. We are moving away from a world of “average” medicine for the “average” patient and toward a world where every treatment is informed by the real-world experiences of millions of similar patients. At Lifebit, we are proud to provide the federated AI platform and Trusted Research Environments that make this research possible across five continents, ensuring that data stays private while insights move the world forward.
Find how to securely leverage Real-World Data for your research