OMOP CDM in Federal Health: Why CMS, NIH, and the VA Standardize on It
OMOP CDM in Federal Health: Why CMS, NIH, and the VA Standardize on It
The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is the database schema that US federal health programs use to harmonize electronic health record (EHR), claims, and outcomes data across hundreds of institutions. The current version (OMOP CDM v5.4, maintained by the OHDSI community) is the analytic substrate for All of Us, the FDA Sentinel System, the VA’s Million Veteran Program, the NIH N3C national COVID cohort, and the NCATS PCORnet research network. If you’re working in federal health data infrastructure, OMOP is the default — not an option.
This guide explains what OMOP CDM is, why federal programs converged on it, when to use OMOP vs FHIR, and what a production OMOP deployment actually looks like.
What OMOP CDM is
OMOP CDM is a person-centric relational schema that re-represents heterogeneous clinical data — EHR encounters, prescriptions, lab results, claims, vital signs, demographics — in a single uniform format. The model has about 15 core tables organized around four entities:
- Person — de-identified patient demographics
- Observation period — spans during which a patient was observed in the data source
- Clinical events —
condition_occurrence(diagnoses),drug_exposure(medications),procedure_occurrence(procedures),measurement(labs, vitals),observation(everything else) - Cost & administrative —
cost,visit_occurrence,visit_detail,death,payer_plan_period
Every clinical concept in OMOP — every diagnosis, lab, medication, procedure — is mapped to a standard concept in the OMOP vocabulary, which is itself a curated overlay of LOINC, SNOMED CT, RxNorm, UCUM, ICD-10, CPT, HCPCS, and other reference terminologies. The vocabulary is maintained by OHDSI and distributed through the Athena tool. This is the part that makes cross-institution analytics possible: a “myocardial infarction” coded as I21.0 in one site and 410.91 in another both resolve to OMOP concept 4329847, and queries operate on the standard concept.
Why federal programs converged on OMOP
Federal health data programs hit the same structural problem every multi-site study faces: every contributing institution stores its data differently (Epic vs Cerner, ICD-9 vs ICD-10, custom lab codes, vendor-specific medication catalogs), and writing site-specific analytic code doesn’t scale. OMOP solves this in a way that’s auditable and federation-friendly.
Five federal programs that standardized on OMOP:
| Program | Sponsor | OMOP role | Scale |
|---|---|---|---|
| All of Us Researcher Workbench | NIH | Primary analytic representation for EHR + survey data | 633K+ participants with EHR data |
| FDA Sentinel System / FDA BEST | FDA | Sentinel Common Data Model (SCDM) is OMOP-aligned; BEST uses native OMOP | 350M+ patients across data partners |
| VA Million Veteran Program | VA | OMOP for cross-cohort analytic queries | 1M+ veterans enrolled |
| N3C — National COVID Cohort Collaborative | NIH NCATS | OMOP as the harmonized substrate across 75+ institutions | 22M+ patients |
| PCORnet | PCORI | PCORnet CDM is OMOP-compatible | 80M+ patients |
The convergence is not coincidence. OMOP gives federal programs four properties they need simultaneously:
- A documented, versioned schema. OMOP CDM v5.4 is a stable specification with reproducible vocabulary releases. That meets federal audit and reproducibility expectations.
- A mature open-source ecosystem. OHDSI HADES — Health Analytics Data-to-Evidence Suite — gives federal teams pre-built tools for cohort definition (
ATLAS), characterization, population-level effect estimation, and patient-level prediction. They don’t have to build analytic primitives from scratch. - Cross-site portability without sharing raw data. Studies designed against the OMOP schema run on any OMOP-compliant data partner. The FDA Sentinel design — analytic code travels, data stays — depends on this.
- Community curation. The OMOP vocabulary is maintained as a public good. When SNOMED CT releases an update, the OHDSI vocabulary working group propagates the changes; downstream programs inherit the maintenance.
OMOP vs FHIR — when to use each
This is the single most common question federal health teams ask. The short answer: they solve different problems.
| Dimension | OMOP CDM v5.4 | FHIR R4 |
|---|---|---|
| Primary use | Observational analytics across populations | Real-time interoperability between systems |
| Optimization | Cohort discovery, statistical queries, ML | Single-patient API calls, EHR-to-EHR exchange |
| Data shape | Person-centric flat relational tables | Resource-centric REST API |
| Vocabulary | Standard concepts from curated OHDSI vocabulary | Native terminologies (LOINC, SNOMED CT, RxNorm) bound per element |
| Maintained by | OHDSI community | HL7 International |
| Used in | All of Us, FDA Sentinel, VA MVP, N3C, PCORnet | EHR exchange, USCDI, 21st Century Cures Act API mandate |
| Best for federal AP1 / data-platform work | ✅ The default | Mirror representation for clinical events |
In practice, federal data platforms maintain both representations of the same underlying data. The ARPA-H CIRCLE program’s AP1 Clinical Data & Analysis Platform — and our work on the CHORDS proposal led by Regenstrief Institute — uses FHIR R4 as the wire format for clinical-event ingestion and OMOP CDM v5.4 as the analytic representation TA performers query against.
What an OMOP harmonization pipeline actually does
For each contributing institution’s source data, an OMOP pipeline runs through six stages:
- Source-format parsing. HL7v2 messages, FHIR R4 resources, X12 claims, lab files in proprietary vendor formats — each parsed into a normalized intermediate representation.
- Vocabulary mapping. Each source code (vendor lab code, internal procedure code, free-text drug name) mapped to its OMOP standard concept ID. The OHDSI Athena vocabulary release is the source of truth; per-site
concept_mapoverlays handle the long tail. - Person resolution. All records about the same patient unified into a single
person_id— within a site, often deterministic; across sites, typically via privacy-preserving record linkage like Datavant or the N3C Linkage Honest Broker. - Time-domain alignment. Encounters, prescriptions, and observations placed on a coherent timeline. For ICU cohorts, this includes “N hours since ICU admission” panels critical to digital-twin modeling.
- Quality validation. Per the Kahn et al. data-quality framework — conformance (does the data fit the schema?), completeness (is expected data present?), plausibility (are values in clinically realistic ranges?).
- Materialization. Validated OMOP tables landed in the analytic store, version-tagged, and exposed to researchers through OHDSI ATLAS or direct SQL.
Time to onboard a new data source matters. Industry baseline is 6–18 months per site for manual ETL. Production platforms with AI-assisted mapping — like Lifebit’s Trusted Data Factory, deployed across NIH National Library of Medicine, Genomics England, and the Danish National Genome Center — deliver 1-day source ingestion and 2–10-day OMOP transformation. That’s the difference between hitting Phase I milestones and missing them.
AI-assisted OMOP mapping — current state
Two AI-assisted mapping techniques are in production today for OMOP harmonization:
- BGLM-style LOINC mapping for laboratory codes (Liu et al., JAMIA Open 2022) — big-data-guided mapping of long-tail vendor lab codes to LOINC with multi-language support. Achieves >99% coverage on previously-untranslatable lab catalogs.
- LLM-assisted SNOMED CT and RxNorm mapping — retrieval-augmented large language model proposals gated by UMLS semantic-network compatibility checks. Auto-applied for high-confidence mappings; human-review queue for the rest.
Critically, every applied mapping in a well-designed system carries a mapping_method attribute (exact, athena_default, bglm, llm_assisted, human_review) so downstream analyses can stratify quality by mapping path. That auditability is what distinguishes AI-assisted mapping from black-box automation — and what federal IV&V reviewers will expect.
Frequently asked questions
What is the OMOP Common Data Model?
The OMOP (Observational Medical Outcomes Partnership) Common Data Model is a person-centric relational schema for harmonizing EHR, claims, and outcomes data across institutions. The current version is OMOP CDM v5.4, maintained by the OHDSI community. Federal programs including All of Us, FDA Sentinel, the VA Million Veteran Program, N3C, and PCORnet use OMOP as their analytic substrate.
Is OMOP CDM the same as FHIR?
No. OMOP CDM is optimized for observational analytics across populations — cohort discovery, statistical queries, machine learning. FHIR R4 is optimized for real-time interoperability between systems and single-patient API calls. They are complementary, and most federal data platforms maintain both representations of the same underlying clinical events.
Who uses OMOP CDM?
Federal: All of Us Researcher Workbench (NIH), FDA Sentinel and BEST systems, VA Million Veteran Program, N3C national COVID cohort (NIH NCATS), PCORnet (PCORI). Academic and industry: most major academic medical centers participating in OHDSI, pharmaceutical companies running real-world evidence studies, and CRO platforms supporting federal research.
How long does it take to convert source data to OMOP?
Industry baseline is 6–18 months per institution for manual ETL. Production federated TRE platforms with AI-assisted mapping (BGLM for LOINC, LLM-assisted for SNOMED/RxNorm) deliver 1-day source ingestion and 2–10-day OMOP transformation per site. The speed depends almost entirely on the harmonization automation layer.
What’s the difference between OMOP CDM v5.3 and v5.4?
v5.4 (current) added the episode and episode_event tables for grouping clinical events into care episodes, expanded the device_exposure table, and improved the vocabulary representation. Federal programs are migrating from v5.3 to v5.4 — ensure new deployments target v5.4 from the start.
What is OHDSI?
OHDSI is the Observational Health Data Sciences and Informatics consortium — an open international collaboration that maintains the OMOP CDM specification, the OMOP vocabulary, and the HADES analytic tool suite. Federal programs participate in OHDSI working groups that govern the data model’s evolution.
Can I use OMOP without OHDSI tools?
Yes — OMOP is a schema specification, and any platform can build against it. In practice most teams adopt at least the Athena vocabulary tool (for terminology mappings) and ATLAS (for cohort definition) because rebuilding those primitives is uneconomical.
How does federated analytics work over OMOP?
Studies are defined as OHDSI HADES analytic packages or DataSHIELD-pattern federated queries. The package is distributed to each contributing institution’s OMOP database; per-site results (summary statistics, model gradients, cohort counts) are returned and combined centrally. Raw patient-level data never leaves the source institution. This is the architecture used in N3C, PCORnet, and the FDA Sentinel system.
How Lifebit fits into federal OMOP deployments
Lifebit’s federated trusted research environment is the platform layer for federal AP1-shaped data infrastructure: ingestion connectors for HL7v2, FHIR R4, X12 claims, VCF genomics, mass-spec proteomics, and DICOM imaging; AI-assisted harmonization to FHIR R4 + OMOP CDM v5.4 + LOINC + SNOMED CT + UCUM + RxNorm; federated analytics workbench with OHDSI HADES tooling, JupyterLab + RStudio Pro, and LLM-assisted natural-language query; NIST SP 800-53 r5 + NIST SP 800-188 + HIPAA §164.514(b) compliance.
The same platform is in production today at NIH National Library of Medicine (under FedRAMP ATO), Genomics England, CanPath (Canada), the Danish National Genome Center, and Cambridge Biomedical Research Centre. Across deployments: 275M+ patient records, 1,500+ research projects, six government deployments on three continents.
If you’re evaluating OMOP CDM infrastructure for a federal program — ARPA-H, NIH, FDA, VA, CMS — book a 30-minute scoping call and we’ll walk through the architecture that fits your scale and timeline.
Sources:
– OMOP Common Data Model — OHDSI
– OHDSI Athena vocabulary tool
– OHDSI HADES analytic tools
– All of Us Researcher Workbench
– FDA Sentinel System
– N3C — National COVID Cohort Collaborative
– PCORnet
– VA Million Veteran Program
– Kahn et al. data-quality framework — EGEMS 2016
– BGLM LOINC mapping — Liu et al., JAMIA Open 2022
Last updated: May 9, 2026
