Lifebit logo
BlogUncategorizedOMOP CDM in Federal Health: Why CMS, NIH, and the VA Standardize on It

OMOP CDM in Federal Health: Why CMS, NIH, and the VA Standardize on It

OMOP CDM in Federal Health: Why CMS, NIH, and the VA Standardize on It

The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is the database schema that US federal health programs use to harmonize electronic health record (EHR), claims, and outcomes data across hundreds of institutions. The current version (OMOP CDM v5.4, maintained by the OHDSI community) is the analytic substrate for All of Us, the FDA Sentinel System, the VA’s Million Veteran Program, the NIH N3C national COVID cohort, and the NCATS PCORnet research network. If you’re working in federal health data infrastructure, OMOP is the default — not an option.

This guide explains what OMOP CDM is, why federal programs converged on it, when to use OMOP vs FHIR, and what a production OMOP deployment actually looks like.


What OMOP CDM is

OMOP CDM is a person-centric relational schema that re-represents heterogeneous clinical data — EHR encounters, prescriptions, lab results, claims, vital signs, demographics — in a single uniform format. The model has about 15 core tables organized around four entities:

  • Person — de-identified patient demographics
  • Observation period — spans during which a patient was observed in the data source
  • Clinical eventscondition_occurrence (diagnoses), drug_exposure (medications), procedure_occurrence (procedures), measurement (labs, vitals), observation (everything else)
  • Cost & administrativecost, visit_occurrence, visit_detail, death, payer_plan_period

Every clinical concept in OMOP — every diagnosis, lab, medication, procedure — is mapped to a standard concept in the OMOP vocabulary, which is itself a curated overlay of LOINC, SNOMED CT, RxNorm, UCUM, ICD-10, CPT, HCPCS, and other reference terminologies. The vocabulary is maintained by OHDSI and distributed through the Athena tool. This is the part that makes cross-institution analytics possible: a “myocardial infarction” coded as I21.0 in one site and 410.91 in another both resolve to OMOP concept 4329847, and queries operate on the standard concept.

Why federal programs converged on OMOP

Federal health data programs hit the same structural problem every multi-site study faces: every contributing institution stores its data differently (Epic vs Cerner, ICD-9 vs ICD-10, custom lab codes, vendor-specific medication catalogs), and writing site-specific analytic code doesn’t scale. OMOP solves this in a way that’s auditable and federation-friendly.

Five federal programs that standardized on OMOP:

ProgramSponsorOMOP roleScale
All of Us Researcher WorkbenchNIHPrimary analytic representation for EHR + survey data633K+ participants with EHR data
FDA Sentinel System / FDA BESTFDASentinel Common Data Model (SCDM) is OMOP-aligned; BEST uses native OMOP350M+ patients across data partners
VA Million Veteran ProgramVAOMOP for cross-cohort analytic queries1M+ veterans enrolled
N3C — National COVID Cohort CollaborativeNIH NCATSOMOP as the harmonized substrate across 75+ institutions22M+ patients
PCORnetPCORIPCORnet CDM is OMOP-compatible80M+ patients

The convergence is not coincidence. OMOP gives federal programs four properties they need simultaneously:

  1. A documented, versioned schema. OMOP CDM v5.4 is a stable specification with reproducible vocabulary releases. That meets federal audit and reproducibility expectations.
  2. A mature open-source ecosystem. OHDSI HADES — Health Analytics Data-to-Evidence Suite — gives federal teams pre-built tools for cohort definition (ATLAS), characterization, population-level effect estimation, and patient-level prediction. They don’t have to build analytic primitives from scratch.
  3. Cross-site portability without sharing raw data. Studies designed against the OMOP schema run on any OMOP-compliant data partner. The FDA Sentinel design — analytic code travels, data stays — depends on this.
  4. Community curation. The OMOP vocabulary is maintained as a public good. When SNOMED CT releases an update, the OHDSI vocabulary working group propagates the changes; downstream programs inherit the maintenance.

OMOP vs FHIR — when to use each

This is the single most common question federal health teams ask. The short answer: they solve different problems.

DimensionOMOP CDM v5.4FHIR R4
Primary useObservational analytics across populationsReal-time interoperability between systems
OptimizationCohort discovery, statistical queries, MLSingle-patient API calls, EHR-to-EHR exchange
Data shapePerson-centric flat relational tablesResource-centric REST API
VocabularyStandard concepts from curated OHDSI vocabularyNative terminologies (LOINC, SNOMED CT, RxNorm) bound per element
Maintained byOHDSI communityHL7 International
Used inAll of Us, FDA Sentinel, VA MVP, N3C, PCORnetEHR exchange, USCDI, 21st Century Cures Act API mandate
Best for federal AP1 / data-platform work✅ The defaultMirror representation for clinical events

In practice, federal data platforms maintain both representations of the same underlying data. The ARPA-H CIRCLE program’s AP1 Clinical Data & Analysis Platform — and our work on the CHORDS proposal led by Regenstrief Institute — uses FHIR R4 as the wire format for clinical-event ingestion and OMOP CDM v5.4 as the analytic representation TA performers query against.

What an OMOP harmonization pipeline actually does

For each contributing institution’s source data, an OMOP pipeline runs through six stages:

  1. Source-format parsing. HL7v2 messages, FHIR R4 resources, X12 claims, lab files in proprietary vendor formats — each parsed into a normalized intermediate representation.
  2. Vocabulary mapping. Each source code (vendor lab code, internal procedure code, free-text drug name) mapped to its OMOP standard concept ID. The OHDSI Athena vocabulary release is the source of truth; per-site concept_map overlays handle the long tail.
  3. Person resolution. All records about the same patient unified into a single person_id — within a site, often deterministic; across sites, typically via privacy-preserving record linkage like Datavant or the N3C Linkage Honest Broker.
  4. Time-domain alignment. Encounters, prescriptions, and observations placed on a coherent timeline. For ICU cohorts, this includes “N hours since ICU admission” panels critical to digital-twin modeling.
  5. Quality validation. Per the Kahn et al. data-quality framework — conformance (does the data fit the schema?), completeness (is expected data present?), plausibility (are values in clinically realistic ranges?).
  6. Materialization. Validated OMOP tables landed in the analytic store, version-tagged, and exposed to researchers through OHDSI ATLAS or direct SQL.

Time to onboard a new data source matters. Industry baseline is 6–18 months per site for manual ETL. Production platforms with AI-assisted mapping — like Lifebit’s Trusted Data Factory, deployed across NIH National Library of Medicine, Genomics England, and the Danish National Genome Center — deliver 1-day source ingestion and 2–10-day OMOP transformation. That’s the difference between hitting Phase I milestones and missing them.

AI-assisted OMOP mapping — current state

Two AI-assisted mapping techniques are in production today for OMOP harmonization:

  • BGLM-style LOINC mapping for laboratory codes (Liu et al., JAMIA Open 2022) — big-data-guided mapping of long-tail vendor lab codes to LOINC with multi-language support. Achieves >99% coverage on previously-untranslatable lab catalogs.
  • LLM-assisted SNOMED CT and RxNorm mapping — retrieval-augmented large language model proposals gated by UMLS semantic-network compatibility checks. Auto-applied for high-confidence mappings; human-review queue for the rest.

Critically, every applied mapping in a well-designed system carries a mapping_method attribute (exact, athena_default, bglm, llm_assisted, human_review) so downstream analyses can stratify quality by mapping path. That auditability is what distinguishes AI-assisted mapping from black-box automation — and what federal IV&V reviewers will expect.

Frequently asked questions

What is the OMOP Common Data Model?
The OMOP (Observational Medical Outcomes Partnership) Common Data Model is a person-centric relational schema for harmonizing EHR, claims, and outcomes data across institutions. The current version is OMOP CDM v5.4, maintained by the OHDSI community. Federal programs including All of Us, FDA Sentinel, the VA Million Veteran Program, N3C, and PCORnet use OMOP as their analytic substrate.

Is OMOP CDM the same as FHIR?
No. OMOP CDM is optimized for observational analytics across populations — cohort discovery, statistical queries, machine learning. FHIR R4 is optimized for real-time interoperability between systems and single-patient API calls. They are complementary, and most federal data platforms maintain both representations of the same underlying clinical events.

Who uses OMOP CDM?
Federal: All of Us Researcher Workbench (NIH), FDA Sentinel and BEST systems, VA Million Veteran Program, N3C national COVID cohort (NIH NCATS), PCORnet (PCORI). Academic and industry: most major academic medical centers participating in OHDSI, pharmaceutical companies running real-world evidence studies, and CRO platforms supporting federal research.

How long does it take to convert source data to OMOP?
Industry baseline is 6–18 months per institution for manual ETL. Production federated TRE platforms with AI-assisted mapping (BGLM for LOINC, LLM-assisted for SNOMED/RxNorm) deliver 1-day source ingestion and 2–10-day OMOP transformation per site. The speed depends almost entirely on the harmonization automation layer.

What’s the difference between OMOP CDM v5.3 and v5.4?
v5.4 (current) added the episode and episode_event tables for grouping clinical events into care episodes, expanded the device_exposure table, and improved the vocabulary representation. Federal programs are migrating from v5.3 to v5.4 — ensure new deployments target v5.4 from the start.

What is OHDSI?
OHDSI is the Observational Health Data Sciences and Informatics consortium — an open international collaboration that maintains the OMOP CDM specification, the OMOP vocabulary, and the HADES analytic tool suite. Federal programs participate in OHDSI working groups that govern the data model’s evolution.

Can I use OMOP without OHDSI tools?
Yes — OMOP is a schema specification, and any platform can build against it. In practice most teams adopt at least the Athena vocabulary tool (for terminology mappings) and ATLAS (for cohort definition) because rebuilding those primitives is uneconomical.

How does federated analytics work over OMOP?
Studies are defined as OHDSI HADES analytic packages or DataSHIELD-pattern federated queries. The package is distributed to each contributing institution’s OMOP database; per-site results (summary statistics, model gradients, cohort counts) are returned and combined centrally. Raw patient-level data never leaves the source institution. This is the architecture used in N3C, PCORnet, and the FDA Sentinel system.

How Lifebit fits into federal OMOP deployments

Lifebit’s federated trusted research environment is the platform layer for federal AP1-shaped data infrastructure: ingestion connectors for HL7v2, FHIR R4, X12 claims, VCF genomics, mass-spec proteomics, and DICOM imaging; AI-assisted harmonization to FHIR R4 + OMOP CDM v5.4 + LOINC + SNOMED CT + UCUM + RxNorm; federated analytics workbench with OHDSI HADES tooling, JupyterLab + RStudio Pro, and LLM-assisted natural-language query; NIST SP 800-53 r5 + NIST SP 800-188 + HIPAA §164.514(b) compliance.

The same platform is in production today at NIH National Library of Medicine (under FedRAMP ATO), Genomics England, CanPath (Canada), the Danish National Genome Center, and Cambridge Biomedical Research Centre. Across deployments: 275M+ patient records, 1,500+ research projects, six government deployments on three continents.

If you’re evaluating OMOP CDM infrastructure for a federal program — ARPA-H, NIH, FDA, VA, CMS — book a 30-minute scoping call and we’ll walk through the architecture that fits your scale and timeline.


Sources:
OMOP Common Data Model — OHDSI
OHDSI Athena vocabulary tool
OHDSI HADES analytic tools
All of Us Researcher Workbench
FDA Sentinel System
N3C — National COVID Cohort Collaborative
PCORnet
VA Million Veteran Program
Kahn et al. data-quality framework — EGEMS 2016
BGLM LOINC mapping — Liu et al., JAMIA Open 2022

Last updated: May 9, 2026


Federate everything. Move nothing. Discover more.


United Kingdom

3rd Floor Suite, 207 Regent Street, London, England, W1B 3HH United Kingdom

USA
228 East 45th Street Suite 9E, New York, NY United States

© 2026 Lifebit Biotech Inc. DBA Lifebit. All rights reserved.

By using this website, you understand the information being presented is provided for informational purposes only and agree to our Cookie Policy and Privacy Policy.