Considerations for De-identification of the OMOP Common Data Model #OHDSISocialShowcase Lead: Jose D. Posada, PhD Team: Natasha Flowers, Priya Desai 🔗 https://lnkd.in/eHN-xBKe Background: De-identification, defined by NIST as "any process of removing the association between a set of identifying data and the data subject," is essential for OMOP-CDM datasets because healthcare data is susceptible to privacy regulations worldwide that protect patients from disclosure of sensitive information that may affect their reputation or livelihood. The primary driver for de-identification in the OHDSI community context is the secondary use of data for research purposes—using healthcare data for purposes other than direct patient care—which requires preserving patient privacy while enabling valuable research applications. HIPAA's Safe Harbor method, which requires removal of 18 specific identifiers, has become the most common approach for de-identification in healthcare datasets. The OMOP-CDM contains several types of Personal Identifiable Information that must be addressed during de-identification. Direct identifiers like person_source_value require random identifier generation or deterministic encryption with secure key management, while dates typically need consolidation or patient-level shifting to preserve event timelines. Source_value fields throughout the OMOP-CDM tables present particular challenges as they may contain free-form text with embedded PII, especially in fully populated implementations where provenance data is preserved for concept_id mapping and measurement details. #JoinTheJourney
About us
The Observational Health Data Sciences and Informatics (or OHDSI, pronounced "Odyssey") program is a multi-stakeholder, interdisciplinary collaborative to bring out the value of health data through large-scale analytics. All our solutions are open-source. OHDSI has established an international network of researchers and observational health databases with a central coordinating center housed at Columbia University. Our Mission To improve health by empowering a community to collaboratively generate the evidence that promotes better health decisions and better care. Our Vision A world in which observational research produces a comprehensive understanding of health and disease. Our Objectives Innovation: Observational research is a field that will benefit greatly from disruptive thinking. We actively seek and encourage fresh methodological approaches in our work. Reproducibility: Accurate, reproducible, and well-calibrated evidence is necessary for health improvement. Community: Everyone is welcome to actively participate in OHDSI, whether you are a patient, a health professional, a researcher, or someone who simply believes in our cause. Collaboration: We work collectively to prioritize and address the real-world needs of our community’s participants. Openness: We strive to make all our community’s proceeds open and publicly accessible, including the methods, tools and the evidence that we generate. Beneficence: We seek to protect the rights of individuals and organizations within our community at all times.
- Website
-
http://www.ohdsi.org/
External link for OHDSI
- Industry
- Research Services
- Company size
- 51-200 employees
- Headquarters
- New York, NY
- Type
- Nonprofit
- Specialties
- Collaborative Research, Software Development, Data Network, Observational Research, Health Data, Open Science, and Real World Evidence
Locations
-
Primary
Get directions
622 West 168th Street, PH-20
New York, NY 10032, US
Employees at OHDSI
Updates
-
Powering a Personal Health Record Analytic Environment using the OHDSI CDM and Google Colab #OHDSISocialShowcase Lead: Janos Hajagos 🔗 https://lnkd.in/eCAnF7fa Background: A personal health record (PHR) should empower a user to better understand and manage their health by combining multiple sources of information. The OHDSI CDM (Common Data Model) is an analytic data model for combining multiple sourced information by aligning elements to a standardized vocabulary. CDA XMLs are a baseline information exchange format supported by electronic health record vendors in the United States. CCDAs, which is a CDA dialect, can be manually downloaded from patient portals and a single CDA document can be exported with the Apple’s iOS Health App. This work demonstrates that multiple sourced CDAs from a single person can be converted to the OHDSI schema and analyzed in the interactive Google Colab Python environment. #JoinTheJourney
-
-
Agentic conversation on OMOP CDM: the OMCP-A2A foundation library #OHDSISocialShowcase Lead: Niko Möller-Grell Team: Shihao Shenzhang, Dr. Zhangshu Joshua Jiang, Vishnu V Chandrabalan, Richard Dobson 🔗 https://lnkd.in/eUGQ4hiK Background: The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) has transformed healthcare analytics through the standardisation of electronic health record (EHR) data, enabling interoperability across institutions and large-scale observational studies. Large Language Models (LLMs) are transforming healthcare data access by replacing specialized SQL skills with natural language interfaces. Nevertheless, there remain challenges in linking specialised medical AI models in collaborative frameworks to tackle intricate healthcare inquiries adequately. Presented foundation agents as a paradigm shift from regular language models, featuring brain-inspired modular architectures with advanced cognition, perception, and execution abilities. These systems contain five cognitive modules as a mental state: memory, world modeling, reward processing and emotion-like systems. The framework's emphasis on specialist domain agents working together mirrors clinical team dynamics, providing advantages through parallel processing of complex medical information and the inclusion of validation mechanisms. Google's A2A protocol provides standardized communication between independent AI agents through vendor-neutral tools for information sharing and task management. However, healthcare's unique requirements like stringent regulatory compliance (MHRA, HIPAA, GDPR), patient safety assurance, and domain-specific medical expertise necessitate custom augmentations beyond generic agent frameworks. We introduce the OMCP-A2A-foundation library, a foundation-agent-based A2A protocol extension for medical use. Our system embodies the modular cognitive architecture of through domain-specific agents for medical concept extraction, natural-language-to-SQL translation, and research execution with OMOP CDM data. Agents communicate via A2A protocols and execute tasks via Model Context Protocol (MCP) servers, delivering healthcare-required safety, compliance, and reliability while enabling democratized access to OMOP CDM analytics. #JoinTheJourney
-
-
Evaluating the Quality of Positive Unlabeled Learning Methods if Unlabeled Instances Cannot be Validated #OHDSISocialShowcase Lead: Praveen Kumar Team: Kristan A Schneider, Fariha Moomtaheen, Rajesh Upadhayaya, Scott Alexander Malec, Jeremy Yang, Cristian Bologa, Yiliang Zhu, Mauricio Tohen, Gerardo Villarreal, Douglas J. Perkins, Elliot Fielstein, Sharon Davis, Michael Matheny, Christophe Lambert 🔗 https://lnkd.in/ethbEk6j Background: While clinical diagnoses can confirm the presence of a medical condition, the absence of a diagnosis does not necessarily imply the absence of disease. This extends to electronic healthcare records: the absence of an International Classification of Diseases (ICD) code does not imply the absence of the underlying medical condition. When attempting to automatically classify medical conditions in the growing field of health informatics, this asymmetry in diagnostics/coding – or more generally labeling – creates a fundamental problem for traditional supervised learning, which assumes the availability of both positive and negative examples. The absence of the latter leads to the problem of positive unlabeled (PU) learning. Recently, novel methods to estimate the fraction of positive unlabeled instances were proposed. However, the performance of such methods is difficult to ascertain if it is impractical or even impossible to validate unlabeled instances, as in the case of psychological conditions such as depression, bipolar disorder, and post-traumatic stress disorder (PTSD). #JoinTheJourney
-
-
Creating a Standardized EHR Analytics Data Source for the National Cancer Institute’s Connect for Cancer Prevention Study #OHDSISocialShowcase Lead: Edward Frankenberger Team: Jake M Peters, Nicole Gerlanc 🔗 https://lnkd.in/ezecE3EK Background: The Connect for Cancer Prevention Study (“Connect”) is a new, multi-site prospective cohort study enrolling up to 200,000 cancer-free adults receiving care within integrated U.S. healthcare systems. The cohort will be followed for 25+ years, during which time comprehensive data and biospecimens are continuously collected. These data will be used to enhance our understanding of cancer etiology and inform precision strategies for prevention and early detection. The Connect Coordinating Center (CCC) manages electronic health record (EHR) data in the Observation Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to enable harmonized, high-throughput epidemiologic analyses. Each participating site independently develops and executes their own ETL processes, managing local OMOP databases and submitting data to the CCC in CSV or Parquet formats via Google Cloud Storage. To lower the barrier for sites to contribute, the CCC does not mandate specific OMOP CDM nor vocabulary versions and requires only a core set of tables (Clinical Data and Vocabulary Tables) most often populated in EHR-only OMOP instances. Despite adopting a standardized data model, variations in CDM and vocabulary versions, site-specific code mappings, differing ETL strategies, and implementation errors have complicated the creation of a unified OMOP analytics database for Connect. To address these challenges, we developed an API-driven, filecentric, cloud-based pipeline designed to harmonize incoming OMOP data, ensuring consistent structural and semantic content across all contributing sites. #JoinTheJourney
-
-
Bridging FHIR and OMOP: Data Lineage for Observational Data Conversion #OHDSISocialShowcase Lead: Benjamin Berk MD Team: Melissa Benzie, Bindu Bolisetty, Scott Favre, Jeremy Fortune, Jeremy Goslin, Vik Kheterpal, MD, Kathleen Marinan, Anne Marsan, Ed Ramos, Sunanda Venumuddula, John Wyderko 🔗 https://lnkd.in/eSxzF7sX (includes multiple applicable links to this research) Background: The interoperability between Fast Healthcare Interoperability Resources (FHIR) and the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) has become increasingly important for healthcare data research. While FHIR excels as a clinical data exchange standard, OMOP provides a standardized format optimized for observational research. Converting data between these models presents significant challenges in maintaining data integrity and traceability. The HL7 Vulcan FHIR→OMOP Working Group was established to develop a standard implementation guide for this transformation. Previous work, including the FHIR-to-OMOP Cookbook, CAMP FHIR, and NACHC's fhir-to-omop, has focused primarily on mapping between resources and tables. Here we implement comprehensive data lineage tracking throughout the transformation process. This paper presents our experience implementing a FHIR→OMOP transformation API that captures data lineage, enabling tracing of data from source to destination across four dimensions: ● Data Source Provenance: Tracking source system provenance of each element. ● Entity Lineage: Maintaining traceability from FHIR resources to resulting OMOP table rows. ● Concept Standardization: Recording source codes to OMOP standard concepts. ● Processing Events: Capturing comprehensive logs of transformation decisions, including warnings, informational messages, errors, and documentation of FHIR resources that do not result in OMOP rows with detailed explanations. #JoinTheJourney
-
-
Please join us for the March 24 community call (11 am ET) as we host our first OHDSI/OMOP Research Spotlight session of 2026. The OHDSI community has published more than 950 peer-review studies highlighting the OMOP CDM or OHDSI tools. Lead authors from three recent studies will share insights from their work. Berta Cuyàs Roca (University of Barcelona) Trends in incidence, prevalence, and survival of primary liver cancer in the United Kingdom (2000–2021) (https://lnkd.in/etQ7ZqRg) • European Journal of Public Health Cindy Cai (Johns Hopkins University) Semaglutide and diabetic retinopathy: an OHDSI network study (https://lnkd.in/gV8T4_4M) • BMJ Open Diabetes Research & Care Matthew Spotnitz (National Institutes of Health) Assessing data quality of rheumatoid and psoriatic arthritis patients in the All of Us Research Program (https://lnkd.in/etAJuQqd) • JAMIA Open You can find the meeting link on our community calls page: https://lnkd.in/eZgFWrEz. #JoinTheJourney
-
-
"𝐖𝐞'𝐫𝐞 𝐚𝐥𝐥 𝐚𝐟𝐭𝐞𝐫 𝐭𝐡𝐞 𝐬𝐚𝐦𝐞 𝐭𝐡𝐢𝐧𝐠, 𝐰𝐞 𝐡𝐚𝐯𝐞 𝐭𝐡𝐞 𝐬𝐚𝐦𝐞 𝐦𝐢𝐬𝐬𝐢𝐨𝐧 𝐚𝐧𝐝 𝐢𝐭'𝐬 𝐭𝐫𝐮𝐥𝐲 𝐟𝐚𝐧𝐭𝐚𝐬𝐭𝐢𝐜." From the UK to the global stage, Sarah Seager has seen the power of #OHDSI firsthand for nearly a decade. In our newest "My Journey" feature, Sarah discusses the "amazing" opportunity of working with data across multiple countries and systems. She also shares her excitement for how OHDSI is evolving alongside technology and AI to open up research to everyone who wants to get involved. "𝘛𝘩𝘦 𝘰𝘱𝘱𝘰𝘳𝘵𝘶𝘯𝘪𝘵𝘺 𝘰𝘧 𝘩𝘢𝘷𝘪𝘯𝘨 𝘢𝘷𝘢𝘪𝘭𝘢𝘣𝘭𝘦 𝘥𝘢𝘵𝘢 𝘢𝘤𝘳𝘰𝘴𝘴 𝘮𝘶𝘭𝘵𝘪𝘱𝘭𝘦 𝘴𝘺𝘴𝘵𝘦𝘮𝘴, 𝘮𝘶𝘭𝘵𝘪𝘱𝘭𝘦 𝘤𝘰𝘶𝘯𝘵𝘳𝘪𝘦𝘴, 𝘸𝘢𝘴 𝘫𝘶𝘴𝘵 𝘢𝘮𝘢𝘻𝘪𝘯𝘨... 𝘐𝘵 𝘳𝘦𝘢𝘭𝘭𝘺 𝘪𝘴 𝘵𝘩𝘦 𝘤𝘰𝘮𝘮𝘶𝘯𝘪𝘵𝘺, 𝘪𝘵'𝘴 𝘵𝘩𝘦 𝘤𝘶𝘭𝘵𝘶𝘳𝘦, 𝘵𝘩𝘦 𝘦𝘯𝘷𝘪𝘳𝘰𝘯𝘮𝘦𝘯𝘵, 𝘪𝘵'𝘴 𝘦𝘷𝘦𝘳𝘺𝘰𝘯𝘦." Check out the video below and see why the journey is only getting started. #OHDSI #Innovation #AI #HealthTech #GlobalResearch #RWE
-
Building the OHDSI Evidence Network – A Global, Open, Federated Collaboration #OHDSISocialShowcase Lead: Clair Blacketer, MPH Team: Haeun Lee, Benjamin Martin, Evanette Burrows, Patricia Mabry, Deran McKeen, Sam Patnoe, MPH, Ben Gerber, Pantelis Natsiavas, Aamirah Vadsariya, Hanieh R., Paul Nagy 🔗 https://lnkd.in/edqM6USR Background: Federated healthcare data networks have emerged as a powerful solution for enabling realworld evidence (RWE) generation across diverse populations while respecting data privacy and institutional autonomy. Prominent initiatives such as PCORnet and the FDA's Sentinel System have demonstrated the value of shared analytical methods and centralized governance for post-marketing surveillance, pragmatic trials, and comparative eRectiveness research. However, these federated networks present new challenges that can limit their scale. These challenges can include the use of proprietary software, requiring formal contractual participation, being overly dependent on single limited funding opportunities. As the demand for timely, reproducible, and generalizable evidence continues to grow globally, there is an urgent need for flexible, open-source solutions that promote inclusivity and scale. European initiatives such as the European Health Data and Evidence Network (EHDEN) and the Data Analysis and Real World Interrogation Network (DARWIN EU) have seen success in this space and OHDSI is building on that framework through the establishment of an open-source, global, federated network. The OHDSI Evidence Network was launched in 2024 to meet this need. It oRers a fully open-source, international, and federated model for collaborative research using the OMOP Common Data Model. Unlike traditional models, participation in the OHDSI Evidence Network does not require formal data use agreements or centralized data transfers. Instead, partners retain full control over their data and contribute only summarylevel information when they choose to opt into a given study or data characterization eRort. This lightweight framework enables broader participation while still supporting high-quality, large-scale evidence generation. #JoinTheJourney
-
-
dqdbt: Continuous Data Quality Testing for OMOP ETL with dbt #OHDSISocialShowcase Lead: Katy Sadowski Team: Lawrence Adams, Thomas Wylie More Info: https://lnkd.in/eZaC2BHh Background: High-quality real-world data is the foundation of reliable observational research. It is therefore critical to identify and address data quality issues before publishing a dataset for research use. Data quality issues can stem from multiple sources, including inherent characteristics of the data source (e.g., data entry errors in an electronic health record system) and errors in data processing. The conversion of a dataset into the OMOP Common Data Model (CDM) may introduce such errors if data quality is not properly assessed throughout the extract, transform, and load (ETL) process. The OHDSI community makes available several tools for monitoring ETL quality, including the DataQualityDashboard (DQD). DQD is an R package offering a comprehensive suite of quality checks based on the Kahn framework, many of which can be used to expose errors in the ETL process. OMOP ETL developers and data users are encouraged to run DQD on their OMOP CDM to ensure the CDM is high-quality and compliant with the OMOP specification. However, due to its design as a standalone R package, DQD is generally only used after a full run of the ETL is complete. This approach risks allowing quality issues to propagate through the ETL pipeline, delaying detection and complicating remediation. Inspired by established best practices for shifting testing “left” in software and data pipelines, with dqdbt we aim to integrate DQD’s standard quality checks natively within the OMOP ETL pipeline. To demonstrate this concept, we leverage Data Build Tool (dbt), an open-source data transformation tool with built-in testing features that enable the detection of anomalies at runtime throughout the ETL development process. This project represents a continuation of “dbt for OMOP Phase I”, in which we introduced dbt-synthea. dbt-synthea is an open-source OMOP ETL project which showcases how dbt can be used to develop an OMOP ETL and promotes a set of ETL development principles we believe will make OMOP ETL easier and more robust. #JoinTheJourney https://lnkd.in/emXk69xx
OHDSI 2025 Demo: dqdbt: Continuous Data Quality Testing for OMOP ETL with dbt
https://www.youtube.com/