Observational Health Data Sciences and Informatics (OHDSI)

The Observational Health Data Sciences and Informatics (OHDSI) consortium generates evidence. Its mission is to improve health by empowering a community to collaboratively generate the evidence that promotes better health decisions and better care. OHDSI is a multi-stakeholder, interdisciplinary, international collaborative with a coordinating center at Columbia University. With 200 researchers from 25 countries and health records on about half a billion unique patients, OHDSI is able to answer important questions at scale. OHDSI is an open-science effort dedicated to generating reproducible research. It develops and maintains a rich array of artifacts needed to carry out large-scale reproducible research:

  • a mature common data model known as OMOP CDM (Observational Medical Outcomes Partnership Common Data Model) that covers medicine broadly
  • mappings from over 80 international vocabularies to a standard set that includes SNOMED, RxNorm, and LOINC
  • advanced methods to control bias and draw causal conclusions from observational data
  • open-source tools to convert, curate, visualize, and analyze observational data
  • a voluntary data network that carries out observational research

Its treatment pathways study [PNAS 2016] showed how type II diabetes, hypertension, and depression are currently treated around the world based on 240 million records. It found that metformin is used as a first-line drug a majority of the time in all countries except Japan (which may have a genetic reason for the difference), that 25% of patients on hypertension therapy took a sequence of drugs that no one else in the cohort took, and that claims databases and electronic health records produced similar results for similar cohorts despite differences in how drug exposures are gathered.

Its large-scale depression study [Phil Trans A 2018] demonstrated the bias inherent in the observational research literature including censoring of negative results (missing points in the figure), and it illustrated a new way of doing observational research that is open and large-scale, with modern techniques to address confounding and full diagnostics for each study. Those techniques have been adopted by OHDSI’s Large-Scale Evidence Generation and Evaluation in a Network of Databases (LEGEND) study, beginning with hypertension. Based on hundreds of millions of records from around the world, OHDSI studied 58 drugs and all combinations on 58 clinical outcomes to augment the evidence base for the 2017 US hypertension guideline. For the results, see data.ohdsi.org/LegendBasicViewer/. See also OHDSI.org, and see github.com/OHDSI for all the software related to this study.


George Hripcsak


United States

Areas of Focus

Health Systems and Bioinformatics