Program Snapshot

The overarching goal of the NIH Data Commons is to accelerate new biomedical discoveries by developing and testing a cloud-based platform where investigators can store, share, access, and interact with digital objects (data, software, etc.) generated from biomedical and behavioral research. By connecting the digital objects and making them accessible, the Data Commons is intended to allow novel scientific research that was not possible before, including hypothesis generation, discovery, and validation.

Researchers funded as part of the pilot phase of the NIH Data Commons Pilot Phase Consortium (DCPPC), are working out the best ways to build and implement the cloud-based platform described above. They are iteratively building and testing a series of key capabilities – fundamental computational units – needed for the Commons to operate and meet standards for being FAIR – findable, accessible, interoperable, and reusable. Engaging the biomedical research community to develop the Data Commons helps ensure the needs of the research community are met. Three different and high-value test case data sets help in setting policies, processes, and architecture for the Data Commons Pilot Phase with the aim of being able to use all three data sets simultaneously in analyses. The tools and best practices developed by the DCPPC will help researchers discover and interpret connections between human genes and traits and those of model organisms like fruit flies or mice.

Data Commons Pilot Phase and the NIH Strategic Plan for Data Science

The Data Commons Pilot Phase is part of the New Models of Data Stewardship program, which is a trans-NIH endeavor. The program supports the NIH Strategic Plan for Data Science goal to develop new, cutting-edge methods for storing, sharing, and analyzing NIH derived datasets in the cloud environment.

Data Commons Pilot Phase Vision

The lessons learned from the Data Commons Pilot Phase will inform the development of best practices, guidelines and standards, and cohesive approaches to Data Commons architecture and principles. The adoption of such standards and guidelines by all NIH Commons-like efforts will result in an interoperable NIH Data Commons consistent with the NIH Strategic Plan for Data Science.

Data Commons Pilot Phase Implementation

A multidisciplinary NIH Data Commons Pilot Phase Consortium (DCPPC) including data scientists, computer scientists, information technology engineers, cloud service providers, biomedical researchers, and the stewards of the test case data sets, are charged with setting community-endorsed processes and metrics for FAIR data management. They are developing a plan for building the Data Commons through key capabilities – fundamental computational components – to support access, use, and sharing of the test data sets.

The NIH Data Commons is implemented as a pilot phase over Fiscal Years 2017-2020.

This page last reviewed on July 24, 2018