NIH Data Commons Pilot Phase Explores Using the Cloud to Access and Share *FAIR Biomedical Big Data

*Findable, Accessible, Interoperable and Reusable

What is a Data Commons?

A data commons is a shared virtual space where scientists can work with the digital objects of biomedical research such as data and analytical tools. The NIH Data Commons Pilot will test ways to store, access, and share biomedical data and associated tools in the cloud so that they are FAIR. The goal of the NIH Data Commons is to accelerate new biomedical discoveries by providing a cloud-based platform where investigators can store, share, access, and compute on digital objects (data, software, etc.) generated from biomedical research and perform novel scientific research including hypothesis generation, discovery, and validation.

How will the NIH Data Commons be implemented? 

The NIH Data Commons will be implemented initially as a Pilot Phase in which three high-value datasets will serve as test cases for the principles, policies, processes, and architectures that need to be developed. NIH expects the Pilot Phase will occur over 4 years. The test case datasets include the Genotype-Tissue Expression (GTEx) and the Trans-Omics for Precision Medicine (TOPMed) datasets, as well as several Model Organism Databases (MODs) that are working as a consortium to create an integrated resource known as the Alliance of Genome Resources. Test case dataset selection derives from the high value of these data to many users in the biomedical research community as well as from the diversity of the data they contain. However, it is envisioned the Data Commons will expand to include other data resources once this pilot phase has achieved its primary objectives.

A multidisciplinary NIH Data Commons Pilot Phase Consortium (DCPPC) including data scientists, computer scientists, information technology engineers, cloud service providers, and biomedical researchers will be established in December 2017. This group will be charged with setting community-endorsed processes and metrics for FAIR data management and will develop a consortium roadmap for building the Data Commons. External consultants from industry and academia, volunteering their time and expertise, will be engaged by the NIH to help ensure the Data Commons is maximally useful to users with varying degrees of expertise. Safeguarding the security of data through state-of-the-art user authentication and authorization protocols will be a key focus; interoperability with existing data structures such as the NCI Genome Data Commons, AHA Precision Medicine platform, and the European Data Commons, will also be emphasized.

Assessment will be a key component of the NIH Data Commons Pilot. In addition to the external consultants described above, the NIH will engage an organization to assess business models associated with deposition, access and use of data and tools into the Data Commons by investigators. This will enable the NIH to negotiate best prices for storage and computing services with cloud service providers at a trans-NIH level. The Data Commons Pilot Phase also will inform future models for data management based on how frequently particular data sets are used, who uses them, and whether/how end-users may ultimately contribute to the costs of sustaining the resource.

As the work of the DCPPC, the external consultants, and the assessment organization gets underway, gaps will be identified. They may include technical gaps that require a new tool or method to be established rapidly, policy gaps that require Ethical, Legal, Social Implications research, or other issues that must be addressed if the Data Commons is to be successful. The DCPPC will be structured to provide rapid resolution of gaps by inclusion of additional contributors to the consortium. It will also be structured to enable the NIH to discontinue elements of the consortium that may not prove useful.

What are the status and plans for the NIH Data Commons?

The NIH released a Research Opportunity Announcement RM-17-026 to support the DCPPC. This announcement solicited applications under the Other Transactions (OT) mechanism, which is different from traditional grants, cooperative agreements, or contracts. NIH expects to make several OT awards under this funding opportunity. Other Transactions gives NIH the flexibility it needs to keep up with the fast-changing field of data science and to encourage non-traditional research partners, like industry, to participate in this project. In addition to the recipients of OT awards, the DCPPC will include investigators from the repositories housing the test case datasets, independent contractors engaged to assess the project, and NIH staff. In the coming years, investigators managing additional data sets will be included, and the broader data science community will be engaged to help test the utility of the Data Commons as it is developed. Input will be sought from entities with knowledge of applicable research ethics and information privacy laws and regulations. The DCPPC will work collectively toward achieving NIH’s comprehensive vision for an interoperable, FAIR-compliant, multi-cloud NIH Data Commons founded on open source tools and open standards.

Learn More about the NIH Data Commons Test Case Datasets
 

The Data Commons Pilot Phase will include the following activities

  • Establishing community-endorsed unifying principles and guidelines to govern how the Data Commons operates and what it means for digital objects in the Commons to be FAIR
  • Developing and testing cloud-based platforms to store, manage and interact with biomedical data and tools
  • Setting up the ability to access controlled-access data through appropriate authorization and authentication protocols
  • Harnessing and further developing community-based tools and services that support interoperability between existing biomedical data and tool repositories and portability between cloud service providers
  • Creating portals where users with all levels of expertise can access and interact with data and tools
  • Learning by doing, which involves developing agile, iterative pilots of the Data Commons architecture/platform, testing its utility, troubleshooting, and retesting
  • Analyzing and evaluating the products and processes of the Data Commons Pilot Phase for cost, utility, efficiency, usability, and adherence to FAIR data principles

The NIH Data Commons Pilot Phase is expected to span fiscal years 2017-2020, with an estimated total budget of approximately $95.5 Million, pending available funds.

Learn more about FAIR data principles.

For questions about the NIH Data Commons Pilot Phase, please contact commonspilot@od.nih.gov.

This page last reviewed on November 17, 2017