As biomedical tools and technologies rapidly improve, researchers are producing and analyzing a rapidly increasing amount of complex biological data called “big data.” The Big Data to Knowledge (BD2K) program, was launched in 2014 to facilitate broad use of biomedical big data, develop and disseminate analysis methods and software, enhance training relevant for large-scale data analysis, and establish centers of excellence for biomedical big data. The BD2K Program also supported initial efforts toward making data sets “FAIR” Findable, Accessible, Interoperable, and Reusable. Learn more about the FAIR principles.
NIH Data Commons Pilot Phase Explores Using the Cloud to Access and Share FAIR Biomedical Big Data
The NIH, under the BD2K program, will be launching a Data Commons Pilot Phase to test ways to store, access and share FAIR biomedical data and associated tools in the cloud.
A data commons is a way to share and provide access to digital objects, like the data generated during biomedical research, or the software and other tools needed to use the data. A commons would help NIH extract more value from the digital products of biomedical research by making them available to more researchers. Using a cloud-based data commons model will empower researchers to find and interact with data directly in the cloud without spending time and resources downloading large datasets to their own servers.
The Data Commons Pilot Phase will include the following activities
- Establishing community-endorsed unifying principles and guidelines to govern how the Data Commons operates and what it means for digital objects in the Commons to be FAIR
- Developing and testing cloud-based platforms to store, manage and interact with biomedical data and tools
- Setting up the ability to access controlled-access data through appropriate authorization and authentication protocols
- Harnessing and further developing community-based tools and services that support interoperability between existing biomedical data and tool repositories and portability between cloud service providers
- Creating portals where users with all levels of expertise can access and interact with data and tools
- Learning by doing, which involves developing agile, iterative pilots of the Data Commons architecture/platform, testing its utility, troubleshooting, and retesting
- Analyzing and evaluating the products and processes of the Data Commons Pilot Phase for cost, utility, efficiency, usability, and adherence to FAIR data principles
The NIH Data Commons Pilot Phase is expected to span fiscal years 2017-2020, with an estimated total budget of approximately $55.5 Million, pending available funds.
Big Data to Knowledge Phase II
BD2K is a trans-NIH initiative established to enable biomedical research as a digital research enterprise. In its first phase (FY2014-FY2017), it invested $200 million in grant awards to address some major data science challenges and to stimulate data-driven discovery. These awards will continue through award end dates, and lessons from this initial investment will help inform the second phase of the program (FY2018-FY2021).
In its second phase, the program will continue to pursue approaches to making biomedical big data Findable, Accessible, Interoperable, and Reusable or “FAIR.” BD2K will fund a Data Commons Pilot Phase to test ways to store, share, and use biomedical data and associated tools in a cloud environment.
BD2K is one of many data science-related programs at NIH. To learn more about NIH Data Science efforts visit the Data Science at NIH website.
This page last reviewed on June 23, 2017