As biomedical tools and technologies rapidly improve, researchers are producing and analyzing a rapidly increasing amount of complex biological data called “big data.” The Big Data to Knowledge (BD2K) program, was launched in 2014 to facilitate broad use of biomedical big data, develop and disseminate analysis methods and software, enhance training relevant for large-scale data analysis, and establish centers of excellence for biomedical big data. The BD2K Program also supported initial efforts toward making data sets “FAIR” Findable, Accessible, Interoperable, and Reusable.
BD2K is entering a second program phase, and will continue to pursue approaches to making biomedical datasets “FAIR.” It will test the feasibility of, and develop best practices for, making NIH funded datasets and computational tools available in a shared space that multiple scientists can access remotely.
Big Data to Knowledge Phase II
BD2K is a trans-NIH initiative established to enable biomedical research as a digital research enterprise. In its first phase (FY2014-FY2017), it invested $200 million in grant awards to address some major data science challenges and to stimulate data-driven discovery. These awards will continue through award end dates, and lessons from this initial investment will help inform the second phase of the program (FY2018-FY2021).
In its second phase, the program will continue to pursue approaches to making biomedical big data Findable, Accessible, Interoperable, and Reusable or “FAIR.” This will include a pilot program to test the feasibility, cost, and best practices for making multiple NIH-funded datasets and associated computational tools FAIR in a shared space that multiple scientists can access remotely, such as the cloud.
This approach of managing data, referred to as a Data “Commons,” will be piloted through BD2K, with the expectation that it will inform future data management strategies across the NIH.
Data Science at NIH and the Next Phase of the Big Data to Knowledge Program
The amount and diversity of data generated by NIH-funded research programs continues to grow rapidly; safe, scalable storage solutions, new analytic approaches, and an adaptable workforce are urgently needed. As this unprecedented revolution in biologic information unfolds and NIH looks to the future of data science, we must ensure that researchers have the ability to make meaningful use of this increasingly massive biomedical data resource. It will be critical for NIH to identify and implement new strategies to improve data discoverability, utility, and sustainability, including moving many large data sets into the cloud and making them adherent to FAIR Principles – Findable, Accessible, Interoperable, and Reusable. Success in meeting this challenge will require a major infusion of resources.
After Phil Bourne’s decision to move to an academic position, Patricia Flatley Brennan, Director of the National Library of Medicine (NLM), agreed to take on the additional role of interim Associate Director for Data Science. Over the next few months, Dr. Brennan will work with all of NIH’s 27 Institutes and Centers to develop strategies to improve data discoverability, utility, and sustainability for the biomedical research community.
As part of this effort, the second phase of the Big Data to Knowledge program will include expanded investments to accelerate progress in the development of these new strategies. Grants from the first phase of the program will continue through award end dates. Lessons learned from the first phase of the program will help inform the second phase of the program. Previously issued BD2K funding opportunities with application due dates in August 2017 and beyond will be re-scoped to include funding opportunities that will be issued to support the implementation of the new strategies.
At the same time, the NLM is devising its strategic plan to become the intellectual and programmatic hub for data science at the NIH, per the recommendations from the Advisory Committee to the (NIH) Director.
For more information, visit the DataScience@NIH blog.
Version 2.0 of the DataMed prototype biomedical Data Discovery Index (DDI) is now available. Developed through the BD2K biomedical and healthCAre Data Discovery Indexing Ecosystem project (bioCADDIE), the prototype allows users to find and access biomedical datasets from multiple sources based on key attributes. The bioCADDIE development team welcomes your feedback on the DataMed prototype!