We appreciate your interest in the New Models of Data Stewardship Program. This list of frequently asked questions will be updated periodically as new questions are received. The list was last updated on 10/22/18.
The New Models of Data Stewardship (NMDS) program is designed to enhance biomedical discovery and improve efficiency through new digital data management strategies. These strategies contribute to NIH efforts to develop and sustain a modern biomedical data ecosystem as described in the NIH Strategic Plan for Data Science . They also aim to make data for research findable, accessible, interoperable, and reusable (FAIR) in the cloud. The New Models of Data Stewardship program began in fiscal year 2017 and will run through fiscal year 2020. It consists of two initiatives, the Data Commons Pilot Phase and the Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiative, working together to establish best practices, standards, and infrastructure to address the challenges above.
The NIH Data Commons Pilot Phase is testing ways to store, access, and share biomedical data and associated tools in the cloud so they are findable, accessible, interoperable, and reusable (FAIR). The STRIDES Initiative is establishing partnerships with commercial cloud service providers (CSPs) to reduce economic and technological barriers to accessing and computing on large biomedical data sets to accelerate biomedical advances.
Through the STRIDES Initiative, NIH is establishing a series of innovative partnerships to provide NIH biomedical researchers access to the most advanced and cost-effective cloud-based computational infrastructure, tools, and services. The STRIDES Initiative is a key enabler for the establishment of an interconnected biomedical research data ecosystem that will enable data to be FAIR and help researchers build a digital ecosystem by working more collaboratively in the cloud environment. The partnerships established through STRIDES will provide a cost-efficient framework for NIH researchers, as well as researchers at more than 2,500 academic institutions across the nation receiving NIH support, to make use of storage, computing, and other computational tools and services. In addition, it will allow NIH researchers to take advantage of emergent data-management and technological expertise, computational platforms, and tools available from the private sector. This will support both NIH internal research as well any external institutions receiving NIH support, enabling cutting-edge science in the most cost-effective ways possible. The STRIDES Initiative will also facilitate broad-based training on cloud products and services for the biomedical research community, as well as explore unique opportunities for NIH and Cloud Service Providers (CSPs) to collaborate. The STRIDES Initiative will be piloting with the Data Commons Pilot Phase to develop best practices for moving and storing large datasets in the cloud.
Researchers and staff from the NIH Data Commons Pilot Phase and the STRIDES Initiative will work with cloud service providers (CSPs) to learn how to provide a sustainable cloud infrastructure to support NIH-derived data sets including the Data Commons Pilot Phase test case datasets. The NIH Data Commons will provide computational services built on top of STRIDES cloud services. STRIDES’ first test will be to work with staff and investigators of the NIH Data Commons Pilot Phase to provide cloud storage and services for three test case data sets (Alliance of Genome Resources (AGR), Genotype-Tissue Expression (GTEx), and Trans-Omics for Precisions Medicine (TOPMed)) being used to develop principles, standards, policies, and processes for the Data Commons. Other NIH-funded data sets will move to the cloud via STRIDES agreements, and best practices stemming from the NIH Data Commons Pilot Phase will be applied.
An initial partnership for the STRIDES Initiative was established with Google Cloud in July 2018. The agreement with Google Cloud creates a cost-efficient framework for NIH researchers, as well as researchers at more than 2,500 academic institutions across the nation receiving NIH support, to make use of Google Cloud’s storage, computing, and machine learning technologies. In addition, the partnership will involve collaborations with NIH’s Data Commons Pilot — a group of innovative projects testing new tools and methods for working with and sharing data in the cloud — and enable the establishment of training programs for researchers at NIH-funded institutions on how to use Google Cloud Platform. Additional partnerships for the program are anticipated in the future and will be announced to the public. NIH & Google Cloud are working with Carahsoft, a Google Cloud distributor, to establish billing relationships to support NIH’s and NIH-funded investigators’ use of Google Cloud services. More information on the Google Cloud implementation of STRIDES can be found in Google’s blog post .
An initial partnership for the STRIDES Initiative was established with Google Cloud in July 2018. A second partnership for the STRIDES Initiative was established with Amazon Web Services in September 2018. NIH & Amazon are working with Four Points Technology, an Amazon Cloud distributor, to establish billing relationships to support NIH’s and NIH-funded investigators’ use of Amazon web services. Both partnerships will enable NIH to make high-value data sets more accessible to researchers, help optimize technology-intensive research, and lower economic barriers for research. Going forward, we expect to leverage different capabilities of these commercial cloud providers, and other partners we add, to help enhance NIH’s research mission. These partnerships provide researchers a choice to select the optimal cloud platforms that support their research.
Not yet, but stay tuned! NIH currently is in the process of establishing procedures and doing key foundational work around the STRIDES Initiative, so we can offer researchers its capabilities in the future. We’ll provide more specifics over time.
Initial efforts to pilot various aspects of these services will inform how and when access to cloud storage, computing, and related services will become more broadly available to NIH and NIH-funded researchers. For example, early-stage pilots will test methods related to cloud account setup and administration to establish more streamlined processes that are able to scale with the growth of the program. It will also measure broader researcher adoption of cloud technologies. More details regarding broader availability of the services created by these partnerships are still evolving and will be shared in FY19.
Currently, three test datasets are being used to develop principles, standards, policies, and processes for the Data Commons. Best practices will be developed and distributed on how to use the data commons for data storage and analysis. Coupled with ongoing lessons learned from the STRIDES initiative, the Common Fund NMDS program will allow for the uploading and usage of new datasets in the cloud environment in the future, potentially by the middle of 2019.