The Common Fund Data Ecosystem

Hand and data

Common Fund programs are intended to provide resources that accelerate discovery across many different biomedical research fields. Often these resources include large data sets and associated digital tools needed to mine and analyze the data. To maximize impact, these data sets and tools must be leveraged by researchers from different disciplines, with varying expertise in bioinformatics and large-scale data analysis. Additionally, these data sets must be usable together across interoperable platforms. However, current approaches to data storage, management, and analysis mean that data is often not Findable, Accessible, Interoperable, and Reusable (FAIR).

To address this challenge, the Common Fund is supporting the Common Fund Data Ecosystem (CFDE), an ongoing investment in data management infrastructure that will support past, current, and future Common Fund data sets.

The goals of the CFDE are:

  • Enhance the ability to ask scientific questions across data sets
  • Enable the uptake, reuse, and addition of Common Fund data and tools
  • Support the storage, sharing, and sustainability of Common Fund data sets
  • Provide training that maximizes scientists’ ability to upload data and use Common Fund data and other resources

The CFDE includes several integrated efforts:

Cloud computing

  1. CFDE Coordinating Center – The CFDE coordinating center will manage and organize CFDE activities, engage with participating Common Fund programs, connect with user communities, support training, develop tools and standards, and provide technical expertise. These activities will be conducted in close partnership with relevant Common Fund programs.
  2. Participating Common Fund data coordinating centers (DCCs) – DCCs will work with the CFDE Coordinating Center to understand their program’s unique requirements for data storage and analysis, adopt/adapt guidelines and best practices, share resources and tools with other DCCs, establish and enable use cases for cross-data analyses, and provide training. In September 2019, the NIH Council of Councils approved a proposed concept to provide support starting in fiscal year 2020 for Common Fund DCCs to participate in the CFDE (video of presentationslides, and concept).
  3. Leveraging the Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative - A key component to the CFDE is making sure that data are onboarded to the cloud environment in a consistent manner. Working with the STRIDES Initiative from the NIH Office of Data Science Strategy (ODSS), the CFDE will develop guidelines to ensure data are stored and organized optimally for proper data versioning and upkeep. Working with the STRIDES Initiative also will provide favorable pricing for cloud data storage and use of Common Fund data sets.

The ultimate goal of the CFDE is for Common Fund data to be more usable and useful both within a single program and among data sets from multiple programs. By connecting the data sets and making them more accessible, the CFDE is intended to enable novel scientific research that was not possible before, including hypothesis generation, discovery, and validation.

CFDE resources for NIH staff (requires NIH log in)

 

 

This page last reviewed on November 6, 2019