The Common Fund Data Ecosystem
The Common Fund Data Ecosystem (CFDE) aims to enable broad use of Common Fund (CF) data sets to accelerate discovery. CF programs generate a wide range of diverse and valuable data sets designed to be used by the research community. However, these data sets reside in different locations, and it is challenging or even impossible to work with multiple data sets in an accessible and user-friendly way. To help remedy this problem, the CFDE has created an online discovery portal (CFDE portal) that helps make CF data sets FAIR (Findable, Accessible, Interoperable, and Reusable) and enables researchers to search across CF data sets to ask scientific and clinical questions from a single access point. The CFDE is also supporting pilot projects that are:
- Engaging a broad community of end-users and collecting feedback on the utility of CF data resources
- Enabling novel cross-cutting biological questions to be formulated and addressed
The CFDE Coordinating Center oversees CFDE activities and works closely with participating data coordinating centers from other CF programs on an initial subset of data sets, with plans to expand to additional CF data sets in the future. The CFDE portal is also developing and deploying a number of resources and tools, including training materials, to empower the research community to use CF data sets for novel scientific research that was not possible before. This may include hypothesis generation, discovery, or validation that leads to new insights in health and disease.
CFDE: Data Coordinating Center Engagement and Examples of Collaborative Use Cases
Click on the various sections of the interactive graphic below to learn about Common Fund Data Coordinating Centers (DCCs) participating in the CFDE and read about examples of use cases being developed by collaborative DCC teams. This graphic will be updated as additional DCCs join and new use cases are developed. To minimize the pop-up boxes simply click on the “X” in the upper right hand corner, or view a different pop-up box by clicking on another area of the interactive graphic.
CFDE Coordinating Center Website
CFDE resources for NIH staff (requires NIH log in)
The CFDE includes several integrated efforts:
- CFDE Coordinating Center – The CFDE Coordinating Center oversees CFDE activities, engages with participating Common Fund programs, connects with user communities, supports training, develops tools and standards, and provides technical expertise. These activities are conducted in close partnership with relevant Common Fund programs.
- Participating Common Fund data coordinating centers (DCCs) – The DCCs are working with the CFDE Coordinating Center to understand their programs’ unique requirements for data storage and analysis, adopt/adapt guidelines and best practices, share resources and tools with other DCCs, develop use cases for cross-data analyses, and provide training. In January 2020, the Common Fund released an Engagement Opportunity Announcement for eligible DCCs to engage with the CFDE Coordinating Center and other DCCs to establish the CFDE. For more details, please view the Engagement Opportunity Announcement and Process for Rolling Submission of Engagement Opportunity Award Plans.
- Enhancing the Utility of Common Fund Data Sets (pilot projects) – The CFDE is supporting pilot projects where researchers develop new tools or methods to enhance the value of CF data sets, add useful information to existing CF data sets, or combine multiple CF data sets to answer biologically relevant questions that could not be answered with just one data set. These projects will enhance the use of CF data sets by engaging new end-users, obtaining feedback to help improve the CFDE portal (and DCC portals), and enabling novel cross-cutting biological questions to be formulated and addressed. Please see the Funded Research page for more details about funded pilot projects.
- Leveraging the Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative – The CFDE is working with the STRIDES Initiative from the NIH Office of Data Science Strategy (ODSS) to provide favorable pricing for cloud data storage and to develop guidelines that ensure data are stored and organized optimally for proper data versioning and upkeep.
Using STRIDES for in-kind cloud services for new Common Fund applicants: To fully leverage the Common Fund's investment in STRIDES, Common Fund award applicants will be asked to outline the anticipated type, direct cost, and justification for activities related to cloud computing in the Budget Justification section, including, but not limited to, data storage, computing, data movement/egress (see below), professional services, training, and related activities. To foster a cloud-centric model that minimizes data movement out of the cloud, data egress fees (i.e. charges for outgoing traffic from cloud environments) should be minimized. Any requests to support egress fees incurred by large-scale data download functionalities should have strong justification. NIH will use this cost estimate to provide in-kind services via STRIDES if the application is funded and the amount requested for cloud services will not be added to the requested budget total or count toward the direct cost limit for the award. Upon award, NIH staff will coordinate with awardees to work through logistical details associated with STRIDES accounts. For more information, please see Notice of Information: Leveraging STRIDES for Cloud Computing Activities in Common Fund Awards (NOT-RM-20-009).
Currently, there are 13 Common Fund programs that are eligible to engage with the CFDE Coordinating Center to establish the CFDE: 4D Nucleome (4DN), Acute to Chronic Pain Signatures (A2CPS), Extracellular RNA Communication (ExRNA), Gabriella Miller Kids First Pediatric Research (Kids First), Genotype Tissue Expression (GTEx), Human BioMolecular Atlas Program (HuBMAP), Illuminating the Druggable Genome (IDG), Knockout Mouse Phenotyping Program (KOMP2), Library of Integrated Network-based Cellular Signatures (LINCS), Metabolomics, Molecular Transducers of Physical Activity Consortium (MoTrPAC), Stimulating Peripheral Activity to Relieve Conditions (SPARC), and Undiagnosed Diseases Network (UDN). These programs offer different perspectives that will enable a deeper understanding of the issues around using and integrating diverse data types, identifying mutual needs for Common Fund programs, and collaborating across programs to enhance data utility. Applying best practices and lessons learned from these partnerships, the CFDE coordinating center will expand its activities to engage with future Common Fund programs as well.
More information can be found in presentations from the May NIH Council of Councils meeting where the CFDE and the ODSS efforts were discussed, as well as the September Council of Councils meeting where the concept for the upcoming DCC funding opportunity to establish the CFDE was approved.
This page last reviewed on May 2, 2022