Skip to main content
Enhancing the Utility of Common Fund Data Sets: Maximizing Data Set Usage
Big data and AI.

To maximize the impact of Common Fund generated data, engage a broader community of end-users for wider adoption of these data sets, and to obtain feedback to enhance the data resources, the Common Fund CFDE supported small research projects (R03) encouraging the use of Common Fund data sets. Projects are intended to enable novel and compelling biological questions to be formulated and addressed, and/or to generate cross-cutting hypotheses for future research. Projects currently supported include: 

  • Methods to maximize the utility of common fund functional genomic data in multi-ethnic genetic studies. Data sets used: GTEx, 4DN 

This study will maximize the utility of Common Fund functional genomic data for multi-ethnic studies of smoking and drinking addiction. Through integrations of (GTEx, 4DN, ENCODE data sets) with other non-European functional genomic dataset this project aims to improve the gene expression prediction accuracy across different tissue types and multi-ethnic ancestries.  

  • Durable Common Fund Data Interfaces and Tutorials with Bioconductor. Data sets used: 4DN, IDG, GTEx 

Bioconductor/R is a widely used set of tools for high-throughput genomic data. This project will produce Common Fund datasets (4DN, IDG, GTEx) that the Bioconductor/R environment can use for genomic science with workspaces on NHGRI's AnVIL. 

  • Constructing High-Resolution Ensemble Models of 3D Single-Cell Chromatin Conformations of eQTL Loci from Integrated Analysis of 4DN-GTEx Data towards Structural Basis of Differential Gene Expression. Data sets used: 4DN, GTEx 

This project aims to develop novel computational tools for understanding the relationship of gene expression and gene topology based on datasets from the 4D Nucleome (4DN) and Genotype Tissue Expression (GTEx) programs. 

  • Deep Phenotyping of 3D Data for Candidate Gene Selection from Kids First Studies Data sets used: KOMP2, Kids FIRST 

This study aims to study the relationship between the asymmetry and the susceptibility to developmental disorders in a model organism (mouse) using quantitative analysis of KOMP and Kids FIRST data sets. 

  • Using Phosphorylation Signatures of Drug Perturbagens to Identify Exercisemimetic Compounds Data sets used: LINCS, MoTrPAC 

This study will focus on exploring if there are known compounds that can mimic the effects of physical activity. To accomplish this, the PTM signatures database (PTMsigDB) will be significantly expanded using the LINCS data. These signatures will then be correlated with phosphoproteomic changes induced by physical activity provided by MoTrPAC to suggest exercise mimicking drugs. 

  • Using Common Fund Datasets for Xenobiotic Localization Data sets used: LINCS, IDG, Metabolomics 

This project aims to develop a novel platform with computational tools for a better understanding of the subcellular localization of xenobiotic molecules in the body. The researchers will use IDG, LINCS, and Metabolomics data sets to provide predictions on improving subcellular localization of  specific xenobiotic molecules so that they can be more efficacious and less toxic. 

  • Interrogation and Interpretation of Common Fund Data Sets to Identify Novel Ocular Disease Genes Data sets used: KOMP2, GTEx 

This project aims to identify all mouse retinal disease genes in KOMP2, using GTEx and an aligned database EyeGEx to find human gene homologs, and then leverage ocular GWAS studies, pathway analysis, and literature searches to provide additional biologic data into a list of novel candidate genes that may impact blindness. 

  • Unraveling the Topological Architecture and Phenotypic Contexture of Structural Variation Data sets used: 4DN, Epigenomics, GTEx, Kids First 

This project aims to integrate 4DN, Epigenomics, and GTEx to provide an architecture (germline variation, genome topology, and chromatin structure) to explore gene expression and expression in pediatric and adult cancer tumor samples. 

  • Using three-dimensional genome structure to refine eQTL detection. Data sets used: 4DN, GTEx 

This project aims to use 4DN and GTEx datasets to generate a list of cis-regulatory elements (CRE)-gene linkages to improve the identification of eQTLs, reducing the search space, decreasing computational loads, and increasing the statistical power for eQTL detection. 

  • Investigating Systems Physiology with Multi-Omics Data Data sets used: MoTrPAC, GTEx 

This project will leverage GTEx and MoTrPAC data to create cross-tissue gene expression and protein expression correlations. This will be used to generate hypotheses for cross-tissue and cross-organ protein endocrine signals that can then be tested. 

To learn more click here.

This page last reviewed on August 23, 2023