Big Data to Knowledge Program Resources

Centers of Excellence for Big Data Computing

The Big Data to Knowledge (BD2K) Centers of Excellence have developed new approaches, methods, software tools and related resources including publications, data standards, and educational resources to advance Big Data Science in their relevant biomedical area of focus.



BD2K Training and Education

The Big Data to Knowledge (BD2K) Training activities were designed to improve big data skills of biomedical scientists and increase the number of biomedical data scientists. BD2K-funded grants have produced a number of educational resources to strengthen the role of data science in modern biomedical research.




Resource Indexing

The data discovery index (DataMed) prototype was developed through the BD2K biomedical and healthCAre Data Discovery Indexing Ecosystem project (bioCADDIE), and allows users to find and access biomedical datasets from multiple sources based on key attributes.


Software & Analysis

Big Data to Knowledge (BD2K) supported the development of software tools and methods to tackle data management, transformation, and analysis challenges in areas of high need to the biomedical research community.

Click below for descriptions and direct links to the tools and methods developed under each area of high need.






Forums for Integrative Phenomics

BD2K supported the development of community-based data and metadata standards. The Forums for Integrative Phenomics combines data across species to illuminate challenges in genomics, human health and disease. 

  • Phenotype Ontology Reconciliation Effort:  A community effort that attempts to reconcile logical definition across a number of important phenotype ontologies. The outcome of this effort will be an integrated ecosystem of phenotype ontologies that can be leveraged in clinical diagnostics and disease mechanism discovery in humans.

Interactive Digital Media & Crowdsourcing

Big Data to Knowledge (BD2K)  supported the development of interactive media tools for analyzing biomedical data via crowdsourcing. 



  • Fold It: A revolutionary crowdsourcing computer game enabling users to contribute to important scientific research. Users solve puzzles to help researchers find out if humans' pattern-recognition and puzzle-solving abilities make them more efficient than existing computer programs at pattern-folding tasks, informing models of protein structure prediction.
  • EyeWire II: A 3D puzzle game that allows players to solve puzzles of neuron configurations to help researchers map the brain.
  • Quorum: A flexible gaming platform that will crowdsource the analysis of visual data, such as microscopic images or graphical charts, that are provided by researcher  scientists.
  • Cancer Crusade: A game in which players can help improve scientific understanding of combination therapies that fight cancer.
  • GraphSpace: An easy-to-use web-based platform that collaborating research groups can use for storing, interacting with, and sharing networks.
  • StarGEO: The Search Tag Analyze Resource for the Gene Expression Omnibus (STARGEO) project aims to crowdsource annotations of open genomics big data that allows users to discover the functional genes and biological pathways that are defective in disease.

This page last reviewed on June 18, 2020