Skip to main content

The HMP ‘Healthy Human’ Reference Dataset

The World’s Most Comprehensive Baseline Dataset of Microbiome and Human Host Sequence Data

The Human Microbiome Project has transitioned from Common Fund support. For more information please visit

Please note that since the Human Microbiome Project is no longer being supported by the Common Fund, the program website is being maintained as an archive and will not be updated on a regular basis.

The NIH’s Human Microbiome Project (HMP) mission was to create the foundational research resources to support growing scientific interest in the role of the microbiome in human health and disease. A key resource which was created was a complete dataset of microbiome and host sequence data from a cohort of 300 adults verified to be free of disease and so considered healthy. This reference dataset includes over 2000 metagenomes and over 10 terabytes (TB) of DNA sequence data, making it the largest set of microbiome data from human or any other habitat. This resource is the world’s most comprehensive reference host/microbiome dataset as it includes the microbial community composition from five major body regions (nasal, oral, skin, gastrointestinal tract, and urogenital tract) of these subjects, and the predicted metabolic pathways of the microbial communities in these body regions. All microbial members (bacterial, archaeal, bacteriophage, viral, and fungal) have been included in this baseline dataset and both phylogenetic marker gene sequence [e.g. 16S rRNA, 18S rRNA, and internal transcribed spacer region (ITS)] and metagenomic whole genome shotgun sequence data were generated. Additional attention has been paid to the gut microbiome in this cohort and the research community has used the HMP reference dataset to analyze the mobile gene content, the antibiotic resistome, the bacteriophage composition and the presence of putative pathogens in this key subset of the human microbiome reference dataset. To complete the overall dataset, the human genome sequence has also been analyzed for these subjects. Many broadly used databases have incorporated the complete HMP reference dataset. Two of these are notable: the HMP Data Analysis and Coordination Center and the Qiita web-based microbial study management platform. The human sequence dataset are under controlled access but can be requested for appropriate research purposes from NIH’s database of genotypes and phenotypes (dbGaP).

The following nine key papers include the various datasets which comprise the complete HMP ‘healthy human’ microbiome reference dataset:


This page last reviewed on April 12, 2024