Welcome and Introduction

David Relman opened the workshop by describing the Human Microbiome Project (HMP) as a hypotheses-generating endeavor that will begin to establish tools allowing researchers to further explore the role of the human microbiome in human health. He described the production of a reference data set as one step in meeting this goal, setting out the following questions that must be answered to generate this data set:

  • The definition of a normal or healthy human being;
  • The variability of the donor;
  • The relevant spatial scale for sampling;
  • The relevant time scale for sampling;
  • The importance of rare community members,
  • Relevant techniques for statistical analyses.

Fiscal Year 2007 Jumpstart Funding Plan

The sequencing centers provided an overview of their current efforts directed towards sequencing reference genomes from human-associated microbes. The current goal for the HMP is to produce 1000 sequenced human-associated bacterial genomes and to deposit that data public databases as a reference against which whole genome shotgun metagenomic sequence data can be compared. The centers are currently receiving NHGRI support to sequence 300 bacterial genomes plus 16S rDNA for a limited sample set. Under HMP funding, the centers plan to sequence another 200 bacterial genomes and to generate 16s rDNA sequence survey data from a set of body regions defined by this workshop.

Ethical Issues of Risk Associated with HMP Sampling

Participants were strongly urged to consider the ethical, legal, and social implications of this research, particularly issues of respect and disrespect across social boundaries that could result from this work. The prospect of using previously collected specimens to avoid the challenges of de novo sampling was raised, with the warning that few existing samples are sufficiently consented to place the data on the web as this study requires.

GI Tract Sampling Discussion

Experts in microbial studies of the human gastrointestinal tract provided an overview of sampling plus a discussion of current data. Recent studies have shown significant variation between individuals at the level of species and strains, but less variation on a grosser taxonomic scale. Within an individual, the consistency of community composition despite changes in relative abundance over long periods of time was emphasized. Though gut sampling can be quite complex, it was noted that stool samples provide excellent material for a high level survey and are easy to obtain. Because of the ease in obtaining samples, sampling a large number of individuals for an overview of microbial diversity was suggested.

Several alternatives to fecal sampling were also outlined, with descriptions of various levels of invasiveness for sample retrieval. Investigating microbial diversity in multiple generations through the use of twin studies was also suggested.

Oral Cavity Sampling Discussion

The discussion of sampling the oral cavity began with reflections on the difference between a normal and a healthy individual. Because of the prevalence of various forms of oral disease, many influenced by bacterial communities, orally healthy individuals are not typical of the United States population. For this reason, though careful metadata collection by a dental specialist was encouraged, experts suggested focusing on representative individuals, not those with perfect oral health.

Because ensuring sample integrity raises major difficulties, the practice in the field is to extract nucleic acids prior to sample storage. Efforts are currently underway to create draft sequences of large numbers of oral microbes as a database against which these nucleic acid samples can be searched. As correlation between oral and gut microbiota has been previously demonstrated, those in the field strongly supported sampling the same individuals who were sampled for gut microbes. On the whole, because of the goal of the HMP is to generate a reference data set, the suggestion was made to sample as many young adults as possible and not to focus on sampling a smaller cohort more heavily and at multiple time points.

Vagina Sampling Discussion

Experts in vaginal sampling discussed a tiered sampling approach where high-level characterization of microbial diversity allows the use of statistical analyses to determine when sampling has revealed all microbes present at or above a set frequency. Once the sampling reaches this point, clustering by the initial data identifies the most representative samples for follow-up sequencing, thus reducing the burden of the more detailed analysis. It was suggested that this approach be implemented in the sampling for the HMP instead of establishing a set number of individuals to sample at the outset.

Several specific criteria for sampling individuals were also discussed, particularly the need to link sampling the vagina with the GI tract and oral cavity. The challenge of defining healthy individuals, especially since reported symptoms often do not correlate with clinical measures of vaginal disease, was also emphasized. Though self-sampling of the vagina is effective in clinical settings, the consensus was that it would be suboptimal for this project because of its inability to define specific regions within the vagina.

Skin Sampling Discussion

Experts on sampling the skin emphasized the diversity of the skin as a microbial habitat, both within an individual and between individuals. In particular, an individual's skin microbiota is strongly dependent upon external environmental factors such as the workplace, with healthcare workers expected to show very different microbial profiles from outdoor laborers. Interestingly, when comparing sampling mechanisms it has been found that the dominant species returned by each of the three main techniques, swabbing, shaving, and punching, were the same, though rare species were differentially represented in samples obtained using the three methods. For this reason, simple swabbing techniques were suggested for their ease of screening large numbers of individuals.

Statistical Considerations

From a statistical perspective, key factors for sampling communities were discussed, particularly the need to take a rigorous and thoughtful approach. Rigorous sampling requires satisfying and testing statistical assumptions of independence and lack of bias, while thoughtful sampling requires more fundamental decisions about experimental design, particularly knowing the goal of the project and developing a strategy to meet that goal. Sampling techniques will depend on whether this project seeks to discover as many novel microorganisms as possible or to understand the broader ecology of human-associated microbial ecosystems. For this reason, the group needs to be explicit in defining the goal and to follow through consistently. From a statistical perspective, the group was reminded that the smaller the scope of the question being asked, the greater the statistical power to provide an answer. Since it is a fundamental tradeoff that the smaller the target of inference, the more that can be known about the target, the suggestion was made to involve plant and animal ecologists familiar with balancing breath with depth when studying ecological metagenomics.

Breakout Group Discussions

Following the morning presentations, workshop attendees divided into two breakout sessions charged with assimilating the region-specific recommendations into a sampling plan for establishing a data resource for the Human Microbiome Project. Their task was to address how many samples to collect, what samples to collect, and how to obtain the necessary samples. Following the sessions, the attendees reconvened to provide their recommendations.

Breakout Group I

In defining the scope of the project, the first breakout group saw the greatest scientific relevance in defining overall human-associated microbial representation, not in seeking microbial novelty. In accordance with this goal, the members described a stratified cohort sampling plan of three nested sampling levels as follows:

  • A large number of individuals (>100) who would undergo basic non-invasive sampling in all regions;
  • A subset of individuals (~100) who would undergo more invasive sampling;
  • And a final subset of ~10 individuals who would undergo the most extensive sampling.

This approach would require the development of specific sampling protocols for each subset of individuals in each region, a project to be completed by region-specific working groups.

Though the breakout group did not have the opportunity to discuss the definition of a healthy versus a normal individual, they agreed that this distinction was critical for the project. Though self-reporting would be the least expensive way to collect this data, they were uncomfortable with the limitations it introduces in areas such as the oral cavity where disease often goes unrecognized.

Breakout Group II

Instead of focusing on the details of sampling in specific regions, the second breakout group reached the following conclusions regarding major decisions affecting the structure of sampling:

  • All individuals should be consented equally and sampled from as many regions as possible, both for scientific and practical reasons.
  • The sample archive should preserve DNA and RNA, not raw sample material.
  • Host genetic material should be collected to allow for genotyping or sequencing at a later date even if there are no immediate plans for its use.
  • Professional sampling of subjects, as opposed to self-sampling, should be done in order to maintain aseptic technique and properly sample tracking.
  • Relevant metadata will be critical to the project, and it was suggested that the most important are age, gender, occupation, social and economic status (including zip code), diet, medications, smoking status, family health history, BMI, and asymptomatic pathology.
  • Experts in each sampling each region should be convened as "islands of expertise"; to define standard operating procedures for sample collection.
  • Though no definitive numbers were discussed for sampling, there was broad consensus that 100 individuals would be insufficient.

This page last reviewed on August 27, 2013