Frequently Asked Questions (FAQs) for the Kids First Program

Click the link below to jump to those FAQs


FAQs for Kids First Data Sharing

1.    Why is data sharing important for the Kids First program?

This program is Congressionally mandated to provide resources that will drive discovery in pediatric research (see Gabriella Miller Kids First Research Act).  Datasets and resources generated by this program must be made as broadly shareable and accessible as is possible while abiding with informed consent language and protecting participants. 

In line with NIH’s mission and the Kids First program’s goals, increasing accessibility to data through broad sharing practices empowers researchers and accelerates scientific progress which leads to more refined diagnostic capabilities and ultimately more targeted therapies.

To view a recent presentation about Data Sharing in the Kids First program, click here.

2.    What are the general benefits of data sharing?

  • Enables data generated for a given study(s) to be used to explore a wide range of additional research questions
  • Increases statistical power by combining separate datasets and increasing sample size 
  • Allows validation of research results
  • Promotes innovation of methods and tools for research
  • Facilitates development of improved therapeutic and diagnostic strategies for patients

3.    What is the National Institutes of Health (NIH) Genomic Data Sharing (GDS) policy?

Effective January 25, 2015 the, NIH Genomic Data Sharing Policy 
(NOT-OD-14-124) replaces the NIH GWAS Data Sharing Policy (NOT-OD-07-088). Under terms and conditions consistent with the informed consent provided by individual participants, the GDS policy seeks to make genomic data broadly available to the research community in a timely manner.  Information on the NIH Genomic Data Sharing Policy can be found on: 

4.    Who can I go to for data sharing questions?

Each NIH Institute and Center has a Genomic Program Administrator (GPA) who serves as a point of contact for GDS Policy implementation within the IC. GPAs involved with the Kids First program are indicated on our Working Group members page: 
You can find a full list of a GPAs here: 

5.    What is an Institutional Certification and what role does it play in data sharing?

Individual consent forms signed by study participants are the legal foundation for how controlled-access data from enrolled participants can be shared through dbGaP. Institutional Certifications assure that:

  • the data submission is consistent with all applicable national, tribal, and state laws and regulations as well as relevant institutional policies; 
  • an Institutional Review Board (IRB or equivalent) has reviewed the investigator’s proposal for data submission, and 
  • the data submission and subsequent data sharing for research purposes is consistent with the informed consent of study participants from whom the data were obtained, including any data use limitations (DULs) or modifiers. 

The Institutional Certification is submitted to a Genomic Program Administrator (GPA) who uses this to register the study in dbGaP and generate a Data Use Certification (DUC).  DULs and DUL modifiers are to be selected only if they reflect the language of the consent form, not PI or even IRB preference. “Preferences” for data use that are not expressly stated in patient consents can be written into other sections of the DUC. Secondary users and their supporting Institution must agree to the conditions of the DUC, when applying to access data (see “FAQs for accessing Kids First data” below).
Please refer to the following resources for more information about consent language, Institutional Certifications, and the dbGaP registration process:

6.    What are the data sharing expectations for Kids First projects? 

In line with GDS policy, the consent forms should contain language that reflects broad sharing of genomic data. For guidance on developing consent language, visit  Projects that allow for the broadest leveling of sharing (i.e. “General Research Use” with no additional restrictions) will be prioritized for sequencing and incorporation into the Kids First Data Resource.  

In addition to complying with the NIH Genomic Data Sharing Policy (NOT-OD-14-124), sequence data generated through Kids First are expected to be consented for broad data sharing that allows comparing and combining datasets for analyses, consistent with the goals of the program. Therefore, we ask that applicant PIs obtain an Institutional Certification following these steps:

1)    Download the current NIH Institutional Certification template from: 
2)    Fill out the first page of the Institutional Certification to include the sites that would contribute samples for sequencing.  One document can list multiple sites; alternatively, multiple Institutional Certifications, one for each site, can be submitted.  Note: Investigators should not fill out the second page, which indicates the data use limitations and modifiers (these are for the IRB to determine). 
3)    Provide the Institutional Certification to the IRB along with the participant consent forms for each site and any other pertinent information (e.g. protocols). 
4)    The IRB reviews the consent form(s) and supporting information to determine whether there are any data use limitations (DULs) and/or DUL modifiers for each “consent group”.  Unless the intent of the consent form language is determined to prohibit specific uses of the data generated from the samples collected from the participants, it is expected that the IRB will designate a dataset as “General Research Use”. 
5)    After IRB review, the Institutional Certification needs to be counter-signed by the applicant PI and a senior official at the PI’s institution who is authorized to enter the institution into a legally binding contract and sign on behalf of the investigator who plans to submit the data to NIH, e.g. Dean, Vice President for Research.

While a full Institutional Certification is preferred for submission with the X01 application, a Provisional Certification is acceptable if there is not enough time to obtain a full Institutional Certification before submitting the application.  However, approval to access the Kids First X01 sequencing capacity is conditional on the submission of a full Institutional Certification covering all samples to be submitted for sequencing. If the document does not meet the Kids First program’s expectation for broad data sharing (i.e. General Research Use), another cohort with broader sharing may be selected instead.  

To download a letter that explains the data sharing expectations of the Kids First program, click here


FAQs for Accessing Kids First data

7.    Where are Kids First data being stored?

Individual level sequence data (BAM/FASTQ/VCF files) and associated clinical/phenotype data and metadata generated for Kids First cohorts are stored in NIH–approved repositories. In the near future, Kids First structural birth defects and childhood cancer datasets will be accessible through the Kids First Data Resource Center (

Some data and metadata from structural birth defect projects are currently stored in the National Center for Biotechnology Information’s (NCBI) Sequence Read Archive (SRA) and/or the database of Genotypes and Phenotypes (dbGaP).

8.    When will Kids First data be publicly available?

Kids First X01 data sets are scheduled to be released to the public via dbGaP six months after the X01 investigator team receives access to the entire data set.  Sometimes this “pre-release period” can be longer than six months due to administrative delays, but the data will not get released prior to six months unless specifically requested by the X01 PI.

Visit our X01 projects page to see projects that have been released and estimated release dates for pending projects: 

9.    How do I access Kids First data?

The first step is to find the Kids First data.  In addition to our X01 projects page, you can find a list of all released Kids First projects on our Umbrella BioProject Page: To see the dbGaP pages, go to the “Project Data” section and select the link to the right of “Genotype and Phenotype (dbGaP)” under “Resource Name”.
The next step is to submit a Data Access Request (DAR) through dbGaP for each project: 
Secondary users and their supporting Institution’s Signing Official and IT Director must agree to the conditions of the Data Use Certification (sample agreement:, including any DULs or DUL modifiers pertinent to the requested dataset and the Genomic User Code of Conduct (  All internal and external collaborators must be listed on the application, with the exception of technicians, graduate students, and postdoctoral fellows who are under the requestor’s direct supervision.  External collaborators are required to submit separate DAR(s) for approved access to the same dataset(s). The DAR(s) will be reviewed by an NIH IC’s Data Access Committee (DAC).    
To learn more about the dbGaP data access procedure: 

FAQs for the Discovery of the Genetic Basis of Childhood Cancers and of Structural Birth Defects: Gabriella Miller Kids First Pediatric Research Program (X01 Clinical Trial Not Allowed) (PAR-18-583) Funding Opportunity Announcement (FOA) 

The Kids First program staff hosted a pre-application webinar for PAR-18-583 on February 8th, 2018. We encourage applicants to view webinar slides and video

1.  What are some major features of PAR-18-583?

  • Supports whole genome sequencing (WGS) of existing cohorts to elucidate the genetic contribution to childhood cancers and the genetic etiology of structural birth defects.
  • Whole genome, exome, and transcriptome sequencing are available for tumor or affected tissue when justified.  Cohort participants must have given consent to allow sharing of individual-level sequence and relevant phenotype data through an NIH-approved repository (see question 3 below).
  • Cohort samples that have consents that allow for broad data sharing (i.e. for General Research Use) are of higher priority. For more information, please see our FAQs on data sharing.
  • Investigators with small cohort sizes are encouraged to collaborate with other investigators and pool samples together to increase statistical power.
  • Investigators who have probands that have previously undergone WGS and who have unsequenced nucleic acids from their parents, siblings, tumor, and/or affected tissue are encouraged to apply to have those samples sequenced.
  • Kids First is requesting that sample, phenotype, family structure, and data sharing information for the proposed cohorts be provided as "Other Attachments". See question 2 below for a downloadable form.
  • The newly established Kids First Data Resource will receive and process sequence data generated under this FOA and make genomic and phenotypic data accessible to the research community to facilitate comparative analyses.
  • This list is not exhaustive. Applicants are strongly encouraged to read the funding announcement closely and to contact program staff in case of any questions.

2.  What information is required as "Other Attachments"?

  • Kids First is asking for specific information to be summarized and included as attachments. This is described in detail under Section IV. Application and Submission Information under the subheading SF424(R&R) Other Project Information. Applicants must include:
  • Institutional Certification – Institutional Certifications specify the data use limitations and data use limitation modifiers, as determined by the institution’s IRB based on the informed consent agreed to by the participants.  
    • In order to obtain the Institutional Certification, you can submit a cover letter that explains the data sharing expectations of the Kids First program (to download cover letter click here), along with the current NIH Institutional Certification template (please leave DULs and DUL modifier blank for your IRB to fill out), consent forms, and any other pertinent information (protocols etc.) to your IRB. 
    • If the IRB has not completed its review and therefore the institution cannot attest to all of the elements of the formal Institutional Certification, a provisional Institutional Certification is acceptable but the applicant is asked to describe the anticipated data use limitations and data use limitation modifiers. For institutional and/or provisional certifications, please use the current template:  
  • Sample Information, including type (e.g., DNA, RNA), tissue source, fixation method (when appropriate), and other details
  • Description of Phenotype Data that is available to be shared through the Kids First Data Resource.  Applications that propose submitting rich phenotypic data sets will be looked upon favorably.  
  • Optional – Family Structure

Kids First has developed a downloadable form  that applicants can use to summarize the samples, phenotype data, and data use limitations (if needed) for the proposed cohort. While applicants are required to provide this information, the use of this form is optional. Applicants may submit the required information in whatever format meets their individual purposes as long as it provides, at a minimum, the information requested in the FOA.

3.  Do the cohorts have to be properly consented before applying for the X01?

Participants in cohorts selected under this FOA must have given consent to allow sharing of individual-level genome sequence and relevant phenotype data through dbGaP or other NIH-approved repositories. Applicants must provide documentation of this by submitting an Institutional Certification (or Provisional Certification with a description of anticipate data use limitations) that covers all sites contributing samples, as an attachment (see question 2 above).
Cohort samples that have consents allowing for broad data sharing (e.g. for General Research Use with no data use limitation modifiers) will be given highest priority. No funds will be provided for obtaining new consent for existing samples. Consent to re-contact participants for additional phenotyping or collection of additional samples is strongly encouraged. Applicants are required to describe any data use limitations.

For research teams planning to start recruiting cohorts and/or collecting samples for a future application to the X01 program, please see FAQs for Kids First Data Sharing for more information. 

4.  What biospecimen information and phenotype data elements are expected?

Certain biospecimen and clinical/phenotype data are expected in order to process and analyze datasets; however, deep phenotyping is preferred. For phenotype data, the following clinical data elements (CDE) are expected, where available:

Primary Tumor Site or Type of Birth Defect, Sex, Race, Ethnicity, Age at Diagnosis in Days, Overall Survival Time in Days, Family History of the Childhood Cancer or Birth Defect

For templates and additional resources related to information required or suggested for the cancer projects visit:

5.  If investigators have already registered a project in dbGaP, and are seeking WGS through Kids First for samples from the same cohort, is a new Institutional Certification required? 

As long as the Institutional Certification for the registered project complies with NIH Genomic Data Sharing policy and covers all of the participants whose samples will  be sequenced through Kids First, a new certification is not required. We strongly advise working with the Genomic Program Administrator from the relevant institute ( to ensure that the appropriate samples are covered by the existing registration.  

6.  Is it important to know the source of the DNA for samples being submitted for WGS through Kids First?

It is important to know the source of the DNA for samples provided to Kids First Sequencing Centers. We ask that applicants provide a description of the samples, such as collection site; number of samples included in the study; a detailed inventory of the sources of the DNA (e.g., number of samples from blood, number of samples from saliva); and previous genotyping or sequencing. DNA from fresh/frozen blood or tissue is ideal for sequencing, as DNA from saliva can be contaminated with microbial DNA, which may result in higher costs (and therefore reduce the number of total samples that can be sequenced). Cell lines are also usually a poor choice because they often have significant genomic differences compared to the original germline which could complicate analysis. There are circumstances where studies might include induced pluripotent stem cells (iPSCs), but even then, a normal sample for comparison may be desirable.

7.  What file types will be provided by the sequencing center? 

The sequencing center will generate Variant Call Format (VCF) and BAM or CRAM files. A BAM file (.bam) is the binary version of a SAM file. A SAM file (.sam) is a tab-delimited text file that contains sequence alignment data. A CRAM is a compressed version of a BAM. FASTQ files may be provided for RNA transcriptomic data. 

8. What is the role of the Kids First Data Resource Center (DRC)?

The goal of the Kids First Data Resource is to accelerate discovery of genetic etiology and shared biologic pathways by building a collection of curated genomic and phenotypic data from Kids First X01 projects and providing a central portal where these data and analysis tools will be readily accessible to the research community. The Kids First DRC is charged with re-processing and “harmonizing” data generated by the sequencing centers to facilitate analyses across all Kids First datasets. Sequence data will be re-aligned and re-called with every iteration of the Human Genome Reference.  X01 investigators will be able to access the data files generated by the sequencing center, as well as the harmonized version of the data generated by the Data Resource Center.

DRC activities and implementations will form an integral part of the emerging landscape of the NIH’s data commons environments and will support the establishment of a cross-searchable pediatric data commons reference with shared common standards.

Additionally, the DRC roadmap includes building out a collaborative platform for integrating, distributing, and collaborating over higher level analyses.  X01 investigators are encouraged to utilize the DRC’s resources and work collaboratively with the Kids First Data Resource Center to develop additional research projects and/or pursue specific analyses. 

Applicants can contact at the Kids First Data Resource Center to enquire about including a bioinformatics analysis collaboration as part of the X01 application:
- visit, or  
- email

9.  It seems that no funds will be awarded to investigators but a detailed analytic plan is requested. Given that is the case, are investigators expected to obtain funds to support analysis separately?

There are no direct funds available under the PAR-18-583 to support analysis of sequence data. The request for applicants to provide an analysis plan is intended to increase the likelihood that the samples to be sequenced are of high quality, that the number of specimens is appropriate for the stated aims, and that those submitting X01 applications will be prepared to do the analyses. Those investigators providing the samples are likely to have a significant advantage in conducting analyses, because they are familiar with the cohort, they will be interacting directly with NIH, sequencing centers, and the Kids First Data Resource Center throughout the process, and lastly, because each X01 investigator team has six months of proprietary access to the sequence data before it is released to the public for controlled access via dbGaP. 

10.  Are there other opportunities for obtaining analysis funding?

A funding opportunity, “Small Research Grants for Analyses of Data for the Gabriella Miller Kids First Data Resource (R03)”, supported by six (6) NIH Institutes (see below) is soliciting applications intended to promote meritorious research projects focused on analyses of childhood cancer and/or structural birth defects datasets that are or could be included in the Kids First Data Resource. Development of statistical methodology appropriate for analyzing genome-wide data relevant to childhood cancer and/or structural birth defects may also be proposed.

Participating Institutes in the funding announcement:

Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD); National Cancer Institute (NCI); National Institute on Alcohol Abuse and Alcoholism (NIAAA); National Institute of Dental and Craniofacial Research (NIDCR); National Institute of Neurological Disorders and Stroke (NINDS); National Heart, Lung, and Blood Institute (NHLBI)

In addition, X01 recipients may apply for funds to support analyses using the R01 mechanism (  

11.  Is it possible to submit an application with multiple PIs from different Institutions in order to an adequate samples size or create a larger, more compelling cohort? Alternatively, is it possible to reach an adequate samples size by adding trios or families with a different childhood cancer or structural birth defect?

Efforts to increase sample number by collaborations across institutions are acceptable and encouraged. Strong justification for the proposed sample size is expected in each application. Increasing sample numbers by aggregating across related conditions is acceptable. However, applicants doing this should be prepared to provide a coherent description of the analyses that will be performed across the aggregated cohort, and it may be easier to do this for sets of samples with related phenotypes or suspected underlying pathways. In addition, investigators should state how aggregating samples won’t slow the process of sending samples to the Kids First Sequencing Center. 
Applicants are also encouraged to partner with current X01 recipients to extend existing cohorts. For a full list of current X01 projects, visit:

12.  Is there also a maximum that will be considered? Our combined cohorts for example have nearly 5000 trios.

We encourage the submission of a large number of trios, but ask that the samples be organized into tranches that make analytic/scientific sense to provide flexibility in the review process. The available budget for sequencing services associated with this FOA allows for roughly 4,000 genomes total. Depending on the quality and number of applications received, the Kids First program management will determine how many total samples each X01 recipient will have approved for sequencing, while taking study design and sample size into consideration.

Additionally, applicants who propose sequencing large numbers of samples should describe their capacity and plan to prepare such a large number of samples for sequencing within the year timeframe.

13.  Should we propose quality metrics for the genome sequencing?

No, this is not necessary. You should note the quality of the samples being proposed for submission. 

14.  Will agreeing to share additional genomic data through the Kids First Data Resource be looked upon favorably?

Investigators with existing childhood cancer and/or structural birth defect genomic data are encouraged to submit these data to be part of the Kids First Data Resource. Willingness to share additional genomic data through the Kids First Data Resource will likely be looked upon favorably. A willingness to contribute additional data can be indicated in the analysis plan.  This could be done if the additional genomic data adds power to the analyses that are planned using data from the cohort that the PI is proposing. For instance, if the plan is to submit samples that have already undergone exome sequencing, then sharing the exome data might provide additional value to the cohort. Aggregating larger amounts of data through Kids First will strengthen our goal of facilitating comprehensive and cross-cutting research.

15.  Do applicants need to describe the capacity to store BAM files?

If your group plans to download data to a local server as part of the data management plan, it is important to make clear that your team has the capacity (including equipment, security infrastructure, and physical resources) at your institution to securely accept and store large data files. However, as cloud-based resources become available through the Data Resource Center, local download and processing of data may not be necessary for interacting with Kids First datasets. If your group plans to make use of cloud-based workspaces, please describe a plan for analyzing data in such spaces. For information about the DRC cloud-based workspaces, visit

Data may be stored/hosted on local cloud-based platforms. For more information see “NIH Security Best Practices for Controlled-Access Data Subject to the NIH Genomic Data Sharing (GDS) Policy”. 

16.  Although the maximum project period is 1 year, could one propose to sequence 70 trios now and then add 50 trios next year after additional collections?

All samples must be extracted, properly consented, and ready to send off to the sequencing center shortly after the review date. Please refer to the FOA for a more detailed timeframe.

17.  Who is responsible for data deposition?

The sequencing center is responsible for deposition of the sequence data into a NIH approved data repository (e.g., the Kids First Data Resource). The study Principal Investigator will be responsible for directly submitting the clinical/phenotypic data to the Kids First Data Resource Center.

18. For tumor specimens, is there an opportunity for applying whole genome sequencing (WGS) to DNA extracted from formalin-fixed paraffin-embedded (FFPE) tissue?

Fresh frozen samples for tumors are preferred. However, proposals that include FFPE samples will be accepted. If such a proposal is successful in review, there may be technical issues to resolve before good results can be obtained.

19.  What amount and concentration of DNA will be required and what will be the coverage?

Whole genome sequencing (WGS) of germline DNA will be done at 30X mean coverage using paired end sequencing. Tumors will be sequenced at 30X mean coverage using paired end sequencing combined with whole exome sequencing (WES) and RNA sequencing both at 100X also using paired end sequencing. If RNA from the affected site is not available, the sequencing center staff will work with each project to determine the best coverage and approach for sequencing and analysis of tumors and/or affected tissue.

Amount of DNA/RNA and coverage
  Amount DNA or RNA required/recommended Concentration Coverage Additional info.
WGS ~2ug DNA 20-50 ng/ul preferred  30X paired end reads
WES 275 ng DNA (minimum); 1 ug recommended 20 ng/ul (minimum) 100X, greater than 80% coding exons covered at 20X paired end reads
RNA-Seq 750 ng total RNA (minimum); 1 ug recommended 20 ng/ul (minimum) 100X, greater than 40% coding exons covered at 20X paired end reads

20.  Can I propose long-read whole genome sequencing?

Yes. Both of the designated Kids First sequencing centers, the Broad Institute and the HudsonAlpha Institute for Biotechnology in collaboration with St. Jude Children’s Research Hospital, offer long-read WGS services using the 10X Genomics platform. However, the benefits of using this technology for birth defects and pediatric cancer cohorts are not yet well understood. Additionally, long-read WGS is more expensive than standard WGS and results in fewer samples being sequenced and loss of statistical power. (For example, for a fixed set of funding, approximately 15-20% fewer samples can be sequenced using long-read WGS compared to standard WGS). Thus, applicants should justify the added benefits that long-read WGS will provide versus standard WGS for their particular cohort.

21.  Are applicants expected to describe how results will be returned to study participations or how incidental findings will be reported?

Decisions about returning individual results and incidental findings to study participants lie with the institution and their IRBs and are outlined in the consent form agreed to by participants. NIH does not require that X01 applicants describe a plan for return of results for the Kids First sequencing program.  Investigators and participants should keep in mind that the technology used to generate sequence data in this program is designed for research purposes, not for identifying clinical results, and is not CLIA-certified.   

22. Who should I contact for additional question? 

You can email Valerie Cotton at for additional questions. Please use the subject line: “X01 inquiry”. 
Potential applicants may also contact any Program Officer listed in the FOA.

FAQs for Small Research Grants for Analyses of Data for the Gabriella Miller Kids First Data Resource (R03)(PAR-16-348

1.  When is the next receipt date for this opportunity?  

Due to new requirements for clinical trials, many Funding Opportunity Announcements are being reissued across NIH, including those that are not for clinical trials.  For this reason, PAR-16-348, “Small Research Grants for Analyses of Data for the Gabriella Miller Kids First Data Resource (R03)” recently expired and the reissue will be posted in time for the next standard receipt cycle (Cycle II). 

The changes to this FOA are minor and do not impact the scope or overall expectations for this funding opportunity.  

2.  How can secondary data and analytic pipelines from the R03 be shared with the pediatric research community?  

Since the goal of the Kids First program is to facilitate new discoveries and novel ways of thinking about childhood cancers and structural birth defects, researchers funded through PAR-16-348 are strongly encouraged to submit the results of data analyses as well as the analytic pipeline used to the Kids First Data Resource to make the data more broadly available to the pediatric research community.

Applicants may contact at the Kids First Data Resource Center (DRC) to learn more about how secondary data and analytical pipelines can be submitted:
- visit (link is external), or  
- email (link sends e-mail)

3.  Since the DRC is charged with processing Kids First data, can we work with them to develop an analysis plan?  

Applicants are encouraged to communicate with the Kids First Data Resource Center to avoid redundant efforts and to collaborate on the analytical approach. To contact them:
- visit (link is external), or  
- email (link sends e-mail)

4.  If I am proposing to use the R03 to analyze data obtained outside of Kids First, what are the requirements? 

Applicants who already have other whole genome sequence data relevant to the Kids First program, but generated through other resources, must be willing and able to submit these data to the Kids First Data Resource. These applicants must confirm that the consent used to obtain samples allows sharing of individual level genome sequence and relevant phenotype data through a controlled access, NIH-approved repository such as dbGaP. Applicants should also describe whether the consent requires any data use limitations; however, datasets allowing for broad data sharing (e.g. for General Research Use) are strongly preferred. Information such as the ability to recontact participants for additional phenotyping or collection of additional samples should be included, although this FOA is not providing support for additional sample collection.  

5.  Who can I contact for additional information?  

You can email Valerie Cotton at sends e-mail) for additional questions. Please use the subject line: “R03 inquiry”.  

Researchers funded by PAR-16-348 may contact the Program Officer listed on the Notice of Award for the R03 grant, which is available through your NIH eRA user account.  


This page last reviewed on February 23, 2018