Frequently Asked Questions (FAQs) for the Kids First Program

Click the link below to jump to those FAQs

FAQs for Kids First Genomic Data Sharing

1. Why is data sharing important for the Kids First program?

This program is Congressionally mandated to provide resources that will drive discovery in pediatric research (see Gabriella Miller Kids First Research Act). Datasets and resources generated by this program must be made as broadly shareable and accessible as is possible while abiding with informed consent language and protecting participants. 

In accordance with NIH’s mission and the Kids First program’s goals, increasing accessibility to data through broad sharing practices empowers researchers and accelerates scientific progress that can lead to improved diagnostic capabilities and targeted therapies.

2. What are the general benefits of data sharing?

  • Enables data generated for a given study(s) to be used to explore a wide range of additional research questions
  • Increases statistical power by combining separate datasets and increasing sample size 
  • Allows validation of research results
  • Promotes innovation of methods and tools for research
  • Facilitates development of improved therapeutic and diagnostic strategies for patients

3. What is the National Institutes of Health (NIH) Genomic Data Sharing (GDS) policy?

Effective on January 25, 2015 the NIH Genomic Data Sharing Policy (NOT-OD-14-124) replaces the NIH GWAS Data Sharing Policy (NOT-OD-07-088). Under terms and conditions consistent with the informed consent provided by individual participants, the GDS policy seeks to make genomic data broadly available to the research community in a timely manner. Information on the NIH Genomic Data Sharing Policy can be found on: 

4. What is an Institutional Certification and what role does it play in genomic data sharing?

Individual consent forms signed by study participants are the legal foundation for how genomic data from enrolled participants can be shared through dbGaP. Institutional Certifications assure that:

  • The data submission is consistent with all applicable national, tribal, and state laws and regulations as well as relevant institutional policies; 
  • An Institutional Review Board (IRB or equivalent) has reviewed the investigator’s proposal for data submission, and 
  • The data submission and subsequent data sharing and use for research purposes is consistent with the informed consent of study participants from whom the data were obtained, including any data use limitations (DULs) or modifiers. 

The Institutional Certification is submitted to a Genomic Program Administrator (GPA) who uses this to register the study in dbGaP and generate a Data Use Certification (DUC). Data Use Limitations reflect the language of the consent form and not PI or IRB preferences. Secondary users and their supporting Institution must agree to the conditions of the DUC, when applying to access data (see “FAQs for accessing Kids First data” below).

    5. What is the process for obtaining an Institutional Certification?

    We suggest that applicant PIs obtain Institutional Certifications following these steps:

    1)    Download the current NIH Institutional Certification template from: 
    2)    Fill out the first page of the Institutional Certification to include the sites that would contribute samples for sequencing. One document can list multiple sites; alternatively, multiple Institutional Certifications, one for each site, can be submitted. 
    3)    Provide the Institutional Certification to the IRB, or equivalent body, along with the participant consent forms for each site and any other pertinent information (e.g. protocols), to complete the second and third pages:

     a. On the top of second page, it is anticipated that the individual-level genomic data will be made available through controlled-access. Regarding “genomic summary results (GSR),” this box is to be left unchecked, unless unrestricted access to GSR is not permitted due to the study’s designation as “sensitive” by the institution. Please note that it is not anticipated that a “sensitive” designation will apply to current Kids First studies; therefore, GSR from Kids First data would not require controlled-access.

    b.    The lower section of the second page addresses “genomic summary results (GSR).” This box is to be left unchecked, unless unrestricted access to GSR is not permitted due to the study’s designation as “sensitive” by the institution. Please note that it is anticipated that unrestricted access to GSR will be appropriate for the majority of Kids First genomic datasets. For additional information see “Update to NIH Management of Genomic Summary Results Access” (  

    4)    On the third page, the IRB, or equivalent body, is to select the appropriate data use limitations (DULs) and/or DUL modifiers based on the language of each site’s consent form. Unless the intent of the consent form language is determined to prohibit specific uses of the data generated from the samples collected from the participants, it is expected that the dataset will be designated as “General Research Use (GRU)”. Please note that cohorts with data use limitations and/or modifiers that impede the ability to access, use, combine, or cross-analyze data will not be prioritized for sequencing by the Kids First program (e.g., datasets consented for disease-specific research only, datasets that require a letter of collaboration (“COL”), or datasets that require local “IRB” approval).
    5)    Finally, the Institutional Certification needs to be counter-signed by the applicant PI and the Institution Signing Official who is authorized to enter the institution into a legally binding contract and sign on behalf of the investigator who plans to submit the data to NIH, e.g. Dean, Vice President for Research.

    An Institutional Certification must be provided with an application to the Kids First X01 sequencing opportunity; a Provisional Certification is acceptable if there is not adequate time to obtain a full Institutional Certification before submitting the application. However, approval to access the Kids First X01 sequencing capacity is conditional on the submission of a full Institutional Certification covering all samples to be submitted for sequencing. Cohort selection will be based, in part, on the Kids First program’s expectation for broad data sharing (i.e. General Research Use).

    6. What are the genomic data sharing expectations for Kids First projects? 

    Consistent with the NIH Genomic Data Sharing Policy (NOT-OD-14-124), consent forms should contain language that reflects broad sharing of genomic data. Additionally, Kids First takes seriously its responsibility to ensure data can be broadly accessed, used, combined, and/or cross-analyzed across childhood cancer and structural birth defects. Projects that allow for the broadest leveling of sharing (i.e. “General Research Use” with no additional restrictions) will be prioritized for Kids First support (i.e., the X01 sequencing opportunity). The following data use consent groups and modifiers limit broad data access and impede the ability of the Kids First  program to accomplish its goals.

    -    Disease Specific Consent Group: When data use is restricted to a specific disease area, the data cannot be combined with a dataset with a different disease specific data use limitation. Combining and cross-analyzing datasets are a primary goal of Kids First and therefore datasets that are consented for General Research Use and/or Health/Medical/Biomedical purposes will be prioritized over datasets restricted to Disease Specific use. 

    -    IRB modifier: With this box checked, the Requester must provide documentation of a their local IRB’s approval for the proposed research when submitting a Data Access Request (DAR). We find that it is rare for consent language to include such a requirement and that this modifier is often included in error. As a reminder when submitting a Data Access, every requester and their institution must agree to the terms of the Data Use Certification (DUC), which verifies that the requesting PI is accredited within the institution, the institution is aware of the project for which the PI is proposing to use the data, and that the Institution has all appropriate security measures in place to manage and maintain the controlled-access dataset(s) being retrieved. For a sample DUC, see: 

    -    COL modifier: This box is checked when the consent form states that collaboration with the original/submitting investigator is required in order to use the dataset; therefore, the Requestor must provide a collaboration agreement document in order to be approved for access the dataset. This can limit the number of end-users who are able to use the dataset.  

    Please note that under the recent guidance, “Update to NIH Management of Genomic Summary Results Access (NOT-OD-19-023)”, it is anticipated that unrestricted access to Genomic Summary Results will be appropriate for the majority of Kids First genomic datasets (i.e.the new box on page 2 should remain unchecked).

    7. Where can I find additional resources about genomic data sharing? 

    Please refer to the following resources for more information about Genomic Data Sharing, consent language, Institutional Certifications, and the dbGaP registration process:

    •NIH Office of Science Policy: NIH Genomic Data Sharing:
    •NIH GDS Policy pdf, 4. Informed Consent and 5. Institutional Certification:
    •NIH Guidance on Consent for Future Research Use and Broad Sharing of Human Genomic and Phenotypic Data Subject to the NIH Genomic Data Sharing Policy:
    •National Institutes of Health Points to Consider in Drafting Effective Data Use Limitation Statements (Institutional Certifications):
    •NHGRI: The Informed Consent Resource:
    •Points to Consider for Institutions and Institutional Review Boards in Submission and Secondary Use of Human Genomic Data under the National Institutes of Health Genomic Data Sharing Policy:
    •Institutional Certification Template (note that Data Use Limitations (DULs) and modifiers must only be selected according to the language of the participant consent forms): 

    •dbGaP Registration Flow Chart:

    8. Who can I contact for additional information and questions regarding data sharing?
    Jaime M. Guidry Auvil, Ph.D.
    Genomic Program Administrator (GPA)
    Director, NCI Office of Data Sharing 

    Vivian Ota Wang, Ph.D.        
    Kids First Data Access Committee (DAC) Chair 
    NCI Office of Data Sharing

    General NIH Genomic Data Sharing questions: 

    dbGaP (NCBI) helpdesk:

    FAQs for Accessing Kids First data

    1. Where can I access Kids First data?

    Individual level sequence data (BAM/FASTQ/VCF files) and associated clinical/phenotype data and metadata generated for Kids First cohorts can be accessed through the Kids First Data Resource Portal (to learn more, visit Before accessing individual level genomic sequence data, you will need to submit a Data Access Request through dbGaP for approval from the NIH Kids First Data Access Committee (see FAQ #3 below).

    Some genomic datasets from structural birth defect projects are currently stored in the National Center for Biotechnology Information’s (NCBI) Sequence Read Archive (SRA), but all datasets can be accessed through the Kids First Data Resource Portal and require dbGaP approval.  

    2. When will Kids First data be publicly available?

    Kids First X01 datasets are scheduled to be released to the public via dbGaP six months after the X01 investigator team receives access to the sequence data. Sometimes this “pre-release period” can be longer than six months due to procedural delays, but the data will not be released prior to six months unless specifically requested by the X01 PI.

    Visit our X01 projects page to see projects that have been released and estimated release dates for pending projects: 

    3. How do I access Kids First data?

    The first step is to find the Kids First data. In addition to our X01 projects page, and the website, you can find a list of all released Kids First projects on our Umbrella BioProject Page: To see the dbGaP pages, go to the “Project Data” section and select the link to the right of “Genotype and Phenotype (dbGaP)” under “Resource Name.”

    The next step is to submit a Data Access Request (DAR) through dbGaP for each project: Secondary users and their supporting Institution’s Signing Official and IT Director must agree to the conditions of the Data Use Certification (sample agreement:, including any DULs or DUL modifiers pertinent to the requested dataset and the Genomic User Code of Conduct (htps://

     All internal and external collaborators must be listed on the application, with the exception of technicians, graduate students, and postdoctoral fellows who are under the requestor’s direct supervision. External collaborators from other institutions are required to submit separate DAR(s) for approved access to the same dataset(s). The DAR(s) will be reviewed by the NIH Kids First Data Access Committee (DAC), which is run out of the NCI Office of Data Sharing.    

    To learn more about the dbGaP data access procedure, visit: and watch a presentation about requesting access to genomic datasets through dbGaP at

    4. Who can apply to access individual level sequence data from dbGaP? 

    For extramural researchers, the Principal Investigator (PI) must be a tenure-track professor, senior scientist, or equivalent, to be able to submit a data access request (DAR) and have a valid NIH eRA Commons account for logging in to the dbGaP system. Please see here for more about how to setup a new eRA Commons account or how to make changes to an existing eRA Commons account.

    FAQs for the Discovery of the Genetic Basis of Childhood Cancers and of Structural Birth Defects: Gabriella Miller Kids First Pediatric Research Program (X01 Clinical Trial Not Allowed) (PAR-19-390) Funding Opportunity Announcement (FOA)

    A pre-application webinar for PAR-19-390 is tentatively scheduled for Friday, November 15, 2019 from 12:00 - 1:00 pm EST. 

    1.  What are some major features of PAR-19-390?

    •  Supports whole genome sequencing (WGS) of existing cohorts to elucidate the genetic (germline or somatic) contribution to childhood cancers and the genetic etiology of structural birth defects.
    •  Whole genome, exome, and transcriptome sequencing may be available for tumor or affected tissue when justified.  With justification, complementary sequencing approaches, such as long-read sequencing, may also be proposed for a cohort or a subset of a cohort. Project design will be finalized in discussions among the X01 investigators, the sequencing centers, and NIH program staff.
    • Cohort participants must have given consent to allow sharing of individual-level sequence and relevant phenotypic data through an NIH-approved repository (see question 3 below). Cohort samples that have consents that allow for broad data sharing (i.e. for General Research Use) are of higher priority and cohorts with data use limitations that impede program goals will not be prioritized (e.g., datasets consented for disease-specific research only, datasets that require a letter of collaboration (“COL”), for access, or datasets that require local IRB approval for access). For more information, please see our FAQs on data sharing ( Genomic Data Sharing FAQ #6).
    • Cohorts proposed for sequencing must include a minimal amount of associated clinical and phenotypic data sufficient to enable association with genomic variants and analysis. Proposals with rich clinical and phenotypic data that can be shared to facilitate cross-disease research among the pediatric research community will be prioritized.   
    • Investigators with small cohort sizes are encouraged to collaborate with other investigators and pool samples together to increase statistical power.
    • Investigators who have probands that have previously undergone WGS and who have unsequenced nucleic acids from their parents, siblings, tumor, and/or affected tissue are encouraged to apply to have those samples sequenced.
    • Kids First is requesting that sample, phenotype, family structure, and data sharing information for the proposed cohorts be provided as "Other Attachments." See question 2 below for a downloadable set of tables that can be used.
    • The Kids First Data Resource Center will receive, and process sequence data generated under this FOA and make genomic and phenotypic data accessible to the research community to facilitate comparative analyses.
    • This list is not exhaustive. Applicants are strongly encouraged to read the funding announcement closely and to contact program staff in case of any questions.

    2. How will X01 projects/cohort will selected?

    Investigators whose projects are selected for this opportunity will be notified by NIH Kids First program staff with the estimated number of samples approved for sequencing. Since there is no “award” associated with the X01 mechanism, X01 decisions are not finalized by an NIH Institute or Center (IC) Council. Rather, following initial peer review, recommended applications will receive a second level of review by the Common Fund and NIH staff involved in the Kids First Program, and decisions are approved by the NIH Gabriella Miller Kids First Working Group Co-Chairs ( The following will be considered in making cohort selections:

    • Scientific and technical merit of the proposed project as determined by scientific peer review
    • Availability of funds
    • Relevance of the proposed project to program priorities
    • Value of incorporating the dataset into the Data Resource to empower research among the pediatric research community
    • Program balance:  Kid First seeks to ensure that a broad diversity of both childhood cancer studies and structural birth defects studies are well represented. The program prioritizes cohorts with conditions not previously sequenced under Kids First
    • Informative study design and sufficient clinical and phenotypic data
    • Availability of samples in timely manner 
    • Sample quality in terms of suitability for whole genome sequencing 
      Compliance with resource sharing policies as appropriate and ability to broadly share and use data from the cohort in line with the goals of the program (i.e. combining and cross-analyzing genomic datasets). Approval to access the sequencing capacity is conditional on the submission of a completed Institutional Certification covering all samples to be submitted for sequencing. If the document does not meet the Kids First program's expectation for broad data sharing (i.e. General Research Use), another cohort with broader sharing may be selected instead. For more information, please see our FAQs on data sharing (Genomic Data Sharing FAQ #6).

    3. What information is required as "Other Attachments"?

     Kids First is asking for specific information to be summarized and included as attachments. This is described in on the FOA under Section IV. Application and Submission Information under the subheading SF424(R&R) Other Project Information. Applicants must include:

    • Institutional Certification – Institutional Certifications specify the data use limitations and data use limitation modifiers, as determined by the institution’s IRB based on the informed consent agreed to by the participants. 
      •  In order to obtain the Institutional Certification, you can submit a cover letter that explains the data sharing expectations of the Kids First program (to download cover letter click here), along with the current NIH Institutional Certification template (please leave DULs and DUL modifier blank for your IRB to fill out), consent forms, and any other pertinent information (protocols etc.) to your IRB. 
      •  If the IRB has not completed its review and therefore the institution cannot attest to all of the elements of the formal Institutional Certification, a provisional Institutional Certification is acceptable but the applicant is asked to describe the anticipated data use limitations and data use limitation modifiers. For institutional and/or provisional certifications, please use the current template:
    • Sample Information, including type (e.g., DNA, RNA), tissue source, fixation method (when appropriate), and other details. Please note that DNA from patient-derived cell lines will not be accepted due to the possible introduction of mutations that could confound the identification of disease-causing rare variants.  
    • Description of clinical and phenotyopic data that are available to be shared through the Kids First Data Resource.  Applications that propose submitting rich phenotypic data sets will be looked upon favorably.  
    • Optional – Family Structure or Pedigrees
    • Kids First has developed a downloadable table that applicants can use to summarize the samples, phenotype data, and data use limitations (if needed) for the proposed cohort. While applicants are required to provide this information, the use of this form is optional. Applicants may submit the required information in whatever format meets their individual purposes as long as it provides, at a minimum, the information requested in the FOA.

    4.  Do the cohorts have to be properly consented before applying for the X01?

    Participants in cohorts selected under this FOA must have given consent to allow sharing of individual-level genome sequence and relevant phenotype data through dbGaP or other NIH-approved repositories. Applicants must provide documentation of this by submitting an Institutional Certification (or Provisional Certification with a description of anticipate data use limitations) that covers all sites samples, as an attachment (see question 3 above).

    Cohort samples that have consents allowing for broad data sharing (e.g. for General Research Use with no data use limitation modifiers) will be given highest priority. No funds will be provided for obtaining new consent for existing samples. Consent to re-contact participants for additional phenotyping or collection of additional samples is strongly encouraged. Applicants are required to describe any data use limitations.

    For research teams planning to start recruiting cohorts and/or collecting samples for a future application to the X01 program, please see FAQs for Kids First Genomic Data Sharing for more information.

    5.  What biospecimen information and phenotype data elements are expected?

    Certain biospecimen and clinical/phenotype data are expected in order to process and analyze datasets; however, deep phenotyping is preferred. For phenotype data, the following data elements are expected, where available:
    sex, race, ethnicity, age at enrollment and/or diagnosis, diagnoses (e.g. type of birth defect, primary tumor type), phenotypes for affected cases and unaffected families members, vital status, age at last known vital status, clinical information, and family medical history (e.g., family history of cancer or birth defects).   

    For templates and additional resources related to information required or suggested for the cancer projects visit : You can also view the Kids First DRC's "Clinical Phenotype Data Element which describes minimal expected data. 

    6.  If investigators have already registered a project in dbGaP, and are seeking WGS through Kids First for samples from the same cohort, is a new Institutional Certification required? 

    As long as the Institutional Certification for the registered project complies with NIH Genomic Data Sharing policy and covers all of the participants whose samples will  be sequenced through Kids First, a new certification is not required with the application. However, the Genomic Program Administrator (GPA) may ask for an Institutional Certification using the most recent NIH template (published on November 1, 2018:, if needed, prior to registering the study in dbGaP. 

    7.  Is it important to know the source of the DNA for samples being submitted for WGS through Kids First?

    It is important to know the source of the DNA for samples provided to Kids First Sequencing Centers. We ask that applicants provide a description of the samples, such as collection site; number of samples included in the study; a detailed inventory of the sources of the DNA (e.g., number of samples from blood, number of samples from saliva); and previous genotyping or sequencing. DNA from fresh/frozen blood or tissue is ideal for sequencing, as DNA from saliva can be contaminated with microbial DNA, which may result in higher costs (and therefore reduce the number of total samples that can be sequenced). Cell lines will not be accepted because they often have significant genomic differences compared to the original germline which could complicate analysis. There are circumstances where studies might include induced pluripotent stem cells (iPSCs), but even then, a normal sample for comparison may be desirable.

    8.  What file types will be provided by the sequencing center? 

    The sequencing center will generate Variant Call Format (VCF) and BAM or CRAM files for genomic data. A BAM file (.bam) is the binary version of a SAM file. A SAM file (.sam) is a tab-delimited text file that contains sequence alignment data. A CRAM is a compressed version of a BAM. FASTQ files may be provided for RNA transcriptome data. 

    9. What is the role of the Kids First Data Resource Center (DRC)?

    The goal of the Kids First Data Resource is to accelerate discovery of genetic etiology and shared biologic pathways by building a collection of curated genomic and phenotypic data from Kids First X01 projects and providing a central portal where these data and analysis tools will be readily accessible to the research community. The Kids First DRC is charged with re-processing and “harmonizing” sequence data generated by the sequencing centers  and clinical and phenotypic data provided by X01 investigators to facilitate analyses across all Kids First datasets. Sequence data will be re-aligned and re-called with every iteration of the Human Genome Reference. X01 investigators will be able to access the data files generated by the sequencing center, as well as the harmonized version of the data generated by the Data Resource Center.

    DRC activities and implementations will form an integral part of the emerging landscape of the NIH’s data environments and supports the establishment of a cross-searching pediatric data with shared common standards.

    Additionally, the DRC roadmap includes building out a collaborative platform for integrating, distributing, and collaborating over higher level analyses. X01 investigators are encouraged to utilize the DRC’s resources and work collaboratively with the Kids First Data Resource Center to develop additional research projects and/or pursue specific analyses. 

    X01 applicants can contact at the Kids First Data Resource Center to inquire current data processing procedures, tools, pipelines, and workspaces available through the data resource, and potential analysis collaborations:
    - visit, or  
    - email

    10.  It seems that no funds will be awarded to investigators but a detailed analytic plan is requested. Given that is the case, are investigators expected to obtain funds to support analysis separately?

    There are no direct funds available under the PAR-19-390 to support analysis of sequence data or other activities. The request for applicants to provide an analysis plan is intended to increase the likelihood that the samples to be sequenced are of high quality, that the number of specimens is appropriate for the stated aims, and that those submitting X01 applications will be prepared to do the analyses. Those investigators providing the samples are likely to have a significant advantage in conducting analyses, because they are familiar with the cohort, they will be interacting directly with NIH, sequencing centers, and the Kids First Data Resource Center throughout the process, and lastly, because each X01 investigator team has six months of proprietary access to the sequence data before it is released to the public for controlled access via dbGaP. 

    11.  Are there other opportunities for obtaining analysis funding?

    A funding opportunity, “Small Research Grants for Analyses of Data for the Gabriella Miller Kids First Pediatric Research Data Resource (R03 Clinical Trial Not Allowed)(PAR-19-375)”, supported by five (5) NIH Institutes (see below) is soliciting applications intended to promote meritorious small research projects focused on analyses of childhood cancer and/or structural birth defects genomic datasets generated by the Kids First program and/or associated phenotypic datasets. Development of approaches, tools, or algorithms appropriate for analyzing genomic, phenotypic, and/or clinical data relevant to Kids First may also be proposed.

    NIH Institutes and Centers participating in the R03 funding announcement:

    Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD); National Cancer Institute (NCI); National Institute on Alcohol Abuse and Alcoholism (NIAAA); National Institute of Dental and Craniofacial Research (NIDCR); National Heart, Lung, and Blood Institute (NHLBI)

    In addition, X01 recipients may apply for funds to support analyses using the R01 mechanism (  

    12.  Is it possible to submit an application with multiple PIs from different Institutions in order to build an adequate sample size or create a larger, more compelling cohort? Alternatively, is it possible to reach an adequate sample size by adding trios or families with a different childhood cancer or structural birth defect?

    Efforts to increase sample number by collaborations across institutions are acceptable and encouraged. Strong justification for the proposed sample size is expected in each application. Increasing sample numbers by aggregating across related conditions is acceptable. However, applicants doing this should be prepared to provide a coherent description of the analyses that will be performed across the aggregated cohort, and it may be easier to do this for sets of samples with related phenotypes or suspected underlying pathways. In addition, investigators should state how aggregating samples won’t slow the process of sending samples to the Kids First Sequencing Center. 
    Applicants are also encouraged to partner with current X01 recipients to extend existing cohorts. For a full list of current X01 projects, visit:

    13.  Is there also a maximum that will be considered? Our combined cohorts for example have nearly 5000 trios.

    We encourage the submission of a large number of trios, but ask that the samples be organized into tranches that make analytic/scientific sense to provide flexibility in the review process. The available budget for sequencing services associated with this FOA allows for roughly 84,000 genomes total. Depending on the quality and number of applications received, the Kids First program management will determine how many total samples each X01 recipient will have approved for sequencing, while taking study design and sample size into consideration.

    Additionally, applicants who propose sequencing large numbers of samples should describe their capacity and plan to prepare such a large number of samples for sequencing within the year timeframe.

    14.  Should we propose quality metrics for the genome sequencing?

    No, this is not necessary. You should note the quality of the samples being proposed for submission. 

    15.  Will agreeing to share additional genomic data through the Kids First Data Resource be looked upon favorably?

    Investigators with existing childhood cancer and/or structural birth defect genomic data are encouraged to submit these data to be part of the Kids First Data Resource,if the data are not currently accessible to the research community through other data repositories, as long as the data is consented for broad sharing and use by the research community. Willingness to share additional genomic data through the Kids First Data Resource will likely be looked upon favorably. A willingness to contribute additional data can be indicated in the analysis plan. This could be done if the additional genomic data add power to the analyses that are planned using data from the cohort that the PI is proposing. For instance, if the plan is to submit samples that have already undergone exome sequencing, then sharing the exome data might provide additional value to the cohort. Aggregating larger amounts of data through Kids First will strengthen our goal of facilitating comprehensive and cross-cutting research. 

    16.  Do applicants need to describe the capacity to store BAM files?

    Applicants are encouraged to make use of the cloud-based workspace that will be provided by the Kids First Data Resource Center. Therefore, local download and processing of data may not be necessary for interacting with Kids First datasets. If your group plans to download data to a local server as part of the data management plan, it is important to make clear that your team has the capacity (including equipment, security infrastructure, and physical resources) at your institution to securely accept and store large data files. If your group plans to make use of cloud-based workspaces, please describe a plan for analyzing data in such spaces. For information about the DRC cloud-based workspaces, visit

    Data may be stored/hosted on local cloud-based platforms. For more information see “NIH Security Best Practices for Controlled-Access Data Subject to the NIH Genomic Data Sharing (GDS) Policy” 

    17.  Although the maximum project period is 1 year, could one propose to sequence 70 trios now and then add 50 trios next year after additional collections?

    All samples must be extracted, properly consented, and ready to send off to the sequencing center shortly after the review date. Please refer to the FOA for a more detailed timeframe.

    18.  Who is responsible for data deposition?

    The sequencing center is responsible for deposition of the sequence data into a NIH approved data repository (e.g., dbGaP or the Kids First Data Resource). The study Principal Investigator will be responsible for directly submitting the clinical/phenotypic data to the Kids First Data Resource Center.

    19. For tumor specimens, is there an opportunity for applying whole genome sequencing (WGS) to DNA extracted from formalin-fixed paraffin-embedded (FFPE) tissue?

    Fresh frozen samples for tumors are preferred. However, proposals that include FFPE samples will be accepted. If such a proposal is successful in review, there may be technical issues to resolve before good results can be obtained.

    20.  What amount and concentration of DNA will be required and what will be the coverage?

    Whole genome sequencing (WGS) of germline DNA will be done at 30X mean coverage using paired end sequencing. Depending on the sequencing center’s protocol, tumors may be sequenced at 60X or 30X mean coverage using paired end sequencing combined with whole exome sequencing (WES) and RNA sequencing both at 100X also using paired end sequencing. The sequencing center staff will work with each project to determine the best coverage and approach for sequencing and analysis of tumors and/or affected tissue.

    Amount of DNA/RNA and coverage
      Amount DNA or RNA required/recommended Concentration Coverage Additional info.
    WGS ~2ug DNA 20-50 ng/ul preferred  30X paired end reads
    WES 275 ng DNA (minimum); 1 ug recommended 20 ng/ul (minimum) 100X, greater than 80% coding exons covered at 20X paired end reads
    RNA-Seq 750 ng total RNA (minimum); 1 ug recommended 20 ng/ul (minimum) 100X, greater than 40% coding exons covered at 20X paired end reads

    21.  Can I propose alternative or complimentary sequencing approaches, such as long-read whole genome sequencing?

    Yes. Kids First sequencing centers are equipped to provide advanced sequencing technology; however, some approaches may require higher quality DNA or be more expensive than the typical Kids First pipeline (currently 30X WGS on the NovaSeq platform).  With justification, applicants may proposed alternative or complimentary sequencing for their cohort, or a subset of their cohort, to further inform the value of these technologies for structural birth defects and childhood cancer research. However, this may reduce the number of total samples that can be sequenced for your project within the program’s limited funds. Project design will be finalized in discussions among the X01 investigators, the sequencing centers, and NIH program staff. 

    22.  Are applicants expected to describe how results will be returned to study participants or how incidental findings will be reported?

    Decisions about returning individual results and incidental findings to study participants lie with the institution and their IRBs and are outlined in the consent form agreed to by participants. NIH does not require that  Kids First X01 applicants describe a plan for return of results. Investigators and participants should keep in mind that the technology used to generate sequence data in this program is designed for research purposes, not for identifying clinical results. Communicating clinically meaningful results to participant requires sequencing and analysis by a CLIA-approved laboratory. Since the Kids First program is focused on research and discovery, CLIA sequencing is not provided.

    23. Who should I contact for additional questions? 

    You can email Valerie Cotton at for additional questions. Please use the subject line: “X01 inquiry.” 
    Potential applicants may also contact any Program Officer listed in the FOA.

    FAQs for X01 Cohorts Selected for Sequencing

     1. What information should be included on the shipping manifest? 

    Please include the following information on the shipping manifest (column headers): 

    Participant ID    
    Sample ID    
    Aliquot ID
    Tissue Type
    Anatomical Site    
    Age at Collection
    Tumor Descriptor
    Analyte Type

    You may add any additional fields relevant to the biospecimens and/or the operational needs of shipping (e.g. well/box location). 

    You may download this spreadsheet  as a starting point. Please contact the DRC for recommendations or questions.
    Please include support@kidsfirstdrc.organd when emailing shipping manifests to the sequencing center. 

    2. What clinical and phenotypic information do X01 investigators need to submit to the DRC in order to be approved for access to the genomic dataset?

    It is expected that each X01 group will provide the clinical and phenotypic described in the original X01 proposal to the DRC for sharing with the broader research community upon release of the dataset. Kids First strongly encourages the submission of detailed/deep clinical and phenotypic data, including longitudinal data and family histories. Please provide the information described in the "Clinical Phenotype Data Element" spreadsheet for the DRC and program staff to review: 
    The DRC will accept this information in another format, such as the REDCap dbGaP submission files, as long as all the necessary information is provided. 

    Please contact to discuss the best format for submitting further information.

    Upon receipt of the required information, Kids First NIH program staff will work with the DRC and/or sequencing centers to enable the X01 team to have access to the associated sequence data. Once the investigator team has access to the sequence data, they have six months of proprietary access before it is released to the public. 

    The DRC is working closely with investigators who have expertise in specific areas to address how to best capture clinical and phenotypic data moving forward. If you have an interest in engaging in this process or providing suggestions, please contact or visit

    3. How will the DRC harmonize phenotypes across Kids First projects?  Which data ontologies will be used?

    The DRC is leveraging existing community standards to harmonize clinical and phenotypic data which facilitates searching, analysis, and interoperability with other data efforts. If you are currently collecting phenotypic data or working to map such data to existing standards, we suggest you use one of the following ontologies, since these are what the DRC plans to use for phenotype harmonization: 

    Also recommended: 

    Uberon ( for tissue/anatomy, including but not limited to tumors. 
    Monarch Disease Ontology (MONDO,
    ICD-10 (

    Other helpful resources: 

    Ontology Lookup Service:

    The DRC is working closely with investigators who have expertise in specific areas to address how to best capture clinical and phenotypic data moving forward. If you have an interest in engaging in this process or providing suggestions, please contact or visit

    4. What should X01 investigators include in their acknowledgement statement when publishing research findings from Kids First generated data?

    In addition to listing the PHS Accession Number(s) of the datasets used for a particular analysis and the databases from which they are accessible to the research community, X01 investigator teams (i.e. “Contributing Investigator(s)”) are asked to describe support for the project, including NIH grant numbers. 
    A sample statement for the acknowledgment of Kids First dataset(s) follows:

    The results analyzed and <published or shown> here are based in whole or in part upon data generated by Gabriella Miller Kids First Pediatric Research Program (Kids First) projects <insert phs accession number(s)>, and are accessible through from the Kids First Data Resource Portal ( and/or dbGaP (  Kids First was supported by the Common Fund of the Office of the Director of the National Institutes of Health ( The <insert Kids First Sequencing Center> was awarded a U24 (<enter grant number>) to sequence [childhood cancer and/or structural birth defect cohort samples] submitted by investigators through the Kids First program (<enter X01 grant number>). Additional funds from <enter relevant NIH institute grant number(s)> supported the assembling of the cohorts, and the collection of the phenotypic data and samples, and/or data analysis. Contributing investigators include: <enter names>*.
    *If there are many collaborators/consortium members, you can use a ‘corporate authorship’ with a link to a website that lists everyone.

    Kids First Sequencing Center Grants 
    Sequencing Center Grant Number
    BROAD INSTITUTE U24 HD090743-01

    5. What should secondary users (a.k.a. “end-users” or approved data requestors) include in acknowledgement statements when publishing research findings from Kids First generated data?

    Secondary users, or “end users”, must acknowledge all datasets used in a publication or analysis by listing all relevant dbGaP PHS Accession Numbers, as well as the urls of the databases where the datasets were accessed. The Data Use Certification (DUC) agreed to by secondary users outlines how to use and acknowledge each approved dataset.  

    6. Are there opportunities for collaborating with other efforts for functional validation of variants? 

    •  KOMP2 is receptive to considering genes of interest identified through X01 analyses as candidates for targeting by KOMP2 Centers. This includes reviewing the literature for existing models, prioritizing specific genes for generating new knockout mice, and mapping resulting phenotypes to animal model ontologies All KOMP data are publicly available at If you are interested in collaborating with KOMP, please
    •  Projects that fall within the categorical interests of two or more NIH institutions/centers, may also consider applying for ORIP’s R21 program announcement for Development of Animal Models and Related Biological Materials for Research (PA-16-141). Investigators considering applying to PA-16-141 are strongly encouraged to consult with ORIP program staff (see Scientific/Research Contacts in Section VII. Agency Contacts) to be advised whether their research plans are appropriate for this FOA.
    • Researchers interested in exploring the gene by environment interactions of conditions such as craniofacial diseases may be interested in these funding opportunity announcements:
      •       Mechanistic Studies of Gene-Environment Interplay in Dental, Oral, Craniofacial, and Other Diseases and Conditions (R01) (PAR-19-292)
      •       Development of Novel and Robust Systems for Mechanistic Studies of Gene-Environment Interplay in Dental, Oral, Craniofacial, and Other Diseases and Conditions (R21) (PAR-19-293)

    FAQs for Small Research Grants for Analyses of Data for the Gabriella Miller Kids First Pediatric Research Data (R03) (PAR-19-375) 

    1.  When is the next receipt date for this opportunity?  

    " Small Research Grants for Analyses of Gabriella Miller Kids First Pediatric Research Data (R03 Clinical Trial Not Allowed)” has been reissued and follows the standard receipt cycle, with the first receipt date after the “Open Date." Please note that occasionally reissuing FOAs may interfere with a standard receipt date; however, we try our best to avoid this.

    2. Do I need to suggest a scientific review group or study section in my cover letter?

           No, the study section will be assigned by CSR and program staff will request that they review “Kids First” R03 applications in the same study section. 

    3. What are the data sharing expectations for this opportunity?  

    It is expected that data (including resultant raw, derived, aggregated, and summary data), tools, workflows, and/or pipelines created or used with support from this FOA will be provided to the Kids First Data Resource Center to be shared with the wider scientific community, if not already part of the Data Resource, in a timely manner that would enable other researchers to replicate and build on the analyses for future research efforts. 

    Applicants may contact at the Kids First Data Resource Center (DRC) to learn more about how secondary data and analytical pipelines can be submitted:

    4. What should be included in a data sharing plan?
             In the Data and Resource Sharing Plan, applicants should describe the anticipated timeline, formats, and methods of providing the data and other products used or created under this FOA to the Data Resource Center. Some example resultant data type could include variant call files from multi-sample comparisons, plots or graphs of variant associations, lists or tables of gene summaries, network/pathway analysis results, and other summary statistics. Where applicable, applicants should describe how they plan to share any analytical tools, pipelines, or workflows used or created through open access channels (e.g. public GitHub links). 

            Here are two Example Acceptable Data Sharing Plans for Kids First R03s: 

            Kids First Example A (external researcher analyzing Kids First X01 genomic datasets)
            The proposed research will compare genomic data from two Gabriella Miller Kids First X01 datasets (phs00XXX.v1.p1 and phs00XXX.v2.p2) which will be accessed via the Kids First Data Resource Portal and analyzed in associated cloud-based workspaces after dbGaP approval. Data and documentation related to this analysis will be provided to the Kids First Data Resource Center (DRC) upon acceptance by a journal for publication or sooner, via secure cloud-based transfer and/or other sharing methods in consultation with the DRC. Submitted data types will include resultant multi-sample VCFs as well as spreadsheets of raw and derived input data which will be processed to yield resulting summary tables and graphs for publication. We will also provide documentation to explain the analytical approach and will use pipelines that have been previously published. Code and other information related to these pipelines are currently available at the following open access links: [insert public GitHub links]. We recognize that the DRC may then make these data and documentation available to the research community in line with the appropriate parameters and/or policies, such as data use limitations of the original datasets described in dbGaP. We will cite the dbGaP accession numbers of both genomic datasets used in this analysis in any associated presentations or publication.

               Kids First Example B (Kids First X01 investigator analyzing deep phenotypic datasets)
               The proposed research aims to analyze phenotypic data from a variety of cohorts that overlap with phenotypes that will be extracted from our Kids First X01 dataset, phs00XXX.v1.p1, but are currently not shared through the Kids First Data Resource. We will mine the Kids First Data Resource Portal for deep phenotypic data and compare those with deep phenotypic data elements collected from patients represented in our X01 dataset to guide priorities for extracting, organizing, curating, and harmonizing data elements that would enable phenotype comparisons across cohorts.The newly extracted phenotypic data will be mapped to Human Phenotype Ontology (, where applicable, and all harmonized data will be provided immediately thereafter (estimate: 1 year after award start date) to the Data Resource Center to add to and strengthen the existing data already available in the Kids First Data Resource Portal. All data provided will be de-identified. We will develop tools to facilitate phenotype data mining, extraction, and harmonization procedures and these algorithms and associated metadata and data dictionaries will be provided to the Data Resource Center for sharing with the wider research community. We will mine all datasets available through the portal whose data use limitations allows comparing and combining with datasets of the same phenotype as our project, to include the following: 

    • phs00XXX.v1.p1
    • phs00YYY.v2.p2
    • phs00ZZZ.v3.p3
    • [etc…]

                All dbGaP accession numbers and/or other study identifiers will be cited in any associated publication.  

    5. Since the DRC is charged with processing Kids First data, can we work with them to develop an analysis plan?  
       Applicants are encouraged to communicate with the Kids First Data Resource Center to avoid redundant efforts and to collaborate, analytical approach, tool development or data sharing. To contact them:

    6. If I am proposing to use the R03 to analyze genomic data obtained outside of Kids First, what are the data sharing requirements? 
         For proposals that aim to co-analyze Kids First data with non-Kids First genomic datasets that are currently accessible through an NIH-approved repository (e.g., dbGaP) or some other public controlled access database (e.g., European Genome-phenome  Archive), applicants must describe the database through which the proposed data are accessible to the research community and the details of the dataset including any data use limitations based on the associated consent form.

        For proposals that aim to co-analyze Kids First data with non-Kids First genomic datasets that are not currently accessible through an NIH-approved repository (e.g., dbGaP) or some other public controlled access database (e.g., European Genome-phenome Archive), applicants must describe their ability and willingness to submit the individual-level sequence data to an NIH-approved repository (e.g., dbGaP) and provide an associated Institutional Certification using the current NIH template ( If the Institutional Certification is not available, provide a Provisional Certification and describe the anticipated data use limitations and associated modifiers separately. If submitting a Provisional Certification with the application, please note that a completed Institutional Certification may be required prior to award. Note that an NIH Institute may consider whether and/or how the external genomic dataset can be shared with the broader research community in line with the goals of Kids First and the mission of the corresponding institute, before making a decision about funding the proposal. 

    7. Who can I contact for additional information?  

    Researchers may contact the Program Officer listed on the Notice of Award for the R03 grant, which is available through your NIH eRA user account, or the Scientific/Research Contacts listed in the FOA.  


    This page last reviewed on October 22, 2019