Click the link below to jump to those FAQs
- Kids First Data Sharing
- Accessing Kids First data
- Discovery of the Genetic Basis of Childhood Cancers and of Structural Birth Defects: Gabriella Miller Kids First Pediatric Research Program (X01) (PAR-17-063)
- Small Research Grants for Analyses of Data for the Gabriella Miller Kids First Data Resource (R03) (PAR-16-348)
Why is data sharing important for the Kids First program?
This program is Congressionally mandated to provide resources that will drive discovery in pediatric research (see Gabriella Miller Kids First Research Act). Datasets and resources generated by this program must be made as broadly shareable and accessible as is possible while abiding with informed consent language and protecting participants.
To view a recent presentation about Data Sharing in the Kids First program, click here.
What are the general benefits of data sharing?
- Enables data generated for a given study(s) to be used to explore a wide range of additional research questions
- Increases statistical power by combining separate datasets and increasing sample size
- Allows validation of research results
- Promotes innovation of methods and tools for research
- Facilitates development of improved therapeutic and diagnostic strategies for patients
What is the National Institutes of Health (NIH) Genomic Data Sharing (GDS) policy?
Effective January 25, 2015 the, NIH Genomic Data Sharing Policy (NOT-OD-14-124) replaces the NIH GWAS Data Sharing Policy (NOT-OD-07-088). Under terms and conditions consistent with the informed consent provided by individual participants, the GDS policy seeks to make genomic data broadly available to the research community in a timely manner. Information on the NIH Genomic Data Sharing Policy can be found on:
- NIH GDS homepage: https://osp.od.nih.gov/scientific-sharing/genomic-data-sharing/
- NCI GDS Policy Page: https://www.cancer.gov/grants-training/grants-management/nci-policies/genomic-data
- GDS FAQs: https://osp.od.nih.gov/scientific-sharing/genomic-data-sharing-faqs/
Who can I go to for data sharing questions?
Each NIH Institute and Center has a Genomic Program Administrator (GPA) who serves as a point of contact for GDS Policy implementation within the IC. GPAs involved with the Kids First program are indicated on our Working Group members page: https://commonfund.nih.gov/kidsfirst/members
You can find a full list of a GPAs here: https://osp.od.nih.gov/wp-content/uploads/IC_GPAs.pdf
What is an Institutional Certification and what role does it play in data sharing?
The language of the individual consent forms signed by study participants provides the legal foundation for how controlled-access tier data from enrolled participants can be shared. The Institutional Certification assures that: the data submission is consistent with all applicable national, tribal, and state laws and regulations as well as relevant institutional policies; and that an IRB (or equivalent) has reviewed the investigator’s proposal for data submission and ensures that data submission and subsequent data sharing for research purposes are consistent with the informed consent of study participants from whom the data were obtained, including any data use limitations (DULs) or data use limitation (DUL) modifiers.
The Institutional Certification is submitted to a Genomic Program Administrator (GPA) who uses this to register the study in dbGaP and generate a Data Use Certification (DUC). DULs and DUL modifiers are to be selected only if they reflect the language of the consent form, not PI or even IRB preference. “Preferences” for data use that are not expressly stated in patient consents can be written into other sections of the DUC. Secondary users and their supporting Institution must agree to the conditions of the DUC, when applying to access data (see “FAQs for accessing Kids First data” below).
Please refer to the following resources for more information about consent language, Institutional Certifications, and the dbGaP registration process:
- NIH GDS Policy pdf, 4. Informed Consent and 5. Institutional Certification:
- National Institutes of Health Points to Consider in Drafting Effective Data Use Limitation Statements (in Consent Forms): https://gds.nih.gov/pdf/nih_ptc_in_drafting_dul_statements.pdf
- Institutional Certification Template (note that Data Use Limitations (DULs) and modifiers must only be selected according to the language of the participant consent forms): https://osp.od.nih.gov/scientific-sharing/institutional-certifications/
- dbGaP Registration Flow Chart: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?document_name=HowToSubmit.pdf
What are the data sharing expectations for Kids First projects?
In line with GDS policy, the consent forms should contain language that reflects broad sharing of genomic data. For guidance on developing consent language, visit https://osp.od.nih.gov/wp-content/uploads/NIH_Guidance_on_Elements_of_Consent_under_the_GDS_Policy_07-13-2015.pdf. Projects that allow for the broadest leveling of sharing (i.e. “General Research Use” with no additional restrictions) will be prioritized for sequencing and incorporation into the Kids First Data Resource.
Where are Kids First data being stored?
Individual level sequence data (BAM/FASTQ files) and associated clinical/phenotype data and metadata generated for Kids First cohorts are stored in NIH–approved repositories. Data and metadata from structural birth defect projects are currently stored in the National Center for Biotechnology Information’s (NCBI) Sequence Read Archive (SRA) and/or the database of Genotypes and Phenotypes (dbGaP). Equivalent data for all childhood cancer projects will be stored in the National Cancer Institute’s Genomic Data Commons (GDC).
When will Kids First data be publicly available?
Visit our X01 projects page to see projects that have been released and estimated release dates for pending projects: https://commonfund.nih.gov/kidsfirst/x01projects
How do I access Kids First data?
The first step is to find the Kids First data. In addition to our X01 projects page, you can find a list of all released Kids First projects on our Umbrella BioProject Page: https://www.ncbi.nlm.nih.gov/bioproject/338775. To see the dbGaP pages, go to the “Project Data” section and select the link to the right of “Genotype and Phenotype (dbGaP)” under “Resource Name”.
The next step is to submit a Data Access Request (DAR) through dbGaP for each project: https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login.
Secondary users and their supporting Institution’s Signing Official and IT Director must agree to the conditions of the Data Use Certification (sample agreement: https://osp.od.nih.gov/wp-content/uploads/Model_DUC.pdf), including any DULs or DUL modifiers pertinent to the requested dataset and the Genomic User Code of Conduct (https://dbgap.ncbi.nlm.nih.gov/aa/Code_of_Conduct.html). All internal and external collaborators must be listed on the application, with the exception of technicians, graduate students, and postdoctoral fellows who are under the requestor’s direct supervision. External collaborators are required to submit separate DAR(s) for approved access to the same dataset(s). The DAR(s) will be reviewed by an NIH IC’s Data Access Committee (DAC).
To learn more about the dbGaP data access procedure: https://dbgap.ncbi.nlm.nih.gov/aa/dbgap_request_process.pdf
FAQs for the Discovery of the Genetic Basis of Childhood Cancers and of Structural Birth Defects: Gabriella Miller Kids First Pediatric Research Program (X01) (PAR-17-063) FOA
The Kids First program staff hosted a pre-application webinar for PAR-17-063 on January 23, 2017. The webinar slides are available, and the video recording of the webinar can be viewed by clicking the image below.
What are some major features of PAR-17-063?
- Supports whole genome sequencing (WGS) of existing cohorts to elucidate the genetic contribution to childhood cancers and the genetic etiology of structural birth defects.
- Cancer cohorts will receive genome (WGS), exome (WES), and transcriptome (RNAseq) sequencing of tumor tissue, when it is available, to more accurately elucidate the oncogenic role of variants.
- Investigators with small cohort sizes are encouraged to collaborate with other investigators and pool samples together to increase statistical power.
- Investigators who have probands that have previously undergone WGS and who have unsequenced nucleic acids from their parents/siblings/tumor are encouraged to apply to have those samples sequenced.
- Cohort participants must have given consent to allow sharing of individual-level sequence and relevant phenotype data through an NIH-approved repository (see question 3 below).
- Kids First is requesting that sample information and phenotype data for the proposed cohorts be provided as "Other Attachments". See question 2 below for a downloadable form.
- Kids First highly encourages all investigators with a cohort that meets the minimum requirements to consider applying.
- This list is not exhaustive. Applicants are strongly encouraged to read the funding announcement closely and to contact program staff in case of any questions.
What information is required as "Other Attachments"?
Kids First is asking for information to be summarized and included as attachments. This is described in detail under Section IV. Application and Submission Information under the subheading SF424(R&R) Other Project Information. Applicants must include:
- Institutional Certification (a provisional Institutional Certification could be used in a situation where the IRB has not completed its review of the protocol and therefore the institution cannot attest to all of the elements of the formal Institutional Certification). For institutional and/or provisional certifications, please use the current template: https://osp.od.nih.gov/scientific-sharing/institutional-certifications/
- Sample Information, including type (e.g., DNA, RNA) and tissue source and other details
- Description of Phenotype Data that is available
- Optional – Family Structure
Kids First has developed a downloadable form that applicants can use to describe their samples and phenotype data. While applicants are required to provide this information, the use of this form is optional. Applicants may submit the required information in whatever format meets their individual purposes as long as it provides, at a minimum, the information requested in the FOA.
Do the cohorts have to be properly consented before applying for the X01?
Participants in cohorts selected under this FOA must have given consent to allow sharing of individual-level genome sequence and relevant phenotype data through dbGaP or other NIH-approved repositories. Applicants must provide documentation of this by submitting an Institutional Certification (or Provisional Certification) that covers all sites contributing samples, as an attachment (see question 2 above).
Cohort samples that have consents allowing for broad data sharing (e.g. for general research purposes with no data use limitation modifiers) will be considered highest priority. No funds will be provided for obtaining new consent for existing samples. Consent to re-contact participants for additional phenotyping or collection of additional samples is strongly encouraged. Applicants are required to describe any data use limitations.
For research teams planning to start recruiting patients and/or collecting samples for a future application to the X01 program, please see FAQs for Kids First Data Sharing for more information.
If investigators have already registered a project in dbGaP, and are seeking WGS through Kids First for samples from the same cohort, is a new Institutional Certification required?
As long as the institutional certification for the registered project complies with NIH Genomic Data Sharing policy and covers all of the samples that will receive WGS through Kids First, a new certification is not required. We strongly advise working with the Genomic Program Administrator from the relevant institute (https://gds.nih.gov/pdf/ic_gpas.pdf) to ensure that the appropriate samples are covered by the existing registration.
Is it important to know the source of the DNA for samples being submitted for WGS through Kids First?
It is important to know the source of the DNA for samples provided to Kids First sequencing centers. We ask that applicants provide a description of the samples, such as collection site; number of samples included in the study; a detailed inventory of the sources of the DNA (e.g., number of samples from blood, number of samples from saliva); and previous genotyping or sequencing. DNA from fresh/frozen blood or tissue is ideal for sequencing, as DNA from saliva can be contaminated with microbial DNA, which may result in higher costs (and therefore reduce the number of total samples that can be sequenced). Cell lines are also usually a poor choice because they often have significant genomic differences compared to the original germline which could complicate analysis. There are circumstances where studies might include induced pluripotent stem cells (iPSCs), but even then, a normal sample for comparison may be desirable.
It seems that no funds will be awarded to investigators but a detailed analytic plan is requested. Given that is the case, are investigators expected to obtain funds to support analysis separately?
There are no direct funds available under the PAR-17-063 to support analysis of sequencing data; however, in collaboration with X01 recipients, up to 2% of the sequencing center’s budget can be used to perform custom analyses and validation of a selected set of variants after the generation of Variant Call Format (VCF) files.
The request for applicants to provide an analysis plan is intended to increase the likelihood that the samples to be sequenced are of high quality, that the number of specimens is appropriate for the stated aims, and that those submitting X01 applications will be prepared to do the analyses. Note that the sequencing data will be released to a NIH data repository (e.g., the NCBI Sequence Read Archive or the NCI Genomic Data Commons) according to the NIH Genomic Data Sharing Policy (http://gds.nih.gov/03policy2.html). Those investigators providing the samples are likely to have a significant advantage in conducting analyses, both because they are familiar with the samples and because they will be interacting directly with NIH and the sequencing center throughout the process.
Are there other opportunities for obtaining analysis funding?
A funding opportunity, “Small Research Grants for Analyses of Data for the Gabriella Miller Kids First Data Resource (R03)”, supported by six (6) NIH Institutes (see below) is soliciting applications intended to promote meritorious research projects focused on analyses of childhood cancer and/or structural birth defects datasets that are or could be included in the Kids First Data Resource. Development of statistical methodology appropriate for analyzing genome-wide data relevant to childhood cancer and/or structural birth defects may also be proposed.
Participating Institutes in the funding announcement:
Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD); National Cancer Institute (NCI); National Institute on Alcohol Abuse and Alcoholism (NIAAA); National Institute of Dental and Craniofacial Research (NIDCR); National Institute of Neurological Disorders and Stroke (NINDS); National Heart, Lung, and Blood Institute (NHLBI)
Is it possible to submit an application with multiple PIs from different Institutions in order to meet the requirement for at least 100 structural birth defect trios, or in order to create a larger, more compelling cohort? Alternatively, is it possible to reach the minimal sample number by adding trios with a different structural birth defect?
Efforts to increase sample number by collaborations across institutions are acceptable and encouraged. Structural birth defect cohorts, unless scientifically justified, are expected to have a minimum sample set of 100 trios. Increasing sample numbers by aggregation across related conditions is acceptable. However, applicants doing this should be prepared to provide a coherent description of the analysis that will be performed across the aggregated cohort, and it may be easier to do this for sets of samples with related phenotypes or suspected underlying pathways. In addition, investigators should state how aggregating samples won’t slow the process of sending samples to the sequencing center. Exceptional justification will be required for cohorts for which fewer than 100 trios are proposed.
What is the minimum sample size for cancer cohorts?
For cancer cohorts, there is no prescribed minimum sample size, although larger cohorts are encouraged. The sample size proposed for cancer cohorts must be justified by its adequacy in allowing reliable conclusions to be drawn about the contribution of genetic or genomic alterations to the condition under study. Efforts to increase sample number through collaborations across institutions are acceptable and encouraged.
Can we propose more than 100 trios or samples for sequencing?
In the Funding Opportunity Announcement, it mentions that there should be a minimum of 100 trios. Is there also a maximum that will be considered? Our combined cohorts for example have nearly 5000 trios.
We encourage the submission of a large number of trios, but ask that the samples be organized into tranches that make analytic/scientific sense to provide flexibility in the review process. The available budget for sequencing services associated with this FOA allows for roughly 4,000 genomes total. Depending on the quality and number of applications received, the Kids First program management will determine how many total samples (adhering to a minimum of 100) each X01 recipient will get to sequences.
Additionally, applicants who propose sequencing large numbers of samples should describe their capacity and plan to prepare such a large number of samples for sequencing within the year timeframe.
Should we propose quality metrics for the genome sequencing?
No, this is not necessary. You should note the quality of the samples being proposed for submission.
Will agreeing to share additional genomic data through the Kids First Data Resource be looked upon favorably?
Investigators with existing childhood cancer and/or structural birth defect genomic data are encouraged to submit these data to the NCBI Sequence Read Archive or the NCI Genomic Data Commons to be part of the forthcoming Kids First Data Resource. Willingness to share additional genomic data through the Kids First Data Resource will likely be looked upon favorably. One way that this may be able to be incorporated into the application is in the analysis plan. This could be done if the additional genomic data adds power to the analyses that are planned using data from the cohort that the PI is proposing. For instance, if the plan is to submit samples that have already undergone exome sequencing, then sharing the exome data might provide additional value to the cohort. Aggregating larger amounts of data through Kids First will strengthen our goal of driving comprehensive and cross-cutting research.
Will the sequencing center provide VCF files or raw BAM files?
The sequencing center will generate and transfer Variant Call Format (VCF) and BAM files. A BAM file (.bam) is the binary version of a SAM file. A SAM file (.sam) is a tab-delimited text file that contains sequence alignment data. In some cases, FASTQ files will also be delivered to investigators.
Do applicants need to describe the capacity to store BAM files?
As part of the data management plan, it is important to make clear that your team has the capacity (including equipment, security infrastructure, and physical resources) at your institution to securely accept and store large data files (investigators will receive BAMs and/or FASTQs as well as VCFs).
Data may be stored/hosted on local cloud-based platforms. For more information see “NIH Security Best Practices for Controlled-Access Data Subject to the NIH Genomic Data Sharing (GDS) Policy”.
Although the maximum project period is 1 year, could one propose to sequence 70 trios now and then add 50 trios next year after additional collections?
All samples must be extracted, properly consented, and ready to send off to the sequencing center shortly after the review date. Please refer to the FOA for a more detailed timeframe.
Who is responsible for data deposition?
The sequencing center is responsible for deposition of the sequence data into a NIH data repository (e.g., the NCBI Sequence Read Archive or the NCI Genomic Data Commons). The study Principal Investigator will be responsible for directly depositing the clinical/phenotypic data with the NIH through dbGaP or the NCI Genomic Data Commons.
For tumor specimens, is there an opportunity for applying whole genome sequencing (WGS) to DNA extracted from formalin-fixed paraffin-embedded (FFPE) tissue?
Fresh frozen samples for tumors are preferred. However, proposals that include FFPE samples will be accepted. If such a proposal is successful in review, there may be technical issues to resolve before good results can be obtained, but we are willing to be flexible.
What amount and concentration of DNA will be required and what will be the coverage?
Whole genome sequencing (WGS) of germline DNA will be done at 30X mean coverage using paired end sequencing. Tumors will be sequenced at 30X mean coverage using paired end sequencing combined with whole exome sequencing (WES) and RNA sequencing both at 100X also using paired end sequencing. If tumor RNA is not available, the sequencing center staff will work with each project to determine the best coverage and approach for sequencing and analysis of tumors.
|Amount of DNA/RNA and coverage|
|Amount DNA or RNA required/recommended||Concentration||Coverage||Additional info.|
|WGS||~2ug DNA||20-50 ng/ul preferred||30X||paired end reads|
|WES||275 ng DNA (minimum); 1 ug recommended||20 ng/ul (minimum)||100X, greater than 80% coding exons covered at 20X||paired end reads|
|RNA-Seq||750 ng total RNA (minimum); 1 ug recommended||20 ng/ul (minimum)||100X, greater than 40% coding exons covered at 20X||paired end reads|
Can I propose long-read whole genome sequencing?
Yes. Both of the designated Kids First sequencing centers, the Broad Institute and the HudsonAlpha Institute for Biotechnology in collaboration with St. Jude Children’s Research Hospital, offer long-read WGS services using the 10X Genomics platform. However, the benefits of using this technology for birth defects and pediatric cancer cohorts are not yet well understood. Additionally, long-read WGS is more expensive than standard WGS and results in fewer samples being sequenced and loss of statistical power. (For example, for a fixed set of funding, approximately 15-20% fewer samples can be sequenced using long-read WGS compared to standard WGS.) Thus, applicants should justify the added benefits that long-read WGS will provide versus standard WGS for their particular cohort.
What biospecimen information and phenotype data elements are expected for childhood cancer projects?
Childhood cancer data will be submitted to NCI’s Genomic Data Commons (GDC), which requires submission of certain biospecimen and clinical/phenotype data. For phenotype data, the following clinical data elements (CDE) are expected, where available:
Gender, Race, Ethnicity, Age at Diagnosis in Days, First Event, Event Free Survival Time in Days, Vital Status, Overall Survival Time in Days, Year of Diagnosis, Year of Last Follow Up, Treatment Protocol
For templates and additional resources related to information required or suggested for the GDC visit: https://docs.gdc.cancer.gov/Data_Dictionary/viewer/
FAQs for Small Research Grants for Analyses of Data for the Gabriella Miller Kids First Data Resource (R03)(PAR-16-348)
How can data and analytic pipelines from the R03 be shared with the pediatric research community?
Since the goal of the Kids First program is to facilitate new discoveries and novel ways of thinking about childhood cancers and structural birth defects, researchers funded through PAR-16-348 to analyze Kids First data or data that can be added to the Kids First Data Resource are encouraged to share the results of their analyses with the pediatric research community through the Kids First Data Resource. During the period when the Kids First Data Resource is being funded and the Kids First Portal is under construction, researchers funded by PAR-16-348 should contact their Program Officer if they have questions about how to submit data analyses and analytic pipelines to the Data Resource. Program Officer contact information is available on the Notice of Award for the R03 grant available through your NIH eRA user account
This page last reviewed on August 9, 2017