Kids First Program Frequently Asked Questions (FAQs)
FAQs for Kids First Data Sharing
1.Why is data sharing important for the Kids First program?
This program is Congressionally mandated to provide resources that will drive discovery in pediatric research (see Gabriella Miller Kids First Research Act). Datasets and resources generated by this program must be made as broadly shareable and accessible as is possible while abiding with informed consent language and protecting participants.
In accordance with NIH’s mission and the Kids First program’s goals, increasing accessibility to data through broad sharing practices empowers researchers and accelerates scientific progress that can lead to improved diagnostic capabilities and targeted therapies.
2. What are the general benefits of data sharing?
- Enables data generated for a given study(s) to be used to explore a wide range of additional research questions
- Increases statistical power by combining separate datasets and increasing sample size
- Allows validation of research results Promotes innovation of methods and tools for research
- Facilitates development of improved therapeutic and diagnostic strategies for patients
3. What is the National Institutes of Health (NIH) Genomic Data Sharing (GDS) policy?
Effective on January 25, 2015 the NIH Genomic Data Sharing Policy (NOT-OD-14-124) replaces the NIH GWAS Data Sharing Policy (NOT-OD-07-088). Under terms and conditions consistent with the informed consent provided by individual participants, the GDS policy seeks to make genomic data broadly available to the research community in a timely manner. Information on the NIH Genomic Data Sharing Policy can be found on:
4. What is an Institutional Certification and what role does it play in genomic data sharing?
Individual consent forms signed by study participants are the legal foundation for how genomic data from enrolled participants can be shared through dbGaP. Institutional Certifications assure that:
The Institutional Certification is submitted to a Genomic Program Administrator (GPA) who uses this to register the study in dbGaP and generate a Data Use Certification (DUC). Data Use Limitations reflect the language of the consent form and not PI or IRB preferences. Secondary users and their supporting Institution must agree to the conditions of the DUC, when applying to access data (see “FAQs for accessing Kids First data” below).
5. What is the process for obtaining an Institutional Certification?
We suggest that applicant PIs obtain Institutional Certifications following these steps:
1) Download the current NIH Institutional Certification template from: https://osp.od.nih.gov/scientific-sharing/institutional-certifications/
2) Fill out the first page of the Institutional Certification to include the sites that would contribute samples for sequencing. One document can list multiple sites; alternatively, multiple Institutional Certifications, one for each site, can be submitted.
3) Provide the Institutional Certification to the IRB, or equivalent body, along with the participant consent forms for each site and any other pertinent information (e.g. protocols), to complete the second and third pages:
a. On the top of second page, it is anticipated that the individual-level genomic data will be made available through controlled-access. Regarding “genomic summary results (GSR),” this box is to be left unchecked, unless unrestricted access to GSR is not permitted due to the study’s designation as “sensitive” by the institution. Please note that it is not anticipated that a “sensitive” designation will apply to current Kids First studies; therefore, GSR from Kids First data would not require controlled-access.
b. The lower section of the second page addresses “genomic summary results (GSR).” This box is to be left unchecked, unless unrestricted access to GSR is not permitted due to the study’s designation as “sensitive” by the institution. Please note that it is anticipated that unrestricted access to GSR will be appropriate for the majority of Kids First genomic datasets. For additional information see “Update to NIH Management of Genomic Summary Results Access” (https://grants.nih.gov/grants/guide/notice-files/NOT-OD-19-023.html).
4) On the third page, the IRB, or equivalent body, is to select the appropriate data use limitations (DULs) and/or DUL modifiers based on the language of each site’s consent form. Unless the intent of the consent form language is determined to prohibit specific uses of the data generated from the samples collected from the participants, it is expected that the dataset will be designated as “General Research Use (GRU)”. Please note that cohorts with data use limitations and/or modifiers that impede the ability to access, use, combine, or cross-analyze data will not be prioritized for sequencing by the Kids First program (e.g., datasets consented for disease-specific research only, datasets that require a letter of collaboration (“COL”), or datasets that require local “IRB” approval).
5) Finally, the Institutional Certification needs to be counter-signed by the applicant PI and the Institution Signing Official who is authorized to enter the institution into a legally binding contract and sign on behalf of the investigator who plans to submit the data to NIH, e.g. Dean, Vice President for Research.
An Institutional Certification must be provided with an application to the Kids First X01 sequencing opportunity; a Provisional Certification is acceptable if there is not adequate time to obtain a full Institutional Certification before submitting the application. However, approval to access the Kids First X01 sequencing capacity is conditional on the submission of a full Institutional Certification covering all samples to be submitted for sequencing. Cohort selection will be based, in part, on the Kids First program’s expectation for broad data sharing (i.e. General Research Use).
6. What are the genomic data sharing expectations for Kids First projects?Consistent with the NIH Genomic Data Sharing Policy (NOT-OD-14-124), consent forms should contain language that reflects broad sharing of genomic data. Additionally, Kids First takes seriously its responsibility to ensure data can be broadly accessed, used, combined, and/or cross-analyzed across childhood cancer and structural birth defects. Projects that allow for the broadest leveling of sharing (i.e. “General Research Use” with no additional restrictions) will be prioritized for Kids First support (i.e., the X01 sequencing opportunity). The following data use consent groups and modifiers limit broad data access and impede the ability of the Kids First program to accomplish its goals.
- Disease Specific Consent Group: When data use is restricted to a specific disease area, the data cannot be combined with a dataset with a different disease specific data use limitation. Combining and cross-analyzing datasets are a primary goal of Kids First and therefore datasets that are consented for General Research Use and/or Health/Medical/Biomedical purposes will be prioritized over datasets restricted to Disease Specific use.
- IRB modifier: With this box checked, the Requester must provide documentation of a their local IRB’s approval for the proposed research when submitting a Data Access Request (DAR). We find that it is rare for consent language to include such a requirement and that this modifier is often included in error. As a reminder when submitting a Data Access, every requester and their institution must agree to the terms of the Data Use Certification (DUC), which verifies that the requesting PI is accredited within the institution, the institution is aware of the project for which the PI is proposing to use the data, and that the Institution has all appropriate security measures in place to manage and maintain the controlled-access dataset(s) being retrieved. For a sample DUC, see: https://osp.od.nih.gov/wp-content/uploads/Model_DUC.pdf
- COL modifier: This box is checked when the consent form states that collaboration with the original/submitting investigator is required in order to use the dataset; therefore, the Requestor must provide a collaboration agreement document in order to be approved for access the dataset. This can limit the number of end-users who are able to use the dataset.
Please note that under the recent guidance, “Update to NIH Management of Genomic Summary Results Access (NOT-OD-19-023)”, it is anticipated that unrestricted access to Genomic Summary Results will be appropriate for the majority of Kids First genomic datasets (i.e.the new box on page 2 should remain unchecked).
7. Where can I find additional resources about genomic data sharing?
Please refer to the following resources for more information about Genomic Data Sharing, consent language, Institutional Certifications, and the dbGaP registration process:
•NIH Office of Science Policy: NIH Genomic Data Sharing: https://osp.od.nih.gov/scientific-sharing/genomic-data-sharing/
•NIH GDS Policy pdf, 4. Informed Consent and 5. Institutional Certification:
https://osp.od.nih.gov/wp-content/uploads/NIH_GDS_Policy.pdf
•NIH Guidance on Consent for Future Research Use and Broad Sharing of Human Genomic and Phenotypic Data Subject to the NIH Genomic Data Sharing Policy: https://osp.od.nih.gov/wp-content/uploads/NIH_Guidance_on_Elements_of_Consent_under_the_GDS_Policy_07-13-2015.pdf.
•National Institutes of Health Points to Consider in Drafting Effective Data Use Limitation Statements (Institutional Certifications): https://osp.od.nih.gov/wp-content/uploads/NIH_PTC_in_Developing_DUL_Statements.pdf
•NHGRI: The Informed Consent Resource: https://www.genome.gov/27565449/the-informed-consent-resource/
•Points to Consider for Institutions and Institutional Review Boards in Submission and Secondary Use of Human Genomic Data under the National Institutes of Health Genomic Data Sharing Policy: https://osp.od.nih.gov/wp-content/uploads/GDS_Points_to_Consider_for_Institutions_and_IRBs.pdf
•Institutional Certification Template (note that Data Use Limitations (DULs) and modifiers must only be selected according to the language of the participant consent forms): https://osp.od.nih.gov/scientific-sharing/institutional-certifications/
•dbGaP Registration Flow Chart: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?document_name=HowToSubmit.pdf
8. Who can I contact for additional information and questions regarding data sharing?
Jaime M. Guidry Auvil, Ph.D.
Genomic Program Administrator (GPA)
Director, NCI Office of Data Sharing
NCIOfficeofDataSharing@mail.nih.gov
Vivian Ota Wang, Ph.D.
Kids First Data Access Committee (DAC) Chair
NCI Office of Data Sharing
KidsFirstDAC@nih.gov
General NIH Genomic Data Sharing questions: GDS@mail.nih.gov
dbGaP (NCBI) helpdesk: dbgap-sp-help@ncbi.nlm.nih.gov
FAQs for Kids First Data Sharing FAQs for Kids First Data Sharing
FAQs for Accessing Kids First Data
1. Where can I access Kids First data?
Individual level sequence data (BAM/FASTQ/VCF files) and associated clinical/phenotype data and metadata generated for Kids First cohorts can be accessed through the Kids First Data Resource Portal (to learn more, visit https://kidsfirstdrc.org/support/studies-and-access/). Before accessing individual level genomic sequence data, you will need to submit a Data Access Request through dbGaP for approval from the NIH Kids First Data Access Committee (see FAQ #3 below).
Some genomic datasets from structural birth defect projects are currently stored in the National Center for Biotechnology Information’s (NCBI) Sequence Read Archive (SRA), but all datasets can be accessed through the Kids First Data Resource Portal and require dbGaP approval.
2. When will Kids First data be publicly available?
Kids First X01 datasets are scheduled to be released to the public via dbGaP six months after the X01 investigator team receives access to the sequence data. Sometimes this “pre-release period” can be longer than six months due to procedural delays, but the data will not be released prior to six months unless specifically requested by the X01 PI.
Visit our X01 projects page to see projects that have been released and estimated release dates for pending projects: https://commonfund.nih.gov/kidsfirst/x01projects
3. How do I access Kids First data?
The first step is to find the Kids First data. You can find Kids First datasets at the following links:
- Directly search Kids First and other interoperable datasets in the Kids First Data Resource Portal: https://portal.kidsfirstdrc.org/
- The Kids First X01 projects page lists all projects selected for sequencing (including those that have not yet been released): https://commonfund.nih.gov/kidsfirst/x01projects
- Learn more at the Kids First DRC’s Studies & Access Page: https://www.notion.so/d3b/Studies-and-Access-a5d2f55a8b40461eac5bf32d9483e90f
The next step is to submit a Data Access Request (DAR) through dbGaP for each project: https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login. Secondary users and their supporting Institution’s Signing Official and IT Director must agree to the conditions of the Data Use Certification (sample agreement: https://osp.od.nih.gov/wp-content/uploads/Model_DUC.pdf), including any DULs or DUL modifiers pertinent to the requested dataset and the Genomic User Code of Conduct (htps://dbgap.ncbi.nlm.nih.gov/aa/Code_of_Conduct.html).
All internal and external collaborators must be listed on the application, with the exception of technicians, graduate students, and postdoctoral fellows who are under the requestor’s direct supervision. External collaborators from other institutions are required to submit separate DAR(s) for approved access to the same dataset(s). The DAR(s) will be reviewed by the NIH Kids First Data Access Committee (DAC), which is run out of the NCI Office of Data Sharing.
To learn more about the dbGaP data access procedure, visit: https://dbgap.ncbi.nlm.nih.gov/aa/dbgap_request_process.pdf and watch a presentation about requesting access to genomic datasets through dbGaP at https://www.youtube.com/watch?v=39cba0gF2tw&t=3s.
Once you have dbGaP approval, you can learn how to push data from the Kids First Data Resource Portal to the CAVATICA analysis platform here: https://kidsfirstdrc.org/support/analyze-data/
4. Who can apply to access individual level sequence data from dbGaP?
For extramural researchers, the Principal Investigator (PI) must be a tenure-track professor, senior scientist, or equivalent, to be able to submit a data access request (DAR) and have a valid NIH eRA Commons account for logging in to the dbGaP system. Please see here for more about how to setup a new eRA Commons account or how to make changes to an existing eRA Commons account.
FAQs X01 Cohort Selection for Sequencing
1. What information should be included on the shipping manifest?
Please include the following information on the shipping manifest (column headers):
Participant ID
Sample ID
Aliquot ID
Composition
Tissue Type
Anatomical Site
Age at Collection
Tumor Descriptor
Analyte Type
Concentration
Volume
You may add any additional fields relevant to the biospecimens and/or the operational needs of shipping (e.g. well/box location).
You may download this spreadsheet as a starting point. Please contact the DRC for recommendations or questions.
Please include support@kidsfirstdrc.organd valerie.cotton@nih.gov when emailing shipping manifests to the sequencing center.
2. What clinical and phenotypic information do X01 investigators need to submit to the DRC in order to be approved for access to the genomic dataset?
It is expected that each X01 group will provide the clinical and phenotypic described in the original X01 proposal to the DRC for sharing with the broader research community upon release of the dataset. Kids First strongly encourages the submission of detailed/deep clinical and phenotypic data, including longitudinal data and family histories. Please provide the information described in the "Clinical Phenotype Data Element" spreadsheet for the DRC and program staff to review:
The DRC will accept this information in another format, such as the REDCap dbGaP submission files, as long as all the necessary information is provided.
Please contact support@kidsfirstdrc.org to discuss the best format for submitting further information.
Upon receipt of the required information, Kids First NIH program staff will work with the DRC and/or sequencing centers to enable the X01 team to have access to the associated sequence data. Once the investigator team has access to the sequence data, they have six months of proprietary access before it is released to the public.
The DRC is working closely with investigators who have expertise in specific areas to address how to best capture clinical and phenotypic data moving forward. If you have an interest in engaging in this process or providing suggestions, please contact support@kidsfirstdrc.org or visit kidsfirstdrc.org.
3. How will the DRC harmonize phenotypes across Kids First projects? Which data ontologies will be used?
The DRC is leveraging existing community standards to harmonize clinical and phenotypic data which facilitates searching, analysis, and interoperability with other data efforts. If you are currently collecting phenotypic data or working to map such data to existing standards, we suggest you use one of the following ontologies, since these are what the DRC plans to use for phenotype harmonization:
- For structural birth defects: Human Phenotype Ontology (http://human-phenotype-ontology.github.io/)
- For childhood cancers: NCI Thesaurus (https://ncit.nci.nih.gov)
Also recommended:
Uberon (https://github.com/obophenotype/uberon) for tissue/anatomy, including but not limited to tumors.
Monarch Disease Ontology (MONDO, http://obofoundry.org/ontology/mondo.html)
ICD-10 (http://www.who.int/classifications/icd/en/)
Other helpful resources:
Ontology Lookup Service: https://www.ebi.ac.uk/ols/index
The DRC is working closely with investigators who have expertise in specific areas to address how to best capture clinical and phenotypic data moving forward. If you have an interest in engaging in this process or providing suggestions, please contact support@kidsfirstdrc.org or visit kidsfirstdrc.org.
4. What should X01 investigators include in their acknowledgement statement when publishing research findings from Kids First generated data?
In addition to listing the PHS Accession Number(s) of the datasets used for a particular analysis and the databases from which they are accessible to the research community, X01 investigator teams (i.e. “Contributing Investigator(s)”) are asked to describe support for the project, including NIH grant numbers.
A sample statement for the acknowledgment of Kids First dataset(s) follows:
The results analyzed and <published or shown> here are based in whole or in part upon data generated by Gabriella Miller Kids First Pediatric Research Program (Kids First) projects <insert phs accession number(s)>, and are accessible through from the Kids First Data Resource Portal (kidsfirstdrc.org) and/or dbGaP (www.ncbi.nlm.nih.gov/gap). Kids First was supported by the Common Fund of the Office of the Director of the National Institutes of Health (www.commonfund.nih.gov/KidsFirst). The <insert Kids First Sequencing Center> was awarded a U24 (<enter grant number>) to sequence [childhood cancer and/or structural birth defect cohort samples] submitted by investigators through the Kids First program (<enter X01 grant number>). Additional funds from <enter relevant NIH institute grant number(s)> supported the assembling of the cohorts, and the collection of the phenotypic data and samples, and/or data analysis. Contributing investigators include: <enter names>*.
*If there are many collaborators/consortium members, you can use a ‘corporate authorship’ with a link to a website that lists everyone.
Kids First Sequencing Center Grants | |
---|---|
Sequencing Center | Grant Number |
BROAD INSTITUTE | U24 HD090743-01 |
HUDSON-ALPHA INSTITUTE FOR BIOTECHNOLOGY | U24 HD090744-01 |
BAYLOR COLLEGE OF MEDICINE | 3U54HG003273-12S1 |
WASHINGTON UNIVERSITY | 3U54HG003079-12S2 |
5. What should secondary users (a.k.a. “end-users” or approved data requestors) include in acknowledgement statements when publishing research findings from Kids First generated data?
Secondary users, or “end users”, must acknowledge all datasets used in a publication or analysis by listing all relevant dbGaP PHS Accession Numbers, as well as the urls of the databases where the datasets were accessed. The Data Use Certification (DUC) agreed to by secondary users outlines how to use and acknowledge each approved dataset.
6. Are there opportunities for collaborating with other efforts for functional validation of variants?
- KOMP2 is receptive to considering genes of interest identified through X01 analyses as candidates for targeting by KOMP2 Centers. This includes reviewing the literature for existing models, prioritizing specific genes for generating new knockout mice, and mapping resulting phenotypes to animal model ontologies All KOMP data are publicly available at www.mousephenotype.org. If you are interested in collaborating with KOMP, please contact:KidsFirstKomp@nih.gov
- Projects that fall within the categorical interests of two or more NIH institutions/centers, may also consider applying for ORIP’s R21 program announcement for Development of Animal Models and Related Biological Materials for Research (PA-16-141). Investigators considering applying to PA-16-141 are strongly encouraged to consult with ORIP program staff (see Scientific/Research Contacts in Section VII. Agency Contacts) to be advised whether their research plans are appropriate for this FOA.
- Researchers interested in exploring the gene by environment interactions of conditions such as craniofacial diseases may be interested in these funding opportunity announcements:
- Mechanistic Studies of Gene-Environment Interplay in Dental, Oral, Craniofacial, and Other Diseases and Conditions (R01) (PAR-19-292)
- Development of Novel and Robust Systems for Mechanistic Studies of Gene-Environment Interplay in Dental, Oral, Craniofacial, and Other Diseases and Conditions (R21) (PAR-19-293)
This page last reviewed on January 5, 2021