of Program Coordination, Planning, and Strategic Initiatives
Title of proposed idea: Molecular Phenotypes for Genome Function and Disease (see "Beyond Genome-Wide Association Studies (GWAS)" in Innovation Brainstorm ideas)
Nominator: NIH Institutes/Centers
Major obstacle/challenge to overcome: Understanding how the human genome functions and how it is influenced by genetic variation in health and disease are major challenges of wide interest across NIH. The Innovation Brainstorm meeting suggested this area in “Beyond GWAS”: “Establish a functional genome project that leverages functional information to find causal variants — employing ENCODE, epigenomics, and functional genomics strategies”. Several projects are addressing pieces of these challenges but none in the comprehensive manner required. GWAS studies have found thousands of human genomic regions associated with disease, but definitively identifying which genomic variants and elements in these regions are causal, rather than simply correlated, is a major challenge for the field. Mapping GWAS hits to functional elements catalogued by ENCODE and other efforts are providing some insights, but determining the causal links and understanding the mechanistic underpinnings are still very difficult with current resources.
Several critical gaps exist, including limited knowledge of variability between individuals for a range of molecular phenotypes; the correlations in molecular phenotypes across tissues; variability in somatic genomic changes/mosaicism among tissues within individuals; the influence of environmental exposures (e.g., diet, toxins, stress) on molecular phenotypes; and the molecular phenotypes of cell types in vivo. Furthermore, integration of data across these and other projects (ENCODE, CF Epigenomics, CF GTEx, etc.) and with GWAS and other disease studies is lacking.
The field needs experimentally tractable systems to generate integrated and comprehensive data resources to study gene function and how genetic variation leads to differences in function and disease.
Emerging scientific opportunity ripe for Common Fund investment: Recent improvements in high-throughput molecular assays and the availability of rich model organism resources provide an opportunity to interrogate gene function in vivo at an unprecedented level of detail. The cost of this project is much lower than it would have been even a few years ago since many of the technologies for molecular phenotyping, such as RNA-seq, ChIP-seq, and DNase-seq, are based on sequencing, the cost of which continues to decline rapidly.
This project would be synergistic with existing and new projects, some of which are already supported by the Common Fund and by ICs:
Common Fund investment that could accelerate scientific progress in this field: The Common Fund could invest in the generation and analysis of multiple molecular phenotypes in model systems such as mice, rats, and flies. This resource would include measurement of gene expression and multiple additional molecular phenotypes (epigenomic marks, chromatin accessibility, transcription factor binding sites, etc.) in completely sequenced strains of model organisms. Using model organisms would allow access to a full range of tissues in different developmental, environmental, and disease states. The mouse Collaborative Cross (CC) and Knock-out Mouse Project (KOMP) are two resources upon which one could build this project, but they are not the only ones. The data set would show the correlations among the molecular phenotypes across tissues, to allow predictions based on the more accessible tissues.
The product of this project would be a public data resource to support work to interpret how variants, genomic elements, environmental factors, and molecular phenotypes are related, as well as proof-of-principle examples for predictive models of gene function. With this resource one could predict which genes and genomic elements are causal for phenotypes and how the elements interact. Experiments could test these predictions and determine the response to additional genetic or environmental perturbations in vivo. Relevance to humans could then be examined with focused studies, using resources like the Common Fund Genotype-Tissue Expression (GTEx) project. For example, mouse studies might show that correlated pancreatic, liver, and muscle chromatin states are associated with risk for Type 2 diabetes, in particular genetic strains and dietary environments. These states and associations could then be examined in humans with efficient, narrowly focused, molecular studies in the relevant tissues and donors.
Many strains, cell types, and developmental stages, in a range of environments (such as various diets, smoking, environmental toxins, sun, and psychosocial stress) could be studied. The molecular phenotypes that would be surveyed in the model organisms include:
The data would be freely available to the scientific community. The project would also require the development of improved computational analysis methods for integrating the multiple data types, predicting functional elements, and understanding how variation in function arises from sequence differences. Although the main data production effort would be generating the sequence and molecular phenotypes, some pilot projects would focus on using these data to predict which genomic elements are causal for some diseases or traits that are shared by humans and the model organisms.
This proposal is related to, but distinct from, the GTEx, ENCODE, and CF Epigenomics projects. While GTEx directly studies human tissues, it has limitations on the ability to control post-mortem effects, a limited range of developmental stages that can be studied, and an inability to control and manipulate environmental and genetic factors. The animal models proposed here, on the other hand, allow great flexibility to control and manipulate the genomes and environment in many animals, in order to identify mechanistic relationships between the genome and multiple phenotypes. The ENCODE and CF Epigenomics efforts are focused on developing the reagents and standards for characterizing functional elements in the genome and cataloging them in a small set of reference cell lines and tissues. The project proposed here leverages these efforts by applying them to experimental organisms in which to make causal inferences and testable hypotheses of genome function, by looking at a large set of tissues in many individuals, developmental stages, and in several environments. This proposed project is much more extensive and comprehensive than the current reference projects.
Possible extensions to this project:
This project could expand to include:
Potential impact of Common Fund investment: This project would produce a valuable resource of data sets and tools for understanding genome function, disease biology, and risk prediction in experimentally manipulable systems. Having these data sets in model organisms would allow researchers to study which genomic elements are mechanistically causal, not just correlated, for how the genome brings about phenotype. Once causal mechanisms in the model organisms are discovered, focused studies in humans could be carried out to test the predictions. Knowing the causal genomic elements and variants would allow researchers to study how they function in health and disease, to make accurate risk predictions, and, to develop therapies based on this mechanistic understanding.
Tags: genetics/genomics, epigenomics, molecular phenotype, animal, model organism, database, data integration
Title of proposed idea: Meeting the Challenge of Big Data in Biomedical and Translational Science (see “Cross-Cutting Issues in Computation and Informatics” in Innovation Brainstorm ideas)
Major obstacle/challenge to overcome: The complexity of human biology in health and illness is increasingly being taken into account by research design, with individual studies collecting genomic, image, biosensor, and clinical data, along with information about sociocultural and environmental factors. And, these large amounts of diverse data are almost always collected in digital form. Thus, modern biomedicine is confronted at once by great opportunity and great challenge. The opportunity presented by collecting multiple measures is to understand disease and gain insight to its prevention, treatment and cure, from a broad, encompassing perspective more likely to bear fruit than from studies limited to a small number of measures. The opportunity presented by collecting digital data is the ability to share, compare, reaggregate, reuse, and integrate data, as well as to use these data for models and simulations in ways that have been heretofore impossible. The challenge, however, is to be able to organize, present, analyze and manage these data to fully realize such opportunities. The challenge is one of “big data,” where handling and working with complex data at large scale is both quantitatively and qualitatively different than at a smaller scale.
Emerging scientific opportunity ripe for Common Fund investment: As the translation of biomedical research results into improved human health accelerates, and as the diversity of clinically-relevant measures grows to include those of basic biology, new approaches to big data, drawing from information science, informatics, computer science, and computational biology, must be developed and used to maximize the return on the research investment. Advancing the science of big data and developing associated tools requires test-beds to stimulate and shape conceptual progress and its reduction to practice. While this has happened in other fields, such as astronomy (which has benefited greatly as a consequence), a concerted effort to move these ideas and tools forward has not yet been made in translational biomedicine. The number, size, and scope of biomedical and translational research projects collecting large amounts of different types of data is now sufficient to offer numerous test-beds that would be demanding enough to move forward the science of big data and developing a big data research environment. The time is right for seizing the opportunity these test beds offer to drive the development of big data approaches in the context and service of translational science.
Common Fund investment that could accelerate scientific progress in this field: This initiative would support the research and development of a big data research environment for each of several sites hosting translational science projects, collectively spanning analyses that might include genomic, image, sensor, clinical, sociocultural, environmental, and electronic medical record data. Some examples of elements likely to be developed under a given award include, but are not limited to:
Awards would be made to support such integrated efforts to advance the science of big data and build a big data research environment associated with sites at which large clinical research projects and clinical trials are typically ongoing. While this initiative could be implemented in any of a number of ways, one possible implementation would be the use of cooperative agreements. The methods, results, progress, setbacks, and lessons learned would be shared among all of these cooperative agreements in an ongoing way so as to allow for an adaptive project process.
Potential impact of Common Fund investment: Informatics approaches currently used have largely been developed in the context of more limited data types and amounts than large translational science projects are now producing. A big data research environment built around, and assuming, such large, multidimensional studies producing gargantuan amounts of complex data would represent not only a quantitatively different understanding, but a qualitatively different understanding of the basic biology of health and disease. This new understanding, based on an integrative perspective from omics to the environment, would, in turn, provide new insights to improve human health, as well as clinical and public health decision-making.
Tags: new tools, computational, database, biomedical data, translational, workforce
Title of proposed idea: Molecular Classification of Disease
Nominator: Innovation Brainstorm participants
Major obstacle/challenge to overcome: Currently, “clinical syndromes” are often used to classify disease. The problem with this approach is that a given patient syndrome may contain significant heterogeneity with regard to molecular mechanisms of pathogenesis. As a result, the ability to identify pathogenic mechanisms in population studies is limited, as is the ability to quickly and efficiently identify who will benefit from therapeutic interventions. Thus, new approaches are needed for classifying patients and disease states that are more tied to the molecular basis of disease. Intermediate markers or “endophenotypes” may be helpful in this regard.
Another obstacle to translation is a general lack of willingness to challenge dogma, which can perpetuate stale thinking and practice.
Emerging scientific opportunity ripe for Common Fund investment: Progress in this area promises to fill gaps between molecular characterization and patient disease states, as well as to identify heterogeneity in classical clinical syndrome classifications. Recent advances in technologies that allow comprehensive profiling of patients at the molecular level and association of these profiles with clinical data provide an opportunity to completely redefine the way we think about and understand disease. However, these capabilities need to be developed further and expanded for regular use in the clinic.
Common Fund investment that could accelerate scientific progress in this field: Innovation is needed in the way in which we classify patients. Examples include:
The NIH could establish a well-characterized, central sample database to encourage data sharing and integration. New approaches to finding “lenses” to view complex biomedical problems could include funding coherent, high risk programs, as well as considering the relevance and ability of existing networks to pursue this work [e.g., Clinical and Translational Science Awards (CTSAs)].
Potential impact of Common Fund investment: Molecular characterization of disease has obvious benefit across the board for diagnosis and treatment of all diseases. In addition, progress in this area would catalyze the transition from one-size-fits-all medicine to personalized medicine. Clinical trials could be done more quickly and efficiently, and the resources harbored by population studies may be better utilized.
Finally, encouraging a mandate to challenge dogma would likely introduce broader thinking that will undoubtedly open new avenues for exploration.
Tags: computational, genetics/genomics, database, disease phenotype, clinical, diagnostics