Title of proposed idea: Meeting the Challenge of Big Data in Biomedical and Translational Science (see “Cross-Cutting Issues in Computation and Informatics” in Innovation Brainstorm ideas)
Nominator: NIH Institutes/Centers
Major obstacle/challenge to overcome: The complexity of human biology in health and illness is increasingly being taken into account by research design, with individual studies collecting genomic, image, biosensor, and clinical data, along with information about sociocultural and environmental factors. And, these large amounts of diverse data are almost always collected in digital form. Thus, modern biomedicine is confronted at once by great opportunity and great challenge. The opportunity presented by collecting multiple measures is to understand disease and gain insight to its prevention, treatment and cure, from a broad, encompassing perspective more likely to bear fruit than from studies limited to a small number of measures. The opportunity presented by collecting digital data is the ability to share, compare, reaggregate, reuse, and integrate data, as well as to use these data for models and simulations in ways that have been heretofore impossible. The challenge, however, is to be able to organize, present, analyze and manage these data to fully realize such opportunities. The challenge is one of “big data,” where handling and working with complex data at large scale is both quantitatively and qualitatively different than at a smaller scale.
Emerging scientific opportunity ripe for Common Fund investment: As the translation of biomedical research results into improved human health accelerates, and as the diversity of clinically-relevant measures grows to include those of basic biology, new approaches to big data, drawing from information science, informatics, computer science, and computational biology, must be developed and used to maximize the return on the research investment. Advancing the science of big data and developing associated tools requires test-beds to stimulate and shape conceptual progress and its reduction to practice. While this has happened in other fields, such as astronomy (which has benefited greatly as a consequence), a concerted effort to move these ideas and tools forward has not yet been made in translational biomedicine. The number, size, and scope of biomedical and translational research projects collecting large amounts of different types of data is now sufficient to offer numerous test-beds that would be demanding enough to move forward the science of big data and developing a big data research environment. The time is right for seizing the opportunity these test beds offer to drive the development of big data approaches in the context and service of translational science.
Common Fund investment that could accelerate scientific progress in this field: This initiative would support the research and development of a big data research environment for each of several sites hosting translational science projects, collectively spanning analyses that might include genomic, image, sensor, clinical, sociocultural, environmental, and electronic medical record data. Some examples of elements likely to be developed under a given award include, but are not limited to:
- Scientific foundation – e.g., meaningful analysis of multiple diverse data sets, scalable algorithms
- Informational foundation - e.g., identification/development of vocabularies, ontologies, metadata
- Technology and technical infrastructure - e.g., to move personal biosensor data to the environment
- Approaches for management of the data – e.g., semi-automated annotation, data compression and decompression, crawlers to associate particular data points
- Approaches for the use of the data – e.g., a synthesis platform that could be used to conduct “preliminary clinical trials” in silico to be used with adaptive trial methods, methods to evaluate the contribution of multidimensional measures to particular clinical and health outcomes
- Approaches, technical and cultural, to share and compare data across research groups
- Training in the science, development, and use of big data and its technology
Awards would be made to support such integrated efforts to advance the science of big data and build a big data research environment associated with sites at which large clinical research projects and clinical trials are typically ongoing. While this initiative could be implemented in any of a number of ways, one possible implementation would be the use of cooperative agreements. The methods, results, progress, setbacks, and lessons learned would be shared among all of these cooperative agreements in an ongoing way so as to allow for an adaptive project process.
Potential impact of Common Fund investment: Informatics approaches currently used have largely been developed in the context of more limited data types and amounts than large translational science projects are now producing. A big data research environment built around, and assuming, such large, multidimensional studies producing gargantuan amounts of complex data would represent not only a quantitatively different understanding, but a qualitatively different understanding of the basic biology of health and disease. This new understanding, based on an integrative perspective from omics to the environment, would, in turn, provide new insights to improve human health, as well as clinical and public health decision-making.