of Program Coordination, Planning, and Strategic Initiatives
Potential benefits and challenges in incorporation of informatics approaches and/or integration of large protein datasets into the development of the proteomic technologies.
3/23/2012 7:55:45 AM #
I think that one of the most important lessons from the DNA sequencing revolution is that technology may make the cost of acquiring data very inexpensive, but the cost of interpreting and analyzing data does not get cheaper. Large-scale DNA sequencing programs like the TCGA are constantly discussing how to analyze and interpret their data, how to do meaningful filtering and normalization, how to draw defensible and statistically sound biological conclusions. These are important computational topics, and the quality of the entire program’s results depends as much on good bioinformatics as it does on new instrumentation. There must be a significant and sustained investment in bioinformatics concurrent with the investment in technology. The volume and complexity of data that we hope to achieve with a disruptive proteomics technologies will require agile and intelligent computational research.
Sam Payne, PNNL
3/23/2012 2:18:37 PM #
One of the major realizations of the DNA sequencing revolution is that although the cost of generating data has decreased, the cost of data analysis has not. Large scale sequencing projects (like the TCGA) are constantly discussing their need for rigorous bioinformatics: how to filter and normalize the data, how to draw statistically robust conclusions. Bioinformatics research is a critical part of technology development. Advances in new instrumentation, particularly disruptive advances, will require sophisticated bioinformatics. In light of this, I hope that the NIH will make a clear and significant investment in bioinformatics research associated with the proposed investment in proteomics instrumentation. I think the listed goals are wonderful and should be supported.
3/26/2012 5:39:43 PM #
The development of "Informatics Tools" that enable new ways to 1) store, 2) analyze, and 3) exchange mass spectrometry data has tremendous potential as a "Disruptive Proteomics Technology". In fact, informatics tools (e.g. protein database search engines like SEQUEST and MASCOT) have been one of the key drivers in the field, and it's my belief that informatics will be the biggest factor for the future as MS instruments become more of a commodity. Investments in information technology that lead to the development and adoption of a common "Cloud Computing Framework for Mass Spectrometry" can be a unifying force that advances the use of best practices and promotes the exchange of data and information between research groups. Currently, many labs use and support a menagerie of informatics tools that have been cobbled together in a various ways to analyze MS data. What's needed is the development of shared analytics platfroms that allow multiple users and groups to work together to solve the data analysis problems using a systematic approach. Rather than all of the proteomics groups building their own "informatics tree forts" the NIH could support the development of a distributed cloud based environments that would allow multiple groups to work together and develop a sustainable industrial solution that benefits the larger good.For example, the NIH could dedicate $2M towards the purchase of cloud based computing resources (e.g. Amazon, Azur, etc...) and make this available for research to as many groups as possible with the requirement that tools that are developed need to be open-source and freely available to the mass spectrometry community. New tools that can demonstrate that they have been adopted by a large number of users and are providing a benefit would gain additional resources. Survival of the fittest. In this way, the NIH can prime the pump for the development of disruptive proteomics informatics technology.
Cancel reply to comment