Skip to main content
Little Mice, Big Data
Little Mice, Big Data

Analyzing large amounts of data can be a daunting task in any research field; this includes applying appropriate statistical methods to ensure robust conclusions.With many areas of biomedical research now generating massive datasets, there is a growing need for easy to use and freely available statistical tools. Increasingly, researchers are working toward making their research data and analyses follow the “FAIR” principles—findable, accessible, interoperable, and reusable. 

The NIH Common Fund Knockout Mouse Phenotyping Program (KOMP2),  as part of the International Mouse Phenotyping Consortium (IMPC (link is external)), is leading the way in understanding different biological processes and diseases, and making their data FAIR. The researchers are part of an ambitious project to genetically silence – or “knockout” – and characterize all genes that code for proteins in the mouse genome. This effort to generate "knockout mice" for every protein-coding gene is the first step before systematically carrying out a range of tests to understand each gene’s biological function, or “phenotype.” 

The IMPC has developed a software package called “OpenStats (link is external),” specifically designed for the type of high throughput data generated by large research programs like the IMPC. But, it can also be tailored for smaller scale projects. The software package has been tested and implemented by the IMPC, which is increasingly focused on reproducibility, studying both sexes, and using appropriate statistical tools and methodologies. OpenStats builds on the current IMPC statistical computing software called PhenStat . However, when compared to PhenStat, it used far less computing time and obtained consistently similar results. One important way OpenStats contributes to FAIR data is by assessing input data for completeness, redundancy, and other mismatched variables or formatting. It also provides automated ways to consistently label commonly used sex and gender terms as a single term (“sex”) to promote interoperability and reusability of data. Importantly, OpenStats is freely available (www.bioconductor.org/packages/OpenStats (link is external)), allowing any researcher to reproduce and reuse analyses from others’ research while ensuring their own analysis is FAIR.

Reference:

OpenStats: A robust and scalable software package for reproducible analysis of high-throughput phenotypic data. Haselimashhadi H, Mason JC, Mallon AM, Smedley D, Meehan TF, et al. (2020) PLOS ONE 15(12): e0242933. https://doi.org/10.1371/journal.pone.0242933 (link is external)

This page last reviewed on August 8, 2023