Non-small cell lung cancer (NSCLC) is the most common type of lung cancer, and identifying its different types helps doctors choose the most effective treatment. Current approaches, such as examining tissue samples under a microscope or using imaging, can be time-intensive, invasive, or inconclusive, especially when tumor features are difficult to distinguish. At the same time, many researchers struggle to make full use of large and complex datasets of genetic information where important patterns can remain hidden. To address these challenges, the NIH Common Fund Data Ecosystem (CFDE)-supported researcher Dr. Shibiao Wan and colleagues developed RPSLearner, a computer-based tool that turns complex genetic data into clearer information to help identify major non-small cell lung cancer subtypes.
RPSLearner analyzes which genes are turned on or off in a tumor. To make these complex datasets easier to work with, the method first simplifies it using “random projection,” shrinking thousands of data points into a smaller set while preserving key patterns. It then combines results from multiple computer models using a “stacking” approach to produce a more accurate final prediction.
When tested on data from more than 1,300 patients, RPSLearner outperformed existing methods and more accurately identified different NSCLC types. The model also identified biological signals linked to specific cancer types, offering insights that may help guide diagnosis and treatment decisions.
By enabling faster and more accurate identification of lung cancer types, this work shows how improved data analysis methods can unlock the value of complex biomedical datasets, one of the key goals of the CFDE program. With early-stage funding to design, build, and test RPSLearner, CFDE is helping advance tools and approaches that make data more usable while supporting emerging researchers in generating insights that lead to better health outcomes and more personalized care.
Reference: Wu X, Wang J, Wan S. RPSLearner: A Novel Approach Based on Random Projection and Deep Stacking Learning for Categorizing Non-Small Cell Lung Cancer. Adv Intell Syst. 2025 Oct 5:e202500635. doi: 10.1002/aisy.202500635. Epub ahead of print. PMID: 41347120; PMCID: PMC12674606.