Integrating Protein-protein Interaction Networks and Machine Learning to Identify Biomarkers of Cancer Onset

Hongyue Chen; Min Ma; Haoyu Liu; Qian Yang; Jiqiu Wang; Jie Zheng

doi:10.1101/2025.11.21.25340742

Abstract

Recent large-scale plasma proteomic studies have identified a set of biomarkers for the diagnosis of early cancer onset, but the predictive performance is still a challenging problem. Most existing studies have treated proteins as independent markers, ignoring their functional interdependencies within the biological network. We consider that protein–protein interaction (PPI) networks can capture coordinated biological signals to enhance the predictive performance. We identified 1,605 high-confidence PPI pairs of proteins (corresponding to 1,155 unique proteins) from the STRING database (confidence scores>0.9). The plasma proteomic data of these pairs were extracted from a subset of 38,585 UK Biobank participants with Olink measurements (noted as UKB-PPP). The univariate Cox regression (p<0.05) integrated with elastic-net machine learning models was used to build PPI predictive model on 23 cancer types, which seven cancer types with robust PPI associated with were included in the final predictive model. In general, models included proteomics features outperformed those based on age, sex, and lifestyles. Incorporating PPI-derived interaction features further improved prediction performance in three of the seven cancer models, with melanoma showing a significant improvement compared to the base model in C-index (ΔC-index = 0.13). In summary, integrating PPI networks with proteomic models could provide predictive gains in specific cancer types and underscore the value of molecular interaction patterns as complementary biomarkers for cancer onset.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This research was funded by the Noncommunicable Chronic Diseases-National Science and Technology Major Project (2024ZD0531500, 2024ZD0531502), the National Key Research and Development Program of China (2022YFC2505200, 2022YFC2505201, 2022YFC2505203), and the National Natural Science Foundation of China (32500519, 32570728). These funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

We thank the participants, contributors, and researchers of the UKB for making data available for this study. We thank the research and development teams at the 13 participating UKB-PPP companies (Alnylam Pharmaceuticals, Amgen, AstraZeneca, Biogen, Calico, Bristol-Myers Squibb, Genetech, GlaxoSmithKline (GSK), Janssen Pharmaceuticals, Novo Nordisk, Pfizer, Regeneron, and Takeda) for funding the study. All 13 companies listed as part of the UKB-PPP were involved in the generation of the proteomic data used in the present study.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data availability

Full information on how to access UKB data can be found at its website (https://www.ukbiobank.ac.uk/use-our-data/).

The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.