Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Biomarker Identification in Pancreatic Cancer Through Concordant Differential Expression and Interpretable Machine Learning Analyses

View ORCID ProfileSonia Maciá-Escalante, Rubén López-Aladid, Rebeca Muñoz-Tovar, Manuel López-Herrero, Ana Navarro-Sellés, Leonor Garmendia, Carolina Puerto, Mariana Fossati-Vázquez, Pablo Parente-Muñoz
doi: https://doi.org/10.64898/2026.02.13.26346263
Sonia Maciá-Escalante
1Bioinformatics Research Department, SMED Clinical Research, Alicante, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sonia Maciá-Escalante
  • For correspondence: smacia{at}smedcr.com
Rubén López-Aladid
1Bioinformatics Research Department, SMED Clinical Research, Alicante, Spain
2Data Science Dept., Codex Mythos, Barcelona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rebeca Muñoz-Tovar
1Bioinformatics Research Department, SMED Clinical Research, Alicante, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Manuel López-Herrero
1Bioinformatics Research Department, SMED Clinical Research, Alicante, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ana Navarro-Sellés
1Bioinformatics Research Department, SMED Clinical Research, Alicante, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Leonor Garmendia
1Bioinformatics Research Department, SMED Clinical Research, Alicante, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Carolina Puerto
1Bioinformatics Research Department, SMED Clinical Research, Alicante, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mariana Fossati-Vázquez
1Bioinformatics Research Department, SMED Clinical Research, Alicante, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pablo Parente-Muñoz
1Bioinformatics Research Department, SMED Clinical Research, Alicante, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background Pancreatic ductal adenocarcinoma is one of the most aggressive and lethal malignancies of the gastrointestinal tract. The poor prognosis is largely attributed to late-stage diagnosis, pronounced tumor heterogeneity, and limited therapeutic efficacy. These challenges underscore the urgent need for the identification of robust molecular biomarkers and novel therapeutic targets.

Methods Gene expression data from a total of 146 pancreatic tissue samples, comprising 72 normal and 74 tumor specimens obtained from the Pan-Cancer Atlas(TCGA) were analyzed. Differential gene expression analysis was conducted using the DESeq2 package, followed by functional enrichment analysis based on GO and KEGG. A classification model was developed using the XGBoost algorithm and evaluated through 500 bootstrapping iterations and 5-fold cross-validation to ensure robustness and generalizability. Model interpretability was assessed using SHAP (SHapley Additive exPlanations) values to identify genes with the highest predictive contribution.

Results A comprehensive transcriptomic analysis revealed significant dysregulation of multiple genes between normal and tumor pancreatic tissues. Genes such as GJB3, S100A2, MSLN, and SLC2A1 were notably overexpressed, whereas DEFA6, APOB, and RBP2 exhibited marked downregulation, indicative of impaired exocrine function and aberrant epithelial reprogramming. The XGBoost classification model achieved an average area under the curve (AUC) of 0.9868 and an overall accuracy of 98.6%. SHAP (SHapley Additive exPlanations) analysis identified GJB3, LINC02086, and TSPAN1 as key predictive features. Six genes were concurrently identified as differentially expressed and highly influential within the model, supporting their potential utility as robust biomarkers for pancreatic tumor characterization.

Conclusions Pancreatic ductal adenocarcinoma is marked by extensive transcriptomic reprogramming. The integration of differential gene expression analysis with interpretable machine learning enabled the identification of a molecular signature with potential diagnostic and therapeutic relevance.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data and Code Availability

De-identified gene-level resources and analytic artifacts generated for this study will be made publicly available upon publication at a persistent repository (DOI to be provided upon acceptance). Released files will include, subject to data-use permissions, gene-level normalized expression matrices; the full differential expression table (deseq2_results_annotated.csv); Gene Ontology and KEGG over-representation results (downregulated_GO_Biological_Process_2021_results.csv) and upregulated_GO_Biological_Process_2021_results.csv); SHAP summaries aggregated across evaluation runs (shap_values_bootstrap_summary.csv) and per-bootstrap classification metrics (Classification Metrics Across 500 Bootstraps.csv). Reproducible code will be deposited alongside the data, comprising R scripts for DESeq2 and clusterProfiler workflows and Python scripts for model training and explainability with XGBoost and SHAP.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted February 16, 2026.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Biomarker Identification in Pancreatic Cancer Through Concordant Differential Expression and Interpretable Machine Learning Analyses
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Biomarker Identification in Pancreatic Cancer Through Concordant Differential Expression and Interpretable Machine Learning Analyses
Sonia Maciá-Escalante, Rubén López-Aladid, Rebeca Muñoz-Tovar, Manuel López-Herrero, Ana Navarro-Sellés, Leonor Garmendia, Carolina Puerto, Mariana Fossati-Vázquez, Pablo Parente-Muñoz
medRxiv 2026.02.13.26346263; doi: https://doi.org/10.64898/2026.02.13.26346263
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Biomarker Identification in Pancreatic Cancer Through Concordant Differential Expression and Interpretable Machine Learning Analyses
Sonia Maciá-Escalante, Rubén López-Aladid, Rebeca Muñoz-Tovar, Manuel López-Herrero, Ana Navarro-Sellés, Leonor Garmendia, Carolina Puerto, Mariana Fossati-Vázquez, Pablo Parente-Muñoz
medRxiv 2026.02.13.26346263; doi: https://doi.org/10.64898/2026.02.13.26346263

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Oncology
Subject Areas
All Articles
  • Addiction Medicine (576)
  • Allergy and Immunology (868)
  • Anesthesia (306)
  • Cardiovascular Medicine (4483)
  • Dentistry and Oral Medicine (449)
  • Dermatology (385)
  • Emergency Medicine (615)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1528)
  • Epidemiology (15282)
  • Forensic Medicine (31)
  • Gastroenterology (1134)
  • Genetic and Genomic Medicine (6650)
  • Geriatric Medicine (671)
  • Health Economics (1006)
  • Health Informatics (4606)
  • Health Policy (1378)
  • Health Systems and Quality Improvement (1624)
  • Hematology (544)
  • HIV/AIDS (1276)
  • Infectious Diseases (except HIV/AIDS) (15965)
  • Intensive Care and Critical Care Medicine (1111)
  • Medical Education (626)
  • Medical Ethics (147)
  • Nephrology (674)
  • Neurology (6698)
  • Nursing (346)
  • Nutrition (1006)
  • Obstetrics and Gynecology (1153)
  • Occupational and Environmental Health (961)
  • Oncology (3370)
  • Ophthalmology (988)
  • Orthopedics (370)
  • Otolaryngology (421)
  • Pain Medicine (437)
  • Palliative Medicine (131)
  • Pathology (670)
  • Pediatrics (1704)
  • Pharmacology and Therapeutics (700)
  • Primary Care Research (717)
  • Psychiatry and Clinical Psychology (5497)
  • Public and Global Health (9287)
  • Radiology and Imaging (2225)
  • Rehabilitation Medicine and Physical Therapy (1375)
  • Respiratory Medicine (1202)
  • Rheumatology (598)
  • Sexual and Reproductive Health (721)
  • Sports Medicine (536)
  • Surgery (722)
  • Toxicology (100)
  • Transplantation (290)
  • Urology (267)