Abstract
Microsatellite instability (MSI) is a key biomarker for immunotherapy response and prognosis across multiple cancers, yet its identification from routine Hematoxylin and Eosin (H&E) slides remains challenging. Current deep learning predictors often operate as black-box, weakly supervised models trained on individual slides, limiting interpretability, biological insight, and generalization; particularly in low-data regimes. Importantly, systematic quantitative analysis of shared MSI-associated characteristics across different cancer types has not been performed, representing a major gap in understanding conserved tumor microenvironment (TME) patterns linked to MSI. Here, we present a multi-cancer MSI prediction model that leverages pathology foundation models for robust feature extraction and cell-level social network analysis (SNA) to uncover TME patterns associated with MSI. For the MSI prediction task, we introduce a novel transformer-based embedding aggregation method, leveraging attention-guided, multi-case batch training to improve learning efficiency, stability, and interpretability. Our method achieves high predictive performance, with mean AUROCs of 0.86±0.06 (colorectal cancer), 0.89±0.06 (stomach adenocarcinoma), and 0.73±0.06 (uterine corpus endometrial carcinoma) in internal cross-validation on TCGA dataset and AUROC of 0.99 on external PAIP dataset, outperforming state-of-the-art weakly supervised methods (particularly in AUPRC with an average of 0.65 across three cancers). Multi-cancer training further improved generalization (by 3%) via exposing the model to diverse MSI manifestations, enabling robust learning of transferable, domain-invariant histological patterns. To investigate the TME, we constructed cell graphs from high-attention regions, classifying cells as epithelial, inflammatory, mitotic, or connective, and applied SNA metrics to quantify spatial interactions. Across cancers, MSI tumors exhibited increased epithelial cell density and stronger epithelial–inflammatory connectivity, with subtle, context-dependent changes in stromal organization. These features were consistent across univariate and multivariate analyses and supported by expert pathologist review, suggesting the presence of a conserved MSI-associated microenvironmental phenotype. Our proposed prediction algorithm and SNA-driven interpretation advance MSI prediction and uncover interpretable, biologically meaningful MSI signatures shared across colorectal, gastric, and endometrial cancers.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
NR reports financial support provided by UK Research and Innovation (UKRI), outside of the submitted work. NR reports financial support from GlaxoSmithKline, outside of the submitted work. NR is a co-founder of Histofy Ltd.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All data produced in the present study are available upon reasonable request to the authors. All the data used in this work are publicly available: TCGA: https://portal.gdc.cancer.gov/ PAIP: https://paip2020.grand-challenge.org/Dataset/ HISTOPANTUM: https://zenodo.org/records/14555794
https://portal.gdc.cancer.gov/





