Aligning computational pathology with clinical practice for colorectal cancer
=============================================================================

* Elias Baumann
* José F Carreño-Martinez
* Ana Leni Frei
* Javier Garcia-Baroja
* Mauro Gwerder
* Amjad Khan
* Rina Mehmeti
* Jacob Hanimann
* Philipp Zens
* Heather Dawson
* Alessandro Lugli
* Inti Zlobec

## Abstract

Pathology reporting of colorectal cancer (CRC) follows the International Collaboration on Cancer Reporting (ICCR) guidelines which define a set of 25 diagnostic report elements. To further develop the CRC diagnostic routine, multiple computational tools have been proposed in the last years. Despite the excellent sensitivity and potential advantages, many tools do not reach clinical deployment, suggesting that there are critical challenges to address when developing these algorithms. To summarize existing efforts in deep learning for ICCR CRC elements and highlight existing gaps between development and clinical deployment, this systematic review collected studies on computational tools for colorectal cancer histopathology analysis published between 2015 and 2024. Most of the 66 included studies focus on a subset of just three ICCR elements, namely mismatch repair status, BRAFV600E mutation testing, and lymph node status. Moreover, many of the studies did not include clinically relevant and validated results. These results show the gap between research and clinical practice in pathology with the example of CRC diagnosis. There is an unmet need for publicly available datasets, and a stronger focus on clinically important tasks. This review will contribute to aligning computation pathology with the clinic to increase the translational potential of developed tools.

## 1. Introduction

Digital pathology is shaping the future of the fast-evolving field of pathology and digitalized pathology laboratories are an increasing trend worldwide1. Once these laboratories have the digital infrastructure in place (e.g. slide scanner and viewer, laboratory information systems or image management system etc.), they can benefit from AI-based tools such as AI algorithms in supporting pathologists to analyze tissue slides. Leveraging AI is potentially helpful for time-consuming cases that require assessment of long lists of criteria and are prone to interobserver variability. Moreover, several studies have shown increased diagnosis confidence among pathologists when assisted with AI2–4.

AI-based tools have been developed in numerous different pathologies, including colorectal cancer (CRC). CRC is the third most common cancer worldwide with a raising incidence in younger ages due to sedentary lifestyle and consumption of processed food 5. As a result of this, and widespread screening, CRC constitutes a considerable portion of cases in a pathologist’s diagnostic routine.

On one hand, the World Health Organization (WHO) Classification covers all generally accepted criteria6 for standardized disease definition and classification. On the other hand, the College of American Pathologists (CAP) and ICCR have developed evidence-based guidelines for standardizing reporting in pathology that are implemented globally7,8. In this review, we focused on the ICCR CRC guideline7 which covers both the macroscopic as well as microscopic evaluation of CRC resection specimen. As automatic methods have been predominantly developed for microscopic assessment, we restrict the review to only include reporting elements from this category. These elements are classified into “core” elements that are key for disease management, staging and prognosis, such as histological tumor type and grade, lymphatic invasion, TNM staging; and “non-core” elements, recommended but not validated by all levels of evidence, including tumor budding, mismatch repair (MMR) status, coexistent pathology, etc. While various AI algorithms have been developed to assess individual core and non-core elements, it remains unclear whether they can be combined for an AI-based evaluation of all elements on the digitized whole slide image (WSI) and produce an automated CRC pathology report. With this in mind, through this systematic review we aimed to reflect on: 

1.  The current landscape of deep learning (DL) in computational pathology (CPath) for CRC

2.  Published work on the 25 elements of the ICCR CRC diagnostic report

3.  Gaps towards automated ICCR-based reporting

4.  The potential avenues and future steps to close these gaps

## 2. Methods

### 2.1 Structure of this review

Given the breadth of this review, the Results section is divided into three major parts, starting with general meta-results on gathered papers, and followed by meta-analysis of each element of the ICCR CRC guideline grouped here as core or non-core. The elements are sorted by number of included publications. After this analysis, and a general discussion on the most important takeaways, we include recommendations for elements where currently no readily usable computational approach has been published (Fig. 1).

![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2025/09/01/2025.06.10.25329328/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2025/09/01/2025.06.10.25329328/F1)

Figure 1: Graphical overview of the review.

#### Review Protocol

We perform a systematic literature review by following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines9.

#### Search strategy

For the review, we defined the following search criteria, with slight adaptations specific to the search engine. Essentially, search terms included a term related to cancer, to colon/colorectal, and a term related to histopathology to focus on the specific disease. Furthermore, deep learning, artificial intelligence, computer vision, and computer assistance-related search terms were added to cover all papers that develop computer-aided approaches. To include a wide range of publications across disciplines in our study, we included Pubmed, Web of Science, Embase, and IEEE Xplore as databases. Only papers published between January 1st 2015 and September 13th 2024 were eligible for inclusion in this study. Only abstracts in English language were included. All article abstracts were screened in covidence10. The full search terms for each database can be found in Supplementary Table 1.

#### Article Screening and eligibility

Articles were screened by at least two raters, with a third rater reviewing the abstract in case of disagreement. Duplicates were removed automatically by covidence10, or manually, if the automatic method failed to recognize the duplicate. Initially, abstracts were screened to assess relevance for the review according to a list of exclusion criteria. Specifically, articles should include an evaluation on human CRC, and results on histopathology image data. Articles that did not include original research (such as reviews, letters and commentaries) as well as conference abstracts were also excluded. Moreover, multimodal studies were excluded if they did not include results using exclusively histopathology/clinical data. If the approach evaluated non-hematoxylin and eosin (H&E) images, did not report slide-level or patient-level results, or only worked in a semi-automatic fashion they were also excluded. Finally, the article had to be related to or directly address one of the elements, e.g. if a study concerned itself with tissue type segmentation but is not usable for or applicable to one of the elements, it was excluded. Moreover, the developed model had to produce an output closely related to how the respective element is specified in the ICCR CRC guideline. In a second round, full texts are assessed for eligibility. The same exclusion criteria were applied. The full list of exclusion criteria can be found in Supplementary Table 2.

#### Adding references from other sources

Additionally, some articles from other sources highly relevant to this review were included. These papers were identified via references or by using search engines such as google scholar or semantic scholar for the specific ICCR CRC guideline element. Moreover, some articles did not contain the necessary information of the search criteria in their abstract, but were still relevant to this review, so they were manually included as well.

#### Data extraction

To thoroughly assess individual studies, information was extracted from each study. Specifically, data concerning the cohorts used (including staining, number of patients, cancer stage distribution, target distribution), the training setup (dataset splits, cross validation and involvement of pathologists), the DL pipeline (input size, pre-processing, model, post-processing, training task and target) and reported results (AUC, F1, … if available). Additionally, we assessed for each study whether there were slide-level or patient-level results, if the published model is clinically approved, whether data, code, or model weights were available and whether there is any information on inference time. Data were extracted by a single reviewer and verified by a second reviewer. Many studies report multiple results across several cohorts and in multiple configurations. To properly represent those studies in our analysis, the best results were extracted from each paper unless specified otherwise in the specific chapter. For multimodal models which use data that would normally not be used in a pathology report, we report the results where only histopathology data is used. Papers which cover more than one ICCR element will be included multiple times in the analysis but evaluated specifically for the respective topic.

## 3. Results

The search across databases yielded 4863 papers in total, of which 1636 were identified as duplicates. Additionally, 20 papers from citation search or additional manual search were included, yielding 3247 studies to be screened. In accordance with the exclusion criteria, 3066 studies were subsequently excluded, leaving 181 studies for full text screening. After full text screening, 66 studies were included in the final review (Figure 2).

![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2025/09/01/2025.06.10.25329328/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2025/09/01/2025.06.10.25329328/F2)

Figure 2: Flowchart following the PRISMA guidelines

### 3.1 Meta Results

Although half of the ICCR CRC report elements are covered by the 66 selected studies, most publications concern only three topics. Specifically, 39/66 (59%) of studies showed prediction results on MMR status, 16/66 (24%) on BRAF V600E mutation, and 9/66 on pN (14%). Coexistent pathologies (5/66), Grade (4/66), and TNM Stage (3/66) are covered by five or less publications. For tumor budding and perineural invasion two studies were included each, whereas one study each for pT and response to neoadjuvant therapy was extracted. Publications can cover multiple topics at once, for example, 14 publications cover both MMR status and BRAF V600E mutation. For the remaining elements of the ICCR CRC guidelines the employed search and exclusion criteria did not yield any publications (Figure 3 A).

![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2025/09/01/2025.06.10.25329328/F3.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2025/09/01/2025.06.10.25329328/F3)

Figure 3: 
A) Number of articles covering a specific element of the ICCR guidelines. B) Number of published articles over time, aggregated by half-year. Studies for the second half of 2024 were only retrieved until 13.09.2024. C) Number of papers with accessible data, code and model weights respectively.

Following the general trend of increased attention and impactful contributions in CPath11, we also observe an overall increase in publications over time, even when applying strict exclusion criteria (Figure 2 B). MSI and BRAF mutation prediction studies remain the most common over the last four years. Lymph node metastasis prediction studies were increasingly published starting in the second half of 2022. Overall, the topic variety among published studies also increased in the past five years, but without a substantial growth in the number of publications in popular topics since 2022.

Considering data, code and model weight availability (Figure 2 C), more than half of the published papers (37/66, 56%) use some publicly available dataset or make their data available, and 31/66 (47%) use The Cancer Genome Atlas (TCGA) in some form. Code was available for less publications (25/66, 38%) and publications that made model weights available were scarce (8/66, 12%). It should be noted that resources which are available “upon reasonable request” were considered unavailable, although we did not investigate whether the information could have indeed been obtained in these cases.

This initial analysis has shown substantial variation in coverage across all reporting elements. In the next step, we will analyze published work related to the core elements of the ICCR CRC guideline.

### 3.2 ICCR CRC Guidelines: Core Elements

Elements classified as “Core” within the ICCR CRC guidelines are considered essential for clinical management and staging of CRC patients. Excluding clinical and macroscopically retrieved information, the microscopic core elements comprise histological tumor type, tumor grade, extent of invasion (and pT), lymphatic, venous and perineural invasion, lymph node status (and pN), tumor deposits, response to neoadjuvant therapy, margin status, margin status, histologically confirmed distant metastasis (and pM), the aggregation into pTNM, and confirmatory evaluation for neuroendocrine neoplasms. Of these twelve core elements, six have been addressed in 20 published articles, with pN (9/20) and tumor grade (4/20) receiving the most attention. 5/20 publications have used publicly available data or released their datasets, 6/20 made their code available, but none published model weights (Fig. 4). Next, we present the state of research of each individual core element in order of the number of included publications.

![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2025/09/01/2025.06.10.25329328/F4.medium.gif)

[Figure 4:](http://medrxiv.org/content/early/2025/09/01/2025.06.10.25329328/F4)

Figure 4: 
A) Number of articles covering a specific core element of the ICCR guidelines, split by the data, code, and weight availability for each topic. B) Number of published articles over time, aggregated by half-year. Not shown are elements where no publications passed exclusion criteria.

#### Regional Lymph Nodes (pN) and Lymph Node Status

Regional lymph node (LN) status is crucial for determining adjuvant chemotherapy and cancer staging. The TNM classification of regional LNs (pN) includes: NX (nodes cannot be assessed), N0 (no metastasis), N1a (1 positive node), N1b (2-3 positive nodes), N1c (tumor deposits in soft tissue without nodal metastasis), N2a (4–6 positive nodes), and N2b (≥7 positive nodes)12. A LN is considered positive if it contains at least one metastatic lesion larger than 2 mm12. Pathologists aim to evaluate at least twelve nodes per case, though this number can be influenced by factors such as specimen length, patient age, or neoadjuvant therapy7,13.

##### Data and Methods

Several approaches have been used to automatically screen LN slides for metastases, listed in Table 1. Most methods are focused on metastasis detection/segmentation or based on metastasis detection further predicting the slide-level labels, whether positive or negative. Model evaluation was performed on both external (30% of the studies) and internal cohorts (70% of the studies). Clinical validation or pathologist involvement was reported in ∼30% of studies.

View this table:
[Table 1:](http://medrxiv.org/content/early/2025/09/01/2025.06.10.25329328/T1)

Table 1: Extracted studies for lymph node status and pN.
*187 CRC WSI, ** Pathologist performance assisted by model.

Huang et al.22 used a human-in-the-loop AI system (nuclei.io) to personalize ML models for colorectal LN metastasis detection. The approach improved accuracy, F1 score, sensitivity, and especially detection of isolated tumor cells, while significantly reducing evaluation time for fellows and negative cases. Similarly, Kindler et al.20 trained a Deep Neural Network tool achieving high pixel-level accuracy. Clinical testing showed high sensitivity (0.990), strong interobserver agreement (κ = 0.94), and significantly reduced review times for pathologists. Furthermore, Khan et al.18 demonstrated 100% agreement between their ensemble model and expert pathologists in a study of 217 cases.

##### Clinical Employability

The reviewed studies show high sensitivity and specificity across the board for metastasis detection. However, the automation of pN staging requires the additional step of counting the number of positive lymph nodes in a slide. This can be difficult due to tissue preparation steps where some large lymph nodes may be cut in half and may be present twice in the same slide, or on multiple different slides. Correctly re-identifying the same lymph node and ensuring positive lymph nodes not being counted twice therefore remains an unsolved problem. One approach to addressing this issue could be a better link between macroscopic tissue preparation and AI-based image analysis. For example, lymph node placement information could be additionally stored, but consistent tissue inking may also be an avenue to solve the problem automatically.

#### Histological Tumor Grade

Tumor grade describes the level of differentiation of tumor cells where a higher grade indicates a loss of glandular organization6,23. In accordance with the WHO classification, the ICCR CRC guidelines classify the grade by the least differentiated component of the neoplastic lesion which then results in either low or high tumor grade.

View this table:
[Table 2:](http://medrxiv.org/content/early/2025/09/01/2025.06.10.25329328/T2)

Table 2: Extracted studies for tumor grade. Normal (NORM), Well-differentiated (WD), Moderately-differentiated (MD), Poorly-differentiated (PD).

##### Data and Methods

The retrieved studies use both private and public datasets, but with the exception of one study25, none validate their results externally. Given the subjectivity and high interobserver variabililty of tumor grade assessment28, a consensus label based on multiple pathologists might be more reliable as ground truth. Some studies26,27 used two raters, with a third pathologist to resolve any conflicts. Yet, none of the retrieved studies reported agreement between pathologists and deep learning models. Most studies rely on tile classification25–27 to directly predict tumor grade. Rathore et al.24 use morphological and texture-based image features aggregated in a support vector machine classifier and majority voting. They achieve comparable performance to tile classifiers on their dataset. For tile classification methods, the overall strategies include multiple instance learning25 for end-to-end training, and supervised tile-based training26,27. No studies aggregate at the patient level, although clinical practice typically considers the highest grade across multiple slides. This is particularly relevant for TCGA-based studies26, as WSIs were not selected for grading and may not be representative of the overall tumor grade.

##### Clinical Employability

Data accessibility varies significantly across studies. While Rathore et al.24 use the publicly available GLaS dataset and Chen et al.26 TCGA-COAD and TCGA-READ, and Soldatov et al.27 only make their dataset available upon request. Beyond TCGA and challenge datasets, independent validation is limited due to restricted access to additional clinical data.

Reproducibility is another concern, as most studies do not share code or model weights. Only Schrammen et al.25 have a publicly available repository. Despite these limitations, the availability of public datasets (e.g., GLaS) enables further research and tool development.

#### pTNM

Pathological TNM staging is the combined assessment of pT, pN, and pM. The TNM stage directly informs treatment decisions including whether a resection is necessary, or whether additional adjuvant therapy needs to be performed. It remains the most important prognostic factor in CRC diagnosis, with Stage I patients’ five-year survival being over 90% for colon and rectum, but Stage IV 11-15% respectively29.

View this table:
[Table 3:](http://medrxiv.org/content/early/2025/09/01/2025.06.10.25329328/T3)

Table 3: Extracted studies for pTNM. *Results from Image-only approach, paper reports improved results when integrating genomic data.

While TNM Staging could be algorithmically constructed out of evaluated T, N, and M, all three retrieved approaches directly predict TNM stage from the primary, with only one publication also considering lymph node slides30. Two publications rely on TCGA COAD (and READ) for the evaluation, and Kumar et al.31 additionally include an internal cohort for training and validation. Two methods used graph convolution to spatially integrate tissue tile embeddings into slide level predictions30,32, whereas the last followed the more traditional method of tumor area detection, and subsequent tile classification and averaging based on morphological features of nuclei31.

##### Clinical Employability

Only Kumar et al.31 generate clinically actionable results, but they use their model for the specific task to differentiate between metastatic colon cancer and local colon cancer. While Pei et al.30 also show slide-level results, they never evaluate them or report a tile-aggregation strategy. Levy et al.30 predict low (I, II) vs high (III, IV) stage, but their approach does evaluate the slide for invasion depth so further development could lead to a comprehensive TNM analysis.

Taken together, none of the retrieved papers demonstrate an end-to-end pipeline that can reliably assess the TNM stage (I-IV) from histological slides. While there are interesting cases where directly predicting the risk of metastasis from the primary might be sensible (e.g. what is the risk of a missed metastasis), simpler approaches following the UICC TNM guidelines by aggregating information about local, regional and distant spread may be more clinically relevant for routine reporting.

#### Perineural Invasion

Perineural Invasion (PNI) is defined as tumor growth along a nerve, where the tumor surrounds at least one-third of the nerve’s perimeter and invades any of its layers33. PNI is reported as either present or absent, and its presence has been linked to poor prognosis, particularly in Stage II34.

View this table:
[Table 4:](http://medrxiv.org/content/early/2025/09/01/2025.06.10.25329328/T4)

Table 4: Extracted studies for perineural invasion.

##### Data and Methods

Both included studies employ a similar approach for detecting PNI. First, nerve and tumor regions are separately segmented, and subsequently a boundary between the two regions is identified. While Jung et al.36 use a private dataset for training, Han et al.35 trained on the PAIP2021 dataset comprising of 240 WSIs from multiple cancer types. The dataset was used for the PAIP2021 challenge37 to detect PNI.

##### Clinical Employability

Neither of the included studies provide the code or the weights of the DL algorithms that produced the reported results. Moreover, the validation cohorts, compared to studies on other topics, e.g. the previously discussed lymph node metastasis detection, are comparatively small. Beyond this reproducibility gap, the methods do report a binary output for the perineural invasion in accordance with the ICCR CRC guidelines.

#### pT and Extent of invasion

The extent of invasion of the primary tumor is the first part of the TNM staging and reflected in pT. It is assessed by considering the different components of the colon wall and the presence of tumor within them. pT ranges from pT0 (no tumor) to pT4 (tumor invades other organs or perforation into the visceral peritoneum) and has a direct impact on the treatment of a patient12.

View this table:
[Table 5:](http://medrxiv.org/content/early/2025/09/01/2025.06.10.25329328/T5)

Table 5: Extracted studies for extent of invasion and pT.

Only one study that automates the prediction of pT in CRC passed the exclusion criteria38. In this study, the authors use two validation cohorts, one internal with 38 patients, one external with 42 patients. For classification into pT1-4, they first segment the tumor area, then use a patch-based classifier to predict pT for each tile to finally choose the highest pT value as the final prediction. The model reaches AUCs up to 0.93 on the internal test set and 0.90 on the external validation cohort.

Yet the validation sets only include few patients, while data, code, and weights are not publicly available. They do show results with explicit cutoffs, but only in 1-vs-all settings without multiclass evaluation. Therefore, more work is needed to develop robust approaches for automatic pT assessment with larger validation cohorts and investigating challenges in real world datasets.

#### Response to Neoadjuvant Therapy

Tumor regression gradings (TRGs) are used to assess the response to neoadjuvant chemoradiotherapy (nCRT) by measuring the extent of tissue changes caused by the treatment39. This is a common assessment step in locally advanced rectal cancer treated with neoadjuvant therapies. Multiple TRGs exist but all rely on the assessment of tumor to fibrosis ratio within the tumor bed39.

View this table:
[Table 6:](http://medrxiv.org/content/early/2025/09/01/2025.06.10.25329328/T6)

Table 6: Extracted studies for response to neoadjuvant therapy.

##### Data and Methods

We retrieved only one study that predicted regression after nCRT in a fully automated way40. The authors used an MIL approach with gated attention weight normalization with a final bilinear attention for multi-scale feature fusion to classify responders from non-responders according to slide-level labels. Yet, they validated the model on a breast cancer metastasis dataset rather than rectal cancer which reduces the value of the validation.

##### Clinical Employability

There exist many different systems for TRG, and in the included study, the AJCC system41 was used. However, the four AJCC TRG categories were merged into a binary classification task which does not correspond to the ICCR CRC guidelines. The reviewed study trained for slide-level predictions but does not discuss aggregation strategies for patient-level predictions. However, this post-processing aggregation step is needed for successful clinical deployment, as pathologists report a patient-level TRG.

Due to the challenges caused by the TRG definitions, we did not find any study that presents a clinically ready algorithm to assess tumor regression in rectal cancer, yet a tissue type segmentation model fine-tuned to predict remaining tumor and fibrosis could replicate every TRG scheme.

### 3.3 ICCR CRC Guidelines: Non-Core Elements

The non-core elements of the ICCR CRC dataset assess additional features that provide valuable prognostic insights and are clinically relevant but are not frequently used in patient management or not yet validated. They include measurement beyond the muscularis propria, MLH1 promoter methylation, tumor budding, coexistent pathology, MMR status, and BRAF V600E mutation. Of these six elements, only four have been addressed in the published studies included in this review. MMR status and BRAF mutation, in particular, have received significant attention, with most MMR-related (28/39) and BRAF-related (11/16) studies using TCGA as their primary data source. For these elements, code to reproduce results is often available (21/48, 44%), and some studies provide access to trained model weights (8/48, 17%). Interestingly, since similar methods are applied for MMR– and BRAF-status prediction, 14 studies investigate both biomarkers simultaneously. In contrast, tumor budding and coexistent pathology remain comparatively unexplored and overall lack publicly available data, code, or access to weights (Fig. 5). In the following section, we examine the current research landscape for each of the non-core elements.

![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2025/09/01/2025.06.10.25329328/F5.medium.gif)

[Figure 5:](http://medrxiv.org/content/early/2025/09/01/2025.06.10.25329328/F5)

Figure 5: 
A) Number of articles covering a specific non-core element of the ICCR guidelines, split by the data, code, and weight availability for each topic. B) Number of published articles over time, aggregated by half-year. Not shown are elements where no publications passed exclusion criteria.

#### MMR status

The DNA mismatch repair (MMR) system is responsible to correct DNA replication mismatches42. The mutation of its major genes and the resulting instability of microsatellite repeat sequences is referred to Microsatellite Instability (MSI)43. MSI is a biomarker in CRC, guiding clinical decisions for intermediate risk Stage II and Stage IV CRCs44,45, and represents an indicator for lynch syndrome screening – a hereditary condition characterized by mismatch repair gene mutations. Routine diagnostic testing is typically performed using Immunohistochemistry (IHC) and both ASCO and ESMO guidelines mandate dMMR testing for all new CRC diagnoses 46. Recent DL models have demonstrated that they can predict MSI status directly from H&E slides without the need for IHC.

View this table:
[Table 7:](http://medrxiv.org/content/early/2025/09/01/2025.06.10.25329328/T7)

Table 7: Extracted studies for microsatellite instability.
*Publication uses multimodal data to improve results, reported are pathology only, ** Results based on some form of training only on TCGA if available.

##### Data and Methods

Common datasets for evaluation are the publicly available TCGA COAD/READ cohorts (80.4%), the PAIP2020 challenge dataset (26.3%), and DACHS (21.05%). Of note here is that different publications use different subsets of TCGA COAD/READ so while results have been generated on some of the same slides metrics cannot be properly compared. At the same time, many papers only used one (33.3%) or two (35.3%) cohorts in their analysis and the remainder used more (29.4%).

There are mixed approaches to ground truth, as for MSI/dMMR testing, several different methods are clinically appropriate. 33.3\% relied on both IHC and PCR for the same or different datasets, 49.0% only rely on PCR, 11.8% only on IHC, 3.9% on other methods. None of the papers using IHC as gold standard are reporting whether these were single observer results, even though there is some disagreement in manual IHC dMMR protein analysis86.

Most MSI prediction models rely on tile level embeddings that are aggregated to a slide-level score. Before that step, many approaches pre-detect the tumor area (43.6%), and then subsequently usually add an average (41.1%), majority (23.5%) or top-k (17.6%) aggregation on top. Models that did not pre-segment the tumor tissue (56.4%) commonly relied on an attention mechanism (50%) to weight tile results for the final slide-level MSI score.

Foundation models have included MSI prediction in CRC on datasets like TCGA-CRC-DX47,87 as a benchmark task 80,84,85, which will lead to more results in this topic, though the small scope of the evaluation may not bring additional insights on clinical translatability.

##### Clinical Employability

In MSI/dMMR prediction, the DL model either needs to predict MSI as well as a diagnostic test (e.g. at least as well as IHC) or needs to be used as a screening tool. In both cases, a cutoff needs to be defined. In this review, 47.1% of papers defined some cutoff at least on a subset of the validation data, but only 13.7% demonstrated some applicable cutoff viable for screening (e.g. at least 90% sensitivity) in a clinical setting. One method is clinically approved as a pre-screening tool with CE-IVD certification78, yet so far, no tool can reach the same specificity as IHC or PCR testing while maintaining 95% sensitivity (Saillard et al.78 46% Spec. at 98% Sens., Wagner et al.75 61% Spec. at 95% Sens.).

The current state of papers therefore indicates that the MSI category is advanced, yet tools can only be employed as screening tools with gold-standard testing necessarily following the screening as the specificity is comparatively low. Therefore, either DL models can be further improved to have similar sensitivity and specificity as gold-standard testing, or this step necessarily needs to be deferred to a non-H&E evaluation.

#### BRAF V600E

BRAF mutation testing plays an important role in Lynch syndrome identification after MSI testing. On the other hand, microsatellite stable BRAF mutant colorectal cancers are particularly aggressive, yet the mutation also increases responsiveness to EGFR inhibitor therapy88. The non-core element in the CRC ICCR guidelines specifically concerns the oncogenic V600E mutation, which represents approximately 90% of BRAF mutant cases in a large trial89. Notably, BRAF and MSI cases share similarities in both morphological features and epidemiological patterns90.

View this table:
[Table 8:](http://medrxiv.org/content/early/2025/09/01/2025.06.10.25329328/T8)

Table 8: Extracted studies for BRAF V600E mutation.

##### Data and Methods

As there is considerable overlap between the setup, task, and patients with MSI and BRAF mutation, publications frequently predict both. Therefore, different subsets of the TCGA cohort are the also most prevalent datasets (75%), followed by the DACHS cohort (31.3%).

Most approaches apply stain normalization (68.8%) and tile extraction (100%) as preprocessing steps. Many workflows (31.3%) subsequently identify tumor tissue using a tissue classification algorithm56,65,70,76,79. The majority of publications implement MIL-based approaches (81.3%), differing in their tile aggregation approaches (averaging: 23.1%, top k tiles: 30.7%, other: 30.9%). These MIL-based approaches are commonly built on either CNN-based (31.3%) or attention-based (31.3%) architectures. Recent publications are showcasing foundation models (18.8%) with either averaging-based 91,92 or attention-based85 aggregation strategies.

##### Clinical Employability

Only 43.8% of publications conducted an external validation of their algorithm, while 56.3% utilized more than one cohort. Openly available datasets were included in 68.8% of all studies. The code is available in 62% of all publications, and model weights are only available in 31.25% of studies. Furthermore, most studies did not differentiate between BRAFV600E mutations and all other BRAF mutations (87.5%). In contrast, Fuji et al. utilized a BRAFV600E-specific cohort, whereas Tsai et al. reported both BRAFV600E-specific and unspecific results.

Only four studies reported actionable results54,61,70,74, meaning that a label (BRAFWT/BRAFmut) rather than a mutation probability was assigned to each WSI. Only two studies report prediction sensitivities higher than 90% 54. Of those, only one study reached an accuracy higher than 0.5 54.

Of all 16 included publications, only Guo et al.70 and Schrammen et al.25 report fully integrated pipelines with results based on a clinically appropriate cutoff (sensitivity >90%). However, Guo et al.70 neither shared code nor weights, and Schrammen et al.25 did not use any openly available datasets on which results could be reproduced.

The current state of BRAFV600E prediction is unsolved and mostly serves as a validation task for multi-purpose models. Almost all (88%) studies reporting BRAFmut –status report MSI-status as well. As those features are highly correlated, it is expected that models will perform similarly for both tasks. However, there exist cases that are non-correlating, which are especially interesting to pathologists95. None of the studies investigated performance in such cases. Furthermore, most studies do not utilize a clinically appropriate cutoff, making it hard to judge their performance in a clinical setting.

#### Coexistent Pathology

Under coexistent pathology, all other identified lesions should be listed. This could be additional synchronous carcinomas, polyps or e.g. IBD, other dysplastic normal tissue, subsumed under other lesions. Published work usually focuses on precursor lesions or the correct classification of polyps, but those models could be applied to e.g. correctly classify additional polyps in the same case. Additionally, because of the diversity of possible findings within this section, the models that have been developed so far only focus on one of the items that should be reported.

View this table:
[Table 9:](http://medrxiv.org/content/early/2025/09/01/2025.06.10.25329328/T9)

Table 9: Extracted studies for response to coexistant pathology.
NORM, Normal tissue; HP, Hyperplastic Polyp; TA, Tubular Adenoma; TVA, Tubulo-Villous Adenoma; SSP, Sessile Serrated Polyp; SA, Serrated Adenoma; SSA, Sessile Serrated Adenoma; LG, low grade dysplasia; HG, high grade dysplasia,

##### Data and Methods

The reviewed studies utilized datasets derived from biopsy or polyp resections rather than oncologic resection specimens. Across studies, there is lack of access to the development datasets.

It is important to highlight that the same task has different GT across studies. While only two studies address dysplasia grading, they incorporate different classes. Perlo et al.98 add Hyperplastic Polyp (HP) to their dysplasia categories, which is not used by Neto et al.99. Similar observations apply for polyp classification.

Slide-level classification is achieved by Multiple Instance Learning by Neto et al.99, while Perlo et al.98 aggregated tile classification results from a CNN. Wei et al.97 defined class-specific thresholds to categorize whole slide images (WSIs).

##### Clinical Employability

Variability in ground truth definitions limits model comparability and clinical integration. Differences in annotation standards hinder consistent evaluation, while a lack of publicly available datasets and independent validation cohorts restricts reproducibility. Only Neto et al.99 provide accessible validation data, but broader external validation remains scarce. Existing studies focus on polypectomy specimens rather than oncologic resection samples, since the studies are designed for application to polyp classification in the colorectal cancer screening workflow. While existing models can assess isolated polyps, their performance on primary tissue slides, which encompass all layers of the colon, and thereby unseen tissue types, is uncertain.

Furthermore, several coexisting pathologies remain unexplored. For instance, no studies were identified that automatically evaluate (IBD) using H&E-stained slides. While Rymarczyk et al.100 present a model to score Crohn’s disease and ulcerative colitis, it relies on prior classification into one of these two categories, highlighting a gap to achieve automatic reporting of this section of the guidelines.

#### Tumor Budding

Tumor budding (TB) is defined as presence of small clusters of up to four cancer cells at the tumor invasive front in a tissue section101. TB assessment in cancer follows the standardized guidelines of the International Tumor Budding Consensus Conference (ITBCC) and is performed in hotspot regions (0.785 mm²) of H&E-stained tissue sections102. This scoring has clinical implications, particularly for pT1 and stage II CRC patients, as high TB is associated with poor prognosis 101,103. Identifying and counting TBs on H&E slides is time-consuming and prone to inter-observer variability making it an ideal task to be automated by DL models.

##### Data and Methods

Most published studies on AI-assisted TB assessment rely on IHC staining, patch-level, single-center training and validation of the model. After exclusion criteria, two studies following the standardized ITBCC assessment pipeline were left (Table 10).

View this table:
[Table 10:](http://medrxiv.org/content/early/2025/09/01/2025.06.10.25329328/T10)

Table 10: Extracted studies for tumor budding.

The study by Lu, et al.104 uses faster RCNN 106 for bud detection. Ground truth was established manually on H&E tissue slides and the model’s evaluation time was significantly faster than manual assessment.

On the other hand, Bokhorst et al.105 created a pipeline integrating a U-Net107 model for tissue segmentation and HoVer-Net108 for nuclei detection at the invasive front. The tumor bulk and invasive front were identified using a convex-hull algorithm. Their strategy was validated on data from four different medical centers.

##### Clinical Employability

Both models present different approaches in TB detection and counting in H&E WSIs that seem promising to reduce time and inter-observer variability among pathologists. However, there are a number of limitations to be addressed before being able to deploy such models in the diagnostic routine, such as small validation datasets, an overestimation of TB count especially in necrotic areas, and an inability to distinguish pseudo budding 109.

## 4. Discussion

In this large systematic review, we have analyzed published work on deep– and machine learning for digital pathology image analysis by putting each study into context of the ICCR CRC guidelines. Particular focus was put on whether studies could be implemented in a clinical workflow to assist pathologists by automating the CRC pathology report. This work has shown substantial gaps and numerous elements of the ICCR CRC guidelines which have not been addressed by any study.

Moreover, studies often do not publish code or weights such that they could be reproduced or implemented in an experimental workflow, and data is rarely shared as well.

One of the main observations from our analysis is that despite the critical role of core elements of the ICCR CRC guidelines for clinical decision making, most research efforts focus on automating the reporting of non-core elements. This discrepancy highlights the challenge in aligning the research objectives of the CPath community with the practical needs of the clinic. Many core tasks are not simple, and some are not investigated at all. Instead, they are often overlooked due to a perceived lack of methodological novelty, despite being central to real-world diagnostic workflows.

In this context, it is important to acknowledge that the recent advances in AI have powered the development of chatbots110, or computer vision tools that can work as an end-to-end diagnostic system or even predict patient survival111. While these models offer the appeal of simplified outputs by bypassing intermediate diagnostic variables, they risk overlooking the interpretability and clinical validation that the elements of the ICCR guidelines offer. In contrast, building trustworthy and reliable models for ICCR-defined elements aligns more closely with current diagnostic practice. Such an approach may facilitate adoption by pathologists and laboratories, as more transparent model outputs can be verified and causally understood112. On the other hand, in established diagnostic criteria, it can take decades to alter established criteria or add additional ones, such that end-to-end prognostic models may not have a chance to ever be adopted as standard-of-care.

Another pressing issue is the generalized lack of multi-cohort evaluation. Models are often developed and validated on single-institution datasets or within closed consortia, which limits generalizability. Differences in staining protocols, scanning equipment, and population demographics are known to significantly affect performance, and are a source of bias113,114, yet multi-site evaluation that would better reflect real-world conditions is rarely done, mostly due to data access barriers. As a result, many studies use TCGA data, but H&E slides in TCGA were never selected for diagnostic evaluation. Therefore, validation results on TCGA have only limited transferability to other real-world cohorts. This review excluded studies which could not report slide level results, but frequently research groups do not have access to clinical data or whole slide annotations to verify their results. Efforts such as BIGPICTURE 115 could help facilitate more clinically relevant research by making such data available.

Challenges have been crucial in pushing the boundaries of deep learning research in computational pathology37,116,117. Even after challenges have been completed, a number of studies still use the provided benchmark for further evaluation of new approaches. Yet, these challenges and benchmarks focus on a specific problem which may not be translatable to a diagnostic element and often do not require methods to develop deployable tools but rather highly specific methods to beat the benchmark. Moreover, inference time, economic considerations and required infrastructure are usually not addressed either, such that these challenges often are only useful for deep learning researchers, but not for clinicians and laboratory staff. On a positive note, challenges provide a standardized evaluation dataset such that methods can be compared, whereas even validation on TCGA is often done non-transparently without publishing the used data split, list of included slides in train and test set, or the exclusion criteria.

This review focused on the microscopic evaluation of CRC resection specimen and exclusively on H&E based evaluation, yet a diagnostic report would also contain macroscopic information or use other data modalities. Macroscopic analysis includes taking photos of the specimen and textual information exists, so theoretically, deep learning models could be developed for this purpose, too. Deep learning-based analysis of IHC, while more expensive since additional stainings are necessary, may be more reproducible and would still be automatically assessed118. Novel machines analyzing tissue before embedding could also further pave the way for additional deep learning tools improving diagnostic routine by creating different imaging data119.

Regardless, the major results of this review are the identification of several gaps in the ICCR criteria that current research has not yet covered. Table 11 gives an overview over all topics not yet comprehensively addressed by current work. For each topic, a potential approach is outlined, together with the required data and the potential benefit from automatically analyzing this element.

View this table:
[Table 11:](http://medrxiv.org/content/early/2025/09/01/2025.06.10.25329328/T11)

Table 11: Addressing the remaining ICCR CRC guideline elements.

This overview could be a starting point for a number of investigations, yet it is important to mention that each element may have its own set of specific problems that only arise once researchers investigate the element.

## 5. Limitations

Limiting the scope of the review to papers with slide-level results and automated analysis ignores a sizable number of papers that could likely be translated to working WSI level approaches with little additional work. Yet, these excluded articles are also not less important for the progress in the field. Moreover, the employed search terms potentially excluded relevant papers if they did not include the indicated terms in the abstract. By selecting ICCR CRC guidelines over alternatives like CAP, we emphasized certain topics while omitting emerging biomarkers such as tumor-stroma ratio, tumor border configuration, and immune response. Additionally, the focus on H&E-based algorithms ignores immunohistochemistry-based approaches, which represent promising alternatives in CRC diagnostics.

## 6. Strengths

This systematic review includes publications on deep– and machine-learning AI-models for image analysis in pathology according to elements of the ICCR CRC dataset7 – a globally recognized, evidence-based and standardized pathology reporting guideline. It provides the readers with the current landscape of research in CPath towards automating routine tasks of pathologists during CRC assessment according to ICCR guideline. In the results section we provide a general analysis on the trend of publications over the years and per each element of the guideline. We also present information on the data, code and weight availability among included publications.

Following the general overview, we dissect publications individually, summarize their key results in separate tables as well as critically evaluate the clinical readiness of the proposed models, for each core and non-core element of the ICCR CRC guideline.

Moreover, to guide future research efforts, we present a dedicated table containing all underexplored core and non-core elements, where we propose a training approach, the dataset types and the anticipated benefits of utilizing such models.

Finally, we emphasize practical aspects for researchers, such as the necessity for transparency in computational requirements, availability of code, weights and data – essential for reproducibility and improvement of AI tools with clinical applicability.

All in all, this review should facilitate alignment between future research efforts in CPath and clinical practices as well as encourage new studies to fill current gaps and underexplored research areas.

## 7. Conclusion

This systematic review provides a comprehensive overview of the current state of CPath research to address elements of the diagnostic report in CRC. It reveals the limited maturity of the field and lack of alignment between research and clinical practices and emphasizes the need for developing approaches for various gaps while evaluating existing methods in clinical contexts. The findings could also lead to new benchmarks for novel models that assess entire diagnostic workflows rather than isolated tasks. Finally, this review underscores the remarkable complexity of CRC diagnosis performed by pathologists, which represents just one of many diseases they routinely evaluate every day.

## Supporting information

Supplementary Material [[supplements/329328_file02.docx]](pending:yes)

## Data Availability

All data produced in the present study are available upon reasonable request to the authors.

## 9. Competing interests

Inti Zlobec acts as scientific advisor for Aiforia.

## 8. Acknowledgements

This work was made possible via funding from several sources. E.B. and A.L.F. were funded by the Swiss National Science Foundation (CRSII5_193832). A.K. was funded by the Center for Artificial Intelligence in Medicine (CAIM), University of Bern. M.G. and J.F.C. was funded by the Swiss cancer league (KFS-5786-02-2023-R). J.H. was funded by the Institute of Tissue Medicine and Pathology, University of Bern. R.M. was funded by Swiss National Science Foundation (31003A_166578/1) and the Swiss Government Excellence Scholarship (ESKAS, nr. 2021.0019 / Kosovo / OP). J.G.B. was funded by the Swiss National Science Foundation (10.000.619).

We would also like to thank Tanya Karrer, University Library of Bern, for her valuable help in the conceptual development of this review.

## Footnotes

*   Changed wrong ORCID in author list.

*   Received June 10, 2025.
*   Revision received September 1, 2025.
*   Accepted September 1, 2025.


*   © 2025, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at [http://creativecommons.org/licenses/by-nc/4.0/](http://creativecommons.org/licenses/by-nc/4.0/)

## References

1.  1.Hanna, M. G., et al. Integrating digital pathology into clinical practice. Modern Pathology 2021 35:2 35, 152–164 (2021).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34599281&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

2.  2.Eloy, C., et al. Artificial intelligence–assisted cancer diagnosis improves the efficiency of pathologists in prostatic biopsies. Virchows Archiv 482, 595–604 (2023).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00428-023-03518-5&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=36809483&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

3.  3.Frei, A. L., et al. Pathologist Computer-Aided Diagnostic Scoring of Tumor Cell Fraction: A Swiss National Study. Modern Pathology 36, 100335 (2023).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=37742926&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

4.  4.Bertram, C. A., et al. Computer-assisted mitotic count using a deep learning– based algorithm improves interobserver reproducibility and accuracy. Vet Pathol 59, 211–226 (2022).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/03009858211067478&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34965805&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

5.  5.Sung, H. et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 71, 209–249 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3322/caac.21660&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33538338&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

6.  6.Nagtegaal, I. D. et al. The 2019 WHO classification of tumours of the digestive system. Histopathology 76, 182 (2019).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31433515&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

7.  7.Loughrey, M. B., et al. Dataset for Pathology Reporting of Colorectal Cancer: Recommendations from the International Collaboration on Cancer Reporting (ICCR). Ann Surg 275, 549–561 (2022).
    
    
8.  8.Burgart, L. J., Chopp, W. V & Jain, D. Protocol for the Examination of Resection Specimens From Patients With Primary Carcinoma of the Colon and Rectum With guidance from the CAP Cancer and CAP Pathology Electronic Reporting Committees. (2022).
    
    
9.  9.Moher, D., et al. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Ann Intern Med 151, 264–269 (2009).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.7326/0003-4819-151-4-200908180-00135&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19622511&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000269038900006&link_type=ISI) 

10. 10.Kellermeyer, L., Harnke, B. & Knight, S. Covidence and Rayyan. J Med Libr Assoc 106, 580 (2018).
    
    
11. 11.Song, A. H., et al. Artificial intelligence for digital and computational pathology. Nature Reviews Bioengineering 2023 1:12 1, 930–949 (2023).
    
    
12. 12.Brierley, J., Gospodarowicz, M. D. & Wittekind, C. T. TNM Classification of Malignant Tumors International Union Against Cancer. 8th. Oxford, England: Wiley; 2017. Wiley pp57–62 (2017).
    
    
13. 13.Ong, M. L. H. & Schofield, J. B. Assessment of lymph node involvement in colorectal cancer. World J Gastrointest Surg 8, 179 (2016).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.4240/wjgs.v8.i3.179&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27022445&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

14. 14.Chuang, W.-Y., et al. Identification of nodal micrometastasis in colorectal cancer using deep learning on annotation-free whole-slide images. Modern Pathology 34, 1901–1911 (2021).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34103664&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

15. 15.Mainovskaya, O. A., et al. Development and application of artificial intelligence for the detection of the lymph nodes metastases in colorectal cancer. Koloproktologia 21, 49–59 (2022).
    
    
16. 16.Tsuneki, M. & Kanavati, F. Deep learning models for poorly differentiated colorectal adenocarcinoma classification in whole slide images using transfer learning. Diagnostics 11, (2021).
    
    
17. 17.Bándi, P., et al. Continual learning strategies for cancer-independent detection of lymph node metastases. Med Image Anal 85, (2023).
    
    
18. 18.Khan, A., et al. Computer-Assisted Diagnosis of Lymph Node Metastases in Colorectal Cancers Using Transfer Learning With an Ensemble Model. Modern Pathology 36, (2023).
    
    
19. 19.Tan, L., et al. Colorectal cancer lymph node metastasis prediction with weakly supervised transformer-based multi-instance learning. Med Biol Eng Comput 61, 1565–1580 (2023).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=36809427&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

20. 20.Kindler, C., Elfwing, S., Öhrvik, J. & Nikberg, M. A Deep Neural Network– Based Decision Support Tool for the Detection of Lymph Node Metastases in Colorectal Cancer Specimens. Modern Pathology 36, (2023).
    
    
21. 21.Giammanco, A., et al. Fast-Track Development and Multi-institutional Clinical Validation of an Artificial Intelligence Algorithm for Detection of Lymph Node Metastasis in Colorectal Cancer. Modern Pathology 37, (2024).
    
    
22. 22.Huang, Z., et al. A pathologist–AI collaboration framework for enhancing diagnostic accuracies and efficiencies. Nat Biomed Eng (2024) doi:10.1038/s41551-024-01223-5.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41551-024-01223-5&link_type=DOI) 

23. 23.Chandler, I. & Houlston, R. S. Interobserver agreement in grading of colorectal cancers-findings from a nationwide web-based survey of histopathologists. Histopathology 52, 494–499 (2008).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1365-2559.2008.02976.x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18315602&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

24. 24.Rathore, S., et al. Segmentation and grade prediction of colon cancer digital pathology images across multiple institutions. Cancers (Basel*)* 11, (2019).
    
    
25. 25.Schrammen, P. L., et al. Weakly supervised annotation-free cancer detection and prediction of genotype in routine histopathology. J Pathol 256, 50–60 (2022).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34561876&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

26. 26.Chen, S., et al. Automatic Tumor Grading on Colorectal Cancer Whole-Slide Images: Semi-Quantitative Gland Formation Percentage and New Indicator Exploration. Front Oncol 12, (2022).
    
    
27. 27.Soldatov, S. A., et al. Deep Learning Classification of Colorectal Lesions Based on Whole Slide Images. Algorithms 15, (2022).
    
    
28. 28.Ueno, H., et al. New criteria for histologic grading of colorectal cancer. American Journal of Surgical Pathology 36, 193–201 (2012).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/PAS.0b013e318235edee&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22251938&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

29. 29.Surveillance Research Program, N. C. I. SEER*Explorer: An interactive website for SEER cancer statistics. SEER Incidence Data, November 2024 Submission (1975-2022), SEER 21 registries (excluding Illinois). Expected Survival Life Tables by Socio-Economic Standards. [https://seer.cancer.gov/statistics-network/explorer/](https://seer.cancer.gov/statistics-network/explorer/) (2025).
    
    
30. 30.Levy, J., Haudenschild, C., Barwick, C., Christensen, B. & Vaickus, L. Topological Feature Extraction and Visualization of Whole Slide Images using Graph Neural Networks. Pac Symp Biocomput 26, 285–296 (2021).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33691025&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

31. 31.Kumar, N., et al. Computer-extracted features of nuclear morphology in hematoxylin and eosin images distinguish stage II and IV colon tumors. J Pathol 257, 17–28 (2022).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/path.5864&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35007352&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

32. 32.Pei, Z., Zhang, D. & Shao, W. Efficient Metric Learning with Graph Transformer for Accurate Colorectal Cancer Staging. in 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) 1–6 (2022). doi:10.1109/BHI56158.2022.9926858.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/BHI56158.2022.9926858&link_type=DOI) 

33. 33.Liebig, C., et al. Perineural invasion is an independent predictor of outcome in colorectal cancer. Journal of Clinical Oncology 27, 5131–5137 (2009).
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamNvIjtzOjU6InJlc2lkIjtzOjEwOiIyNy8zMS81MTMxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjUvMDkvMDEvMjAyNS4wNi4xMC4yNTMyOTMyOC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

34. 34.Huh, J. W., Kim, H. R. & Kim, Y. J. Prognostic value of perineural invasion in patients with stage II colorectal cancer. Ann Surg Oncol 17, 2066–2072 (2010).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1245/s10434-010-0982-7&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20182809&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000279653600016&link_type=ISI) 

35. 35.Han, C. H. & Kwak, J. T. A hybrid computational pathology method for the detection of perineural invasion junctions. doi:10.1117/12.2610756 12039, 215–219 (2022).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1117/12.2610756&link_type=DOI) 

36. 36.Jung, J., Kim, E., Lee, H., Lee, S. H. & Ahn, S. Automated Hybrid Model for Detecting Perineural Invasion in the Histology of Colorectal Cancer. APPLIED SCIENCES-BASEL 12, (2022).
    
    
37. 37.Choi, J., Lee, K., Jeong, W.-K. & Chun, S. Y. PAIP2021: Perineural Invasion in Multiple Organ Cancer (Colon, Prostate, and Pancreatobiliary tract). in International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI*)* (2021). doi:10.5281/ZENODO.4575424.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.5281/ZENODO.4575424&link_type=DOI) 

38. 38.Ju, Y. H., et al. Artificial intelligence recognition of pathological T stage and tumor invasion in rectal cancer based on large panoramic pathological sections. INTELLIGENT MEDICINE 2, 141–151 (2022).
    
    
39. 39.Langer, R. & Becker, K. Tumor regression grading of gastrointestinal cancers after neoadjuvant therapy. Virchows Archiv 472, 175–186 (2018).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28918544&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

40. 40.Zhang, D., et al. Using Multi-Scale Convolutional Neural Network Based on Multi-Instance Learning to Predict the Efficacy of Neoadjuvant Chemoradiotherapy for Rectal Cancer. IEEE J Transl Eng Health Med 10, (2022).
    
    
41. 41.Amin, M. B. et al. AJCC Cancer Staging Manual. vol. 1024 (Springer, 2017).
    
    
42. 42.Kunkel, T. A. & Erie, D. A. DNA mismatch repair. Annu Rev Biochem 74, 681– 710 (2005).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1146/annurev.biochem.74.082803.133243&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15952900&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000231235100023&link_type=ISI) 

43. 43.Boland, C. R. & Goel, A. Microsatellite Instability in Colorectal Cancer. Gastroenterology 138, 2073–2087.e3 (2010).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1053/j.gastro.2009.12.064&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20420947&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000276993600005&link_type=ISI) 

44. 44.Argilés, G., et al. Localised colon cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up y on behalf of the ESMO Guidelines Committee. Annals of Oncology 31, 1291–1305 (2020).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.annonc.2020.06.022&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32702383&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

45. 45.Cervantes, A., et al. Metastatic colorectal cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up 5 behalf of the ESMO Guidelines Committee. Annals of Oncology 34, 10–32 (2023).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.annonc.2022.10.003&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=36307056&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

46. 46.Stoffel, E. M., et al. Hereditary Colorectal Cancer Syndromes: American Society of Clinical Oncology Clinical Practice Guideline Endorsement of the Familial Risk–Colorectal Cancer: European Society for Medical Oncology Clinical Practice Guidelines. Journal of Clinical Oncology 33, 209 (2014).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25452455&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

47. 47.Kather, J. N., et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nature Medicine 2019 25:7 25, 1054–1056 (2019).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-019-0462-y&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31160815&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

48. 48.Schmauch, B., et al. A deep learning model to predict RNA-Seq expression of tumours from whole slide images. Nature Communications 2020 11:1 11, 1–15 (2020).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31911652&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

49. 49.Lee, H., et al. Two-Stage Classification Method for MSI Status Prediction Based on Deep Learning Approach. APPLIED SCIENCES-BASEL 11, (2021).
    
    
50. 50.Yamashita, R., et al. Deep learning model for the prediction of microsatellite instability in colorectal cancer: a diagnostic study. Lancet Oncol 22, 132–141 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S1470-2045(20)30535-0&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33387492&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

51. 51.Ke, J., et al. Identifying patch-level MSI from histological images of Colorectal Cancer by a Knowledge Distillation Model. in 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 1043–1046 (2020). doi:10.1109/BIBM49941.2020.9313141.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/BIBM49941.2020.9313141&link_type=DOI) 

52. 52.Pai, R. K., et al. Development and initial validation of a deep learning algorithm to quantify histological features in colorectal carcinoma including tumour budding/poorly differentiated clusters. Histopathology 79, 391–405 (2021).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33590485&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

53. 53.Lee, S. H., Song, I. H. & Jang, H.-J. Feasibility of deep learning-based fully automated classification of microsatellite instability in tissue slides of colorectal cancer. Int J Cancer 149, 728–740 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/ijc.33599&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33851412&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

54. 54.Schrammen, P. L., et al. Weakly supervised annotation-free cancer detection and prediction of genotype in routine histopathology. Journal of Pathology 256, 50–60 (2022).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34561876&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

55. 55.Saillard, C., et al. Validation of MSIntuit as an AI-based pre-screening tool for MSI detection from colorectal cancer histology slides. Nat Commun 14, (2023).
    
    
56. 56.Bilal, M., et al. Development and validation of a weakly supervised deep learning framework to predict the status of molecular pathways and key mutations in colorectal cancer from routine histology images: a retrospective study. Lancet Digit Health 3, e763–e772 (2021).
    
    
57. 57.Bustos, A., et al. XDEEP-MSI: Explainable bias-rejecting microsatellite instability deep learning system in colorectal cancer. Biomolecules 11, (2021).
    
    
58. 58.Echle, A., et al. Artificial intelligence for detection of microsatellite instability in colorectal cancer—a multicentric analysis of a pre-screening tool for clinical application. ESMO Open 7, (2022).
    
    
59. 59.Saldanha, O. L., et al. Swarm learning for decentralized artificial intelligence in cancer histopathology. Nat Med 28, 1232–1239 (2022).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-022-01768-5&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35469069&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

60. 60.Ghaffari Laleh, N., et al. Benchmarking weakly-supervised deep learning pipelines for whole slide classification in computational pathology. Med Image Anal 79, (2022).
    
    
61. 61.Fujii, S., et al. Rapid Screening Using Pathomorphologic Interpretation to Detect BRAFV600E Mutation and Microsatellite Instability in Colorectal Cancer. Clinical Cancer Research 28, 2623–2632 (2022).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35363302&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

62. 62.Jiang, W., et al. Clinical actionability of triaging DNA mismatch repair deficient colorectal cancer from biopsy samples using deep learning. EBioMedicine 81, (2022).
    
    
63. 63.Guo, Y., et al. Learn to Estimate Genetic Mutation and Microsatellite Instability with Histopathology H&E Slides in Colon Carcinoma. Cancers (Basel*)* 14, (2022).
    
    
64. 64.Lou, J., et al. PPsNet: An improved deep learning model for microsatellite instability high prediction in colorectal cancer from whole slide images. Comput Methods Programs Biomed 225, (2022).
    
    
65. 65.Teichmann, M., Aichert, A., Bohnenberger, H., Ströbel, P. & Heimann, T. End-to-end learning for image-based detection of molecular alterations in digital pathology. in International Conference on Medical Image Computing and Computer-Assisted Intervention 88–98 (2022).
    
    
66. 66.Lv, Z., Yan, R., Lin, Y., Wang, Y. & Zhang, F. Joint region-attention and multi-scale transformer for microsatellite instability detection from whole slide images in gastrointestinal cancer. in International Conference on Medical Image Computing and Computer-Assisted Intervention 293–302 (2022).
    
    
67. 67.Rubinstein, J. C., et al. Deep learning image analysis quantifies tumor heterogeneity and identifies microsatellite instability in colon cancer. J Surg Oncol 127, 426–433 (2023).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/jso.27118&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=36251352&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

68. 68.Ding, K., Zhou, M., Wang, H., Zhang, S. & Metaxas, D. N. Spatially aware graph neural networks and cross-level molecular profile prediction in colon cancer histopathology: a retrospective multi-cohort study. Lancet Digit Health 4, e787–e795 (2022).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=36307192&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

69. 69.Cao, R., Gu, Q., Tan, D., Wei, P. & Zheng, C. Prediction of microsatellite instability of colorectal cancer using multi-scale pathological images based on deep learning. in 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 1461–1466 (2022). doi:10.1109/BIBM55620.2022.9995576.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/BIBM55620.2022.9995576&link_type=DOI) 

70. 70.Guo, B., et al. Predicting microsatellite instability and key biomarkers in colorectal cancer from H&E-stained images: achieving state-of-the-art predictive performance with fewer data using Swin Transformer. Journal of Pathology: Clinical Research 9, 223–235 (2023).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=36723384&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

71. 71.Chang, X., et al. Predicting colorectal cancer microsatellite instability with a self-attention-enabled convolutional neural network. Cell Rep Med 4, (2023).
    
    
72. 72.Niehues, J. M., et al. Generalizable biomarker prediction from cancer pathology slides with self-supervised deep learning: A retrospective multi-centric study. Cell Rep Med 4, (2023).
    
    
73. 73.Yuan, X., Ruan, J. & Yue, J. Classification of Microsatellite Instability Status in Slide-level Annotated Colorectal Tumors by Weakly Supervised Deep Learning. in *2023 IEEE 6th Information Technology,Networking*,Electronic and Automation Control Conference (ITNEC*)* vol. 6 1306–1310 (2023).
    
    
74. 74.Tsai, P.-C., et al. Histopathology images predict multi-omics aberrations and prognoses in colorectal cancer patients. Nat Commun 14, (2023).
    
    
75. 75.Wagner, S. J., et al. Transformer-based biomarker prediction from colorectal cancer histology: A large-scale multicentric study. Cancer Cell 41, 1650–1661.e4 (2023).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ccell.2023.08.002&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=37652006&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

76. 76.Cen, M., et al. A Novel and Efficient Digital Pathology Classifier for Predicting Cancer Biomarkers Using Sequencer Architecture. American Journal of Pathology 193, 2122–2132 (2023).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=37775043&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

77. 77.Kim, K. et al. PAIP 2020: Microsatellite instability prediction in colorectal cancer. Med Image Anal 89, 102886 (2023).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=37494811&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

78. 78.Saillard, C., et al. Validation of MSIntuit as an AI-based pre-screening tool for MSI detection from colorectal cancer histology slides. Nature Communications 2023 14:1 14, 1–11 (2023).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-023-41540-y&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=36596776&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

79. 79.Zamanitajeddin, N., Jahanifar, M., Bilal, M., Eastwood, M. & Rajpoot, N. Social network analysis of cell networks improves deep learning for prediction of molecular pathways and key mutations in colorectal cancer. Med Image Anal 93, 103071 (2024).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.media.2023.103071&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=38199068&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

80. 80.Roth, B., et al. Low-Resource Finetuning of Foundation Models Beats State-of-the-Art in Histopathology. in 2024 IEEE International Symposium on Biomedical Imaging (ISBI) 1–5 (2024). doi:10.1109/ISBI56570.2024.10635695.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/ISBI56570.2024.10635695&link_type=DOI) 

81. 81.Liu, Q., et al. M 2 Fusion: Bayesian-Based Multimodal Multi-level Fusion on Colorectal Cancer Microsatellite Instability Prediction. in International Conference on Medical Image Computing and Computer-Assisted Intervention 125–134 (2023).
    
    
82. 82.Shen, Y., Zhou, B., Xiong, X., Gao, R. & Wang, Y. G. How GNNs Facilitate CNNs in Mining Geometric Information from Large-Scale Medical Images. in 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2227–2230 (2023). doi:10.1109/BIBM58861.2023.10385379.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/BIBM58861.2023.10385379&link_type=DOI) 

83. 83.Gustav, M., et al. Deep learning for dual detection of microsatellite instability and POLE mutations in colorectal cancer histopathology. NPJ Precis Oncol 8, (2024).
    
    
84. 84.Vorontsov, E., et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nature Medicine 2024 30:10 30, 2924–2935 (2024).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-024-03141-0&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=39039250&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

85. 85.Wang, X., et al. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature 634, 21 (2024).
    
    
86. 86.Klarskov, L., et al. Interobserver variability in the evaluation of mismatch repair protein immunostaining. Hum Pathol 41, 1387–1396 (2010).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.humpath.2010.03.003&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20573374&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

87. 87.Kather, J. N. Histological image tiles for TCGA-CRC-DX, color-normalized, sorted by MSI status, train/test split. Preprint at doi:10.5281/ZENODO.3832231 (2020).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.5281/ZENODO.3832231&link_type=DOI) 

88. 88.Sharma, S. G. & Gulley, M. L. BRAF Mutation Testing in Colorectal Cancer. Arch Pathol Lab Med 134, 1225–1228 (2010).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20670148&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

89. 89.Kopetz, S., et al. Molecular profiling of BRAF-V600E-mutant metastatic colorectal cancer in the phase□3 BEACON CRC trial. Nature Medicine 2024 30:11 30, 3261–3271 (2024).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=39313594&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

90. 90.Nicolò Fanelli, G., et al. The heterogeneous clinical and pathological landscapes of metastatic Braf-mutated colorectal cancer. doi:10.1186/s12935-020-1117-2.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12935-020-1117-2&link_type=DOI) 

91. 91.Fu, Y., et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat Cancer 1, 800–810 (2020).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35122049&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

92. 92.Chen, Z. H., Li, X. Y., Yang, M. M., Zhang, H. & Xu, X. S. Optimization of deep learning models for the prediction of gene mutations using unsupervised clustering. JOURNAL OF PATHOLOGY CLINICAL RESEARCH 9, 3–17 (2023).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=36376239&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

93. 93.Ghaffari Laleh, N., et al. Benchmarking weakly-supervised deep learning pipelines for whole slide classification in computational pathology. Med Image Anal 79, (2022).
    
    
94. 94.Saldanha, O. L., et al. Swarm learning for decentralized artificial intelligence in cancer histopathology. Nat Med 28, 1232–1239 (2022).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-022-01768-5&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35469069&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

95. 95.Tran, B., et al. Impact of BRAF mutation and microsatellite instability on the pattern of metastatic spread and prognosis in metastatic colorectal cancer. Cancer 117, 4623–4632 (2011).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/cncr.26086&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21456008&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000295797400009&link_type=ISI) 

96. 96.Korbar, B., et al. Deep Learning for Classification of Colorectal Polyps on Whole-slide Images. J Pathol Inform 8, 30 (2017).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28828201&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

97. 97.Wei, J. W., et al. Evaluation of a Deep Neural Network for Automated Classification of Colorectal Polyps on Histopathologic Slides. JAMA Netw Open 3, e203398–e203398 (2020).
    
    
98. 98.Perlo, D., Tartaglione, E., Bertero, L., Cassoni, P. & Grangetto, M. Dysplasia Grading of Colorectal Polyps Through Convolutional Neural Network Analysis of Whole Slide Images. in 325–334 (2022). doi:10.1007/978-981-16-3880-0_34.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/978-981-16-3880-0_34&link_type=DOI) 

99. 99.Neto, P. C., et al. iMIL4PATH: A Semi-Supervised Interpretable Approach for Colorectal Whole-Slide Images. Cancers 2022, Vol. 14, Page 2489 **14**, 2489 (2022).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35626093&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

100.100.Rymarczyk, D., et al. Deep Learning Models Capture Histological Disease Activity in Crohn’s Disease and Ulcerative Colitis with High Fidelity. J Crohns Colitis 18, 604–614 (2024).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=37814351&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

101.101.Lugli, A., Zlobec, I., Berger, M. D., Kirsch, R. & Nagtegaal, I. D. Tumour budding in solid cancers. Nature Reviews Clinical Oncology 2020 18:2 18, 101–115 (2020).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32901132&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

102.102.Lugli, A., et al. Recommendations for reporting tumor budding in colorectal cancer based on the International Tumor Budding Consensus Conference (ITBCC) 2016. Modern Pathology 30, 1299–1311 (2017).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/modpathol.2017.46&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28548122&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

103.103.Grigore, A. D., Jolly, M. K., Jia, D., Farach-Carson, M. C. & Levine, H. Tumor Budding: The Name is EMT. Partial EMT. Journal of Clinical Medicine 2016, Vol. 5, Page 51 **5**, 51 (2016).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27136592&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

104.104.Lu, J. Q., et al. Development and application of a detection platform for colorectal cancer tumor sprouting pathological characteristics based on artificial intelligence. INTELLIGENT MEDICINE 2, 82–87 (2022).
    
    
105.105.Bokhorst, J.-M., et al. Fully Automated Tumor Bud Assessment in Hematoxylin and Eosin-Stained Whole Slide Images of Colorectal Cancer. Modern Pathology 36, (2023).
    
    
106.106.Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell 39, 1137–1149 (2017).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TPAMI.2016.2577031&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27295650&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

107.107.Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9351, 234–241 (2015).
    
    
108.108.Graham, S., et al. Hover-Net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med Image Anal 58, 101563 (2019).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.media.2019.101563&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31561183&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

109.109.Haddad, T. S., et al. Pseudobudding: ruptured glands do not represent true tumor buds. J Pathol 261, 19–27 (2023).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/path.6146&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=37403270&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

110.110.Lu, M. Y., et al. A multimodal generative AI copilot for human pathology. Nature 634, 466–473 (2024).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-024-07618-3&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=38866050&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

111.111.Kleppe, A., et al. A clinical decision support system optimising adjuvant chemotherapy for colorectal cancers by integrating deep learning and pathological staging markers: a development and validation study. Lancet Oncol 23, 1221–1232 (2022).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S1470-2045(22)00391-6&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35964620&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

112.112.Plass, M., et al. Explainability and causability in digital pathology. J Pathol Clin Res 9, 251–260 (2023).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=37045794&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

113.113.Asilian Bidgoli, A., Rahnamayan, S., Dehkharghanian, T., Grami, A. & Tizhoosh, H. R. Bias reduction in representation of histopathology images using deep feature selection. Scientific Reports 2022 12:1 12, 1–12 (2022).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34992227&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

114.114.Vaidya, A., et al. Demographic bias in misdiagnosis by computational pathology models. Nature Medicine 2024 30:4 30, 1174–1190 (2024).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-024-02885-z&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=38641744&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

115.115.Moulin, P., Grünberg, K., Barale-Thomas, E. & der Laak, J. van. IMI— Bigpicture: A Central Repository for Digital Pathology. Toxicol Pathol 49, 711– 713 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/0192623321989644&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33571073&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

116.116.Graham, S., et al. CoNIC: Colon Nuclei Identification and Counting Challenge 2022. (2021).
    
    
117.117.Litjens, G., et al. 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. Gigascience 7, (2018).
    
    
118.118.Lems, C. M., et al. Color Deconvolution for Color-Agnostic and Cross-Modality Analysis of Immunohistochemistry Whole-Slide Images with Deep Learning. Proceedings – International Symposium on Biomedical Imaging (2024) doi:10.1109/ISBI56570.2024.10635258.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/ISBI56570.2024.10635258&link_type=DOI) 

119.119.Jorns, J. M., et al. Shortwave infrared imaging increases the number of lymph nodes and the number of positive lymph nodes in surgical pathology specimens – a 104-patient study. doi:10.1101/2024.11.24.24317845.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyNC4xMS4yNC4yNDMxNzg0NXYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjUvMDkvMDEvMjAyNS4wNi4xMC4yNTMyOTMyOC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

120.120.Kirsch, R., et al. Venous invasion in colorectal cancer: Impact of an elastin stain on detection and interobserver agreement among gastrointestinal and nongastrointestinal pathologists. American Journal of Surgical Pathology 37, 200–210 (2013).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/PAS.0b013e31826a92cd&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23108018&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

121.121.Ronnow, C. F. et al. Lymphovascular Infiltration, Not Depth of Invasion, is the Critical Risk Factor of Metastases in Early Colorectal Cancer: Retrospective Population-based Cohort Study on Prospectively Collected Data, including Validation. Ann Surg 275, E148–E154 (2022).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32187031&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

122.122.Skancke, M., et al. Lymphovascular invasion and perineural invasion negatively impact overall survival for stage II adenocarcinoma of the colon. Dis Colon Rectum 62, 181–188 (2019).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30640833&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

123.123.Kooyker, A. I., Verbeek, W. H. M., van den Berg, J. G., Tesselaar, M. E. T. & van Leerdam, M. E. Change in incidence, characteristics and management of colorectal neuroendocrine tumours in the Netherlands in the last decade. United European Gastroenterol J 8, 59–67 (2020).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32213058&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

124.124.Shafqat, H., Ali, S., Salhab, M. & Olszewski, A. J. Survival of patients with neuroendocrine carcinoma of the colon and rectum: A population-based analysis. Dis Colon Rectum 58, 294–303 (2015).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25664707&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

125.125.Merkel, S., et al. The prognostic inhomogeneity in pT3 rectal carcinomas. Int J Colorectal Dis 16, 298–304 (2001).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s003840100309&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=11686527&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2025%2F09%2F01%2F2025.06.10.25329328.atom) 

126.126.Bori, R., et al. Heterogeneity of pT3 Colorectal Carcinomas According to the Depth of Invasion. doi:10.1007/s12253-009-9149-x.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s12253-009-9149-x&link_type=DOI)