Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Assessing Genotype-Phenotype Correlations with Deep Learning in Colorectal Cancer: A Multi-Centric Study

View ORCID ProfileMarco Gustav, Marko van Treeck, Nic G. Reitsam, Zunamys I. Carrero, Chiara M. Loeffler, Asier Rabasco Meneghetti, Bruno Märkl, Lisa A. Boardman, Amy J. French, Ellen L. Goode, Andrea Gsur, Stefanie Brezina, Marc J. Gunter, Neil Murphy, Pia Hönscheid, Christian Sperling, Sebastian Foersch, Robert Steinfelder, Tabitha Harrison, Ulrike Peters, Amanda Phipps, View ORCID ProfileJakob Nikolas Kather
doi: https://doi.org/10.1101/2025.02.04.25321660
Marco Gustav
1Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Marco Gustav
Marko van Treeck
1Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nic G. Reitsam
2Pathology, Faculty of Medicine, University of Augsburg, Augsburg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zunamys I. Carrero
1Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chiara M. Loeffler
1Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
3Department of Medicine I, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Asier Rabasco Meneghetti
1Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bruno Märkl
2Pathology, Faculty of Medicine, University of Augsburg, Augsburg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lisa A. Boardman
4Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, Minnesota, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Amy J. French
5Division of Laboratory Genetics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ellen L. Goode
6Department of Quantitative Health Sciences, Division of Epidemiology, Mayo Clinic, Rochester, Minnesota, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrea Gsur
7Center for Cancer Research, Medical University of Vienna, Vienna, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stefanie Brezina
7Center for Cancer Research, Medical University of Vienna, Vienna, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marc J. Gunter
8Nutrition and Metabolism Branch, International Agency for Research on Cancer, World Health Organization, Lyon, France
9Cancer Epidemiology and Prevention Research Unit, School of Public Health, Imperial College London, London, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Neil Murphy
8Nutrition and Metabolism Branch, International Agency for Research on Cancer, World Health Organization, Lyon, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pia Hönscheid
10Institute of Pathology, University Hospital Carl Gustav Carus (UKD), Technical University Dresden (TUD), Dresden, Germany
11National Center for Tumor Diseases (NCT), Partner Site Dresden, German Cancer Research Center Heidelberg, Dresden, Germany
12German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christian Sperling
10Institute of Pathology, University Hospital Carl Gustav Carus (UKD), Technical University Dresden (TUD), Dresden, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sebastian Foersch
13Institute of Pathology, University Medical Center Mainz, Mainz, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robert Steinfelder
14Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tabitha Harrison
14Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
15Department of Epidemiology, University of Washington, Seattle, WA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ulrike Peters
14Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
15Department of Epidemiology, University of Washington, Seattle, WA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Amanda Phipps
14Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
15Department of Epidemiology, University of Washington, Seattle, WA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jakob Nikolas Kather
1Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
3Department of Medicine I, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
16Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany
17Pathology & Data Analytics, Leeds Institute of Medical Research at St James’s, University of Leeds, Leeds, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jakob Nikolas Kather
  • For correspondence: jakob-nikolas.kather{at}alumni.dkfz.de
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Background Deep Learning (DL) has emerged as a powerful tool to predict genetic biomarkers directly from digitized Hematoxylin and Eosin (H&E) slides in colorectal cancer (CRC). However, few studies have systematically investigated the predictability of biomarkers beyond routinely available alterations such as microsatellite instability (MSI), and BRAF and KRAS mutations.

Methods Our primary dataset comprised H&E slides of CRC tumors across five cohorts totaling 1,376 patients who underwent comprehensive panel sequencing, with an additional 536 patients from two public datasets for validation. We developed a DL model using a single transformer model to predict multiple genetic alterations directly from the slides. The model’s performance was compared against conventional single-target models, and potential confounders were analyzed.

Findings The multi-target model was able to predict numerous biomarkers from pathology slides, matching and partly exceeding single-target transformers. The Area Under the Receiver Operating Characteristic curve (AUROC, mean ± std) on the primary external validation cohorts was: BRAF (0·78 ± 0·01), hypermutation (0·88 ± 0·01), MSI (0·93 ± 0·01), RNF43 (0·86 ± 0·01); this biomarker predictability was mirrored across metrics and co-occurrence analyses. However, biomarkers with high AUROCs largely correlated with MSI, with model predictions depending considerably on MSI-associated morphology upon pathological examination.

Interpretation Our study demonstrates that multi-target transformers can predict the biomarker status for numerous genetic alterations in CRC directly from H&E slides. However, their pre-dictability is mainly associated with MSI phenotype, despite indications of slight biomarker-inherent contributions to a phenotype. Our findings underscore the need to analyze confounders in AI-based oncology biomarkers. To enable this, we developed a validated model applicable to other cancers and larger, diverse datasets.

Funding The German Federal Ministry of Health, the Max-Eder-Programme of German Cancer Aid, the German Federal Ministry of Education and Research, the German Academic Exchange Service, and the EU.

Competing Interest Statement

JNK declares consulting services for Bioptimus, France; Owkin, France; DoMore Diagnostics, Norway; Panakeia, UK; AstraZeneca, UK; Mindpeak, Germany; and MultiplexDx, Slovakia. Furthermore, he holds shares in StratifAI GmbH, Germany, Synagen GmbH, Germany; has received a research grant by GSK; and has received honoraria by AstraZeneca, Bayer, Daiichi Sankyo, Eisai, Janssen, Merck, MSD, BMS, Roche, Pfizer, and Fresenius. MG has received honoraria for lectures sponsored by Techniker Krankenkasse (TK) and AstraZeneca. SF has received honoraria for lectures by BMS and MSD. UP declares consulting services for AbbVie and her husband is holding individual stocks for the following companies: BioNTech SE - ADR, Amazon, CureVac BV, NanoString Technologies, Google/Alphabet Inc Class C, NVIDIA Corp, Microsoft Corp.. No other potential conflicts of interest are reported by any of the authors.

Funding Statement

JNK is supported by the German Federal Ministry of Health (DEEP LIVER, ZMVI1-2520DAT111), the Max-Eder-Programme of German Cancer Aid (grant #70113864), the German Federal Ministry of Education and Research (PEARL, 01KD2104C; CAMINO, 01EO2101; SWAG, 01KD2215A; TRANSFORM LIVER, 031L0312A; TANGERINE, 01KT2302 through ERA-NET Transcan), the German Academic Exchange Service (SECAI, 57616814), the German Federal Joint Committee (Transplant.KI, 01VSF21048) the European Union`s Horizon Europe and innovation programme (ODELIA, 101057091; GENIAL, 101096312) and the National Institute for Health and Care Research (NIHR, NIHR213331) Leeds Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. SF is supported by the German Federal Ministry of Education and Research (SWAG, 01KD2215C), the German Cancer Aid (DECADE, 70115166 and TargHet, 70115995) and the German Research Foundation (504101714). The Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) is funded by: National Cancer Institute, National Institutes of Health, U.S. Department of Health and Human Services (U01 CA137088, R01 CA488857, P20 CA252733). Genotyping/Sequencing services were provided by the Center for Inherited Disease Research (CIDR) contract number HHSN268201700006I. This research was funded in part through the NIH/NCI Cancer Center Support Grant P30 CA015704. Scientific Computing Infrastructure at Fred Hutch funded by ORIP grant S10OD028685. The CORSA study was funded by Austrian Research Funding Agency (FFG) BRIDGE (grant 829675, to Andrea Gsur), the Herzfeldersche Familienstiftung (grant to Andrea Gsur) and was supported by COST Action BM1206. CRA was supported by the National Institutes of Health grant R01 CA068535. The coordination of EPIC is financially supported by the International Agency for Research on Cancer (IARC) and also by the Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London which has additional infrastructure support provided by the NIHR Imperial Biomedical Research Centre (BRC). The national cohorts are supported by: Danish Cancer Society (Denmark); Ligue Contre le Cancer, Institut Gustave Roussy, Mutuelle Generale de l`Education Nationale, Institut National de la Sante et de la Recherche Medicale (INSERM) (France); German Cancer Aid, German Cancer Research Center (DKFZ), German Institute of Human Nutrition Potsdam- Rehbruecke (DIfE), Federal Ministry of Education and Research (BMBF) (Germany); Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy, Compagnia di SanPaolo and National Research Council (Italy); Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF), Statistics Netherlands (The Netherlands); Health Research Fund (FIS) - Instituto de Salud Carlos III (ISCIII), Regional Governments of Andalucia, Asturias, Basque Country, Murcia and Navarra, and the Catalan Institute of Oncology - ICO (Spain); Swedish Cancer Society, Swedish Research Council and and Region Skane and Region Vaesterbotten (Sweden); Cancer Research UK (14136 to EPIC-Norfolk; C8221/A29017 to EPIC-Oxford), Medical Research Council (1000143 to EPIC-Norfolk; MR/M012190/1 to EPIC-Oxford) (United Kingdom). The IWHS study was supported by NIH grants CA107333 (R01 grant awarded to P.J. Limburg) and HHSN261201000032C (N01 contract awarded to the University of Iowa). The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts 75N92021D00001, 75N92021D00002, 75N92021D00003, 75N92021D00004, 75N92021D00005.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This study was performed in accordance with the Declaration of Helsinki. This study is a retrospective analysis of scanned images of anonymized tissue samples of various cohorts of cancer patients. Data were collected and anonymized and ethical approval was obtained. The overall analysis was approved by the Ethics board of the Medical Faculty of Technical University Dresden under the ID BO-EK-444102022.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

  • List of abbreviations

    AUROC
    Area under the Receiver Operating Characteristic Curve
    AUPRC
    Area Under the Precision-Recall Curve
    CIN
    Chromosomal instability
    CPTAC
    Clinical Proteomic Tumor Analysis Consortium
    CRC
    Colorectal cancer
    DL
    Deep Learning
    H&E
    Hematoxylin and eosin
    Mb
    Megabases
    MSI
    Microsatellite instability
    MSS
    Microsatellite stable
    MUT
    Mutated
    NOS
    Not otherwise specified
    px
    Pixel
    ROC
    Receiver Operating Characteristic Curve
    TCGA
    The Cancer Genome Atlas
    TILs
    Tumor infiltrating lymphocytes
    ViT
    Vision Transformer
    WSI
    Whole Slide Image
    WT
    Wild type
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
    Back to top
    PreviousNext
    Posted February 08, 2025.
    Download PDF

    Supplementary Material

    Email

    Thank you for your interest in spreading the word about medRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Assessing Genotype-Phenotype Correlations with Deep Learning in Colorectal Cancer: A Multi-Centric Study
    (Your Name) has forwarded a page to you from medRxiv
    (Your Name) thought you would like to see this page from the medRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Assessing Genotype-Phenotype Correlations with Deep Learning in Colorectal Cancer: A Multi-Centric Study
    Marco Gustav, Marko van Treeck, Nic G. Reitsam, Zunamys I. Carrero, Chiara M. Loeffler, Asier Rabasco Meneghetti, Bruno Märkl, Lisa A. Boardman, Amy J. French, Ellen L. Goode, Andrea Gsur, Stefanie Brezina, Marc J. Gunter, Neil Murphy, Pia Hönscheid, Christian Sperling, Sebastian Foersch, Robert Steinfelder, Tabitha Harrison, Ulrike Peters, Amanda Phipps, Jakob Nikolas Kather
    medRxiv 2025.02.04.25321660; doi: https://doi.org/10.1101/2025.02.04.25321660
    Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    Assessing Genotype-Phenotype Correlations with Deep Learning in Colorectal Cancer: A Multi-Centric Study
    Marco Gustav, Marko van Treeck, Nic G. Reitsam, Zunamys I. Carrero, Chiara M. Loeffler, Asier Rabasco Meneghetti, Bruno Märkl, Lisa A. Boardman, Amy J. French, Ellen L. Goode, Andrea Gsur, Stefanie Brezina, Marc J. Gunter, Neil Murphy, Pia Hönscheid, Christian Sperling, Sebastian Foersch, Robert Steinfelder, Tabitha Harrison, Ulrike Peters, Amanda Phipps, Jakob Nikolas Kather
    medRxiv 2025.02.04.25321660; doi: https://doi.org/10.1101/2025.02.04.25321660

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Pathology
    Subject Areas
    All Articles
    • Addiction Medicine (576)
    • Allergy and Immunology (868)
    • Anesthesia (306)
    • Cardiovascular Medicine (4483)
    • Dentistry and Oral Medicine (449)
    • Dermatology (385)
    • Emergency Medicine (615)
    • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1528)
    • Epidemiology (15283)
    • Forensic Medicine (31)
    • Gastroenterology (1134)
    • Genetic and Genomic Medicine (6651)
    • Geriatric Medicine (671)
    • Health Economics (1006)
    • Health Informatics (4606)
    • Health Policy (1378)
    • Health Systems and Quality Improvement (1624)
    • Hematology (545)
    • HIV/AIDS (1276)
    • Infectious Diseases (except HIV/AIDS) (15965)
    • Intensive Care and Critical Care Medicine (1111)
    • Medical Education (626)
    • Medical Ethics (147)
    • Nephrology (675)
    • Neurology (6699)
    • Nursing (346)
    • Nutrition (1006)
    • Obstetrics and Gynecology (1153)
    • Occupational and Environmental Health (961)
    • Oncology (3370)
    • Ophthalmology (989)
    • Orthopedics (370)
    • Otolaryngology (421)
    • Pain Medicine (437)
    • Palliative Medicine (131)
    • Pathology (670)
    • Pediatrics (1704)
    • Pharmacology and Therapeutics (700)
    • Primary Care Research (717)
    • Psychiatry and Clinical Psychology (5497)
    • Public and Global Health (9288)
    • Radiology and Imaging (2225)
    • Rehabilitation Medicine and Physical Therapy (1375)
    • Respiratory Medicine (1202)
    • Rheumatology (598)
    • Sexual and Reproductive Health (721)
    • Sports Medicine (536)
    • Surgery (722)
    • Toxicology (100)
    • Transplantation (290)
    • Urology (267)