Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Novel deep learning algorithm predicts the status of molecular pathways and key mutations in colorectal cancer from routine histology images

Mohsin Bilal, Shan E Ahmed Raza, Ayesha Azam, Simon Graham, Muhammad Ilyas, Ian A. Cree, David Snead, Fayyaz Minhas, Nasir M. Rajpoot
doi: https://doi.org/10.1101/2021.01.19.21250122
Mohsin Bilal
1Department of Computer Science, University of Warwick, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shan E Ahmed Raza
1Department of Computer Science, University of Warwick, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ayesha Azam
1Department of Computer Science, University of Warwick, UK
2University Hospitals Coventry and Warwickshire, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Simon Graham
1Department of Computer Science, University of Warwick, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Muhammad Ilyas
3Faculty of Medicine & Health Sciences, University of Nottingham, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ian A. Cree
4International Agency for Research on Cancer (AIRC), Lyon, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David Snead
2University Hospitals Coventry and Warwickshire, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Fayyaz Minhas
1Department of Computer Science, University of Warwick, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nasir M. Rajpoot
1Department of Computer Science, University of Warwick, UK
2University Hospitals Coventry and Warwickshire, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: n.m.rajpoot@warwick.ac.uk
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Summary

Background Determining molecular pathways involved in the development of colorectal cancer (CRC) and knowing the status of key mutations are crucial for deciding optimal target therapy. The goal of this study is to explore machine learning to predict the status of the three main CRC molecular pathways – microsatellite instability (MSI), chromosomal instability (CIN), CpG island methylator phenotype (CIMP) – and to detect BRAF and TP53 mutations as well as to predict hypermutated (HM) CRC tumors from whole-slide images (WSIs) of colorectal cancer (CRC) slides stained with Hematoxylin and Eosin (H&E).

Methods We propose a novel iterative draw-and-rank sampling (IDaRS) algorithm to select representative sub-images or tiles from a WSI given a single WSI-level label, without needing any detailed annotations at the cell or region levels. IDaRS is used to train a deep convolutional network for predicting key molecular parameters in CRC (in particular, prediction of HM tumors and the status of three main CRC molecular pathways – MSI, CIN, CIMP – as well as the detection of two key mutations, BRAF and TP53) from digitized images of routine H&E stained tissue slides of CRC patients (n=497 for TCGA cohort and n=47 cases for the Pathology AI Platform or PAIP cohort). Visual fields most predictive of each pathway and HM tumors identified by IDaRS are analyzed for verification of known histological features for the first time to reveal novel histological features. This is achieved by systematic, data-driven analysis of the cellular composition of strongly predictive tiles.

Findings IDaRS yields high prediction accuracy for prediction of the three main CRC genetic pathways and key mutations by deep learning based analysis of the WSIs of H&E stained slides. It achieves the state-of-the-art AUROC values of 0.90, 0.83, and 0.81 for prediction of the status of MSI, CIN, and HM tumors for the TCGA cohort, which is significantly higher than any other currently published methods on that cohort. We also report prediction of status of CIMP pathway (CIMP-High and CIMP-Low) from H&E slides, with an AUROC of 0.79. We analyzed key discriminative histological features associated with HM tumors and each molecular pathway in a data-driven manner, via an automated quantitative analysis of the cellular composition of tiles strongly predictive of the corresponding molecular status. A key feature of the proposed method is that it enables a systematic and data-driven analysis of the cellular composition of image tiles strongly predictive of the various molecular parameters. We found that relatively high proportion of tumor infiltrating lymphocytes and necrosis are found to be strongly associated with HM and MSI, and moderately associated with CIMP-H and genome-stable (GS) cases, whereas relatively high proportions of neoplastic epithelial type 2 (NEP2), mesenchymal and neoplastic epithelial type 1 (NEP1) cells are found to be associated with CIN cases.

Interpretation Automated prediction of genetic pathways and key mutations from image analysis of simple H&E stained sections with a high accuracy can provide time and cost-effective decision support. This work shows that a deep learning algorithm can mine both visually recognizable as well as sub-visual histological patterns associated with molecular pathways and key mutations in CRC in a data-driven manner.

Funding This study was funded by the UK Medical Research Council (award MR/P015476/1).

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

The research reported in this publication was supported by the UK Medical Research Council (award MR/P015476/1).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Anonymized scanned whole slide images were retrieved from The Cancer Genome Atlas (TCGA) project through the Genomics Data Commons Portal (https://portal.gdc.cancer.gov/). De-identified pathology images and annotations in the PAIP (Pathology AI Platform) cohort (used as external validation cohort in this study) were prepared and provided by the Seoul National University Hospital by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI18C0316). The organizing committee of PAIP 2020 Challenge: MSI Prediction in Colorectal Cancer, made the PAIP cohort available for this research study permitted by its institutional review board (Seoul National University Hospital IRB No. H-1808-035-964).

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

  • Fig. 9 and Fig. 10 improved with the sharper version for a better view; Table 2 updated; Abstract revised.

Data Availability

All images and the associated pathways/mutations status for the TCGA cohort (COAD and READ) used in this study are publicly available at https://portal.gdc.cancer.gov/ and cbioportal. A link to the TCGA manifest file that can be used to download all images for the TCGA cohort can be found in the Supplementary Materials document. The ground truth labels of TCGA-CRC-DX for HMD/LMD, MSI/MSS, CIN/GS, and CIMP-H/L were obtained from Liu et al. A link to the spreadsheet containing the corresponding clinical and molecular data including cancer stages, subtypes and the status of mutations and pathways can also be found in the Supplementary Materials. De-identified pathology images and annotations from Pathology AI Platform (PAIP) used with institutional permissions in this study can be obtained via appropriate data access requests through the URL: http://www.wisepaip.org/paip.

https://portal.gdc.cancer.gov/

http://www.wisepaip.org/paip

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted February 04, 2021.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Novel deep learning algorithm predicts the status of molecular pathways and key mutations in colorectal cancer from routine histology images
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Novel deep learning algorithm predicts the status of molecular pathways and key mutations in colorectal cancer from routine histology images
Mohsin Bilal, Shan E Ahmed Raza, Ayesha Azam, Simon Graham, Muhammad Ilyas, Ian A. Cree, David Snead, Fayyaz Minhas, Nasir M. Rajpoot
medRxiv 2021.01.19.21250122; doi: https://doi.org/10.1101/2021.01.19.21250122
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Novel deep learning algorithm predicts the status of molecular pathways and key mutations in colorectal cancer from routine histology images
Mohsin Bilal, Shan E Ahmed Raza, Ayesha Azam, Simon Graham, Muhammad Ilyas, Ian A. Cree, David Snead, Fayyaz Minhas, Nasir M. Rajpoot
medRxiv 2021.01.19.21250122; doi: https://doi.org/10.1101/2021.01.19.21250122

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Pathology
Subject Areas
All Articles
  • Addiction Medicine (70)
  • Allergy and Immunology (166)
  • Anesthesia (49)
  • Cardiovascular Medicine (448)
  • Dentistry and Oral Medicine (80)
  • Dermatology (55)
  • Emergency Medicine (157)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (189)
  • Epidemiology (5208)
  • Forensic Medicine (3)
  • Gastroenterology (194)
  • Genetic and Genomic Medicine (748)
  • Geriatric Medicine (76)
  • Health Economics (212)
  • Health Informatics (691)
  • Health Policy (350)
  • Health Systems and Quality Improvement (221)
  • Hematology (98)
  • HIV/AIDS (161)
  • Infectious Diseases (except HIV/AIDS) (5806)
  • Intensive Care and Critical Care Medicine (355)
  • Medical Education (101)
  • Medical Ethics (25)
  • Nephrology (80)
  • Neurology (754)
  • Nursing (43)
  • Nutrition (129)
  • Obstetrics and Gynecology (141)
  • Occupational and Environmental Health (230)
  • Oncology (474)
  • Ophthalmology (149)
  • Orthopedics (38)
  • Otolaryngology (93)
  • Pain Medicine (39)
  • Palliative Medicine (19)
  • Pathology (138)
  • Pediatrics (223)
  • Pharmacology and Therapeutics (136)
  • Primary Care Research (96)
  • Psychiatry and Clinical Psychology (851)
  • Public and Global Health (1989)
  • Radiology and Imaging (342)
  • Rehabilitation Medicine and Physical Therapy (155)
  • Respiratory Medicine (282)
  • Rheumatology (93)
  • Sexual and Reproductive Health (72)
  • Sports Medicine (75)
  • Surgery (108)
  • Toxicology (25)
  • Transplantation (29)
  • Urology (39)