RT Journal Article SR Electronic T1 PathProfiler: Automated Quality Assessment of Retrospective Histopathology Whole-Slide Image Cohorts by Artificial Intelligence – A Case Study for Prostate Cancer Research JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2021.09.24.21263762 DO 10.1101/2021.09.24.21263762 A1 Maryam Haghighat A1 Lisa Browning A1 Korsuk Sirinukunwattana A1 Stefano Malacrino A1 Nasullah Khalid Alham A1 Richard Colling A1 Ying Cui A1 Emad Rakha A1 Freddie C. Hamdy A1 Clare Verrill A1 Jens Rittscher YR 2021 UL http://medrxiv.org/content/early/2021/09/27/2021.09.24.21263762.abstract AB Research using whole slide images (WSIs) of scanned histopathology slides for the development of artificial intelligence (AI) algorithms has increased exponentially over recent years. Glass slides from large retrospective cohorts with patient follow-up data are digitised for the development and validation of AI tools. Such resources, therefore, become very important, with the need to ensure that their quality is of the standard necessary for downstream AI development. However, manual quality control of such large cohorts of WSIs by visual assessment is unfeasible, and whilst quality control AI algorithms exist, these focus on bespoke aspects of image quality, e.g. focus, or use traditional machine-learning methods such as hand-crafted features, which are unable to classify the range of potential image artefacts that should be considered.In this study, we have trained and validated a multi-task deep neural network to automate the process of quality control of a large retrospective cohort of prostate cases from which glass slides have been scanned several years after production, to determine both the usability of the images for research and the common image artefacts present.Using a two-layer approach, quality overlays of WSIs were generated from a quality assessment undertaken at patch-level at 5X magnification. From these quality overlays the slide-level quality scores were predicted and then compared to those generated by three specialist urological pathologists, with a Pearson correlation of 0.89 for overall ‘usability’ (at a diagnostic level), and 0.87 and 0.82 for focus and H&E staining quality scores respectively. We subsequently applied our quality assessment pipeline to the TCGA prostate cancer cohort and to a colorectal cancer cohort, for comparison.Our model, designated as PathProfiler, indicates comparable predicted usability of images from the cohorts assessed (86-90%), and perhaps more significantly is able to predicts WSIs that could benefit from re-scanning or re-staining for quality improvement.We have shown in this study that AI can be used to automate the process of quality control of large retrospective cohorts to maximise research outputs and conclusions.Competing Interest StatementJR and KS are co-founders of Ground Truth Labs. PathLAKE has received in kind industry investment from Philips. University of Oxford, Oxford University Hospitals NHS Foundation Trust and University of Nottingham are part of the PathLAKE consortium. CV is the principal investigator of a study evaluating Paige Prostate.Funding StatementThis paper is supported by the PathLAKE Centre of Excellence for digital pathology and AI which is funded from the Data to Early Diagnosis and Precision Medicine strand of the government's Industrial Strategy Challenge Fund, managed and delivered by Innovate UK on behalf of UK Research and Innovation (UKRI). PathLAKE funding reference: 104689 / Application number: 18181. The national ProMPT (Prostate cancer Mechanisms of Progression and Treatment) collaborative (grant G0500966/75466) supported sample collections. The University of Oxford sponsors ProMPT. We acknowledge the contribution to this study made by the Oxford Centre for Histopathology Research and the Oxford Radcliffe Biobank (ORB) which is supported by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre. Computation used the Oxford Biomedical Research Computing (BMRC) facility, a joint development between the Wellcome Centre for Human Genetics and the Big Data Institute supported by Health Data Research UK and the NIHR Oxford Biomedical Research Centre. JR is adjunct professor of the Ludwig Oxford Branch and the Wellcome Centre for Human Genetics. CV, LB and FCH are part funded by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC). Funding is via the Molecular Diagnostics Theme. The views expressed are those of the author(s) and not necessarily those of the PathLAKE Consortium members, the NHS, Innovate UK, UKRI, the NIHR or the Department of Health. The results shown here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The retrospective study was conducted under the ProMPT ethics (reference MREC 01/4/61). The prospective study was conducted under the Pathology image data Lake for Analytics, Knowledge and Education (PathLAKE) research ethics committee approval (reference 19/SC/0363) and Oxford Radcliffe Biobank research ethics committee approval (reference 19/SC/0173). Patients were not identifiable from the material. The research was performed in accordance with the Declaration of Helsinki.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe datasets generated during and/or analysed during the current study are not publicly available due to the terms of the PathLAKE Consortium Agreement and other agreements in place but a subset of the data could be made available via the corresponding author on reasonable request. The software is open source. https://github.com/MaryamHaghighat/PathProfiler