PT - JOURNAL ARTICLE AU - J.J.J. Condon AU - L. Oakden-Rayner AU - K.A. Hall AU - M. Reintals AU - A. Holmes AU - G. Carneiro AU - L.J. Palmer TI - Replication of an open-access deep learning system for screening mammography: Reduced performance mitigated by retraining on local data AID - 10.1101/2021.05.28.21257892 DP - 2021 Jan 01 TA - medRxiv PG - 2021.05.28.21257892 4099 - http://medrxiv.org/content/early/2021/06/01/2021.05.28.21257892.short 4100 - http://medrxiv.org/content/early/2021/06/01/2021.05.28.21257892.full AB - Aim To assess the generalisability of a deep learning (DL) system for screening mammography developed at New York University (NYU), USA (1, 2) in a South Australian (SA) dataset.Methods and Materials Clients with pathology-proven lesions (n=3,160) and age-matched controls (n=3,240) were selected from women screened at BreastScreen SA from January 2010 to December 2016 (n clients=207,691) and split into training, validation and test subsets (70%, 15%, 15% respectively). The primary outcome was area under the curve (AUC), in the SA Test Set 1 (SATS1), differentiating invasive breast cancer or ductal carcinoma in situ (n=469) from age-matched controls (n=490) and benign lesions (n=44). The NYU system was tested statically, after training without transfer learning (TL), after retraining with TL and without (NYU1) and with (NYU2) heatmaps.Results The static NYU1 model AUCs in the NYU test set (NYTS) and SATS1 were 83.0%(95%CI=82.4%-83.6%)(2) and 75.8%(95%CI=72.6%-78.8%), respectively. Static NYU2 AUCs in the NYTS and SATS1 were 88.6%(95%CI=88.3%-88.9%)(2) and 84.5%(95%CI=81.9%-86.8%), respectively. Training of NYU1 and NYU2 without TL achieved AUCs in the SATS1 of 65.8% (95%CI=62.2%-69.1%) and 85.9%(95%CI=83.5%-88.2%), respectively. Retraining of NYU1 and NYU2 with TL resulted in AUCs of 82.4%(95%CI=79.7-84.9%) and 86.3%(95%CI=84.0-88.5%) respectively.Conclusion We did not fully reproduce the reported performance of NYU on a local dataset; local retraining with TL approximated this level of performance. Optimising models for local clinical environments may improve performance. The generalisation of DL systems to new environments may be challenging.Key Contributions In this study, the original performance of deep learning models for screening mammography was reduced in an independent clinical population.Deep learning (DL) systems for mammography require local testing and may benefit from local retraining.An openly available DL system approximates human performance in an independent dataset.There are multiple potential sources of reduced deep learning system performance when deployed to a new dataset and population.Competing Interest StatementJC previously held less than USD$4,000 of shares in Micron Technology ltd (a computermemory and storage manufacturer) until Sep 2019. Microndid not provide funding for the study and they were not in-volved in any wayFunding StatementNo external funding was received.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The current study was approved by the Central Ade-laide Local Health Network Institutional Review Board(HREC/16/RAH/229, R20160601), with a waiver of con-sent for the retrospective use of deidentified clinical data.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe BSSA dataset is private and not able to be released. The data that support the results presented will be made available at www.github.com/jamesjjcondon https://github.com/jamesjjcondon