PT - JOURNAL ARTICLE AU - McKinney, Scott Mayer TI - Comparing human and AI performance in medical machine learning: An open-source Python library for the statistical analysis of reader study data AID - 10.1101/2022.05.06.22274773 DP - 2022 Jan 01 TA - medRxiv PG - 2022.05.06.22274773 4099 - http://medrxiv.org/content/early/2022/05/07/2022.05.06.22274773.short 4100 - http://medrxiv.org/content/early/2022/05/07/2022.05.06.22274773.full AB - In seeking to understand the potential effects of artificial intelligence (AI) on the practice of diagnostic medicine, many investigations involve collecting interpretations from several human experts on a common set of cases. In an effort to standardize the process of analyzing the data emerging from such studies, we have released an open-source Python library to perform applicable statistical procedures. The software implements the industry-standard Obuchowski-Rockette-Hillis (ORH) method for multi-reader multi-case (MRMC) studies. The tools can be used to compare a standalone algorithm against a panel of readers, or compare readers operating in two modalities (for example, with and without algorithmic assistance). The software supports both nonequivalence and noninferiority tests. Functions are also provided to simulate reader and model scores, useful for Monte Carlo power analysis. The code is publicly available in our Gitub repository at https://github.com/Google-Health/google-health/tree/master/analysis.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was funded by Google.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll of the associated code is publicly available on Github. https://github.com/Google-Health/google-health/tree/master/analysis readera human annotator, usually an expert trained to interpret medical signals or images and identify signs of pathology.casea single unit of analysis in a reader study. It usually corresponds to the data (e.g. a collection of images) from one patient, associated with a known disease status, which readers attempt to assess.modalityone of typically two arms in a multi-reader multi-case study. Example modality pairs are (CT, MRI); (film X-ray, digital X-ray); (fundus imaging, OCT); (assisted read, unassisted read).assisted read studya multi-reader multi-case study in which readers interpret cases with and without computer assistance (e.g. from a machine learning algorithm). The two reading conditions constitute different “modalities.”standalone evaluationa study in which the output of a machine learning algorithm is compared to the judgments from a panel of readers.figure of meritany valid measure of performance, usually defined on the set of suspicion scores and ground truth labels. Examples are AUC-ROC and accuracy.