Demystifying AI in healthcare

Laure Wynants; Luc J M Smits; Ben Van Calster

doi:10.1136/bmj.m3505

Editorials

Demystifying AI in healthcare

BMJ 2020; 370 doi: https://doi.org/10.1136/bmj.m3505 (Published 09 September 2020) Cite this as: BMJ 2020;370:m3505

Linked RMR

Guidelines for clinical trial protocols for interventions involving artificial intelligence

Linked RMR

Reporting guidelines for clinical trial reports for interventions involving artificial intelligence

Laure Wynants, assistant professor1 2 3,
Luc J M Smits, professor1,
Ben Van Calster, associate professor2 3 4

¹Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, Netherlands
²Department of Development and Regeneration, KU Leuven, Leuven, Belgium
³EPI-Centre, KU Leuven, Leuven, Belgium
⁴Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, Netherlands

Correspondence to: L Wynants laure.wynants{at}maastrichtuniversity.nl

Well conducted and transparently reported trials would be an excellent start

In academia and society at large, attention on artificial intelligence (AI) in healthcare is tremendous. Although many researchers and commentators claim that AI improves screening, diagnosis, and prognostication, those who delve deeper will notice a scarcity of external validation studies and randomised controlled trials evaluating the true impact of AI on healthcare.1 2 3 Findings from the few published randomised controlled trials are mixed. In one trial, endoscopy assisted by an automatic AI detection system found more colorectal adenomas than did unassisted endoscopy.4 In another, an AI platform for diagnosing childhood cataracts was less accurate than a senior consultant.5 To gauge the quality of such evidence, readers need a detailed account of study methods and results. Systematic reviews, however, show that studies on AI are often poorly reported.2 6

Reporting guidelines

New extensions of the SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) (doi:10.1136/bmj.m3210) and CONSORT (Consolidated Standards of Reporting Trials) (doi:10.1136/bmj.m3164) reporting guidelines, published in The BMJ, encourage authors to be transparent and comprehensive when writing protocols for trials that evaluate AI interventions,7 and when reporting the results of such trials.8 They cover important issues specific to AI interventions, such as specifying the level of expertise required for researchers interacting with the study’s AI (for example, to identify a region of interest on an image, or to translate AI output into clinical decisions). The operational requirements for integrating AI into the study’s clinical setting also must be clear, as well as any need to fine tune an AI algorithm using data from the local environment.

We can anticipate a positive effect of these reporting guidelines on the quality (and perhaps quantity) of trial reports in this rapidly developing area. Registering a trial protocol improves transparency and discourages research practices that might yield misleading results, such as switching the primary outcome after the results are known.9 Similarly, empirical research suggests that CONSORT guidelines improved the quality of reporting, but that it remains suboptimal.10 11 Funders, scientific publishers, and peer reviewers have an important responsibility to enforce protocol registration and the adoption of appropriate guidelines.11

But even a transparently reported study can lead to misguided conclusions if the trial is poorly designed, if it targets an inappropriate primary outcome, or if the AI system is not well embedded in the clinician’s digital environment and workflow. In addition, owing to the difficulty and cost of running randomised controlled trials, it is important to evaluate the performance of AI algorithms in external validation studies first.1 2 3

One example of a primary outcome that could lead to unjustified claims about AI’s benefits is the number of detected cases in a trial comparing clinicians’ diagnostic performance with or without AI support. Such a trial is likely to show that AI helps detect more cases, even if the AI’s alerts are completely random. A balanced evaluation must weigh up the increase in detected cases against the risk of false alerts.

Another example of the potential for misleading results is a trial of a very accurate AI system that has poor user adherence as a result of the way it is embedded in the clinician’s environment. Poor adherence might be an important reason why clinical decision support systems have largely failed to improve patient health or reduce healthcare costs in trials.12 13 Factors that have been shown to improve outcomes associated with clinical decision support systems include user friendliness, involving stakeholders in implementation, and using systems that give actionable recommendations, nudge users to comply (for example, by asking for a reason to overrule a recommendation), and target clinicians and patients simultaneously in a shared decision making context.12 13

Reporting harm

Similar to the monitoring of drug side effects, AI errors and other associated harms must be monitored and reported—both during trials and later in clinical practice. The new CONSORT and SPIRIT extensions encourage transparent reporting of errors, such as errors in diagnosing rare tumour subtypes or diagnostic errors in certain population subgroups.

One particularly worrying type of error arises from underrepresentation of minorities in the training data for AI systems—such as an application for detecting melanoma that is trained only on white skin. Another is the replication of social biases such as delayed lung cancer diagnosis in patients of low socioeconomic status.14 15 By mechanisms such as these, AI replicates and could even exacerbate health inequities. This is particularly harmful when an AI system is wrongly perceived as objective and free from bias. Using large and diverse samples that allow subgroup analyses provides an opportunity to tackle these problems.

Despite the above considerations, we have an exciting new era to look forward to, in which the true potential of AI will gradually emerge. Sceptics might become enthusiasts, enthusiasts might be disappointed. But whatever happens, well designed trials, registered and published protocols, and transparent reporting will help ensure that a nuanced appraisal of all AI interventions is based on robust evidence instead of fears or aspirations.

Footnotes

Research methods and reporting, doi: 10.1136/bmj.m3210 ; doi: 10.1136/bmj.m3164
Competing interests: The BMJ has judged that there are no disqualifying financial ties to commercial companies. The authors declare the following other interests: LW and BVC acknowledge funding from Internal Funds KU Leuven, KOOR, and the COVID-19 Fund, and Research Foundation-Flanders (FWO), all unrelated to the current editorial. LW and LJMS acknowledge funding from ZonMw, unrelated to the current editorial.
Provenance and peer review: Commissioned; not peer reviewed.

References

↵
1. Wiens J,
2. Saria S,
3. Sendak M,
4. et al
. Do no harm: a roadmap for responsible machine learning for health care. Nat Med2019;25:1337-40. doi:10.1038/s41591-019-0548-6 pmid:31427808
OpenUrl CrossRef PubMed
↵
1. Nagendran M,
2. Chen Y,
3. Lovejoy CA,
4. et al
. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ2020;368:m689. doi:10.1136/bmj.m689 pmid:32213531
OpenUrl Abstract/FREE Full Text
↵
1. Van Calster B,
2. Wynants L,
3. Timmerman D,
4. Steyerberg EW,
5. Collins GS
. Predictive analytics in health care: how can we know it works?J Am Med Inform Assoc2019;26:1651-4. doi:10.1093/jamia/ocz130 pmid:31373357
OpenUrl CrossRef PubMed
↵
1. Wang P,
2. Berzin TM,
3. Glissen Brown JR,
4. et al
. Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study. Gut2019;68:1813-9. doi:10.1136/gutjnl-2018-317500 pmid:30814121
OpenUrl Abstract/FREE Full Text
↵
1. Lin H,
2. Li R,
3. Liu Z,
4. et al
. Diagnostic Efficacy and Therapeutic Decision-making Capacity of an Artificial Intelligence Platform for Childhood Cataracts in Eye Clinics: A Multicentre Randomized Controlled Trial. EClinicalMedicine2019;9:52-9. doi:10.1016/j.eclinm.2019.03.001 pmid:31143882
OpenUrl CrossRef PubMed
↵
1. Wynants L,
2. Van Calster B,
3. Collins GS,
4. et al
. Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. BMJ2020;369:m1328. doi:10.1136/bmj.m1328 pmid:32265220
OpenUrl Abstract/FREE Full Text
↵
1. Rivera SC,
2. Liu X,
3. Chan A-W,
4. Denniston AK,
5. Calvert MJ,
6. SPIRIT-AI and CONSORT-AI Working Group
. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI Extension. BMJ2020;370:m3210.
OpenUrl Abstract/FREE Full Text
↵
1. Liu X,
2. Rivera SC,
3. Moher D,
4. Calvert MJ,
5. Denniston AK,
6. SPIRIT-AI and CONSORT-AI Working Group
. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI Extension. BMJ2020;370:m3164.
OpenUrl Abstract/FREE Full Text
↵
1. Odutayo A,
2. Altman DG,
3. Hopewell S,
4. Shakir M,
5. Hsiao AJ,
6. Emdin CA
. Reporting of a Publicly Accessible Protocol and Its Association With Positive Study Findings in Cardiovascular Trials (from the Epidemiological Study of Randomized Trials [ESORT]). Am J Cardiol2015;116:1280-3. doi:10.1016/j.amjcard.2015.07.046 pmid:26282722
OpenUrl CrossRef PubMed
↵
1. Moher D,
2. Jones A,
3. Lepage L,
4. CONSORT Group (Consolidated Standards for Reporting of Trials)
. Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation. JAMA2001;285:1992-5. doi:10.1001/jama.285.15.1992 pmid:11308436
OpenUrl CrossRef PubMed Web of Science
↵
1. Vassar M,
2. Jellison S,
3. Wendelbo H,
4. Wayant C,
5. Gray H,
6. Bibens M.
Using the CONSORT statement to evaluate the completeness of reporting of addiction randomised trials: a cross-sectional review. BMJ Open2019;9:e032024.
OpenUrl Abstract/FREE Full Text
↵
1. Bright TJ,
2. Wong A,
3. Dhurjati R,
4. et al
. Effect of clinical decision-support systems: a systematic review. Ann Intern Med2012;157:29-43. doi:10.7326/0003-4819-157-1-201207030-00450 pmid:22751758
OpenUrl CrossRef PubMed Web of Science
↵
1. Kawamoto K,
2. Houlihan CA,
3. Balas EA,
4. Lobach DF
. Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success. BMJ2005;330:765. doi:10.1136/bmj.38398.500764.8F pmid:15767266
OpenUrl Abstract/FREE Full Text
↵
1. Parikh RB,
2. Teeple S,
3. Navathe AS
. Addressing Bias in Artificial Intelligence in Health Care. JAMA2019;322:2377-8. doi:10.1001/jama.2019.18058 pmid:31755905
OpenUrl CrossRef PubMed
↵
1. Rajkomar A,
2. Hardt M,
3. Howell MD,
4. Corrado G,
5. Chin MH
. Ensuring Fairness in Machine Learning to Advance Health Equity. Ann Intern Med2018;169:866-72. doi:10.7326/M18-1990 pmid:30508424
OpenUrl CrossRef PubMed

[1] ↵
Wiens J,
Saria S,
Sendak M,
et al
. Do no harm: a roadmap for responsible machine learning for health care. Nat Med2019;25:1337-40. doi:10.1038/s41591-019-0548-6 pmid:31427808
OpenUrl CrossRef PubMed

[2] Wiens J,

[3] Saria S,

[4] Sendak M,

[5] et al

[6] ↵
Nagendran M,
Chen Y,
Lovejoy CA,
et al
. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ2020;368:m689. doi:10.1136/bmj.m689 pmid:32213531
OpenUrl Abstract/FREE Full Text

[7] Nagendran M,

[8] Chen Y,

[9] Lovejoy CA,

[10] et al

[11] ↵
Van Calster B,
Wynants L,
Timmerman D,
Steyerberg EW,
Collins GS
. Predictive analytics in health care: how can we know it works?J Am Med Inform Assoc2019;26:1651-4. doi:10.1093/jamia/ocz130 pmid:31373357
OpenUrl CrossRef PubMed

[12] Van Calster B,

[13] Wynants L,

[14] Timmerman D,

[15] Steyerberg EW,

[16] Collins GS

[17] ↵
Wang P,
Berzin TM,
Glissen Brown JR,
et al
. Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study. Gut2019;68:1813-9. doi:10.1136/gutjnl-2018-317500 pmid:30814121
OpenUrl Abstract/FREE Full Text

[18] Wang P,

[19] Berzin TM,

[20] Glissen Brown JR,

[21] et al

[22] ↵
Lin H,
Li R,
Liu Z,
et al
. Diagnostic Efficacy and Therapeutic Decision-making Capacity of an Artificial Intelligence Platform for Childhood Cataracts in Eye Clinics: A Multicentre Randomized Controlled Trial. EClinicalMedicine2019;9:52-9. doi:10.1016/j.eclinm.2019.03.001 pmid:31143882
OpenUrl CrossRef PubMed

[23] Lin H,

[24] Li R,

[25] Liu Z,

[26] et al

[27] ↵
Wynants L,
Van Calster B,
Collins GS,
et al
. Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. BMJ2020;369:m1328. doi:10.1136/bmj.m1328 pmid:32265220
OpenUrl Abstract/FREE Full Text

[28] Wynants L,

[29] Van Calster B,

[30] Collins GS,

[31] et al

[32] ↵
Rivera SC,
Liu X,
Chan A-W,
Denniston AK,
Calvert MJ,
SPIRIT-AI and CONSORT-AI Working Group
. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI Extension. BMJ2020;370:m3210.
OpenUrl Abstract/FREE Full Text

[33] Rivera SC,

[34] Liu X,

[35] Chan A-W,

[36] Denniston AK,

[37] Calvert MJ,

[38] SPIRIT-AI and CONSORT-AI Working Group

[39] ↵
Liu X,
Rivera SC,
Moher D,
Calvert MJ,
Denniston AK,
SPIRIT-AI and CONSORT-AI Working Group
. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI Extension. BMJ2020;370:m3164.
OpenUrl Abstract/FREE Full Text

[40] Liu X,

[41] Rivera SC,

[42] Moher D,

[43] Calvert MJ,

[44] Denniston AK,

[45] SPIRIT-AI and CONSORT-AI Working Group

[46] ↵
Odutayo A,
Altman DG,
Hopewell S,
Shakir M,
Hsiao AJ,
Emdin CA
. Reporting of a Publicly Accessible Protocol and Its Association With Positive Study Findings in Cardiovascular Trials (from the Epidemiological Study of Randomized Trials [ESORT]). Am J Cardiol2015;116:1280-3. doi:10.1016/j.amjcard.2015.07.046 pmid:26282722
OpenUrl CrossRef PubMed

[47] Odutayo A,

[48] Altman DG,

[49] Hopewell S,

[50] Shakir M,

[51] Hsiao AJ,

[52] Emdin CA

[53] ↵
Moher D,
Jones A,
Lepage L,
CONSORT Group (Consolidated Standards for Reporting of Trials)
. Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation. JAMA2001;285:1992-5. doi:10.1001/jama.285.15.1992 pmid:11308436
OpenUrl CrossRef PubMed Web of Science

[54] Moher D,

[55] Jones A,

[56] Lepage L,

[57] CONSORT Group (Consolidated Standards for Reporting of Trials)

[58] ↵
Vassar M,
Jellison S,
Wendelbo H,
Wayant C,
Gray H,
Bibens M.
Using the CONSORT statement to evaluate the completeness of reporting of addiction randomised trials: a cross-sectional review. BMJ Open2019;9:e032024.
OpenUrl Abstract/FREE Full Text

[59] Vassar M,

[60] Jellison S,

[61] Wendelbo H,

[62] Wayant C,

[63] Gray H,

[64] Bibens M.

[65] ↵
Bright TJ,
Wong A,
Dhurjati R,
et al
. Effect of clinical decision-support systems: a systematic review. Ann Intern Med2012;157:29-43. doi:10.7326/0003-4819-157-1-201207030-00450 pmid:22751758
OpenUrl CrossRef PubMed Web of Science

[66] Bright TJ,

[67] Wong A,

[68] Dhurjati R,

[69] et al

[70] ↵
Kawamoto K,
Houlihan CA,
Balas EA,
Lobach DF
. Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success. BMJ2005;330:765. doi:10.1136/bmj.38398.500764.8F pmid:15767266
OpenUrl Abstract/FREE Full Text

[71] Kawamoto K,

[72] Houlihan CA,

[73] Balas EA,

[74] Lobach DF

[75] ↵
Parikh RB,
Teeple S,
Navathe AS
. Addressing Bias in Artificial Intelligence in Health Care. JAMA2019;322:2377-8. doi:10.1001/jama.2019.18058 pmid:31755905
OpenUrl CrossRef PubMed

[76] Parikh RB,

[77] Teeple S,

[78] Navathe AS

[79] ↵
Rajkomar A,
Hardt M,
Howell MD,
Corrado G,
Chin MH
. Ensuring Fairness in Machine Learning to Advance Health Equity. Ann Intern Med2018;169:866-72. doi:10.7326/M18-1990 pmid:30508424
OpenUrl CrossRef PubMed

[80] Rajkomar A,

[81] Hardt M,

[82] Howell MD,

[83] Corrado G,

[84] Chin MH

Demystifying AI in healthcare

Linked RMR

Linked RMR

Reporting guidelines

Reporting harm

Footnotes

References

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information

Search form

Demystifying AI in healthcare

Linked RMR

Linked RMR

Reporting guidelines

Reporting harm

Footnotes

References

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information