1. ABSTRACT
Background and rationale
Background and rationale Knee osteoarthritis (OA) is a common disease characterized by reduced function, stiffness, and pain. This clinical diagnosis is commonly supported with radiography of the weight-bearing knee. Radiographic features, such as the Kellgren-Lawrence (KL) grading system, are used as eligibility criteria for clinical studies while others, such as the OARSI grades and minimal joint space width, are used as endpoints for structural OA progression. A higher preoperative KL-grade has been correlated with better pain- and functional outcomes after knee arthroplasty surgery. Consequently, the KL-grade is a common requirement for approving knee arthroplasty among American health insurance providers and it is commonly used by orthopedic surgeons as part of determining knee arthroplasty candidacy.
Historically, a radiologist was required to draw on and grade radiographs of the knee to extract the features. With increasing computational power and the increased use of deep convolutional neural networks, off-the-shelf artificial intelligence (AI) tools have become available for automatic extraction of these features. They have received regulatory approval for commercialization but it is apparent that more diligent external validation is required. Finally, as AI tools begin to mature, new versions are released. It is important to assess how these developments change the current performance of the tool.
Objectives
Objectives The aim of this analysis is to evaluate the performance of a commercially available AI tool for grading tibiofemoral OARSI grades, KL grades and patellar osteophytes as well as the accuracy of measuring joint space width. Additionally, a change impact analysis will be performed where the performance of the current version of the AI tool will be compared to that of the previous version.
Methods
Methods This study is a secondary analysis of the data from the AutoRayValid-RBknee study, a retrospective observer performance study. It consists of non-fixed-flexion radiographs acquired from the production picture archiving and communications system (PACS) from three European centers. Root mean square error (RMSE) will be used for estimating the accuracy of minimal and fixed-location joint space width (JSW) measurements. Ordinal ROC will be used for estimating ordinal OARSI-grade and the KL-grade classification AUC. Area under the receiver operating curve (AUC) is used for estimating binary OARSI-grade and patellar osteophyte classification performance.
Population
Population Patients with knee pain referred for radiography on suspicion of knee osteoarthritis
Index test
Index test RBknee-2.2.0 (CE version, KL-grading, OARSI grading, patellar osteophytes) and RBknee-fda-1.0.1 (FDA version, Joint Space Width measurement). RBknee-2.1.0 (CE version, KL-grading, OARSI grading, patellar osteophytes) will be used to perform the change impact analysis of advancing product development.
Reference test
Reference test For all discrete variables, the reference value will be the majority vote, arbitrated by consensus where grades differ by 2 or more. The readers will be three board-certified musculoskeletal radiologists with substantial clinical and research experience. For continuous variables, annotation will be done by a single radiologist trained in the task. The annotations will be reviewed by a board-certified musculoskeletal radiologist with substantial clinical and research experience.
Further statistical details
Sample size
Sample size Not applicable as this is a secondary analysis.
Framework
Framework This is a diagnostic test accuracy study assessing the performance of a commercially available AI tool for radiographic evaluation of knee osteoarthritis according to established grading systems. Additionally, change impact analysis will be performed where multiple versions of the AI tool are available.
Confidence intervals and P values
Confidence intervals and P values All 95% confidence intervals and P values will use an alpha of 5%.
Multiplicity
Multiplicity No explicit multiplicity correction will be performed. Instead a hierarchical approach will be taken based on tabular order of the tested hypotheses.
Statistical software
Statistical software R version 4.2.2 (or newer).
Competing Interest Statement
One author, Mikael Boesen, is a medical advisor for and shareholder of Radiobotics ApS.
Funding Statement
This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 954221 for the EIC SME Instrument project AutoRay. The work only reflects the authors' view and the European Commission is not responsible for any use that may be made from the information it contains.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Danish Patient Safety Authority of Denmark waived ethical approval for this work. The IRB of Charite Universitatsmedizin - Berlin waived ethical approval for this work. The IRB of Erasmus Medical Center waived the ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Funding: This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 954221 for the EIC SME Instrument project AutoRay. The work only reflects the authors’ view and the European Commission is not responsible for any use that may be made from the information it contains.
Data Availability
All data produced in the present study are available upon reasonable request to the corresponding author.