RT Journal Article SR Electronic T1 Predicting Total Knee Replacement in Knee Osteoarthritis Using a Machine-Learning–Guided Approach in patients of the Osteoarthritis Initiative (OAI) JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2025.10.28.25338966 DO 10.1101/2025.10.28.25338966 A1 Blanco, Francisco J. A1 Oreiro, Natividad A1 Vázquez-García, Jorge A1 Morano-Torres, Antonio A1 Balboa-Barreiro, Vanesa A1 Rodríguez-Valle, Isabel A1 Relaño, Sara A1 Veronese, Nicola A1 de Andrés, María C. A1 Rego-Pérez, Ignacio YR 2025 UL http://medrxiv.org/content/early/2025/10/29/2025.10.28.25338966.abstract AB Objective To develop a pragmatic model to predict total knee replacement (TKR) in knee osteoarthritis (OA) using non-imaging clinical, genetic, and lifestyle data with machine-learning (ML)–guided feature selection.Methods We analyzed 3,790 Osteoarthritis Initiative (OAI) participants. Nested ML feature selection on the training set identified 15 informative variables. Classifiers were benchmarked, then a multivariable logistic regression was fit on the full cohort. Performance was summarized by discrimination (AUC with 95% CI) and calibration (Brier score). To assess the incremental value of genetics, we refit an otherwise identical Clinical model excluding the polygenic risk score (PRS) and compared specificity at fixed sensitivities using Bonferroni-adjusted McNemar tests. A pre-specified analysis examined performance by baseline Kellgren–Lawrence (KL) grade (KL 0–1 vs KL ≥2).Results On the test set, classifier AUCs ranged 0.716–0.748, with Elastic Net and XGBoost performing best. The final logistic model fit on the full cohort achieved AUC 0.765 (95% CI 0.736–0.793) with acceptable calibration (Brier 0.097). Performance remained robust by disease stage, with higher discrimination in pre-radiographic knees (KL 0–1: AUC 0.827) and moderate discrimination in KL ≥2 (AUC 0.720); decile plots indicated broadly aligned observed vs predicted risks. PRS added modest, statistically significant gains in specificity at several fixed sensitivities without materially changing AUC.Conclusions We present a pragmatic, non-imaging, ML-informed model that predicts TKR with clinically acceptable discrimination and calibration using routinely collected data. This framework provides a practical basis for individualized risk stratification and decision support without reliance on imaging.What is already known on this topic Risk of total knee replacement (TKR) in knee osteoarthritis (OA) is multifactorial and many existing models depend on imaging markers such as Kellgren–Lawrence grade or MRI findings. Established non-imaging predictors include symptoms and function (WOMAC), age, BMI, knee alignment or prior injury. Genetic scores have been explored in OA but, to date, have shown limited standalone utility compared with routine clinical factors.What this study adds This study presents a clinic-friendly, non-imaging prediction model guided by a transparent ML pipeline—nested random-forest feature selection with in-fold preprocessing and SMOTE, repeated cross-validation, and SHAP-based interpretation—that achieves acceptable discrimination and calibration in the OAI cohort. It reinforces the relevance of routine clinical factors, identifies an inverse association between Mediterranean-diet adherence and TKR risk, and evaluates the incremental—though limited—contribution of genetic risk via a polygenic risk score (PRS), with a signal that persists in pre-radiographic knees despite few events.How this study might affect research, practice or policy The model offers a practical pathway for risk stratification where imaging is unavailable or costly, supporting shared decision-making and prioritization of follow-up. It encourages precision-medicine workflows that integrate clinical and genetic information cautiously and transparently, and it sets clear directions for future work: external validation across settings, assessment in early-stage OA populations, and refinement of genetic predictors before any policy or guideline incorporation.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study has been funded by Instituto de Salud Carlos III (ISCIII) through the projects RD21/0002/0009, RD24/0007/0026, PMP22/00101, PMPTA22/00115, PI17/00210, PI22/01165, PI22/01155 and PI23/00913 and co-founded by the European Union. This work was also funded by grants IN607A 2021/07 and IN607D 2021/13 from Axencia Galega de Innovacion-Xunta de Galicia. IRP is supported by Contrato Miguel Servet-II Fondo de Investigacion Sanitaria (CPII17/00026) SERGAS-stabilized. JVG is supported by grant IN606A 2022/048 from Xunta de Galicia, Spain.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:We used data from the Osteoarthritis Initiative (OAI), a well-characterized prospective cohort of knee OA patients with publicly available data and biospecimens(https://nda.nih.gov/oai)I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesData are available upon reasonable requestAUCArea Under the CurveaMEDAlternate Mediterranean Diet ScoreBMIBody Mass IndexCIConfidence IntervaldbGAPDatabase of Genotypes and PhenotypesGBMGradient Boosting MachineGeCKOGenetic Components of Knee OsteoarthritisglmnetGeneralized Linear Model NetGWASGenome-Wide Association StudyKLKellgren & LawrenceMAFMinor Allele FrequencyMDMediterranean DietMLMachine LearningMRIMagnetic Resonance ImagingmtDNAMitochondrial Deoxyribonucleic AcidNSAIDsNon-Steroidal Anti-Inflammatory DrugsOAOsteoarthritisOAIOsteoarthritis InitiativeOROdds RatioPCAPrincipal Component AnalysisPRSPolygenic Risk ScoreQCQuality ControlRFRandom ForestROCReceiver Operating CharacteristicSHAPSHapley Additive exPlanationsSNPSingle Nucleotide PolymorphismSMOTESynthetic Minority Oversampling TechniqueSVMSupport Vector MachineTKRTotal Knee ReplacementVCFVariant Call FormatWOMACWestern Ontario and McMaster Universities Osteoarthritis IndexXGBoostExtreme Gradient Boosting