@article {Chan2020.06.01.20119552, author = {Lili Chan and Girish N. Nadkarni and Fergus Fleming and James R. McCullough and Patti Connolly and Gohar Mosoyan and Fadi El Salem and Michael W. Kattan and Joseph A. Vassalotti and Barbara Murphy and Michael J. Donovan and Steven G. Coca and Scott Damrauer}, title = {Derivation and validation of a machine learning risk score using biomarker and electronic patient data to predict rapid progression of diabetic kidney disease}, elocation-id = {2020.06.01.20119552}, year = {2020}, doi = {10.1101/2020.06.01.20119552}, publisher = {Cold Spring Harbor Laboratory Press}, abstract = {Importance Diabetic kidney disease (DKD) is the leading cause of kidney failure in the United States and predicting progression is necessary for improving outcomes.Objective To develop and validate a machine-learned, prognostic risk score (KidneyIntelX{\texttrademark}) combining data from electronic health records (EHR) and circulating biomarkers to predict DKD progression.Design Observational cohort studySetting Two EHR linked biobanks: Mount Sinai BioMe Biobank and the Penn Medicine Biobank.Participants Patients with prevalent DKD (G3a-G3b with all grades of albuminuria (A1-A3) and G1 \& G2 with A2-A3 level albuminuria) and banked plasma.Main outcomes and measures Plasma biomarkers soluble tumor necrosis factor 1/2 (sTNFR1, sTNFR2) and kidney injury molecule-1 (KIM-1) were measured at baseline. Patients were divided into derivation [60\%] and validation sets [40\%]. The composite primary end point, progressive decline in kidney function, including the following: rapid kidney function decline (RKFD) (estimated glomerular filtration rate (eGFR) decline of >=5 ml/min/1.73m2/year), >=40\% sustained decline, or kidney failure within 5 years. A machine learning model (random forest) was trained and performance assessed using standard metrics.Results In 1146 patients with DKD the median age was 63, 51\% were female, median baseline eGFR was 54 ml/min/1.73 m2, urine albumin to creatinine ratio (uACR) was 61 mg/g, and follow-up was 4.3 years. 241 patients (21\%) experienced progressive decline in kidney function. On 10-fold cross validation in the derivation set (n=686), the risk model had an area under the curve (AUC) of 0.77 (95\% CI 0.74-0.79). In validation (n=460), the AUC was 0.77 (95\% CI 0.76-0.79). By comparison, the AUC for an optimized clinical model was 0.62 (95\% CI 0.61-0.63) in derivation and 0.61 (95\% CI 0.60-0.63) in validation. Using cutoffs from derivation, KidneyIntelX stratified 46\%, 37\% and 16.5\% of validation cohort into low-, intermediate- and high-risk groups, with a positive predictive value (PPV) of 62\% (vs. PPV of 37\% for the clinical model and 40\% for KDIGO; p \< 0.001) in the high-risk group and a negative predictive value (NPV) of 91\% in the low-risk group. The net reclassification index for events into high-risk group was 41\% (p\<0.05).Conclusions and Relevance A machine learned model combining plasma biomarkers and EHR data improved prediction of progressive decline in kidney function within 5 years over KDIGO and standard clinical models in patients with early DKD.Competing Interest StatementGNN, MD, SGC receive financial compensation as consultants and advisory board members for RenalytixAI, and own equity in RenalytixAI. GNN and SGC are scientific co-founders of RenalytixAI. GM, MWK and JAV are consultants for RenalytixAI. FF and JRM are Executive Directors and BM is a Non-Executive Director of RenalytixAI. SGC has received consulting fees from CHF Solutions, Relypsa, Bayer, Boehringer Ingelheim, and Takeda Pharmaceuticals in the past three years. GNN has received operational funding from Goldfinch Bio and consulting fees from BioVie Inc, AstraZeneca, Reata and GLG consulting in the past three years. This research was supported by RenalytixAI. GNN is supported by a career development award from the National Institutes of Health (NIH) (K23DK107908) and is also supported by R01DK108803, U01HG007278, U01HG009610, and 1U01DK116100. SGC and GNN are members and are supported in part by the Chronic Kidney Disease Biomarker Consortium (U01DK106962). SGC is also supported by the following grants: R01DK106085, R01HL85757, R01DK112258, and U01OH011326.Funding StatementThis project was funded by RenalytixAI plc.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The study protocol was approved by institutional review boards at both Icahn School of Medicine at Mount Sinai and University of Pennsylvania; all participants had provided broad written informed consent for research and were not specifically compensated for participation in the current study.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesData is available on request and subject to institutional approvals due to patient health information.}, URL = {https://www.medrxiv.org/content/early/2020/06/11/2020.06.01.20119552}, eprint = {https://www.medrxiv.org/content/early/2020/06/11/2020.06.01.20119552.full.pdf}, journal = {medRxiv} }