Data Driven Approach to Characterize Rapid Decline in Autosomal Dominant Polycystic Kidney Disease

Background Autosomal dominant polycystic kidney disease (ADPKD) is a genetic kidney disease with high phenotypic variability. Insights into ADPKD progression could lead to earlier detection and management prior to end stage kidney disease (ESKD). We sought to identify patients with rapid decline (RD) in kidney function and to determine clinical factors associated with RD using a data-driven approach. Methods A retrospective cohort study was performed among patients with incident ADPKD (1/1/2002-12/31/2018). Latent class mixed models were used to identify RD patients using rapidly declining eGFR trajectories over time. Predictors of RD were selected based on agreements among feature selection methods, including logistic, regularized, and random forest modeling. The final model was built on the selected predictors and clinically relevant covariates. Results Among 1,744 patients with incident ADPKD, 125 (7%) were identified as RD. Feature selection included 42 clinical measurements for adaptation with multiple imputations; mean (SD) eGFR was 85.2 (47.3) and 72.9 (34.4) in the RD and non-RD groups, respectively. Multiple imputed datasets identified variables as important features to distinguish RD and non-RD groups with the final prediction model determined as a balance between area under the curve (AUC) and clinical relevance which included 6 predictors: age, sex, hypertension, cerebrovascular disease, hemoglobin, and proteinuria. Results showed 72%-sensitivity, 70%-specificity, 70%-accuracy, and 0.77-AUC in identifying RD. 5-year ESKD rates were 38% and 7% among RD and non-RD groups, respectively. Conclusion Using real-world routine clinical data among patients with incident ADPKD, we observed that six variables highly predicted RD in kidney function.


Background
2][3][4] It is the fourth leading cause of end stage kidney disease (ESKD) accounting for 5% of ESKD in the United States. 2,5,6 Te natural progression of the disease is characterized by pathogenic genetic mutations that lead to fluid filled cysts resulting in irreversible damage to kidney parenchyma and loss of kidney function. 7,8 DPKD is genetically heterogenous and is caused by pathogenic mutations most commonly in the PKD1 (85% of cases) and PKD2 (15% of cases) genes.0][11][12][13] While 50% of patients progress to ESKD by their sixth decade of life, the decline in eGFR typically occurs rapidly later in adulthood due to initial compensatory glomerular hyperfiltration.[16] Current methods to identify rapid progression in ADPKD have relied on resource intensive prognostic approaches and tools.One of the first methods was developed using data from the Consortium for Radiologic Imaging Studies of Polycystic Kidney Disease (CRISP) cohort and Mayo database.Mayo Imaging Classification (MIC) necessitates imaging to determine height adjusted total kidney volume (htTKV), along with age, race, and sex to predict time to decline in eGFR. 7While race is used to calculate eGFR, it is not factored into determining risk of progression to ESKD.The Predicting Renal Outcome in PKD (PROPKD) score uses genotype, sex, and the .presence of a urologic event and/or hypertension before the age of 35 to predict onset of ESKD. 17,18 he PROPKD score relies on genotype results limiting its utility.Both the MIC and the PROPKD score were derived from primarily Caucasian populations and thus may be limited in generalizability. 8,19 e previously characterized and described an ADPKD cohort and observed racial/ethnic differences in the proportion of patients with kidney failure, age of kidney failure onset, and likelihood of having had kidney transplantation. 20Thus, approaches to identify rapid progressors early in the disease course with readily available clinical data and considering population diversity are warranted.
Using a large diverse population of patients with ADPKD and longitudinal eGFR data collected within a routine clinical care environment, we sought to identify rapid decline (RD) in kidney function and determine clinical factors associated with RD using a datadriven approach.

Study Population
A retrospective cohort study of Kaiser Permanente Southern California (KPSC) members between January 1, 2002 to December 31, 2018 was performed.KPSC is an integrated health system comprised of 15 medical centers and over 230 satellite clinics providing care to over 4.8 million members.The membership population is racially, ethnically, and socioeconomically diverse, reflecting the general population of Southern .California. 21Complete healthcare encounters are tracked using a comprehensive electronic health record (EHR) from which all study information were extracted.
The study population from which this cohort was identified has been previously described. 4,18 n brief, the study population included patients of any age with a minimum of 1-year continuous membership in the health plan.This time requirement was used to reliably capture incident ADPKD diagnoses and comorbidities.Inpatient and outpatient
Incident ADPKD was defined as newly diagnosed and not having a prior diagnosis of ADPKD.Patients were required to have ≥ 2 diagnosis codes on 2 separate encounter dates (which may have been consecutive days) from inpatient, emergency department (ED), or ambulatory care settings.Patients were excluded if they had a prior diagnosis of ADPKD, ≥ 2 diagnosis codes for autosomal recessive polycystic kidney disease (ARPKD) or did not have 1-year of continuous KPSC membership.Patients with ESKD (defined as treatment with dialysis or kidney transplant), eGFR <15mL/min/1.73m 2 , or no eGFR information at baseline and during follow-up were also excluded (Supplementary Figure 1).Patients were followed until they experienced ESKD, death, disenrollment from the healthcare plan, or until the end of the observation period (January 31, 2020).

Data Collection
Information on demographics, clinical characteristics, and medications were obtained for patients with incident ADPKD in the 1 year prior to the index date.All laboratory data, vital sign assessments (including blood pressure measurements and body mass index), and diagnostic and procedure codes were extracted from the EHR.Kidney function was expressed as eGFR calculated from serum creatinine levels using the 2009 Chronic Kidney Disease Epidemiology Collaboration equation. 22Proteinuria was defined as urinalysis positive for protein, urine protein/creatinine ratio >0.2 g/g, urine albumin/creatinine ratio >30mg/g, or a 24-hour urine collection with >200mg total protein or >30mg of albumin.ESKD was defined as treatment with hemodialysis, peritoneal dialysis, or kidney transplant.
Medication use was retrieved from internal pharmacy dispensing records.Health care utilization and an ever history of comorbidities (diabetes mellitus (DM), hypertension (HTN), hyperlipidemia, ischemic heart disease (ICH), congestive heart failure (CHF), cerebrovascular disease, urologic diseases, abdominal pain, and liver disease) were extracted from the EHR.The Elixhauser Comorbidity Index was also extracted from the EHR categorizing 31 comorbidities using diagnosis codes with each comorbidity being assigned a value of '1'. 23Data on hospitalizations and diagnoses that occurred outside of the KPSC healthcare system were available through administrative claims records.KPSC death records were obtained by identifying death that occurred within KPSCowned facilities. 24

Analysis
Descriptive statistics stratified by the RD vs. non-rapid decliner groups were used to report demographic and clinical characteristics among patients with ADPKD.The standardized mean difference (SMD) was used to test for distance between group means (i.e., effect size).An SMD > 0.1 indicates the distributions between the two groups are unbalanced.
Serial eGFRs were evaluated for each patient to determine the pattern of eGFR change over time.Group-based trajectory analysis was applied to identify possible patterns in potential distinct trajectory groups based on the longitudinal eGFR data using a latent class mixed model (LCMM).LCMM was performed by regressing eGFR on the measurement time assuming random effects of time between individuals.The appropriate assumptions of trajectories were explored with linear, beta, and spline distributions.For each distribution assumption, the number of distinct trajectory groups were evaluated from 2 to 5 in consideration of the sufficient sample size in each group.
The best fitted results were determined based on the smallest model Akaike Information Criterion (AIC).After distinct trajectory groups were identified in the analysis, the trajectory curves of each group were visualized using a locally weighted scatterplot smoothing (LOWESS) approach.Patients with ADPKD showing the steepest decline in kidney function were categorized in one (or more) trajectory curve(s) and were defined as the RD group.Accordingly, the other group(s) was defined as the non-rapid decliner (non-RD).
. After RD patients were identified based on the longitudinal eGFR data using LCMM, the next steps explored potential associations included baseline demographics, vital signs, comorbidities, and laboratory data.Given the numerous baseline variables that were considered as potential predictors for RD, several feature selection steps were performed to help identify the variables with higher importance.Variables with missingness above 30% were initially included, and the missing data were addressed using multiple imputation by chained equations (MICE).To achieve better reliability, three feature selection methods were applied: logistic regression, Least Absolute Shrinkage and Selection Operator (LASSO) regression, and random forest.The important features were determined based on mutual agreements between the three methods.In the logistic model, variables were selected if the p-values were less than 0.05 using likelihood ratio test.In LASSO, lambda parameter was tuned to find the top 80th percentile of important features as the variables with non-zero coefficients.In random forest, variables at top 80th percentile importance were selected.After performing the three methods in ten sets of the imputed data, pooled statistics were calculated to determine the selected variables in each method (pooled p-values in logistic model, grouped adaptive LASSO, and averaged-scaled importance in random forest).Finally, variables with mutual agreements of being selected between the methods were included as the predictors in building the prediction model based on the non-missing data.
The logistic model was used to build the prediction model for RD due to its accessibility and simplicity.Variables selected from the feature selection were included in the model, and the prediction performance was evaluated using area under the curve (AUC).A few additional clinical variables of interest were included in the model to see if the prediction performance could be improved, even if they were not initially selected in the feature selection steps.The final prediction model was determined based on the balance between model simplicity, AUC, and clinical relevance.

Study Population
A total of 1,744 patients with incident ADPKD were included in the study.The mean (SD) age of the study population was 50.6 (19.1) years, 52.4% males, 41.6% White, 13.8% Black, 31.5% Hispanic, and 9.9% Asian/Pacific Islander (Table 1).Mean blood pressure was 129/76 mm Hg and mean (SD) eGFR was 73.8 (35.6) mL/min/1.73m 2 .At baseline, 60.0% of patients had a history of HTN, 33.0% had a history of urologic disease, and 12.6% had a history of DM.

RD vs non-RD
Based on the best AIC model, we identified two groups in the longitudinal eGFR data: 125 (7%) patients in the RD group and 1,619 (93%) patients in the non-RD group (Figure 1).Patients in the RD group had a mean (SD) age of 41.5 (18.0) years vs. 51.3(19.0) years in the non-RD group.Compared to non-RD, the RD group had a higher proportion of patients who were male (55.2% vs 52.2%), Hispanic (39.2% vs 30.9%), had higher baseline BP (131/79 mm Hg vs 129/76 mm Hg), and were more likely to have HTN (64.0%vs 59.7%) and cerebrovascular disease (7.2% vs 3.1%) (Table 1).
History of urologic disease and abdominal pain were lower in the RD group (28.0% vs. 33.4%,40.8% vs. 51.1%,respectively).Mean (SD) baseline eGFR was 85.2 (47.3) and 72.9 (34.4) mL/min/1.73m 2 in the RD and non-RD groups, respectively.Baseline sodium, bicarbonate, BUN, vitamin D, HDL, CRP, PTH and albumin were lower in RD compared to non-RD (Supplemental Table 1).Baseline hemoglobin was 13.1 (1.6) g/dL and 13.6 (1.7) g/dL the in RD and non-RD groups, respectively.The RD group was found to have higher urinary protein, lower urinary phosphorus, and lower urinary calcium compared to non-RD group.

Predictors of Rapid Decline
Of the 42 variables assessed, 7 were found to have significant agreement in identifying RD including: age, hemoglobin, creatinine, proteinuria, hypertension, cerebrovascular disease, and liver disease (Table 2).In addition, 4 pre-selected clinically relevant variables were identified and included in the model: sex, race/ethnicity, diabetes, and history of urologic events.Based on a balance between clinical relevance and AUC assessed using non-missing data, creatinine and history of liver disease were excluded and sex was included in the final model.3).
Results showed 72% sensitivity, 70% specificity, 70% accuracy, and 0.77 AUC in identifying the RD group.AUC results were considered fair for values between 0.7 and 0.8 (Figure 2).

ESKD and Mortality Outcomes
Among the patients in the RD group, 59 (47.2%) progressed to ESKD with an incidence rate of 91.7 (95% CI, 75.2-111.8)per 1,000 person-years with a medium follow-up time of 4.8 years (Supplemental Table 2).Among patients in the non-RD group, 169 (10.4%) .progressed to ESKD with an incidence rate of 17.0 (95% CI, 14.7-19.6)per 1,000 person-years and median follow up time of 6.2 years.The RD group had higher incidence ESKD rates across ages compared to non-RD.After excluding patients >60 years of age, 1,169 patients progressed to ESKD before age 60 with incidence rates 87.2 (95% CI, 70.4-107.9)for RD and 13.3 (95% CI, 10.7-16.6)for non-RD.A total of 19 (15.2%) and 232 (14.3%) patients died in the RD and non-RD groups, respectively.Similar mortality rates were observed between the RD and non-RD (20.5 per 1000patient years and 21.8 per 1000-person years, respectively).
After excluding those who disenrolled or died within 5-years of incident ADPKD, 976 were identified for analysis of ESKD within 5-years.A total of 31 (38.3%)developed ESKD within 5-years among 81 in the RD group compared with 63 (7.0%) among non-RD (Supplemental Table 3).Cumulative incidence plots for ESKD, ESKD before age 60, and mortality were calculated.RD was highly associated with progression to ESKD during all follow-up years whereas cumulative index plots for mortality were similar between both RD and non-RD (Supplementary Figure 2).

Discussion
Among patients with ADPKD, there remains a need to identify aggressive phenotypes that can help guide preventative and therapeutic considerations along with resource allocation.Using 42 baseline measurements from routine clinical practice along with longitudinal eGFR data in a large population with ADPKD, we performed multiple data set analyses to identify factors associated with RD and determine a model to further .predict patients at risk for RD.Our study found 6 variables that highly predicted RD among patients with ADPKD.We observed younger age of onset, male sex, lower hemoglobin, the presence of proteinuria, hypertension, and cerebrovascular disease to be clinical predictors of RD.We demonstrated 72% sensitivity, 70% specificity, and 70% accuracy in identifying RD.
The European Renal Association-European Dialysis and Transplant Association (ERA-EDTA) algorithm for RD identification amongst ADPKD relies on historical eGFR decline to identify RD.However, using historical eGFR decline alone to identify RD is thought to be of limited value in predicting disease progression in early stages given eGFR decline in ADPKD is nonlinear and often occurs rapidly late in the disease. 14In our study, baseline eGFR was higher in the RD group compared to the non-RD group (85.2 vs 72.9 respectively).These findings are consistent with increased hyperfiltration thought to happen first in patients with ADPKD and RD prior to rapid eGFR decline.Baseline creatinine was initially identified as one of the 7 significant variables in identifying RD.
However, after including only non-missing data and balancing the clinical relevance and AUC, creatinine and history of liver disease were both excluded in our final modeling tool.
We found age at diagnosis to be significant factors in identifying RD.The CRISP observational study used a large cohort of patients and assessed several prognostic indicators among patients with ADPKD over a six year follow up-period.They similarly found younger age at diagnosis along with total kidney volume (TKV) to be an indicator . of early disease progression.Several smaller cohort studies found younger age to be a predictor of RD. 16,25 We found presence of proteinuria to also be a significant in identifying RD vs non-RD.[27][28] Total kidney volume (TKV) typically increases continuously from the early stages of the disease and is associated with decline in kidney function.Kidney enlargement progresses significantly in the early stages of the disease making TKV a significant early indicator of disease progression. 16,29,30 Oe of the most commonly used tools for identifying progression amongst ADPKD is the Mayo Imaging Classification (MIC).The MIC uses height adjusted total kidney volume (htTKV) indexed for age to predict future decline in eGFR.MRIs remain expensive and often difficult to access in many clinical settings.
The PROPKD score uses 4 variables to predict RD: male sex, history of HTN before 35 years of age, first urologic event before 35 years, and PKD mutation type.Similarly, we found male sex and history of hypertension to be 2 of the 6 significant variables in identifying RD in our final modeling tool.PROPKD further relies on identifying PKD mutation type.Genetic factors play an important role in determining severity of ADPKD.
PKD1 truncating mutations are associated with the most severe disease with average age of ESKD onset being 56 years whereas PKD1 non-truncating has an average age of ESKD onset of 68 years.PKD2 mutations are associated with the least severe .][10][11][12][13] Although genetic testing provides prognostic information for ADPKD, it is not often used in routine clinical practice and not widely available on all patients.In addition, ADPKD shows significant phenotypic variability among family members which suggests that disease modifiers including clinical and environmental factors should be considered in evaluating disease progression and prognosis. 8,31  PROPKD score uses both clinical and genetic data to predict likelihood of reaching kidney failure before the age of 60.By assigning points to patients with ADPKD, a score > 6 indicates risk of rapid progression with a 92% chance of reaching kidney failure before age 60.Use of genetic mutation in the scoring system does not take into consideration the intrafamilial variability.A recent long-term follow-up of the CRISP cohort found that while ADPKD genotype was associated with CKD outcomes, it was not considered an independent prognostic factor after adjusting for htTKV. 13Similarly, a retrospective study of 164 patients with ADPKD in Spain found the PROPKD score very specific but had low sensitivity in identifying patients with high risk for progression. 15nally, the PROPKD and MIC classification tools were both developed in primarily Caucasian populations.Our clinical risk prediction model is not only an additional tool to help determine RD across a diverse population but also relies on data from routine clinical practice which is often readily available.
Previously, treatment options for ADPKD were limited to management of symptoms and complications. 17Management was focused on hypertension, hydration, dietary changes, .treatment of pain, urinary tract infections, and ephrolithiasis. 18Most patients with ADPKD are diagnosed more than two decades before they reach ESKD. 32,33 s novel therapies are developed, there is an increased need for tools that accurately identify higher risk populations which are most likely to benefit from treatment. 34While there is no cure for ADPKD, tolvaptan, a selective vasopressin V 2 receptor blocker, has been FDA approved as the first treatment to slow kidney function in ADPKD.Tolvaptan reduced kidney growth by 45%, reduced eGFR decline by 26% in early ADPKD, and reduced eGFR decline by 35% in advanced ADPKD 19,18 .Newer therapeutic developments further illustrate the need for clinical tools and resources that reliably identify patients eligible for treatment to and guide clinical decision making to help improve outcomes.

Limitations
There are several potential limitations that may confound the interpretation of our findings.ADPKD was identified using diagnosis codes only which may have misclassified and resulted in over or under-reporting of the ADPKD incidence.However, use of EHRs to identify to rare diseases within KPSC has been described to have modestly high positive predictive values. 35Studies evaluating the accuracy of ADPKD diagnosis by ICD codes have demonstrated high sensitivity and positive predictive values exceeding 85%. 36,37 n our study, we did not have genetic information nor abdominal imaging with age adjusted TKV total kidney volume for the entire study population.Finally, our study was conducted in a single, integrated healthcare system in one geographic region and may not be generalizable to all patients with ADPKD.
. Despite these potential limitations, our ADPKD cohort remains one of the largest to date with detailed clinical information capturing laboratory results, medication use, and health care utilizations.

Conclusion
Among a large, diverse population using real-world data from routine clinical practice, we observed that 6 variables (age, hemoglobin, proteinuria, hypertension, cerebrovascular disease, and sex) highly predict RD among patients with incident ADPKD.Clinical prediction tools may serve as a practical screening tool to capture and manage high-risk patients with ADPKD who may need earlier and more intensive management strategies. .
Time to ESKD and/or mortality were the primary outcomes.ESKD rates overall and before age 53 (50 th percentile), 60 (based in the PROPKD score), 62 (based on Mayo Clinic Research), and 63 (75 th percentile) were evaluated.ESKD rates within 5 years of incident ADPKD were evaluated for risk assessment among incident ADPKD patients.Descriptive statistics and cumulative incidence plots were used to describe time to mortality and ESKD stratified by RD status for each of the outcomes of interest.The incidence rate (IR) per 1,000 person-years and the 95% confidence intervals (CI) were estimated using Poisson regression with robust standard error.All statistical analyses were generated using the SAS Enterprise Guide (version 9.4; SAS Institute, Cary, North Carolina, USA).The KPSC Institutional Review Board (IRB) reviewed and approved the protocol of this study (#11823).A waiver of informed consent was obtained due to the retrospective nature of this study.Data were accessed for research purposes for this study during the period 01/06/2019 through 01/12/2022.

Figure 1 :Figure 2 :
Figure 1: Trajectory groups using latent class mixed model assuming spline distribution The 6 final predictors of RD were younger age

Table Legend :
Table1.Baseline Characteristics and Comorbidities Among Patients in the Non-Rapid Decliner vs. Rapid Decliner Group The threshold value for top 80% importance was 1.40.†The final important features were determined based on the common agreement among the results.Important features were defined as p-values less than 0.05 in either Wald t-test or likelihood ratio test in logistic model, 10 non-zero coefficients in lasso given the proper lambda, and top 80 percentile importance in random forest.The symbol "-" is used to indicate empty, reference, or 0 coefficients. *

Table 3 .
Odds Ratio of Selected Important Features and Pre-selected Clinically Relevant Variables* The reference group for hypertension, cerebrovascular disease, positive urine protein are no history of the condition. *