Modeling and Interpreting Patient Subgroups in Hospital Readmission: Visual Analytical Approach

Background A primary goal of precision medicine is to identify patient subgroups and infer their underlying disease processes, with the aim of designing targeted interventions. However, few methods automatically identify both patient subgroups and their co-occurring characteristics simultaneously, measure their significance, and visualize the results. Such methods could enhance the interpretability of patient subgroups, and inform the design of classification and predictive models. Objectives To analyze patient subgroups in hospital readmitted patients using a three-step modeling approach. (1) Visual analytical modeling to automatically identify patient subgroups and their co-occurring comorbidities, and determine their statistical significance and clinical interpretability. (2) Classification modeling to classify patients into subgroups and measure its accuracy. (3) Prediction modeling to predict a patient's risk of readmission and compare its accuracy with and without patient subgroup information. Methods We extracted 2013-2014 Medicare data related to hospital readmission in three conditions: chronic obstructive pulmonary disease (COPD), congestive heart failure (CHF), and total hip/knee arthroplasty (THA/TKA). For each condition, we extracted cases defined as patients readmitted within 30 days of hospital discharge, and controls defined as patients not readmitted within 90 days of discharge, matched by age, gender, race, and Medicaid eligibility (n[COPD]=29,016, n[CHF]=51,550, n[THA/TKA]=16,498). These data were analyzed using: (1) bipartite networks to identify patient subgroups based on frequently co-occurring high-risk comorbidities; (2) multinomial logistic regression to classify patients into subgroups; and (3) hierarchical logistic regression to predict the risk of hospital readmission using subgroup membership, compared to standard logistic regression without subgroup membership. Results In each condition, the visual analytical model identified patient subgroups that were statistically significant (Q=0.17, 0.17, 0.31; P<.001, <.001, <.05), were significantly replicated (RI=0.92, 0.94, 0.89; P<.001, <.001, <.01), and were clinically meaningful to clinicians. (2) In each condition, the classification model had high accuracy in classifying patients into subgroups (mean accuracy=99.60%, 99.34%, 99.86%). (3) In two conditions (COPD, THA/TKA), the hierarchical prediction model had a small but statistically significant improvement in discriminating between the readmitted and not readmitted patients as measured by net reclassification improvement (NRI=.059, .11), but not as measured by the C-statistic or integrated discrimination improvement (IDI). Conclusions While the visual analytical models identified statistically and clinically significant patient subgroups, the results pinpoint the need to analyze subgroups at different levels of granularity for improving the interpretability of intra- and inter-cluster associations. The high accuracy of the classification models reflects the strong separation of the patient subgroups despite the size and density of the datasets. Finally, the small improvement in predictive accuracy suggests that comorbidities alone were not strong predictors for hospital readmission, and the need for more sophisticated subgroup modeling methods. Such advances could improve the interpretability and predictive accuracy of patient subgroup models for reducing the risk of hospital readmission and beyond.


22
A primary goal of precision medicine is to identify patient subgroups and infer their underlying disease processes, 23 with the aim of designing targeted interventions. However, few methods automatically identify both patient 24 subgroups and their co-occurring characteristics simultaneously, measure their significance, and visualize the 25 results. Such methods could enhance the interpretability of patient subgroups, and inform the design of 26 classification and predictive models. 27

Objectives 28
To analyze patient subgroups in hospital readmitted patients using a three-step modeling approach. (1) Visual 29 analytical modeling to automatically identify patient subgroups and their co-occurring comorbidities, and 30 determine their statistical significance and clinical interpretability.
(2) Classification modeling to classify patients 31 into subgroups and measure its accuracy.
(3) Prediction modeling to predict a patient's risk of readmission and 32 compare its accuracy with and without patient subgroup information. THA/TKA), the hierarchical prediction model had a small but statistically significant improvement in 49 discriminating between the readmitted and not readmitted patients as measured by net reclassification 50 improvement (NRI=.059, .11), but not as measured by the C-statistic or integrated discrimination improvement 51 (IDI). 52

Conclusions 53
While the visual analytical models identified statistically and clinically significant patient subgroups, the results 54 pinpoint the need to analyze subgroups at different levels of granularity for improving the interpretability of 55 intra-and inter-cluster associations. The high accuracy of the classification models reflects the strong separation 56 of the patient subgroups despite the size and density of the datasets. Finally, the small improvement in predictive 57 accuracy suggests that comorbidities alone were not strong predictors for hospital readmission, and the need 58 for more sophisticated subgroup modeling methods. Such advances could improve the interpretability and 59 predictive accuracy of patient subgroup models for reducing the risk of hospital readmission and beyond. 60 Background 62 A wide range of studies [1][2][3][4][5][6][7][8][9] on topics ranging from molecular to environmental determinants of health have 63 shown that most humans tend to share a subset of characteristics (e.g., comorbidities, symptoms, genetic 64 variants), forming distinct patient subgroups. A primary goal of precision medicine is to identify such patient 65 subgroups and infer their underlying disease processes to design interventions targeted to those processes [2, 66 10]. For example, recent studies in complex diseases such as breast cancer [3,4], asthma [5-7] and COVID-19 67 [11] have revealed patient subgroups, each with different underlying mechanisms precipitating the disease, and 68 therefore each requiring different interventions. 69 A critical requirement for designing such interventions is the clinical interpretability of patient subgroups. Such 70 interpretability requires clinicians to understand (a) how characteristics (e.g., comorbidities, symptoms, genetic 71 variants) frequently and significantly co-occur across patients, and (b) the risk for adverse outcomes (e.g., 72 mortality, hospital readmission) of patient subgroups that have those co-occurrences. An integration of the co-73 occurrence of characteristics, with the risk of outcomes in patient subgroups, is critical to infer the disease 74 processes underlying each patient subgroup, and to design precision interventions targeted to those patient 75 subgroups. However, few methods automatically identify both patient subgroups and their co-occurring 76 characteristics simultaneously, which is important for measuring the risk for adverse outcomes and inferring 77 their mechanisms. Such integrated methods could enhance the interpretability of patient subgroups by clinicians 78 for designing interventions, and for informing the design of classification and predictive models that provide 79 clinical decision support. 80 To address this need, we used a visual analytical method to identify and analyze patient subgroups in hospital 81 readmitted patients. While we have previously demonstrated [12] the use of visual analytics to identify patient 82 subgroups and their characteristics in hospital readmission, here we explore how the approach generalizes 83 across three hospital readmission conditions and its use in classification and predictive modeling. This was done 84 through an analytical framework for Modeling and Interpreting Patient Subgroups (MIPS) which used a three-85 step modeling approach: (1) Visual analytical modeling through bipartite networks to automatically identify 86 patient subgroups and their co-occurring characteristics, and determine their statistical significance and clinical 87 interpretability.
(2) Classification modeling through multinomial logistic regression to classify patients into 88 subgroups.
(3) Prediction modeling through logistic regression with and without subgroup information to predict 89 the risk of hospital readmission. Application of the MIPS analytical framework to three datasets helped pinpoint 90 methodological and data limitations in our approach, which provided implications for improving the 91 interpretability of patient subgroups in large and dense datasets, and for the design of clinical decision support 92 systems to prevent adverse outcomes such as hospital readmissions. 93

94
A patient subgroup is a subset of patients drawn from a population (e.g., older adults) that share one or more 95 characteristics (e.g., renal failure and diabetes). Patients have been divided into subgroups by using (a) 96 investigator-selected variables such as race for developing hierarchical regression models [ nodes represent one or more types of 126 entities (e.g., patients or comorbidities), and 127 edges between the nodes represent a 128 specific relationship between the entities. 129 Figure 1A shows a unipartite network, where 130 nodes are of the same type (often used to 131 analyze co-occurrence of comorbidities [23]). 132 In contrast, Figure 1B  However, this improved accuracy trade-offs simplicity as the evaluation requires several additional steps: build 156 multiple predictive models, predict the outcomes for each patient using the appropriate model (predicted by a classifier), aggregate the accuracy of predictions across all patients, and compare it to the predictive accuracy of 158 all patients generated from the Standard Model. Given this complexity, we used the simpler Hierarchical 159 Modeling approach as a preliminary step for leveraging patient subgroups. 160 predictive models, we used the same independent variables for our predictive models. 182 2. Exclusion of Patient Subgroups. None of the CMS models used information related to patient subgroups. 183

The Need for Automatic Identification of Patient Subgroups in Hospital Readmission
Therefore, while such models provide the risk of readmission for an individual patient, they do not leverage 184 the existence of patient subgroups known to be present among patients with hospital readmission [12]. Such 185 patient subgroups could be used in hierarchical regression models to potentially achieve higher predictive 186 accuracy. Furthermore, while the primary focus of the CMS models was on predicting the risk of readmission 187 of a patient, they provide little clinical guidance for the design of clinical interventions to address that risk.

188
In contrast, if a patient belongs to a previously-identified patient subgroup with a comorbidity profile (often 189 referred to as a phenotype), such information could be leveraged to classify patients into the best-fitting 190 phenotype, and then to use that classification as a starting point to design clinical interventions targeted to 191 the patient. As shown, the visual analytical model identifies the patient subgroups, and visualizes them through a network. 202 The classification model predicts subgroup membership for cases and controls, and uses it to measure the risk 203 of readmission within each subgroup based on its proportion of cases. This risk information is juxtaposed with 204 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) for the Race attribute, so we grouped "unknown race" and "other race" and ensured that there was an equal 242 number of them in the cases and control datasets. The low rate of missing data on race had too low a risk for 243 bias to warrant a sensitivity analysis. Appendix-1 shows the detailed inclusion and exclusion criteria used to 244 extract cases and controls for COPD, CHF, and THA/TKA, and the respective numbers of patients extracted at 245 each step, in addition to the International Classification of Diseases, Ninth Version codes (ICD-9) codes for each 246 of the three index conditions selected for analysis. Each modeling method used appropriate subsets of the above 247 data described in the sections below. 248 Variables. The dependent variable (outcome) was whether a patient with an index admission (COPD, CHF, 249 THA/TKA) had an unplanned readmission to an acute-care hospital within 30 days of discharge, as was recorded 250 in the MEDPAR file (inpatient claims) in the Medicare database. 251  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 28, 2022. ; https://doi.org/10.1101/2022.02.27.22271534 doi: medRxiv preprint The independent variables included comorbidities, and patient demographics (age, gender, race). Comorbidities 252 common in older adults were derived from three established comorbidity indices: Charlson Comorbidity Index 253 (CCI) [50], Elixhauser Comorbidity Index (ECI) [51], and the Center for Medicare and Medicare Services Condition 254 Categories (CMS-CC) used in the CMS readmission models [52] (the variables in the CMS models varied across 255 the index conditions). As these indices had overlapping comorbidities, we derived a union of them, which was 256 verified by the clinician stakeholders. They recommended that we also include the following additional variables 257 as they were pertinent to each index condition: COPD (history of sleep apnea, mechanical ventilation); CHF 258 (history of coronary artery bypass graft surgery); THA/TKA (congenital deformity of the hip joint, post-traumatic 259 osteoarthritis). For each patient in our cohort, we extracted the above comorbidities and variables from the 260 physicians, outpatient, and inpatient Medicare claims data in the 6 months before (to guard against miscoding), 261 and on the day of the index admission. 262

263
Overview of the MIPS Framework. Table 1 provides a summary of the inputs and outputs of the three-step 264 modeling approach in the MIPS framework, which was applied across the three index conditions. 265 Visual Analytical Modeling. The data used to build the visual analytical model consisted of 100% cases, and an 266 equal number of 1:1 matched controls extracted by randomly selecting a control without replacement to match 267 each case based on age, gender, race/ethnicity, and Medicaid eligibility [53]. The resulting dataset was divided 268 randomly into a training (50%) and replication (50%) dataset (we use the term replication to avoid confusion 269 with the term validation typically used in classification and prediction models). We used a bipartite network to 270 Random sample of 25% of cases and controls (with case/control labels used to validate the model) • Model Training Predicted Risk: Each patient's probability of being readmitted.

• Model Comparison
Accuracy: Net Reclassification Improvement (NRI) and Integrated Discrimination Improvement (IDI) Table 1. Inputs used to train and replicate/validate the three models, and the analytical outputs they produced.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 28, 2022. ; https://doi.org/10.1101/2022.02.27.22271534 doi: medRxiv preprint model the cases (30-day readmitted patients) and significant comorbidities in each index condition using the 271 following steps: 272 A. Model Training. The training of the bicluster network model consisted of the following two steps: 273 I. Feature Selection. Given the large number of patients and comorbidities in the dataset, we used feature 274 selection to identify comorbidities with the strongest signal and therefore interpretability for readmission 275 using the following steps: (1) excluded comorbidities with prevalence less than 1% (as is commonly done 276 in studies to reduce noise [54]); (2) selected significant comorbidities in the training dataset based on a 277 2-way interaction test using odds ratio (OR) with directionality, and correcting for multiple testing using 278 Bonferroni, and (3) tested the surviving comorbidities for replication in the replication dataset, and 279 selected those that were significant in both datasets. Appendix-2 shows the number of comorbidities, 280 and variables that were included in the analysis for each of the three index conditions. The above feature 281 selection generated a single set of significant and replicated comorbidities used for the following bipartite 282 network analysis. 283 II. Biclustering. We used bipartite networks on the training dataset to analyze heterogeneity in readmission 284 using the following steps.
(1) Removed all cases that did not have any comorbidities (as the modularity 285 maximization algorithm will trivially put disconnected nodes into a separate cluster).
(2) Represented the 286 cases (30-day readmitted patients in the training dataset) and their significant and replicated 287 comorbidities (selected in Step A) as a bipartite network. As shown in Fig. 1, the nodes represented cases 288 or comorbidities, and edges represented which case had which comorbidity.
(3) Used a bipartite 289 modularity maximization algorithm to identify the number of biclusters, their boundaries, and degree of 290 biclusteredness using modularity. Modularity is defined as the fraction of edges falling within a cluster, 291 minus the expected fraction of such edges in a network of the same size with randomly assigned edges 292 [20]. Modularity ranges from -0.5 to +1, with values >0 indicating biclustering that is higher than can be 293 expected by chance. We used the bipartite version of modularity [55, 56] to find biclusters in the network.

294
(4) Measured the significance of the bicluster modularity by comparing it to a distribution of the same 295 quantity generated from 1000 random permutations of the network, by preserving the network size 296 (number of nodes) and the network density (number of edges). 297 B. Model Replication. Repeated the above biclustering steps 1-4 to identify subgroups in the replication 298 dataset, and compared the comorbidity co-occurrence in the training dataset, to that in the replication 299 dataset using the Rand index (RI) [57]. RI measures the proportion of comorbidity pairs that co-occurred and 300 did not co-occur in a cluster in the training and replication datasets (where 0=no inter-network cluster 301 similarity, and 1=total inter-network cluster similarity). The significance of RI was measured by comparing it 302 to a distribution of the same quantity generated from 1000 random permutations of the training and 303 replication networks. All tests of statistical significance in Steps A and B were 2-sided. 304 C. Model Interpretation. The model interpretation consisted of the following steps: 305 I. Visualization. We used the following steps to visualize the network generated from the training dataset.

306
(1) Used Fruchterman-Reingold (FR) [58], a force-directed algorithm to lay out the bipartite network. 307 This layout algorithm pulls together nodes that are strongly connected, and pushes apart nodes that are 308 not. This results in nodes with a similar pattern of connections to be placed close to each other in 309 Euclidean space, and those that are dissimilar are pushed apart.
(2) As the FR algorithm often cannot 310 entirely separate clusters in large and dense networks, the network layout needs to be visually enhanced 311 before it is interpretable by clinician stakeholders. Therefore, we used the ExplodeLayout algorithm [28, 312 29] to separate the biclusters to reduce their visual overlap. This algorithm preserves the distances of 313 nodes within a bicluster, but increases the distance of nodes between clusters to improve 314 interpretability.
(3) Juxtaposed the risk of readmission with the network visualization (in response to a 315 request from the clinical stakeholders). This was done by (a) displaying comorbidity labels with their 316 univariable ORs for readmission (measured in Step A) ranked by their odds ratios (ORs) for each 317 subgroup, and (b) measuring the readmission risk for each patient subgroup based on the full case-318 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 28, 2022. ; https://doi.org/10.1101/2022.02.27.22271534 doi: medRxiv preprint control population (explained in more detail in the section on predictive modeling), and juxtaposing it 319 with the respective subgroup. 320 II. Clinical Interpretation. We used the following steps to solicit clinical interpretations of the above 321 bipartite network. (1) Recruited a pulmonologist specializing in COPD and hospital readmission to 322 interpret the COPD results, and a geriatrician with expertise in treating older adults in CHF and THA/TKA 323 to interpret the respective results.
(2) Requested each clinician stakeholder to interpret the patient 324 subgroups, their mechanisms, and potential interventions to reduce the risk for readmission. 325 Classification Modeling. As shown in the bipartite network example in Fig. 1, the biclusters identified through 326 the modularity maximization algorithm contain patient subgroups and their most frequently co-occurring 327 comorbidities with respect to other patients in the network. However, there are often many edges between 328 biclusters, revealing that many patients within a bicluster have comorbidities that exist in other biclusters. As is 329 true for most partitioning cluster methods, including modularity, membership of a new patient to each bicluster 330 is therefore probabilistic. The classification of a patient into a cluster is therefore not defined by the inclusion or 331 exclusion of comorbidities (e.g., hypertension and diabetes), but rather by the probability of being in a patient 332 subgroup. Patients are therefore similar or different, not just in a handful of carefully-selected comorbidities 333 while ignoring others, but based on all of their recorded comorbidities. This overall profile of patients reflects 334 the reality of comorbid conditions. 335 To model the above complexity, we used multinomial logistic regression [17] to develop classification models in 336 each index condition. This approach has the advantage of generating probabilities ("soft labels") for a patient to 337 belong to each patient subgroup. The models were trained, internally validated, and then applied to generate 338 information for the other two modeling methods, as described below: 339 A. Model Training. The data used to build the classification model consisted of the training dataset and 340 subgroup membership from the visual analytical model. We trained a multinomial logistic regression model 341 using the above data, with independent variables that included comorbidities identified through feature 342 selection done for the visual analytical modeling. Accuracy of the trained model was measured by calculating 343 the percentage of times the model correctly classifed the cases into the subgroups, using the highest 344 predicted probability across the subgroups ("hard labels"). 345 B. Model Internal Validation. To internally validate the classifier, we randomly split the above data into training 346 (75%) and testing (25%) datasets, 1000 times. For each iteration, we trained a model using the training 347 dataset, and measured its accuracy on the testing dataset. This was done by predicting the subgroup 348 membership using the highest predicted probability among all the subgroups. The overall predicted accuracy 349 was then estimated by calculating the mean accuracy across the 1000 models. 350 C. Model Application. Using the 100% cases, in addition to the 100% controls from July 2013-August 2014 351 (representing the entire Medicare population of each index condtion from those years), we generated the 352 following two types of information for use in the other models. (1) Used the classifier trained in Step A above, 353 to classify 100% cases and 100% controls into a subgroup. This information was used by the subsequent 354 predictive modeling.
(2) While the visual analytical model used the 1:1 matched controls for feature 355 selection, this cohort did not represent the entire population. Therefore, to accurately measure the 356 subgroup risk, we used the entire case-control population classified into the subgroups (as described in the 357 above step), and measured the proportion of cases in each subgroup. Furthermore, as requested by the 358 clinicians, we juxtaposed these subgroup risks next to the respective subgroups in the bipartite network 359 visualization, to improve their interpretability. 360 Predictive Modeling. The data used to build the predictive models consisted of 100% cases and 100% controls, 361 in addition to their subgroup membership generated from the above classification models. These data were 362 randomly spilt into a training (75%) and validation (25%) dataset. The predictive models were trained, internally 363 validated, and compared for predictive accuracy, as described below: 364 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 28, 2022. I. Discrimination (model's ability to distinguish readmitted patients from those not readmitted) was 373 measured using the C-statistic, which is identical to the area under the receiver operating characteristic 374 (ROC) curve. Model discrimination was examined using box plots to show the average risk prediction for 375 patients with and without readmission. 376 II. Calibration (model's agreement of the predicted probabilities with the observed risk) was measured using 377 calibration-in-the-large, and calibration slope, which was examined through a calibration plot showing the 378 proportion of patients actually admitted, versus deciles of predicted probability of having readmission. 379 Good calibration is when calibration-in-the-large is close to zero, and the calibration slope is close to one. 380 Since the large sample size overpowered the study, we did not measure the calibration based on statistical 381 significance (e.g., P values of the Hosmer-Lemeshow and calibration indices). 382 C. Model Comparisons. We used the chi-squared test to compare the C-statistic of the Standard Model to that 383 of the Hierarchical Model. We also measured the C-statistic of the Standard Model applied to each subgroup 384 separately. This enabled examination of how the Standard Model performed on patient subgroups to 385 identify, for example, which subgroups underperformed when using the current Standard Model. 386 Because the above models used the feature selection step to select comorbidities for use as independent 387 variables, they differed from those used in the published CMS models. Therefore, to perform a head-to-head 388 comparison with the published CMS models, we additionally developed a logistic regression model using 389 independent variables that were identical to the published CMS model (CMS Standard Model), which was 390 compared to the same model that included subgroup membership (CMS Hierarchical Model). We used the 391 chi-squared test to compare the C-statistic of the CMS Standard Model to that from the CMS Hierarchical 392 Model, in addition to the following measures of model accuracy: 393 I. Net Reclassification Improvement (NRI) measured the proportion of patients whose predicted probability 394 of readmission improved with reference to actual readmission status. We used two NRI statistics: (a) 395 categorical NRI, which predicted readmission probabilities divided into 10 sequential categories ranging 396 from 0-1, with improvement requiring a shift between categories; and (b) continuous NRI which is based 397 on the proportions of patients with any improved predicted probability of readmission, regardless of the 398 size of that improvement. 399 II. Integrated Discrimination Improvement (IDI) measured the difference in the average improvement in 400 predicted risks between the CMS Standard Model and the CMS Hierarchical Model. 401

402
Data 403 Table 2 provides a summary of the number of cases and/or controls used to develop the three models in each 404 condition. 405

406
The visual analytical modeling of readmitted patients in all three index conditions produced statistically and 407 clinically significant patient subgroups and their most frequently co-occurring comorbidities, which were 408 significantly replicated. Results from each condition are described below: 409 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 28, 2022.    Readmi�ed COPD Pa�ent (n=14,457) Comorbidity (d=30) Associa�on . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 28, 2022. The pulmonologist inspected the visualization and noted that the readmission risk of the patient subgroups had 438 a wide range (12.7% to 19.6%) with clinical (face) validity. Furthermore, the co-occurrence of comorbidities in 439 each patient subgroup was clinically meaningful with interpretations for each subgroup. Subgroup-1 had a low 440 disease burden with uncomplicated hypertension leading to the lowest risk (12.7%). This subgroup represented 441 patients with early organ dysfunction and would benefit from using checklists such as regular monitoring of 442 blood pressure in pre-discharge protocols to reduce the risk of readmission. Subgroup-3 had mainly psychosocial 443 comorbidities, which could lead to aspiration precipitating pneumonia leading to an increased risk for 444 readmission (15.9%). This subgroup would benefit from early consultation with specialists (e.g., psychiatrists, 445 therapists, neurologists, and geriatricians) that had expertise in psycho-social comorbidities, with a focus on the 446 early identification of aspiration risks and precautions. Subgroup-2 had diabetes with complications, renal failure 447 and heart failure and therefore had higher disease burden leading to an increased risk for readmission (17.8%) 448 compared to Subgroup-1. This subgroup had metabolic abnormalities with greater end-organ dysfunction and 449 would therefore benefit from case management from advanced practice providers (e.g., nurse practitioners) 450 with rigorous adherence to established guidelines to reduce the risk of readmission. Subgroup-4 had diseases 451 with end-organ damage including gastro-intestinal disorders, and therefore had the highest disease burden and 452 risk for readmission (19.6%). This subgroup would also benefit from case management with rigorous adherence 453 to established guidelines to reduce the risk of readmission. Furthermore, as patients in this subgroup typically 454 experience complications that could impair their ability to make medical decisions, they should be provided with 455 early consultation with a palliative care team to ensure that care interventions align with patients' preferences 456 and values. The geriatrician inspected the visualization and noted that the readmission risk of the patient subgroups, ranging 472 from 15.1% to 19.9%, was wide with clinical (face) validity. Furthermore, the co-occurrence of comorbidities in 473 each patient subgroup was clinically meaningful. Subgroup-1 had chronic but stable conditions, and therefore 474 had the lowest risk for readmission (15.1%). Subgroup-3 had mainly psychosocial comorbidities, but were not as 475 clinically unstable or fragile compared to subgroups 2 and 4, and therefore had medium risk (16.6%). Subgroup-476 2 had severe chronic conditions, making them clinically fragile (with potential benefits from early palliative and 477 hospice care referrals), and were therefore at high risk for readmission if non-palliative approaches were used 478 (19.9%). Subgroup-4 had severe acute conditions which were also clinically unstable, associated with substantial 479 disability and care debility, and therefore at high risk for readmission and recurrent intensive care unit (ICU) use 480 (19.9%). 481 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 28, 2022. Appendix-2) used 39 unique comorbidities identified from the three comorbidity indices plus 2 condition-specific 486 comorbidities. Of these, 11 comorbidities were removed because of <1% prevalence. Of the remaining, 11 487 survived the significance and replication testing with Bonferroni correction. The visual analytical model (Fig. 5) 488 used these surviving comorbidities (d=11), and cases consisting of readmitted patients with at least one of those 489 comorbidities (n=7,010). 490 As shown in Fig. 5 we juxtaposed a ranked list of comorbidities based on their ORs for readmission in each bicluster, in addition to 495 the risk for each of the patient subgroups. 496 The geriatrician inspected the network and noted that TKA patients, in general, were healthier compared to THA 497 patients, and therefore the network was difficult to interpret when the two index conditions were merged 498 together. While our analysis was constrained because we were using the conditions as defined by CMS, these 499 results nonetheless suggest that the interpretations did not suffer from a confirmation bias (manufactured 500 interpretations to fit the results). However, he noted that the range of readmission risk had clinical (face) validity. 501  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 28, 2022. ; https://doi.org/10.1101/2022.02.27.22271534 doi: medRxiv preprint Furthermore, subgroups 2, 4, and 5 had more severe comorbidities related to lung, heart, and kidney, and 502 therefore had a higher risk for readmission compared to subgroups 1, 6, and 7 that had less severe comorbidities 503 with a lower risk for readmission. In addition, subgroups 2, 5, 6 and 7 would benefit from chronic care case 504 management from advanced practice providers (e.g., nurse practitioners). Finally, subgroups 2 and 5 could 505 benefit from using well-established guidelines for CHF and COPD, subgroup 7 would benefit from mental health 506 care and management of psycho-social comorbidities, and subgroup 6 would benefit from care for obesity and 507 metabolic disease management. 508

509
The classification model used multinomial logistic regression in each index condition (see Appendix-3 for the 510 model coefficients) to predict the membership of patients using subgroups (identified from the above visual 511 analytical models). The results revealed that in each index condition, the classification model had high accuracy 512 in classifying all the cases in the full dataset (training dataset used in the visual analytical modeling). Similarly, 513 the internal validation results using a 75%-25% split of the above dataset also had high classification accuracy 514 (Table 3 with classification accuracy divided into quantiles). We report both results for each index condition: 515 COPD. The model correctly predicted subgroup membership for 99.90% of the cases (14443/14457) in the full 516 dataset. Furthermore, as shown in Table 3, the internal validation results revealed that the percentage of COPD 517 cases correctly assigned to a subgroup in the testing dataset, ranged from 99.10% to 100.00%, with a median 518 (Q.50) of 99.60%, and with 95% being in the range from 99.30% to 99.80%. 519 CHF. The model correctly predicted subgroup membership for 99.20% of the cases (25476/25672) in the full 520 dataset. Furthermore, as shown in Table 3, the internal validation results revealed that the percentage of CHF 521  Readmi�ed THA/TKA Pa�ent (n=7,010) Comorbidity (d=11) Associa�on . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 28, 2022. model was used to classify 100% cases and 100% controls for use in the prediction model (described below). 529 Furthermore, the proportion of cases and controls classified into each subgroup was used to calculate the risk 530 of readmission for each subgroup (see Appendix 3). As this subgroup risk information was requested by the 531 clinicians to improve interpretability of the visual analytical model, the values were juxtaposed next to the 532 respective subgroups in the bipartite network visualizations (see blue text in Fig. 3-5). that both models had a slope close to 1, and an 549 intercept close to 0 (see Appendix-4). 550 As shown in Fig. 6B, the Standard Model was used to 551 measure the predictive accuracy of patients in each 552 subgroup separately. The results showed that 553 Subgroup-1 had a lower C-statistic compared to 554 Subgroup-3 and Subgroup-4. While the C-statistics in 555 Fig. 6A and Fig. 6B cannot be compared as they are 556  Table 3. Internal validation results showing the percentage of COPD, CHF, and THA/TKA patients correctly-assigned to a subgroup by the classification models in each condition.  revealed that all models had a slope close to 1, and 571 an intercept close to 0 (see Appendix-4). 572 As shown in Fig. 7B, the Standard Model was used to 573 measure the predictive accuracy of patients in each 574 subgroup separately. The results showed that 575 Subgroup-1 had a lower C-statistic compared to 576 Subgroup-4. While the C-statistics in Fig. 7A and Fig.  577 7B cannot be compared as they are based on models developed from different populations, but similar to the 578 results in COPD, these results reveal that the current CMS readmission model for CHF might be underperforming 579 for one CHF patient subgroup, pinpointing which one might benefit by a Subgroup-Specific Model. Model that had a C-statistic of 0.638 (95% CI: 0.629-0.647). The calibration plots (see Appendix-4) revealed that 584 both models had a slope close to 1, and an intercept 585 close to 0 (see Appendix-4). 586 As shown in Fig. 8B, the Standard Model was used to 587 measure the predictive accuracy of patients in each 588 subgroup separately. The results showed that 589 Subgroup-1 had a lower C-statistic compared to 590 Subgroup-4. Again, while the C-statistics in Fig. 8A  591 and Fig. 8B cannot be compared as they are based 592 on models developed from different populations, 593 similar to the results in COPD, these results reveal 594 that the current CMS readmission model for 595 THA/TKA might be underperforming for 4 patient 596 subgroups, pinpointing which ones might benefit by 597 Subgroup-Specific Models. 598

CMS Standard Model vs. CMS Hierarchical Model. 599
Unlike the CMS published models, the above models 600 used only the comorbidities that survived feature 601 selection. Therefore, to perform a head-to-head 602 comparison with the published CMS models, we also 603 developed a CMS Standard Model (using the same 604 variables from the published CMS model), and 605 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 28, 2022. ; https://doi.org/10.1101/2022.02.27.22271534 doi: medRxiv preprint compared it to the corresponding CMS Hierarchical Model (with an additional variable for subgroup 606 membership) in each condition. Similar to the models in Fig. 6-8, there were no significant differences in the C-607 statistics between the two modeling approaches in any condition (see Appendix-4). However, as shown in Table  608 4 Model was used to predict readmission separately in subgroups within each index condition, it identified 613 subgroups that underperformed, pinpointing which ones might benefit by a Subgroup-Specific Model (See 614 Appendix-4). In summary, the comparisons between the CMS Standard Models and the respective CMS 615 Hierarchical Models showed that in two conditions (COPD and THA/TKA), there was a small but statistically 616 significant improvement in discriminating between the readmitted and not readmitted patients as measured by 617 NRI, but not as measured by the C-statistic or IDI, and that a subgroup in each index condition might be 618 underperforming when using the CMS Standard Model. 619

620
Overview 621 Our overall approach of using the MIPS framework for identifying patient subgroups through visual analytics, 622 and using those subgroups to build classification and prediction models, revealed strengths and limitations for 623 each modeling approach, and for our data source. This examination led to insights for developing future clinical 624 decision support systems, and a methodological framework for improving the clinical interpretability of 625 subgroup modeling results. 626

Strengths and Limitations of Modeling Methods and Data Source 627
Visual Analytical Modeling. The results revealed three strengths of the visual analytical modeling: (1) the use of 628 bipartite networks to simultaneously model patients and comorbidities, enabled the automatic identification of 629 patient-comorbidity biclusters, and the integrated analysis of co-occurrence and risk; (2) the use of a bipartite 630 modularity maximization algorithm to identify the biclusters enabled the measurement of the strength of the 631 biclustering, critical for gauging its significance; and (3) the use of a graph representation enabled the results to 632 be visualized through a network. Furthermore, the request from the domain experts to juxtapose the risk of 633 each subgroup with their visualizations appeared to be driven by a need to reduce working memory loads (from 634 having to remember that information spread over different outputs), which could have enhanced their ability to 635 match bicluster patterns with chunks (previously-learned patterns of information) stored in long-term memory. 636 The resulting visualizations enabled them to recognize subtypes based on co-occurring comorbidities in each 637 subgroup, reason about the processes that precipitate readmission based on the risk of each subtype relative to 638 the other subtypes, and propose interventions that were targeted to those subtypes and their risks. Finally, the 639 fact that the geriatrician could not fully interpret the THA/TKA network because it mixed two fairly different 640 conditions, suggests that the clinical interpretations were not the result of a confirmation bias (interpretations 641 leaning towards fitting the results). 642 However, the results also revealed two limitations: (1) while modularity is estimated using a closed-form 643 equation (formula), no closed-form equation exists to estimate the modularity variance, which is necessary to 644 measure its significance. To estimate modularity variance, we therefore used a permutation test by generating 645  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 28, 2022. ; https://doi.org/10.1101/2022.02.27.22271534 doi: medRxiv preprint 1000 random permutations of the data, and then compared the modularity generated from the real data to the 646 mean modularity generated from the permuted data. Given the size of our datasets (ranging from 7K-25K 647 patients), this computationally-expensive test took approximately 7 days to complete, despite the use of a 648 dedicated server with multiple cores; and (2) while bicluster modularity was successful in identifying significant 649 and meaningful patient-comorbidity biclusters, the visualizations themselves were extremely dense, and 650 therefore potentially concealed patterns within and between the subgroups. Future research should explore a 651 closed-form equation to estimate modularity variance, with the goal of accelerating the estimation of modularity 652 significance, and more powerful analytical and visualization methods to reveal intra-and inter-cluster 653 associations in large and dense networks. Model to measure predictive accuracy across the subgroups helped to pinpoint which subgroups tend to have 671 lower predictive accuracy compared to the rest, and therefore which of them could benefit from a more complex 672 but accurate subgroup-specific model; and (2) despite the use of a simple Hierarchical Model with a 673 dichotomized membership label for each patient, the predictive CMS models detected significant differences in 674 the prediction accuracy as measured by NRI in two of the conditions, when compared to the CMS Standard 675 Models. However, the results also revealed that the differences in predictive accuracy as measured by the C-676 statistic and NDI were small, suggesting that comorbidities on their own were potentially insufficient for 677 accurately predicting readmission. Future research should explore the use of electronic health records, and 678 multiple subgroup-specific models targeted to each subgroup, to improve the predictive accuracy of the models. underrepresented segment of the US population; and (4) data used by CMS to build predictive readmission 683 models, which enabled a head-to-head comparison with the hierarchical modeling approach. 684 However, the data had two critical limitations. (1) As we compared our models with the CMS models, we had to 685 use the same definition for controls (90 days with no readmission) that had been used, which introduced a 686 selection bias that exaggerates the separation between cases and controls. Similarly, by excluding patients who 687 died, this exclusion criterion potentially biased the results towards healthier patients.
(2) Administrative data 688 have known limitations such as the lack of comorbidity severity and test results, which could strongly impact the 689 accuracy of predictive models. Future research should consider the use of national-level electronic health record 690 (EHR) data such as those being assembled by the National COVID Cohort Collaborative (N3C) [59], and the 691 TriNetX [60] initiatives, which could overcome the above limitations by providing laboratory values and 692 comorbidity severity, but could also introduce new as yet unknown limitations. 693 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Implications for Clinical Decision-Support Systems that Leverage Patient Subgroups 694
While the focus of this project was to develop and evaluate the MIPS framework, its application to three index 695 conditions coupled with extensive discussions with clinicians led to insights for designing a future clinical decision 696 support system. Such a system could integrate outputs from all three models. As we have shown, the visual 697 analytical model automatically identified and visualized the patient subgroups, which enabled the clinicians to 698 comprehend the co-occurrence and risk information in the visualization, reason about the processes that lead 699 to readmission in each subgroup, and design targeted interventions. The classification model leveraged the 700 observation that many patients have comorbidities in other biclusters (shown by a large number of edges 701 between biclusters), and accordingly generated a membership probability of a patient belonging to each 702 bicluster, from which the highest was chosen for bicluster membership. Finally, the predictive model predicts 703 the risk for readmission for a patient, by using in the future the most accurate model designed for the bicluster 704 to which the patient belongs. 705 The outputs from the above models could be integrated into a clinical decision support system to provide 706 recommendations for a specific patient using the following algorithm: (1) use the classifier to generate the 707 membership probability (MP) of a new patient belonging to each subgroup; (2) multiply the MP in each subgroup 708 with the patient's risk (R) for readmission provided by the predictive model for that subgroup, to generate an 709 importance score [IS = f(MP) X g(R)] for the respective intervention; (3) rank the subgroups and their respective 710 interventions using IS; and (4) use the ranking to display in descending order, the subgroup comorbidity profiles 711 along with their respective potential mechanisms, recommended treatments, and the respective IS. Such model-712 based information, displayed through a user-friendly interface, could enable a clinician to rapidly scan the ranked 713 list to (a) determine why a specific patient's profile fits into one or more subgroups, (b) review the potential 714 mechanisms and interventions ranked by their importance, and (c) use the combined information to design a 715 treatment that is customized for the real-world context of the patient. Consequently, such a clinical decision 716 support system could not only provide a quantitative ranking of membership to different subgroups, and the 717 importance score for the associated interventions, but also enable the clinician to understand the rationale 718 underlying those recommendations, making the system interpretable and explainable. Comparative evaluation 719 of such a system to standard care could determine its clinical efficacy. 720

Implications for Analytical Granularity to Enhance the Interpretability of Patient Subgroups 721
While the visual analytical model enabled the clinicians to interpret the patient subgroups, they were unable to 722 interpret the associations within and between the subgroups due to the large number of nodes in each bicluster 723 and the dense edges between them. Several network filtering methods [61, 62] have been developed to "thin 724 out" such dense networks such as by dropping or bundling nodes and edges based on user-defined criteria, to 725 improve visual interpretation. However, such filtering could bias the results, or modify the clusters resulting 726 from the reduced data. 727 An alternate approach that preserves the full dataset leverages the notion of analytical granularity, where the 728 data is progressively analyzed at different levels. For example, we have analyzed COVID-19 patients [11] at the 729 cohort, subgroup, and patient levels, and we are currently using the same approach to examine symptom co-730 occurrence and risk at each level in Long COVID patients. Our preliminary results suggest that analyzing data at 731 different levels of granularity enables clinicians to progressively interpret patterns such as within and between 732 subgroups, in addition to guiding the systematic development of new algorithms. For example, at the subgroup 733 level, we have designed an algorithm that identifies which patient subgroups have a significantly higher 734 probability for having characteristics that are clustered in another subgroup, providing critical information to 735 clinicians about how to design interventions for such overlapping subgroups; at the patient level, we have 736 identified patients that are outliers to their subgroups based on their pattern of characteristics inside and outside 737 their subgroup. Such patient outliers could be flagged to examine if they need individualized interventions versus 738 those recommended for the rest of their subgroup. Such analytical granularity could therefore inform the design 739 of interventions by clinicians, in addition to the design of decision support systems that provide targeted and 740 interpretable recommendations to physicians, who can then customize them to fit the real-world context of a 741 patient. 742 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 28, 2022. ; https://doi.org/10.1101/2022.02.27.22271534 doi: medRxiv preprint

Implications of the MIPS Framework for Precision Medicine 743
While we have demonstrated the application of the MIPS framework across multiple readmission conditions, its 744 architecture has three properties that should enable its generalizability across other medical conditions. First, 745 as shown in Fig. 2, the framework is modular with explicit inputs and outputs, enabling the use of other methods 746 at each of the three modeling steps. For example, the framework could use other biclustering (e.g., Non-747 negative Matrix Factorization [63]), classification (e.g., deep learning [64]), and prediction methods (e.g., 748 subgroup-specific modeling [17]). Second, the framework is extensible, enabling an elaboration of the methods 749 at each modeling step to improve the analysis and interpretation of subgroups. For example, as discussed above, 750 the analytical granularity at the cohort, subgroup, and patient levels could improve the interpretability of 751 subgroups in large and dense datasets. Third, the framework is integrative as it systematically combines the 752 strengths of machine learning, statistical, and precision medicine approaches. For example, the visual analytical 753 modeling leverages search algorithms to discover co-occurrence in large datasets; the classification and 754 prediction modeling leverages probability theory to measure the risk of co-occurrence patterns; and clinicians 755 leverage medical knowledge and human cognition to interpret patterns of co-occurrence and risk for designing 756 precision-medicine interventions.

763
Although a primary goal of precision medicine is to identify patient subgroups and to design targeted 764 interventions, few methods automatically identify both patient subgroups and their co-occurring characteristics 765 simultaneously, measure their significance, and visualize the results. Here we demonstrated the use of the MIPS 766 framework, which used a three-step approach to model and interpret patient subgroups. A visual analytical 767 method automatically identified statistically significant and replicated patient subgroups and their frequently 768 co-occurring comorbidities. Next, a multinomial logistic regression classifier had high accuracy in correctly 769 classifying patients into the patient subgroups identified by the visual analytical model. Finally, despite using a 770 simple hierarchical logistic regression model to incorporate subgroup information, the predictive models had a 771 statistically significant improvement in discriminating between the readmitted and not readmitted patients in 772 two of the three readmission conditions, and further analysis pinpointed for which patient subgroups the current 773 CMS model might be underperforming. Finally, by integrating the co-occurrence and risk patterns in a 774 visualization, the MIPS framework enabled clinicians to interpret the patient subgroups, reason about 775 mechanisms precipitating hospital readmission, and design targeted interventions. 776 However, evaluation of the methods across three readmission index conditions also helped to identify 777 limitations of the models and the data. The visual analytical model was too dense to enable the clinicians to 778 interpret the associations within and between the subgroups, and the absence of a closed-form equation to 779 measure modularity variance required a computationally-expensive process to measure the significance of the 780 biclustering. Furthermore, the small improvement in predictive accuracy suggested that comorbidities on their 781 own were insufficient for predicting hospital readmission. 782 By leveraging the modular and extensible nature of the MIPs framework, future research should address the 783 above limitations by developing more powerful algorithms which analyze subgroups at different levels of 784 granularity to improve the interpretability of intra-and inter-cluster associations, and the evaluation of 785 subgroup-specific models to predict outcomes. Furthermore, EHR data made available through national-level 786 data initiatives such as N3C and TriNetX now provide access to critical variables including laboratory results and 787 comorbidity severity, which should lead to higher predictive power for predicting adverse outcomes. Finally, 788 extensive discussions with clinicians provided implications for the design of future decision support systems, 789 which could integrate outputs from the three models to provide for a specific patient, predicted subgroup 790 memberships, ranked interventions, along with associated subgroup profiles and mechanisms. Such 791 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 28, 2022. ; https://doi.org/10.1101/2022.02.27.22271534 doi: medRxiv preprint interpretable and explainable systems could enable clinicians to use patient subgroup information for informing 792 the design of precision medicine interventions, with the goal of reducing adverse outcomes such as unplanned 793 hospital readmissions and beyond. 794 AKNOWLEDGEMENTS 795 We thank Tianlong Chen, Clark Andersen, Yu-Li Lin, and Emmanuel Santillana for performing the analyses on this 796 project. This study was supported in part by the Patient-Centered Outcomes Research Institute (ME-1511-797 33194), the Clinical and Translational Science Award (UL1 TR001439) from the National Center for Advancing 798 Translational Sciences at the National Institutes of Health, and by the National Library of Medicine (R01 799 LM012095) at the National Institutes of Health. The content is solely the responsibility of the authors, and does 800 not necessarily represent the official views of the Patient-Centered Outcomes Research Institute, or the National 801 Institutes of Health. The Medicare data were analyzed using a CMS data-use agreement (CMS DUA RSCH-2017-802 51404). 803 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 28, 2022. ; https://doi.org/10.1101/2022.02.27.22271534 doi: medRxiv preprint