Use of machine learning to predict hypertension-related complication outcomes of varying severity

Jasmine M. McCammon; Sricharan Bandhakavi; Doreen Salek; Zhipeng Liu; Xianglian Ni; Natalie Benner; Ronda Rogers; Hollie Yoder; Shelley Riser; Farbod Rahmanian

doi:10.1101/2020.10.30.20169615

ABSTRACT

Objective A challenge in hypertension-related risk management is identifying which people are likely to develop future complications. To address this, we present administrative-claims based predictive models for hypertension-related complications.

Materials and Methods We used a national database to select 1,767,559 people with hypertension and extracted 112 features from past claims data based on their ability to predict hypertension complications in the next year. Complications affecting kidney, brain, and heart were grouped by clinical severity into three stages. Extreme gradient boosting binary classifiers for each stage were trained and tuned on 75% of the data, and performance on predicting outcomes for the remaining data and an independent dataset was evaluated.

Results In the cohort under study, 6%, 17%, and 7% of people experienced a hypertension-related complication of stage 1, stage 2, or stage 3 severity, respectively. On an independent dataset, models for all three stages performed competitively with other published algorithms by the most commonly reported metric, area under the receiver operating characteristic curve, which ranged from 0.82-0.89. Features that were important across all models for predictions included total medical cost, cost related to hypertension, age, and number of outpatient visits.

Discussion The model for stage 1 complications, such as left ventricular hypertrophy and retinopathy, is in contrast to other offerings in the field, which focus on more serious issues such as heart failure and stroke, and affords unique opportunities to intervene during earlier stages.

Conclusion Predictive analytics for hypertension outcomes can be leveraged to help mitigate the immense healthcare burden of uncontrolled hypertension.

LAY SUMMARY As the leading preventable risk factor for morbidity and mortality in the world, identifying which people with hypertension are likely to exacerbate is critically important for development of effective intervention strategies. Here we present a suite of predictive models that can predict future risk of development of hypertension-related complications. To have utility for triaging as well as identifying mild cases before they progress to critical end phases, the models predict three different stages of severity of hypertension-related complications. Our algorithms utilize variables calculated for the most recent 12 months, and predict probability of a hypertension-related complication for the next 12 months using administrative claims as the data source. Because the types of complications that have been analyzed can also result from comorbidities besides hypertension, such as diabetes and hyperlipidemia, these diagnoses are included as variables. Other variables pertain to demographic characteristics, prescription information, relevant procedures, and utilization patterns. Overall, all three models exhibited strong predictive performance. The ability to use straightforward variables found in claims data to predict future risk of disease-related complications, complemented with targeted clinical intervention strategies, has the potential to reduce cost of care and improve health outcomes for the many people living with hypertension.

BACKGROUND AND SIGNIFICANCE

Hypertension (HTN) is the chronic elevation of blood pressure due to changes in cardiac output and/or systemic vascular resistance, which can lead to organ damage and mortality if left unmanaged. The overall prevalence of HTN is high, nearly a third of the global adult population, making HTN one of the basic cost drivers in healthcare systems [1]. Despite continuing advances in treatment options, HTN remains uncontrolled in more than half of the population on medication [2]. As a result of its high prevalence and difficulty to adequately manage, HTN is the leading preventable risk factor globally for premature death and disability [3].

The contribution of HTN to morbidity and mortality is due to the long term effects of chronic, unmanaged high blood pressure. HTN itself is without symptoms, but left uncontrolled, will lead to systemic organ damage over time [4]. Initial stages of damage such as microalbuminuria and left ventricular hypertrophy are subtle and may go unrecognized until they’ve progressed to more serious complications such as chronic kidney disease and heart failure [5]. Estimates suggest that preventing hypertension could reduce cardiovascular death by 38% in females and 30% among males [6]. Another study found that 48% of population risk for stroke could be attributed to HTN, the highest of all factors examined [7]. After diabetes, HTN is the second leading cause of end-stage renal disease [8]. Therefore, the ability to predict who with HTN is likely to exacerbate, or present with HTN-related complications, before such events occur is a powerful opportunity to mitigate some of this morbidity and mortality burden.

We found a number of studies that used machine learning algorithms to predict various complications associated with HTN. Diverse data sources, including EHR, clinical trial data, and national healthcare databases from the US, Korea, China, and Europe were utilized to predict complications such as cardiovascular disease, dementia, myocardial infarction, acute coronary syndrome, cerebrovascular disease, stroke, kidney disease, heart failure, coronary heart disease, ischemic heart disease, and cardiovascular death [9-17]. In general, the number of people included in these cohorts ranged from 3,395 to 133,176; only three included people below the age of 40 and one did not report age ranges [9, 10, 16, 17]. The area under the receiver operating characteristic (AUROC) was the most commonly reported evaluation metric, ranging from 0.56-0.84, with an average of 0.71 across seven studies [9-13, 15, 17].

From this, we saw an opportunity to contribute to this strong predictive modeling work by exploiting a larger dataset, a national collection of administrative claims from commercially-insured individuals, with hypertensive members in the millions. We refer to this source as the CCAE (Commercial Claims and Encounters) database within this manuscript. Including people in the age range of 18-64 represents a lower risk population for complications, but more opportunities for targeted intervention before the risks may become unmanageable. Our independent test set did include people over 65 to assess the model’s generalizing ability to this demographic. In addition, rather than focusing on a specific type of complications, a specific organ system, or a composite of severe complication outcomes as has been done in previous studies, we chose to build separate models to predict the complications that reflect the progression of chronic HTN. Composite outcomes across organ systems were grouped into three stages of increasing clinical severity as described by Messerli and colleagues [18]. Interventions during earlier signs and symptoms of HTN complications are more likely to be successful than interventions at later stages and may help to prevent later stage HTN complications as well [19]. Features thought to correlate directly or indirectly with HTN-related complications were calculated for the intake window, a 12-month period before the year for prediction, and extreme gradient boosting (XGBoost) classifiers were trained to predict the likelihood of these outcomes in the following year. Overall, model performance was strong on the independent dataset (AUROC = 0.82-0.89), comparable or better than the previously published algorithms mentioned above, in addition to having the benefits of providing early intervention opportunities. These models exhibit practical utility from a care management perspective to triage and manage hypertensive populations, which are inherently large, for different levels of HTN complication severity.

MATERIALS AND METHODS

Study design and sample data

We used administrative claims data generated between January 1, 2016 to December 31, 2017 from commercially-insured individuals in the CCAE database. The cohort of study was selected based on data from 2016 and HTN-related complication events were identified in 2017 data. To be included in the cohort, members were presumed to have HTN based on their medical and pharmacy claims: they either had to have at least two claims 14 or more days apart with a HTN diagnosis or at least one claim with a HTN diagnosis and at least two pharmacy claims with anti-HTN medications (for medical codes used see Supplementary Table S1). They also had to be continuously enrolled in 2016 and 2017 in medical and pharmacy benefits and be at least 18 years old. We also explored cohorts that did and did not include members that had complication events during the intake window of 2016.

Features for each cohort member were then calculated for 2016 to use for predicting hypertension complications in 2017. An independent dataset of administrative claims from an independent payer not in the CCAE database was used to process data in the exact same manner and validate final model performance.

Outcome variable: Defining a hypertension complication

Outcomes were identified during the prediction window of January 1, 2017 to December 31, 2017. In order to maximize applications and utility, we chose to build three different models predicting different levels of complication severity across different target organ systems (Figure 1). Stage 1 complications are the least severe, and include proteinuria, left-ventricular hypertrophy, and retinopathy. Stage 2 complications, the next step in progression of unmanaged hypertension, include chronic kidney disease, coronary artery disease, and transient ischemic attack. The most severe complications, stage 3, include end-stage renal disease, heart failure, and stroke. As a result, stage 1 and stage 2 models can be used to target people in the early stages of HTN progression, while predictions for stage 3 complications can be used to triage for likelihood of severe HTN exacerbations for those that have already progressed to an advanced disease state. ICD-10-CM diagnosis codes [20] used to identify these complication events can be found in Supplementary Table S2. These labels were treated as non-mutually exclusive when training each model. In other words, the same cohort was used to fit all three models and members were not excluded if they had multiple stages of complications during the prediction window.

Figure 1.

Hypertension complications. Target organs affected are kidney, heart, and brain. Complications progress with increasing severity and can ultimately result in mortality. Adapted from [18].

Feature candidates

We compiled 112 different features based on previous literature and clinical information, all during the intake window of January 1, 2016 to December 31, 2016. To ascertain the feature of whether each member was taking anti-hypertensives for the first time in 2016, what we call a “progressor” [21], pharmacy data from 2015 was also used, but we allowed for progressor status as unknown if they were not enrolled in 2015. Demographic features were age and gender. Diagnoses included comorbidities common with hypertension and known to contribute to increased likelihood of disease exacerbation, such as diabetes and hyperlipidemia, as well as other conditions that could contribute to the complication set. For example, heart valve disorders could also lead to arrhythmias. The Charlson Comorbidity Index was calculated as well [22]. Prescription features pertained to anti-hypertensive drug classes: diuretics, calcium channel blockers, agents acting on the renin-angiotensin system, beta blockers, vasodilators, and adrenergic antagonists. Mono- and polytherapies as well as medication changes were evaluated [21]. Medications indicative of comorbidities, such as metformin for diabetes, were included, as well as medications known to interfere with anti-hypertensives, like steroids [23]. In the absence of clinical data, counts of unique dates with particular procedure codes were used in place of laboratory results, including tests for HbA1c and creatinine levels. Finally, utilization was assessed as counts of unique visit dates in different care settings: outpatient (OP), inpatient (IP), and emergency department (ED), as well as two cost features: total medical spending for the year and spending for claims with hypertension diagnoses.

Machine learning approaches and prediction performance evaluation

Several algorithm architectures were tested, and we decided to use the XGBoost classifier [24], which had the best performance during initial data exploration phases. Three separate models were constructed for each stage to allow flexibility in fine-tuning for the most accurate predictions of each stage of complication. For model training and evaluation, data were split 75%-25% for training and test sets using stratified sampling. Model hyperparameters tuned with random grid search and cross validation were learning rate, maximum depth, minimum child weight, and subsample rate.

Model performance was evaluated with several metrics, including area under the receiver operating characteristic (AUROC), area under the precision-recall curve (AUPRC) [25], sensitivity, specificity, negative predictive value (NPV), positive predictive (PPV) value, F1 score, and number needed to evaluate (NNE) – or the number of people needed to evaluate to get one person with the true outcome – on the test set and an independent dataset. The optimal threshold for prediction cutoff was chosen to balance sensitivity and specificity as identified by the Youden index [26]. We also report feature importance for the top 20 most important predictors for each model.

Analyses and visualizations were done using Python, version 2.7, with the scikit-learn, numpy, pandas, and seaborn libraries (Python Software Foundation).

RESULTS

Descriptive statistics of the case population

There were 1,767,559 members in the cohort of study; of these, 396,607 (22.4%) had at least one HTN-related complication of any stage. Stage 2 complications were most common with 303,150 members (17.1% of cohort), while stage 1 (99,829) and stage 3 (131,126) were less so (5.6% and 7.4%, respectively). Some members experienced complications from two or all three stages (6.5%) during the prediction window. There was a lower percentage of females (41.3% vs. 49.0%) as well as a higher average age (54.7 vs. 51.6 years old) among those that experienced a HTN-related complication versus those that did not (Figure 2). Because more than half the people who experienced a stage 1 complication also experienced a stage 2 and/or stage 3 complication, and more than a quarter of people who had a stage 2 complication also had a stage 3 complication, the summary statistics for the features below pertain to mutually exclusive roll-up categories where a person was counted for his or her most severe complication stage.

Figure 2.

Hypertension complication prevalence and demographic summary. (A) Member counts by complication stage during prediction window. Members can experience complications from multiple stages, but are only counted as ‘no complication’ status if they do not experience complications of any stage. (B) Distribution of ages by complication status shows people that do not experience a complication tend to be slightly younger. This study design did not include people below the age of 18, and because it utilized administrate data from commercially insured individuals, does not include anyone over the age of 65. (C) Proportion of males and females by complication status shows roughly equal percentages of both sexes in the no complication group, but higher proportions of males in all three complication stage groups. Comp, complications; HTN, hypertension.

Select diagnosis and prescription features as they relate to HTN complication stages are shown in Figure 3. These were chosen as they are among the top 20 most important features for one or more of the models. Summary statistics of all diagnosis- and prescription-related features are in Tables 1 and 2, respectively. Charlson Comorbidity Index [22], a discrete risk score, increased as complication severity increased (Figure 3B). For diagnosis-related features scored categorically by the presence of at least one code (n=60), all but anxiety disorder and hemochromatosis were proportionally higher in all three of the complication stage groups compared to those that did not have any complications. Thirty-one of these diagnoses increased proportionally with increasing complication severity (Table 1). Discrete prescription features, such as number of different anti-hypertensive classes and number of anti-hypertensive treatment adjustment rounds, positively correlated with complication severity (Figure 3C). For prescription features scored categorically by the presence of at least one code (n=35), 20 were proportionally higher in all complication stage groups compared to those without any hypertension-related complications. Six prescription predictors decreased with complication severity (Table 2).

View this table:

Table 1.

Diagnosis-related features for four groups: those that experienced no hypertension-related complications, stage 1, stage 2, or stage 3 complications. Member counts and proportions for people in each group are shown for most features, scored as presence or absence of at least one code, unless the feature is discrete, in which case mean and standard deviation are presented. CCI, Charlson Comorbidity Index; COPD, chronic obstructive pulmonary disease; SD, standard deviation.

View this table:

Table 2.

Predictors pertaining to pharmacy claim data. An anti-hypertensive “type” refers to its 5 digit ATC4 hierarchy code, whereas “class” refers more broadly to beta blockers, diuretics, calcium channel blockers, etc., or its 3 digit ATC4 hierarchy code. Predictors are scored categorically and presented as proportions of total number of members in that complication class, unless denoted with “mean (SD),” in which case they are discrete features and summary statistics of mean and SD are reported. Abbreviations: ACE, angiotensin converting enzyme inhibitor; ARB, angiotensin II receptor blocker; Ca, calcium; HTN, hypertension; NSAID, nonsteroidal anti-inflammatory drug; ROTO, rounds of treatment options; SD, standard deviation.

Figure 3.

Selected diagnosis and medication features. (A) Features during intake year (2016) shown as proportion of members in each group with the presence of at least one diagnosis or drug code belonging to that category. A person is grouped according to his or her most severe complication stage experienced in 2017. (B) The Charlson Comorbidity Index, a composite weighted risk score of 19 different chronic diseases, increases with increasing complication severity. (C) The count of different anti-hypertensive drug classes (diuretics, calcium channel blockers, agents acting on the renin-angiotensin system, beta blockers, vasodilators, and adrenergic antagonists) increases with increasing complication severity. Bars represent 95% confidence intervals.

Counts of relevant procedure codes (n=6 predictors) and utilization visits and costs (n=5 predictors) were all discrete features, and means and SDs of these predictors by complication stage can be found in Table 3. The mean for all of these features was lowest in the complication group versus all other complication stages. Mean utilization across all categories positively correlated with complication stage severity, as did all procedure code counts except for HbA1c and at home blood pressure monitor (Table 3, Figure 4).

View this table:

Table 3.

Features related to utilization and procedure counts. For each complication group, mean and standard deviation are reported, except for pay features, where median and interquartile range are reported. For the procedures and visit types listed, unique claim dates were counted per person during the intake window. BP, blood pressure; EKG, electrocardiogram; HbA1c, hemoglobin A1c; ED, emergency department; IP, inpatient; OP, outpatient; HTN, hypertension.

Figure 4.

Selected procedure and utilization features. A person is grouped according to his or her most severe complication stage experienced in 2017. (A) Count of procedures for hemoglobin-A1c tests is higher if HTN-related complications experienced in the following year, particularly stage 1 complications. (B-C) Count of procedures for serum uric acid and electrocardiogram procedures increase with increasing complication severity. (D) Total medical spending in previous year increases with complication severity the follow year, particularly for stage 3 complications. (E-F) Number of outpatient visits and inpatient admissions, counted by unique dates of service per member, also increase as HTN-related complications become more serious. Bars represent 95% confidence intervals.

Reducing the overall feature space to two dimensions with principal component analysis for 500 randomly selected cases shows some clustering of no complication cases, with greater spread for stage 2 and stage 3 complications (Supplemental Figure 1).

As some of the complications we are trying to predict, such as chronic kidney disease and dementia, are chronic, we evaluated their presence in our cohort during the intake window and how that may affect model performance. We found that 298,774 (17%) of the cohort had a diagnosis for one of the chronic complication events during the intake year, 216,817 of whom had a complication during the prediction year, accounting for 55% of members of the total cohort that experienced a HTN-related complication during this time. One concern is that the models are benefitting from predicting continued presence of chronic complications; however, not everyone from this group had a complication during the prediction window. Furthermore, models trained on cohorts with and without complication events during the intake window performed less well than when both groups are combined (Supplemental Table S3), so we elected to continue with the latter.

Prediction performance of ML algorithms

After calculating the features and labels, the data were split randomly into train and test sets with stratified sampling. XGBoost models were fitted and tuned on the training data, and the performance evaluated on the test set, summarized in Supplemental Figure 2. On the independent dataset (Supplemental Figure 3), the model to predict stage 2 complications exhibited the best performance across thresholds (AUROC of 0.89, AUPRC of 0.76). The model to predict stage 3 complications was next best (AUROC of 0.87, AUPRC of 0.58), outperforming the model to predict stage 1 complications (AUROC of 0.81, AUPRC of 0.36) (Figure 5).

Figure 5.

Model performance metrics on independent dataset. (A) Receiving operating characteristic and (B) precision-recall curves for models predicting stage 1, stage 2, and stage 3 complications across thresholds. Dashed lines indicate baseline random guess performance. (C) Confusion matrices based on optimized threshold cutoffs for stage 1, stage 2, and stage 3 models. AUC, area under curve; comp, complication.

At the optimized threshold for specificity and sensitivity as determined by the Youden index, the model for stage 1 complications had a specificity of 66.8%, sensitivity of 79.5%, NPV of 97.2%, PPV of 18.4%, F1 score of 0.30, and NNE of 5.5. Meanwhile, the model for stage 2 complications exhibited a specificity of 81.4%, sensitivity of 93.1%, NPV of 92.5%, PPV of 63.4%, F1 score of 0.72, and NNE of 1.6. Finally, the model for stage 3 complications had a specificity of 71.0%, sensitivity of 84.4%, NPV of 96.4%, PPV of 33.2%, F1 score of 0.48, and NNE of 3.0 (Table 4).

View this table:

Table 4.

Model performance metrics on independent dataset. Numbers calculated based on threshold cutoffs to optimize specificity and sensitivity on training data. NNE, number needed to evaluate; NPV, negative predictive value; PPV, positive predictive value.

Feature importances were extracted from the XGBoost classifier attributes and the top 20 for each model are shown in Figure 6. The top four predictors across models for all three complication stages were total medical cost, medical cost related to hypertension, outpatient visits, and age, although order varied depending on stage. Other features common to all three models were Charlson Comorbidity Index, count of anti-hypertensive drug classes, count of anti-hypertensive drugs, ED visits, IP visits, EKG count, round of treatment changes, and HbA1c count. Meanwhile, serum uric acid count (stages 1 and 2); valvular disorders and anticoagulants (stages 1 and 3); and sex, statins, beta blockers, breathing difficulties, and thiazide diuretics (stages 2 and 3) were common in two of the three models. Uniquely in the top 20 most important features for predicting stage 1 complications was previous stage 2 and stage 3 complications, edema, diabetes, and atherosclerosis. Unique to stage 2’s top 20 was hyperlipidemia and previous stage 1 complications. For stage 3, dizziness was a top 20 predictor not found for the other two models.

Figure 6.

Top 20 most important features for predicting (A) stage 1, (B) stage 2, and (C) stage 3 complications. CCI, Charlson Comorbidity Index; ED, emergency department; EKG, electrocardiogram; HbA1c, hemoglobin A1c; HTN, hypertension; IP, inpatient; OP, outpatient.

DISCUSSION

There is already a substantial body of literature predicting HTN onset (see [27] for a review), but there remains much value to be added from forecasting HTN-related complications, of which there are more limited studies [9-17]. This would serve the substantial already-diagnosed population, and serve to mitigate the HTN morbidity and mortality burden due to complications. While HTN is associated with major complications such as chronic kidney disease, heart failure, and stroke [28-30], one area that has not been explored to our knowledge is algorithms to predict earlier stage complications. These subtler, sometimes non-specific symptoms, such as microalbuminuria and cognitive dysfunction, may not be recognized as signs of HTN progression like the later stage major complications will be, and yet it is a more effective time to intervene and review treatment and management options.

Findings

To the above end, we built three separate models to predict different stages of complications that reflect the severity of HTN progression across the target organs of brain, heart, and kidney. On a large commercial population of medical and pharmacy administrative data, we identified prevalence rates of 6%, 17%, and 7%, for stage 1, stage 2, and stage 3 complications, respectively. We trained three XGBoost classifiers using 112 features to predict these different complication stages and found they performed as well or better on an independent dataset than previously published models by AUROC in predicting outcomes most similar to our stage 2 and stage 3 composites [9-13, 15, 17]. The precision or PPV of the models for stages 1 and 3 is low at the threshold chosen to optimize specificity and sensitivity, and could be improved by adjusting the threshold at the sacrifice of other metrics. PPV will always be inherently low for imbalanced datasets, but the NNE of 5.5 and 3.0 for these models is encouraging from an implementation perspective as they reflect a low number of people needed to evaluate to detect one outcome [31].

When looking at the most important features for these models, we found that all five utilization calculated pertaining to cost and visit types (IP, ED, OP) were in the top 20 across models, perhaps because they are good proxies for overall health status. Diagnosis-related features seemed more important in discriminating stage 1 complications (7 features in top 20) than for stage 2 and stage 3 complications (4 features each in the top 20). Conversely, predictors related to medications featured more heavily in the top 20 for stage 2 and stage 3 models (6 and 7 predictors, respectively) than for stage 1 (4 predictors in top 20). A feature to incorporate in the future for all of these models is to look at medication adherence, which has previously been found to increase odds of HTN complications (ischemic heart disease and cerebrovascular disease) [32].

Care management use cases

Care management for HTN involves educational programs encouraging self-management, including checking blood pressure regularly, understanding the importance of medication adherence even in the absence of symptoms, managing weight and diet, and recognizing signs indicative of HTN-related complications. Should a member become unstable, care managers can coordinate with physicians to schedule appointments, adjust medications, and recommend further lifestyle modifications. Because of the time needed to assess each patient’s unique case, identify and eliminate his or her barriers to care, organize education and resources, and build a trusting relationship, care management assets must be allocated wisely.

Because the number of people with HTN is large and likelihood for HTN-related disease exacerbation is variable, these models can be leveraged to help identify and stratify risk for individuals of a HTN population for care management. People can be rank-sorted by the algorithms’ prediction probability for each complication stage and integrated with other clinical information to assess overall acuity. These predicted complication probabilities could add additional layers for prioritization of outreach in terms of surfacing unique members that may be lower risk by other assessments. The models can also identify patients without previous complication events. Stratification of care management resources could include identifying candidates for remote monitoring programs at home [33, 34], and designating outreach resources ranging from digital strategies to personalized contact with a population health consultant, an engagement specialist, or a clinician. In this way, leveraging these predicative models can help improve the financial and health outcomes impact of care management.

Limitations

In addition to prevalence issues, part of improving the performance of the stage 1 complication model could entail specialized cohort selection. These models, which were intended to be applied to a hypertension population at large, utilized cohort selection criteria that would have included people at any stage of HTN progression. Since 58% of people that experienced a stage 1 complication also experienced a stage 2 and/or stage 3 complication, the model to predict these early complications is not necessarily fine-tuned to identify progression to these early signs and symptoms. In fact, the presence of stage 1 complications could be continued reporting of these symptoms as people progress to stages 2 and 3. Future iterations of the model will include longer longitudinal data to ensure selection of newly diagnosed HTN patients to improve stage 1 complication detection.

Another limitation of this study is that outside of the diagnoses codes pertaining specifically to hypertension complications (hypertensive heart disease, hypertensive retinopathy, and hypertensive kidney disease), other complication labels selected as part of the composite outcomes cannot necessarily be attributed directly to a person’s hypertension using only administrative data. While using only claims data could be another limitation, even in a clinical setting, determining whether one’s heart attack was due to hypertension or hyperlipidemia or any other combination of factors is not always possible, and why these comorbidities were included as features for prediction. We maintain these models predict risk of events, and their relation to hypertension is important.

CONCLUSION

Hypertension is a highly prevalent chronic disease, which left unmanaged can progress through increasingly severe complication outcomes, resulting in significant morbidity and mortality. We developed algorithms that could accurately predict different stages of HTN-related complications, grouped by severity level. The ability to predict severity of these outcomes rather than a single composite as most previous models have done allows for more nuanced use cases when these algorithms are utilized in clinical settings.

Data Availability

The data used for this manuscript are commercially licensed from the CCAE database and not publically available. The independent dataset contains sensitive personal health information and is therefore also not publically available.

COMPETING INTERESTS

All authors are employees of Geneia LLC.

DATA AVAILABILITY

SUPPLEMENTARY MATERIAL

View this table:

Supplementary Table S1:

HT diagnosis codes and HT medications

View this table:

Supplementary Table S2:

Complication diagnosis codes

View this table:

Supplementary Table S3:

Model performance on holdout data for cohorts with and without complications during the intake window Combined refers to cohort including people both with and without complications during the intake window. AUROC, area under the receiver operating characteristic curve; comp, complication; NPV, negative predictive value; PPV, positive predictive value.

SUPPLEMENTAL FIGURES

Figure S1.

Two dimensional representation of 112 feature inputs with principal component analysis.

Figure S2. Model performance on Truven holdout dataset.

(A) Receiving operating characteristic and (B) precision-recall curves for models predicting stage 1, stage 2, and stage 3 complications across thresholds. Dashed lines indicate baseline performance. (C) Other performance metrics and confusion matrices for stage 1, stage 2, and stage 3 models. AUC, area under curve; comp, complication; NNE, number needed to evaluate; NPV, negative predictive value; PPV, positive predictive value.

Figure S3.

Demographic characteristics of independent dataset population

ACKNOWLEDGMENTS

The authors thank Andrew Fairless, Mackenzie DeBoer, and Cindy Meck for helpful discussion.

Footnotes

* Certain data used in this study were supplied by International Business Machines Corporation. Any analysis, interpretation, or conclusion based on these data is solely that of the authors and not International Business Corporation.

REFERENCES

1.↵
Mills, K.T., A. Stefanescu, and J. He, The global epidemiology of hypertension. Nat Rev Nephrol, 2020. 16(4): p. 223–237.
OpenUrl
2.↵
Dorans, K.S., et al., Trends in Prevalence and Control of Hypertension According to the 2017 American College of Cardiology/American Heart Association (ACC/AHA) Guideline. J Am Heart Assoc, 2018. 7(11).
3.↵
GBD 2015 Disease and Injury Incidence and Prevalence Collaborators, Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990-2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet, 2016. 388(10053): p. 1545–1602.
OpenUrl CrossRef PubMed
4.↵
Benjamin, E.J., et al., Heart Disease and Stroke Statistics-2018 Update: A Report From the American Heart Association. Circulation, 2018. 137(12): p. e67–e492.
OpenUrl FREE Full Text
5.↵
Mennuni, S., et al., Hypertension and kidneys: unraveling complex molecular mechanisms underlying hypertensive renal damage. J Hum Hypertens, 2014. 28(2): p. 74–9.
OpenUrl CrossRef PubMed
6.↵
Patel, S.A., et al., Cardiovascular mortality associated with 5 leading risk factors: national and state preventable fractions estimated from survey data. Ann Intern Med, 2015. 163(4): p. 245–53.
OpenUrl CrossRef PubMed
7.↵
O’Donnell, M.J., et al., Global and regional effects of potentially modifiable risk factors associated with acute stroke in 32 countries (INTERSTROKE): a case-control study. Lancet, 2016. 388(10046): p. 761–75.
OpenUrl CrossRef PubMed
8.↵
US Renal Data System, USRDS 2009 Annual Data Report: Atlas of chronic kidney disease and end-stage renal disease in the United States. 2009, National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases: Bethesda, MD, USA.
9.↵
Chen, R., et al., 3-year risk prediction of Coronary Heart Disease in hypertension patients: A preliminary study. Conf Proc IEEE Eng Med Biol Soc, 2017. 2017: p. 1182–1185.
10.↵
de Simone, G., et al., Does information on systolic and diastolic function improve prediction of a cardiovascular event by left ventricular hypertrophy in arterial hypertension? Hypertension, 2010. 56(1): p. 99–104.
OpenUrl CrossRef
11.
Lacson, R.C., et al., Use of machine-learning algorithms to determine features of systolic blood pressure variability that predict poor outcomes in hypertensive patients. Clin Kidney J, 2019. 12(2): p. 206–212.
OpenUrl
12.
Lee, W., et al., Prediction of hypertension complications risk using classification techniques. Industrial Engineering & Management Systems, 2014. 13(4): p. 449–453.
OpenUrl
13.↵
Mehta, H.B., et al., Development and Validation of the RxDx-Dementia Risk Index to Predict Dementia in Patients with Type 2 Diabetes and Hypertension. J Alzheimers Dis, 2016. 49(2): p. 423–32.
OpenUrl
14.
Park, J., et al., Patient-Level Prediction of Cardio-Cerebrovascular Events in Hypertension Using Nationwide Claims Data. J Med Internet Res, 2019. 2(2): p. e11757.
OpenUrl
15.↵
Prieto-Merino, D., et al., ASCORE: an up-to-date cardiovascular risk score for hypertensive patients reflecting contemporary clinical practice developed using the (ASCOT-BPLA) trial data. J Hum Hypertens, 2013. 27(8): p. 492–6.
OpenUrl CrossRef PubMed
16.↵
Ren, Y., et al., A hybrid neural network model for predicting kidney disease in hypertension patients based on electronic health records. BMC Med Inform Decis Mak, 2019. 19(Suppl 2): p. 51.
OpenUrl
17.↵
Wu, X., et al., Value of a Machine Learning Approach for Predicting Clinical Outcomes in Young Patients With Hypertension. Hypertension, 2020. 75(5): p. 1271–1278.
OpenUrl
18.↵
Messerli, F.H., B. Williams, and E. Ritz, Essential hypertension. Lancet, 2007. 370(9587): p. 591–603.
OpenUrl CrossRef PubMed Web of Science
19.↵
Nadler, D.L. and A. Poppas, Can early management of hypertension by general practitioners improve outcome? US Cardiology Review, 2019. 13(1): p. 58–60.
OpenUrl
20.↵
World Health Organization, ICD-10: international statistical classification of diseases and related health problems. 2 ed. Vol. Tenth revision. 2004: World Health Organization.
21.↵
Bandhakavi, S., et al., Real world hypertension treatment patterns analysis identifies putative gaps and risk-associated patterns. Manuscript in preparation, 2020.
22.↵
Quan, H., et al., Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care, 2005. 43(11): p. 1130–9.
OpenUrl CrossRef PubMed Web of Science
23.↵
Hwang, A.Y., C.V. Dave, and S.M. Smith, Use of Prescription Medications That Potentially Interfere With Blood Pressure Control in New-Onset Hypertension and Treatment-Resistant Hypertension. Am J Hypertens, 2018. 3(12): p. 1324–1331.
OpenUrl
24.↵
Chen, T. and C. Guestrin, XGBoost: A Scalable Tree Boosting System. KDD ‘16: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016. August 2016: p. 785–794.
25.↵
Saito, T. and M. Rehmsmeier, The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One, 2015. 10(3): p. e0118432.
OpenUrl CrossRef PubMed
26.↵
Fluss, R., D. Faraggi, and B. Reiser, Estimation of the Youden Index and its associated cutoff point. Biom J, 2005. 47(4): p. 458–72.
OpenUrl CrossRef PubMed Web of Science
27.↵
Krittanawong, C., et al., Future Direction for Using Artificial Intelligence to Predict and Manage Hypertension. Curr Hypertens Rep, 2018. 20(9): p. 75.
OpenUrl
28.↵
Cipolla, M.J., D.S. Liebeskind, and S.L. Chan, The importance of comorbidities in ischemic stroke: Impact of hypertension on the cerebral circulation. J Cereb Blood Flow Metab, 2018. 38(12): p. 2129–2149.
OpenUrl
29.
Mahmoodi, B.K., et al., Associations of kidney disease measures with mortality and end-stage renal disease in individuals with and without hypertension: a meta-analysis. Lancet, 2012. 380(9854): p. 1649–61.
OpenUrl CrossRef PubMed Web of Science
30.↵
Slivnick, J. and B.C. Lampert, Hypertension and Heart Failure. Heart Fail Clin, 2019. 15(4): p. 531–541.
OpenUrl
31.↵
Romero-Brufau, S., et al., Why the C-statistic is not informative to evaluate early warning scores and what metrics to use. Crit Care, 2015. 19: p. 285.
OpenUrl CrossRef
32.↵
Kim, H.J., et al., Medication Adherence and the Occurrence of Complications in Patients with Newly Diagnosed Hypertension. Korean Circ J, 2016. 46(3): p. 384–93.
OpenUrl
33.↵
Fisher, N.D.L., et al., Development of an entirely remote, non-physician led hypertension management program. Clin Cardiol, 2019. 42(2): p. 285–291.
OpenUrl
34.↵
Kim, Y.N., et al., Randomized clinical trial to assess the effectiveness of remote patient monitoring and physician care in reducing office blood pressure. Hypertens Res, 2015. 38(7): p. 491–7.
OpenUrl PubMed

View the discussion thread.

Posted November 03, 2020.

Download PDF

Data/Code

Citation Tools

Subject Area

Health Informatics

Subject Areas

All Articles

Addiction Medicine (357)
Allergy and Immunology (681)
Anesthesia (182)
Cardiovascular Medicine (2697)
Dentistry and Oral Medicine (319)
Dermatology (231)
Emergency Medicine (406)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (956)
Epidemiology (12336)
Forensic Medicine (10)
Gastroenterology (773)
Genetic and Genomic Medicine (4181)
Geriatric Medicine (393)
Health Economics (690)
Health Informatics (2704)
Health Policy (1011)
Health Systems and Quality Improvement (1005)
Hematology (366)
HIV/AIDS (865)
Infectious Diseases (except HIV/AIDS) (13780)
Intensive Care and Critical Care Medicine (805)
Medical Education (401)
Medical Ethics (110)
Nephrology (446)
Neurology (3962)
Nursing (216)
Nutrition (587)
Obstetrics and Gynecology (752)
Occupational and Environmental Health (704)
Oncology (2090)
Ophthalmology (597)
Orthopedics (245)
Otolaryngology (309)
Pain Medicine (254)
Palliative Medicine (76)
Pathology (474)
Pediatrics (1136)
Pharmacology and Therapeutics (474)
Primary Care Research (464)
Psychiatry and Clinical Psychology (3503)
Public and Global Health (6596)
Radiology and Imaging (1431)
Rehabilitation Medicine and Physical Therapy (837)
Respiratory Medicine (879)
Rheumatology (416)
Sexual and Reproductive Health (415)
Sports Medicine (347)
Surgery (458)
Toxicology (56)
Transplantation (192)
Urology (170)

[1] 1.↵
Mills, K.T., A. Stefanescu, and J. He, The global epidemiology of hypertension. Nat Rev Nephrol, 2020. 16(4): p. 223–237.
OpenUrl

[2] 2.↵
Dorans, K.S., et al., Trends in Prevalence and Control of Hypertension According to the 2017 American College of Cardiology/American Heart Association (ACC/AHA) Guideline. J Am Heart Assoc, 2018. 7(11).

[3] 3.↵
GBD 2015 Disease and Injury Incidence and Prevalence Collaborators, Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990-2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet, 2016. 388(10053): p. 1545–1602.
OpenUrl CrossRef PubMed

[4] 4.↵
Benjamin, E.J., et al., Heart Disease and Stroke Statistics-2018 Update: A Report From the American Heart Association. Circulation, 2018. 137(12): p. e67–e492.
OpenUrl FREE Full Text

[5] 5.↵
Mennuni, S., et al., Hypertension and kidneys: unraveling complex molecular mechanisms underlying hypertensive renal damage. J Hum Hypertens, 2014. 28(2): p. 74–9.
OpenUrl CrossRef PubMed

[6] 6.↵
Patel, S.A., et al., Cardiovascular mortality associated with 5 leading risk factors: national and state preventable fractions estimated from survey data. Ann Intern Med, 2015. 163(4): p. 245–53.
OpenUrl CrossRef PubMed

[7] 7.↵
O’Donnell, M.J., et al., Global and regional effects of potentially modifiable risk factors associated with acute stroke in 32 countries (INTERSTROKE): a case-control study. Lancet, 2016. 388(10046): p. 761–75.
OpenUrl CrossRef PubMed

[8] 8.↵
US Renal Data System, USRDS 2009 Annual Data Report: Atlas of chronic kidney disease and end-stage renal disease in the United States. 2009, National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases: Bethesda, MD, USA.

[9] 9.↵
Chen, R., et al., 3-year risk prediction of Coronary Heart Disease in hypertension patients: A preliminary study. Conf Proc IEEE Eng Med Biol Soc, 2017. 2017: p. 1182–1185.

[10] 10.↵
de Simone, G., et al., Does information on systolic and diastolic function improve prediction of a cardiovascular event by left ventricular hypertrophy in arterial hypertension? Hypertension, 2010. 56(1): p. 99–104.
OpenUrl CrossRef

[11] 11.
Lacson, R.C., et al., Use of machine-learning algorithms to determine features of systolic blood pressure variability that predict poor outcomes in hypertensive patients. Clin Kidney J, 2019. 12(2): p. 206–212.
OpenUrl

[12] 12.
Lee, W., et al., Prediction of hypertension complications risk using classification techniques. Industrial Engineering & Management Systems, 2014. 13(4): p. 449–453.
OpenUrl

[13] 13.↵
Mehta, H.B., et al., Development and Validation of the RxDx-Dementia Risk Index to Predict Dementia in Patients with Type 2 Diabetes and Hypertension. J Alzheimers Dis, 2016. 49(2): p. 423–32.
OpenUrl

[14] 14.
Park, J., et al., Patient-Level Prediction of Cardio-Cerebrovascular Events in Hypertension Using Nationwide Claims Data. J Med Internet Res, 2019. 2(2): p. e11757.
OpenUrl

[15] 15.↵
Prieto-Merino, D., et al., ASCORE: an up-to-date cardiovascular risk score for hypertensive patients reflecting contemporary clinical practice developed using the (ASCOT-BPLA) trial data. J Hum Hypertens, 2013. 27(8): p. 492–6.
OpenUrl CrossRef PubMed

[16] 16.↵
Ren, Y., et al., A hybrid neural network model for predicting kidney disease in hypertension patients based on electronic health records. BMC Med Inform Decis Mak, 2019. 19(Suppl 2): p. 51.
OpenUrl

[17] 17.↵
Wu, X., et al., Value of a Machine Learning Approach for Predicting Clinical Outcomes in Young Patients With Hypertension. Hypertension, 2020. 75(5): p. 1271–1278.
OpenUrl

[18] 18.↵
Messerli, F.H., B. Williams, and E. Ritz, Essential hypertension. Lancet, 2007. 370(9587): p. 591–603.
OpenUrl CrossRef PubMed Web of Science

[19] 19.↵
Nadler, D.L. and A. Poppas, Can early management of hypertension by general practitioners improve outcome? US Cardiology Review, 2019. 13(1): p. 58–60.
OpenUrl

[20] 20.↵
World Health Organization, ICD-10: international statistical classification of diseases and related health problems. 2 ed. Vol. Tenth revision. 2004: World Health Organization.

[21] 21.↵
Bandhakavi, S., et al., Real world hypertension treatment patterns analysis identifies putative gaps and risk-associated patterns. Manuscript in preparation, 2020.

[22] 22.↵
Quan, H., et al., Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care, 2005. 43(11): p. 1130–9.
OpenUrl CrossRef PubMed Web of Science

[23] 23.↵
Hwang, A.Y., C.V. Dave, and S.M. Smith, Use of Prescription Medications That Potentially Interfere With Blood Pressure Control in New-Onset Hypertension and Treatment-Resistant Hypertension. Am J Hypertens, 2018. 3(12): p. 1324–1331.
OpenUrl

[24] 24.↵
Chen, T. and C. Guestrin, XGBoost: A Scalable Tree Boosting System. KDD ‘16: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016. August 2016: p. 785–794.

[25] 25.↵
Saito, T. and M. Rehmsmeier, The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One, 2015. 10(3): p. e0118432.
OpenUrl CrossRef PubMed

[26] 26.↵
Fluss, R., D. Faraggi, and B. Reiser, Estimation of the Youden Index and its associated cutoff point. Biom J, 2005. 47(4): p. 458–72.
OpenUrl CrossRef PubMed Web of Science

[27] 27.↵
Krittanawong, C., et al., Future Direction for Using Artificial Intelligence to Predict and Manage Hypertension. Curr Hypertens Rep, 2018. 20(9): p. 75.
OpenUrl

[28] 28.↵
Cipolla, M.J., D.S. Liebeskind, and S.L. Chan, The importance of comorbidities in ischemic stroke: Impact of hypertension on the cerebral circulation. J Cereb Blood Flow Metab, 2018. 38(12): p. 2129–2149.
OpenUrl

[29] 29.
Mahmoodi, B.K., et al., Associations of kidney disease measures with mortality and end-stage renal disease in individuals with and without hypertension: a meta-analysis. Lancet, 2012. 380(9854): p. 1649–61.
OpenUrl CrossRef PubMed Web of Science

[30] 30.↵
Slivnick, J. and B.C. Lampert, Hypertension and Heart Failure. Heart Fail Clin, 2019. 15(4): p. 531–541.
OpenUrl

[31] 31.↵
Romero-Brufau, S., et al., Why the C-statistic is not informative to evaluate early warning scores and what metrics to use. Crit Care, 2015. 19: p. 285.
OpenUrl CrossRef

[32] 32.↵
Kim, H.J., et al., Medication Adherence and the Occurrence of Complications in Patients with Newly Diagnosed Hypertension. Korean Circ J, 2016. 46(3): p. 384–93.
OpenUrl

[33] 33.↵
Fisher, N.D.L., et al., Development of an entirely remote, non-physician led hypertension management program. Clin Cardiol, 2019. 42(2): p. 285–291.
OpenUrl

[34] 34.↵
Kim, Y.N., et al., Randomized clinical trial to assess the effectiveness of remote patient monitoring and physician care in reducing office blood pressure. Hypertens Res, 2015. 38(7): p. 491–7.
OpenUrl PubMed

Use of machine learning to predict hypertension-related complication outcomes of varying severity

ABSTRACT

BACKGROUND AND SIGNIFICANCE

MATERIALS AND METHODS

Study design and sample data

Outcome variable: Defining a hypertension complication

Feature candidates

Machine learning approaches and prediction performance evaluation

RESULTS

Descriptive statistics of the case population

Prediction performance of ML algorithms

DISCUSSION

Findings

Care management use cases

Limitations

CONCLUSION

Data Availability

COMPETING INTERESTS

DATA AVAILABILITY

SUPPLEMENTARY MATERIAL

SUPPLEMENTAL FIGURES

ACKNOWLEDGMENTS

Footnotes

REFERENCES

Citation Manager Formats

Subject Area