Predicting Stroke and Mortality in Mitral Regurgitation: A Gradient Boosting Approach

Introduction: We hypothesized that an interpretable gradient boosting machine (GBM) model considering comorbidities, P-wave and echocardiographic measurements, can better predict mortality and cerebrovascular events in mitral regurgitation (MR). Methods: Patients from a tertiary center were analyzed. The GBM model was used as an interpretable statistical approach to identify the leading indicators of high-risk patients with either outcome of CVAs and all-cause mortality. Results: A total of 706 patients were included. GBM analysis showed that age, systolic blood pressure, diastolic blood pressure, plasma albumin levels, mean P-wave duration (PWD), MR regurgitant volume, left ventricular ejection fraction (LVEF), left atrial dimension at end-systole (LADs), velocity-time integral (VTI) and effective regurgitant orifice were significant predictors of TIA/stroke. Age, sodium, urea and albumin levels, platelet count, mean PWD, LVEF, LADs, left ventricular dimension at end systole (LVDs) and VTI were significant predictors of all-cause mortality. The GBM demonstrates the best predictive performance in terms of precision, sensitivity c-statistic and F1-score compared to logistic regression, decision tree, random forest, support vector machine, and artificial neural networks. Conclusion: Gradient boosting model incorporating clinical data from different investigative modalities significantly improves risk prediction performance and identify key indicators for outcome prediction in MR.


Introduction
Mitral regurgitation (MR), defined as a retrograde flow through the mitral valve during ventricular systole, is one of the most prevalent valvular heart disease worldwide with an estimated incidence of 1.7% [1][2][3]. The combination of aortic stenosis and MR accounts for three in four cases of valvular disease with an age-dependent increase in incidence of greater than 6% amongst those above the age of 65 [3,4]. Due to the chronic volume overload under MR, the left ventricular function deteriorates under remodeling, which ultimately leads to pulmonary hypertension and heart failure [5,6]. The mechanisms underlying MR can organic, often degenerative, valvular defects, or functional problems secondary to left ventricular dysfunction [7]. Over the recent years, a variety of prognostic markers has been identified to improve the risk stratification in MR. Clinically, Besides from left ventricular ejection fraction, there is increasing evidence supporting the use of other electrocardiographic markers in outcome prediction, such as left ventricular end systolic diameter, peak mitral inflow velocity, and left atrial size [8][9][10].
Furthermore, P-wave indices, such as P-wave area and P-wave terminal force, were found to reflect left atrial remodeling and MR severity, hence may yield useful prognostic insights [11,12]. In terms of laboratory markers, raised serum brain natriuretic peptide was found to associated with higher risk of cardiac event [13,14]. The increase in inflammatory biomarkers, such as high-sensitivity C-reactive protein and raised neutrophil-to-lymphocyte ratio, were found to be associated with the adverse outcomes of MR, such as heart failure [15,16].
Currently, there is yet a multi-parametric approach in the risk stratification of MR.

Electrocardiographic measurements
We extracted P-wave measurements of patients in sinus rhythm at baseline determined by electrocardiography and calculated the mean P-wave duration (PWD) from the leads V1, II, III and aVF. In addition, lead V1 was used to determine the amplitude of the P-wave for each included patient. Leads V1 to V6 as well as II, III and aVF were used to determine the P-wave morphology. P-wave duration (PWD) ≥ 120 ms in the absence and presence of biphasic P-waves in the inferior leads were used to indicate partial inter-atrial block (IAB) status and advanced IAB status of the patients. P-wave dispersion was determined according to the calculated maximum difference in PWD between the leads V1, II, III and aVF. P-wave terminal force in V1 (PTFV1) was determined as the area subtended by the terminal negative component of a biphasic P-wave in lead V1, and the area was calculated by the multiplication of the duration and waveform depth [23]. Abnormal PTFV1 was defined if it was > 40 ms.mV.

Variable network analysis
One interesting exploration of the correlations between variables are the patterns of variable clustering, which then forms a variable network that can be visualized. In a variable network, each point represents a variable and each path represents a correlation between the two variables that it joins. The width and transparency of the path represent the strength of the correlation (wider and less transparent = stronger correlation). The positioning of variables can be handled by multidimensional scaling of the absolute values of the correlations. Variables that are more highly correlated appear closer together and are joined by stronger paths. Paths can also cardiovascular disease diagnosis, such as in-hospital mortality [27], congestive heart failure [28] etc. RF, first proposed by Breiman [29], is an ensemble approach for building predictive models where a forest is formed using a series of decision trees that act as "weak" learners. As individual trees, they are poor predictors, but can produce a robust prediction in aggregate form. Owing to its simple nature, lack of strong assumptions, and general high prediction performance, RF has been successfully used in many medical applications including prediction of severe asthma exacerbations [30], hospital readmissions in heart failure [31], non-invasive classification of pulmonary hypertension [32], etc. However, it should be noted that GBM is typically used with decision trees of a fixed size as base learners. RF combines results at the end of the process (by averaging or "majority rules") while GBM combines results along the process. RF builds each tree independently while GBM builds one tree at a time. GBM as an additive model works in a forward stage-wise manner, introducing a weak learner to improve the shortcomings of existing weak learners. In addition, we also include black-box-like machine learning approaches of support vector machine (SVM) and artificial neural network (ANN) as baseline models for risk stratification. Statistical analysis was conducted using Stata (Version MP 13.0) and RStudio (Version 1.1.456). Experiments are simulated on a 15-inch MacBook Pro with 2.2 GHz Intel Core i7 Processor and 16 GB RAM.

Baseline characteristics and network visualization of variables
A cohort of patients diagnosed with mitral regurgitation at a single tertiary centre (n=706; 57% male; median age: 66 [57-75] years old) was included in this study. Their clinical and laboratory parameters at baseline are shown in Table 1, stratified by cerebrovascular event . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) outcome (top) or mortality outcome (bottom). The principles of the GBM model are illustrated in Figure 1, where we build an ensemble of shallow and weak successive tree-based learners, then sequentially combine a set of weak learners to deliver improved prediction accuracy. The network of variables constructed by the correlations between variables are shown in Figure 2, yielding patterns of variable clustering and the correlation strength of variable pairs. For instance, a cluster is formed by the highly correlated variable pairs of LVEF and LVESD, LVEDD and LVESD, LVDD and LVESD, and LVEDD and LVDD. Variables that are more highly correlated appear closer together and are joined by stronger paths. Blue and red colours denotes positive and red correlation, respectively. Strong correlations are observed between sex, urea, creatinine, regurgitant volume, and MR severity, LVDs.

Performance comparisons for risk stratification in mitral regurgitation
We compare the performance of using LR, DT, RF, SVM, ANN, GBM to predict i) TIA/stroke and ii) all-cause mortality. All the models were trained with 80% of patients and tested with five-fold cross-validation approach using the remaining 20% patients. The computation results were evaluated using the following metrics of recall, precision, F1-score and area under ROC curve (AUC) ( Table 2). With cross-validation approach, the GBM model significantly produced better risk prediction performance compared with other baseline models in predicting both cerebrovascular events and all-cause mortality. As the important hyperparameters in the GBM model, the number of trees and the tree depth were tuned to be 500 and 5, respectively. The hyper-parameter tuning process was essential to improve the predictive performance of GBM model. For the SVM model, the radial kernel parameters gamma and cost of constraints violation were tuned to 0.01, and 10, respectively. For the ANN model, the number variables for the stroke/TIA prediction model is shown in Figure 4. The deciles of the distribution of the corresponding variable are shown by the log odds and the hash marks at the base of each plot. The partial dependence of each predictor accounts for the average joint effect of the other predictors in the model. Average PWD, LVEF, albumin, and age have a nonmonotonic partial dependence. They decrease over the middle range and increases nearly at the highest values. MR Volume increases sharply before reaching 150 ml, decreases before 350 ml and then increases again at the end. For SBP, the risk fluctuates before reaching 170 mmHg and then abruptly increases at the end, followed by a small decrease. DBP has a roughly monotonically decreasing partial dependence followed by a long plateau still the end. Note that these plots are not necessarily smooth, since no smoothness constraint was imposed on the fitting.
The partial dependence plots generated by GBM between the different variables and allcause mortality are shown in Figure 5. For both urea and LADs, there is a monotonic increase in the risk of mortality as their levels increase. The relationship between LVEF and mortality is complex, with mortality increase as LVEF decreases below 52%. Platelet and albumin show similar roughly monotonically decreasing partial dependence except an increase in the middle range levels. At very low VTI values, VTI decreases sharply with increasing mortality. Finally, for average PWD, there appears to be a U-shaped relationship with al-cause mortality.

Discussion
In this study, we found that an interpretable machine learning method with the consideration of baseline comorbidities, laboratory examinations reflecting inflammatory and nutritional states, electrocardiographic P-wave as well as echocardiographic measurements can . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 5, 2021. GBM generally showed better predictive performance in a series of model comparisons compared to other machine learning algorithms such as SVM and ANN [38]. Another study used a territory-wide database to predict stroke outcome and investigated the performance of SVM, ANN and also random survival forest [39], but did not compare it to a GBM approach. Another study using the United Kingdom General Practice database compared RF, LR, GBM and neural networks for first cardiovascular event in patients initially free from cardiovascular diseases [40].
GBM slowly but steadily achieves optimization by growing a series of weak decision trees in a stage-wise fashion which efficiently utilize the strengths of classification/regression trees and boosting. The superior predictive performance of the GBM model against the conventional models highlights the power of machine learning techniques in accounting for the intervariable nonlinear correlations, whilst maintaining interpretability, in outcome prediction. GBM is appropriate for risk stratification in the present study since it can improve the overall predictions by capturing the non-linearity in sparsely populated data. Besides from the advantages of quick . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 5, 2021. ; convergence and accuracy improvement, GBM avoid overfitting since it can stop learning as soon as overfitting has been detected, typically by using cross-validation. MR, a classical example of cardiovascular diseases where several factors interplay in its progression, is an ideal model for the application of machine learning. Although no other studies were noted to use machine learning in the risk stratification of MR, it has been applied in other cardiovascular diseases. For example, DT has been successfully used for both diagnosis shown to demonstrate stronger predictive power in identifying predictors for the heterogeneity in response against pharmacotherapy amongst a large cohort of heart failure patients [42]. GBM is typically used with decision trees of a fixed size as base learners. RF combines results at the end of the process, by averaging or using "majority rules", whereas GBM combines results along the process. RF builds each tree independently while GBM builds one tree at a time. GBM as an additive model works in a forward stage-wise manner, introducing a weak learner to improve the shortcomings of existing weak learners.
With increasing evidence supporting the superiority of machine learning in predictive accuracy, it is increasingly applied in multi-parametric risk stratification models. Jamthikar et al. developed a novel model based on RF that incorporates conventional risk factors with predictive features from carotid ultrasound image as an inexpensive and effective tool for cardiovascular/ stroke risk prediction [43]. Similarly, the WATCH-DM risk score was developed to predict the risk of incident heart failure during hospitalization amongst diabetic patients. The RF and DT . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.21249215 doi: medRxiv preprint based multiparametric score, which included clinical, laboratory and electrocardiographic variables, demonstrated better discrimination than the best-performing Cox-based model, which illustrate the potential in the incorporation of machine learning into clinical practice [44].
The prognostic use of markers found by the present study is well supported. Clinical predictors, such as age, diastolic and systolic blood pressure, were well established, and justified by their relation to the pathogenesis and disease progression of MR [45,46]. For laboratory markers, uric acid has been found to correlate with left ventricular remodeling, MR severity and the outcome of heart failure in MR [47]. Serum albumin was reported to be lower in those with persistent MR after acute rheumatic fever [48]. Mean platelet volume, which reflects the platelet production and function, is associated with MR severity and thromboembolism risk [49,50]. In terms of electrocardiographic markers, reduced LVEF, increased left atrial dimensions and left ventricular end diastolic diameter have been found to predict MR severity, which is associated with increased in-hospital cardiac death risk and overall mortality [47,51,52]. P wave indices reflective of left atrial remodeling, such as P wave area and P wave terminal force, was predictive of MR severity [11,12]. In this study, P-wave duration was shown to be one of the most important predictors of incident stroke and was incorporated for risk prediction in our machine learning models.

Study limitations
Several limitations should be noted. Firstly, it is limited by its retrospective nature and single ethnicity of the patients included. Secondly, data on some widely used clinical prognostic markers, such as results on exercise tolerance test, were not available for all patients. Finally, only the impact of medical or surgical treatment was not assessed in this study.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Conclusion
An interpretable machine learning risk stratification model considering multi-modality clinical data can better predict cerebrovascular events and mortality in MR. Experiments demonstrate the advantage of GBM to significantly improve the overall risk stratification performance over baseline models, including LR, DT, RF, SVM, and ANN, in addition to provide good model interpretability about the predictive strengths of predictors. Partial dependences are also observed, which benefit insightful understanding on the effects of these predictive variables upon the adverse outcomes. L a n c e t 3 9 1 , 9 6 0 -

References
L a n c e t 3 6 8 , ) .
[ . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 5, 2021. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 5, 2021. [ . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 5, 2021. ; . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 5, 2021. ; Figure 1. Diagram of sequential learning process of GBM prediction model.

Figures
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.21249215 doi: medRxiv preprint CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.21249215 doi: medRxiv preprint ity . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.21249215 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.21249215 doi: medRxiv preprint Figure 5. Partial dependence of eight most important variables in predicting all-cause mortality.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.21249215 doi: medRxiv preprint Abbreviations are the same as those defined in the legend for Figure 1.

Variable
No TIA/stroke TIA/stroke P-value Alive Dead P-value Sex