Comparing the performance of risk stratification scores in Brugada syndrome: a multi-centre study

Introduction: The management of Brugada Syndrome (BrS) patients at intermediate risk of arrhythmic events remains controversial. The present study evaluated the predictive performance of different risk scores in an Asian BrS population and its intermediate risk subgroup. Methods: This is a retrospective territory-wide cohort study of consecutive patients diagnosed with BrS from January 1st, 1997 to June 20th, 2020 in Hong Kong. The primary outcome is sustained ventricular tachyarrhythmias. A novel predictive score was developed. Machine learning-based nearest neighbor and Gaussian Naive Bayes models were also developed. The area under the receiver operator characteristic (ROC) curve (AUC) was compared between the different scores. Results: The cohort consists of 548 consecutive BrS patients (7% female, age at diagnosis: 50+/-16 years old, follow-up duration: 84+/-55 months). For risk stratification in the whole BrS cohort, the score developed by Sieira et al. showed the best performance with an AUC of 0.805, followed by the Shanghai score (0.698), and the scores by Okamura et al. (0.667), Delise et al. (0.661), Letsas et al. (0.656) and Honarbakhsh et al. (0.592). A novel risk score was developed based on variables and weighting from the best performing score (the Sieira score), with the inclusion of additional variables significant on univariable Cox regression (arrhythmias other than ventricular tachyarrhythmias, early repolarization pattern in the peripheral leads, aVR sign, S-wave in lead I and QTc [≥]436 ms). This score has the highest AUC of 0.855 (95% CI: 0.808-0.901). The Gaussian Naive Bayes model demonstrated the best performance (AUC: 0.97) compared to logistic regression and nearest neighbor models. Conclusion: The inclusion of investigation results and more complex models are needed to improve the predictive performance of risk scores in the intermediate risk BrS population.


Introduction
Brugada Syndrome (BrS) is an ion channelopathy with a characteristic electrocardiographic (ECG) pattern (BrP) of ST-elevation followed by either a coved-shaped (type 1) or saddle-shaped (type 2) slope. This disease predisposes affected patients to an increased risk of sudden cardiac death (SCD) due to sustained ventricular tachycardia/fibrillation (VT/VF) in the absence of overt structural abnormalities. Therefore, the stratification of VT/VF/SCD risk in BrS patients is critical to the management of BrS. Although BrS has a higher prevalence in Asia, a large proportion of existing research was based on registries that include mostly Caucasian subjects. (1-4) As a result, the VT/VF/SCD risk stratification tools derived were also largely based on the Western population. (5,6) Intermediate risk refers to the presence of risk factors suggestive of high and low risks, such as an asymptomatic patient presenting with spontaneous type 1 BrP. (7) Whilst it is clear that high risk patients should be referred for implantable cardioverter-defibrillator implantation, and low risk patients should be monitored regularly, it is the management of these intermediate risk patients that remains controversial (8). Recently, Probst et al. evaluated the predictive value of the Shanghai and Sieira score against intermediate risk BrS patients in the largest cohort of BrS patients to date and concluded that risk scores could not stratify the arrhythmic risk in this subpopulation. (7) However, other existing risk scores were not evaluated, with the Shanghai score not designed to be a prognostic tool. In addition, the Asian population was not assessed despite the greater prevalence of BrS in Asia.
Therefore, the present study aims to evaluate the predictive performance of different risk scores in the overall Asian BrS population and its intermediate risk subpopulation, thus examine the applicability of simple risk scores in a clinical setting.

Patient Cohort and Data Collection
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review) preprint
The primary outcome is sustained VT/VF occurring during follow-up. This was obtained from case notes by the physicians during inpatient or outpatient encounteres, and/or implantable cardioverter-defibrillator documentation where available. Continuous variables were reported as mean (standard deviation), whilst discrete variables were reported as total count (percentage). To identify predictors of the primary outcome, univariable Cox regression was performed. Findings from the drug challenge test and EPS were not included in the model since they were not universally performed. Significant univariable predictors were used as inputs for multivariable Cox regression.
Significant predictors from both the univariable and multivariable models were selected to develop predictive scores separately. For continuous variables, cut-off values were identified using the Liu method. The HR and 95% confidence interval (CI) were reported. The weighting of each parameter was adopted from the hazard ratio (HR) calculated from the results of Cox regression.
To evaluate the predictive power of the devised scores against existing scores, the area under the receiver-operator-characteristic curve (AUC) and its 95% CI were generated. The existing scores evaluated are summarized in Supplementary Table 1. (6,(10)(11)(12)(13)(14) In order to evaluate the predictive value of the scores against patients of intermediate risk, the calculation of AUC is repeated after the removal of patients scoring the first and fourth quartile for each score. Random survival forest (RSF) was applied to identify the importance ranking amongst the significant univariable predictors. The importance ranking is measured by the minimal depth and variable importance. A smaller minimal depth means the variable is more important since the variable splits the data further away from the terminal node, whilst higher variable importance refers to a greater change in prediction error when the variable of interest is absent. Statistical significance was defined as P-value <0.05.
In addition, machine learning models were developed, including nearest neighbor model and Gaussian Naïve Bayes model, to predict sustained VT/VF during follow-up, with the input of significant variables from univariable logistic regression. (15,16) A multivariable logistic regression model was used as a benchmark for model comparison. Comparative analyses were conducted according to the area under the receiver operating characteristic curve (AUC, ROC), precision, recall, and F1 score. ROC curve, precision-recall curve, and lift curve were presented. Lift is a measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model. A lift curve is a way of visualizing the performance of a classification model. The greater the area between the lift curve and the baseline, the better the model. All analysis was performed using R Studio (Version: 1.3.1073).

Baseline Characteristics and Predictors
The present cohort consists of 548 patients (7.3% females, age at diagnosis: 49.9±16.3 years old, follow-up duration: 84±55 months) ( Table 1). In total, 66 patients experienced at least one episode of sustained VT/VF during follow-up. Only 9.7% of the cohort undergone genetic testing, and therefore these results were not included as predictors. Univariable Cox regression identified the following predictors of the primary outcome: 1) evolution of BrP The importance ranking of the significant univariable predictors is displayed in Supplementary Table 2. Interestingly, the three significant multivariable predictors were not the . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 10, 2021. ; https://doi.org/10.1101/2021.11.09.21266130 doi: medRxiv preprint three most important variables. Whilst significant S wave in lead 1 (minimal depth: 1.38) is ranked the most important, the occurrence of syncope (minimal depth: 2.25) and other arrhythmias (minimal depth: 2.23) were ranked lower on the list. The strength of the pairwise interactions amongst these variables is shown in Figure 1. QTc interval remains the most influential factor.  Table 3). A novel risk score was developed based on the following steps. Firstly, the best performing score with the highest AUC, was selected from the existing scores (the Sieira score). The original weighting of the Sieira score was used. Additional variables that were significant on univariable Cox regression were selected (arrhythmias other than ventricular tachyarrhythmias, ER pattern in the peripheral leads, aVR sign, S-wave in lead I, QTc ≥ 436 ms) ( Table 4). This score has the highest AUC of 0.855 (95% CI: 0.808-0.901).
An intermediate risk subgroup was identified by ranking the patients based on our score into quartiles and including quartiles 2 and 3. All of the scores applied to this subgroup showed significantly lower AUCs. The newly developed score showed the best performance with an AUC of 0.704, followed by the scores by Sieira et al., Okamura et al., Delise et al., Shanghai score, Letsas et al. and Honarbakhsh et al. ( Table 3).

The Gaussian Naïve Bayes model and nearest neighbour model
Significant predictors identified on univariable logistic regression were used as input variables for the machine learning models, the nearest neighbor model and Gaussian Naïve Bayes model.
Their ability to predict sustained VT/VF on follow-up was determined, with a five-fold cross . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 10, 2021. validation approach. The multivariable logistic regression model was used as a benchmark for comparative analyses (Figure 2). ROC curve, precision-recall curve, and lift curve are presented accordingly. Lift curve measures the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model. A greater area between the lift curve of this model and that of the baseline model using multivariable logistic regression reflects a better model performance. The Gaussian Naïve Bayes model demonstrated the best performance with an AUC of 0.97, a F1 score of 0.87 and greatest area from the lift curve, compared to the logistic regression the nearest neighbor models (p for trends<0.001).

Discussion
To the best of our knowledge, this is the first Asian territory-wide BrS cohort study that directly compared all of the published risk scores. The major findings of the present study include: 1) simple multiparametric scores based on the combination of clinical and baseline electrocardiographic parameters can be used for risk stratification in BrS; 2) interactions between predictors can influence the predictive performance of the score; 3) spontaneous type 1 BrP, family history of SCD, syncope and inducible EPS can be useful for the risk stratification of intermediate risk patients.
Over the past decade, there has been increasing efforts in developing simple-to-use predictive scores for risk stratification in BrS. However, many either include findings from investigations that are only indicated for certain patient groups such as EPS, or include clinical or crude ECG parameters. (6,10-14) As a result, the scores are either difficult to be universally applied amongst all BrS patients, or have insufficient predictive power. Also, it should be noted that the Shanghai score was initially developed for a diagnostic, instead of a prognostic purpose. (12) The evidence supporting its use in risk stratification was based on demonstrations of differences in the arrhythmic events between patients with ≤ 3, 3.5, 4-5 and ≥ 5.5 points. (17) By contrast, Probst et al. found that . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 10, 2021. ; whilst the Shanghai score had an AUC of 0.73 and was able to distinguish between extreme risk groups, it was unable to further stratify patients at intermediate risks (7).
The improved predictive performance of the novel risk score demonstrates that the inclusion of comprehensive clinical and baseline ECG indices is sufficient as an initial risk stratification tool that can be applied at the time of diagnosis. The prognostic value of EPS remains controversial. The current evidence on the predictive power is mixed, varying between different patient subgroups. (10,26,27) A recent meta-analysis suggests that its risk stratification value is operator-and protocol-dependent. (28) Therefore, EPS should be applied on an individual basis with particular considerations towards patient factors, using standardized protocols with predefined locations for the placement of stimulation electrodes and the . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 10, 2021. ; https://doi.org/10.1101/2021.11.09.21266130 doi: medRxiv preprint pacing protocols. It may be useful for particular subgroups of patients, for example, prior studies have reported that in the case of syncope of unknown etiology, the presence of inducible EPS may reflect a higher SCD risk. (29,30)

Limitations
Several limitations for the present study should be noted. Firstly, due to the limitations in the availability of certain variables needed for particular risk scores, these scores could not be fully applied to the present cohort. For example, nocturnal agonal respiration and family history to seconddegree relatives were not recorded in case notes, and thus only a limited version of the Shanghai score was calculated. Secondly, the etiology of syncope was not documented, thus syncope of nonarrhythmic origin may be included. Thirdly, given the low rates of EPS and genetic test performance, the predictive value of findings from these two tests was not assessed. Finally, our score does not incorporate latent interactions between the risk variables, which have previously been shown to be important for risk stratification (31,32). Future studies with the integration of machine learning techniques into the predictive scores may improve the accuracy of risk stratification through the recognition of latent interactions between predictors.

Conclusion
In conclusion, simple risk scores consisting of clinical and baseline electrocardiographic indices are useful in the risk stratification of the overall BrS population. However, the inclusion of investigation results and more complex models are needed to improve the predictive performance of risk scores against the intermediate risk BrS population. The incorporation of machine learning and genomics may be a direction for future research to improve the stratification of SCD risk amongst BrS patients.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 10, 2021. ; https://doi.org/10.1101/2021.11.09.21266130 doi: medRxiv preprint   2  2  .  R  o  w  e  M  K  ,  R  o  b  e  r  t  s  J  D  .  T  h  e  e  v  o  l  u  t  i  o  n  o  f  g  e  n  e  -g  u  i  d  e  d  m  a  n  a  g  e  m  e  n  t  o  f  i  n  h  e  r  i  t  e  d  a  r  r  h  y  t  h  m  i  a  s  y  n  d  r  o  m  e  s  :  P  e  e  r  i  n  g  b  e  y  o  n  d  m  o  n  o  g  e  n  i  c  p  a  r  a  d  i  g  m  s  t  o  w  a  r  d  s  c  o  m  p  r  e  h  e  n  s  i  v  e  g  e  n  o  m  i  c  r  i  s  k  s  c  o  r  e  s  .  J  C  a  r  d  i  o  v  a  s  c  E  l  e  c  t  r  o  p  h  y  s  i  o  l  2  0  2  0  ;  3  1  :  2  9  9  8  -3  0  0  8  .  2  3  .  B  e  z  z  i  n  a  C  R  ,  B  a  r  c  J  ,  M  i  z  u  s  a  w  a  Y  e  t  a  l  .  C  o  m  m  o  n  v  a  r  i  a  n  t  s  a  t  S  C  N  5  A  -S  C  N  1  0  A  a  n  d  H  E  Y  2  a  r  e  a  s  s  o  c  i  a  t  e  d  w  i  t  h  B  r  u  g  a  d  a  s  y  n  d  r  o  m  e  ,  a  r  a  r  e  d  i  s  e  a  s  e  w  i  t  h  h  i  g  h  r  i  s  k  o  f  s  u  d  d  e  n  c  a  r  d  i  a  c  d  e  a  t  h  .  N  a  t  G  e  n  e  t  2  0  1  3 ; . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 10, 2021. ; is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 10, 2021. ;  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 10, 2021. ; is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 10, 2021. ; https://doi.org/10.1101/2021.11.09.21266130 doi: medRxiv preprint

Figure 1. Pairwise interaction between significant univariable predictors.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 10, 2021. ; https://doi.org/10.1101/2021.11.09.21266130 doi: medRxiv preprint Figure 2. Performance comparisons of machine learning models to predict for sustained VT/VF during follow-up with significant t univariable predictors as input variables.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.