ABSTRACT
Purpose Patients with rectal cancer without distant metastases are typically treated with radical surgery. Post curative resection, several factors can affect tumor recurrence. This study aimed to analyze factors related to rectal cancer recurrence after curative resection using different machine learning techniques.
Methods Consecutive patients who underwent curative surgery for rectal cancer between 2004 and 2018 at Gil Medical Center were included. Patients with stage IV disease, colon cancer, anal cancer, other recurrent cancer, emergency surgery, or hereditary malignancies were excluded from the study. The SMOTETomek technique was used to compensate for data imbalance between recurrent and no-recurrent groups. Four machine learning methods, logistic regression (LR), support vector machine (SVM), random forest (RF), and XGBoost (XGB), were used to identify significant factors. To overfit and improve the model performance, feature importance was calculated using the permutation importance technique.
Results A total of 3320 patients were included in the study. However, after exclusion, the total sample size of the study was 961 patients. The median follow-up period was 60.8 months (range:1.2-192.4). The recurrence rate during follow-up was 13.2% (n=127). After applying the SMOTETomek method, the number of patients in both groups, recurrent and non-recurrent group were equalized to 667 patients. After analyzing for 16 variables, the top eight ranked variables (pT, sex, concurrent chemoradiotherapy, pN, age, postoperative chemotherapy, pTNM, and perineural invasion) were selected based on the order of permutational importance. The highest area under the curve (AUC) was for the SVM method (0.831). The sensitivity, specificity, and accuracy were found to be 0.692, 0.814, and 0.798, respectively. The lowest AUC was obtained for the XGBloost method (0.804), with a sensitivity, specificity, and accuracy of 0.308, 0.928, and 0.845, respectively. The variable with highest importance was pT as assessed through SVM, RF, and XGBoost (0.06, 0.12, and 0.13, respectively), whereas pTNM had the highest importance when assessed by LR (0.05).
Conclusions In the current study, SVM showed the best AUC, and the most influential factor across all machine learning methods except LR was found to be pT. Clinicians should be more alert if patients have a high pT stage during postoperative follow-up in rectal cancer patients.
1. Introduction
Colorectal cancer is a common malignant disease having the third highest incidence and second highest mortality rates worldwide [1]. Rectal cancer, accounts for approximately one-third of all colorectal cancers and has a relatively higher recurrence rates than colon cancer. This is due to the lower rectum being devoid of serosa which protects against tumor invasion through the muscle layer, and it is also technically more demanding to obtain a sufficient safety margin [2]. The 5-year recurrence rate of locally advanced rectal cancer after curative surgery is reported to be in the range of 6-27.5% [3]. Such a high rate is associated with both tumor- and treatment-related factors. Early detection and immediate treatment of rectal cancer recurrence may prevent patients from entering a dismal stage. Therefore, clinicians need to identify the factors that increase the risk of rectal cancer recurrence and be more alert during the follow-up period after surgery.
In the recent years, artificial intelligence has been in the spotlight in varied fields, with its applications in the medical field rapidly progressing. Machine learning based algorithms, which forms the basis of artificial intelligence, have been developed over the past decades for predicting disease risk, prognosis, diagnosis, and even the course of treatment in healthcare settings [4]. Further, recent studies have reported the feasibility and utility of artificial intelligence-based predicting the recurrence of several malignant diseases, including colorectal, breast, and gastric cancer [5-10]. However, in colorectal cancer, only a few studies employing machine-learning methodologies focus exclusively on recurrence prediction for rectal cancer without including colon cancer. Hence, we aimed to compare four different machine learning algorithms in terms of performance and accuracy in predicting significant risk factors for the recurrence of rectal cancer after curative resection.
2. Materials and Methods
2.1. Patient selection and dataset
We used the colorectal cancer surgery database, which was retrospectively collected from the Clinical Research Data Warehouse (CRDW) at the Gil Medical Center. The data were accessed for research purpose since Aug 27, 2021. All data has been anonymized so that individual participant could not be identified. The database included 3320 consecutive patients who underwent surgery for colorectal cancer between Jan 2004 and Dec 2018. From the databases, we identified patients who underwent curative surgery (R0 or R1 resection) for rectal cancer. Patients with stage IV disease, colon cancer, anal cancer, recurrent cancer, emergency surgery, or hereditary malignancies were excluded from the study. After exclusion, 961 patients remained eligible for the study. There were 834 and 127 patients in the no-recurrence and recurrence groups, respectively. For model training, the overall database was divided into training and testing datasets. Randomly selected each 20% of data from the recurrence and no-recurrence groups were used as the test dataset (n=193), and the remaining data were used as a training dataset (n=768).
2.2. Ethics and consent
This study obtained institutional review board approval from the Ethics Review Committee of the Gil Medical Center (approval no. GAIRB2021-316). All procedures were performed in accordance with the ethical standards of Gil Medical Center at Gachon University, and the 1964 Declaration of Helsinki and its later amendments. Because of the retrospective nature of the study, the need to obtain informed consent was waived for the individual participants by the Ethics Review Committee.
2.3 Compensating for data imbalances
In this study, we employed the SMOTETomek technique to address the data imbalance issue between the recurrence and no-recurrence groups. SMOTETomek combines oversampling and under sampling techniques, utilizing SMOTE for oversampling and the Tomek link for under sampling. SMOTE employs the k-nearest neighbor (KNN) algorithm to identify minority classes and generates new samples with randomly assigned values ranging from 0 to 1. The Tomek link eliminates samples belonging to the majority class from pairs of neighboring samples of different classes [11]. By utilizing the SMOTETomek technique, we sampled 1334, with 667 in the relapsed group and 667 in the non-relapsed group, effectively addressing and accounting for the data imbalance.
2.4 Potential predictors
The database included 43 clinical features, and surgeons initially selected 16 features that were considered clinically related to rectal cancer recurrence. The following features were analyzed by the machine learning techniques: patient baseline characteristics (age, sex, American Society of Anesthesiologists score: ASA, body mass index: BMI, and initial carcinoembryonic antigen: CEA), treatment related factors (concurrent chemoradiotherapy: CCRT, and postoperative chemotherapy), and tumor related factors (location of rectal cancer, histologic type, pT, pN, pTNM stage, lymphovascular invasion: LVI, perineural invasion: PNI, involvement of distal resection margin, and harvested lymph nodes). Tumor stage was defined according to the American Joint Committee on Cancer (AJCC) 8th edition [12]. All continuous variables were converted to incategorical variables according to their clinical significance: Age was divided into < 65, and ≥ 65 years; BMI was divided into < 25, and ≥ 25 kg/m2; Initial CEA was divided into < 5, and ≥ 5ng/ml; The number of harvested lymph nodes was divided into < 12, and ≥ 12. None of the included variables had any missing values.
2.5 Machine learning algorithms
LR is an algorithm that applies a logistic function to the coefficients obtained from linear regression to classify the values. It uses a linear combination of each independent variable to make a probability prediction and is classically and widely used to identify risk factors in medical research [13]. SVM is an algorithm that converts input data into high-dimensional spatial data and then determines the optimal decision boundary that maximizes the distance between data classes [14]. Further, Random Forest (RF) is an ensemble model that builds on the Decision Tree model. It creates multiple decision trees and aggregates the results of each tree using an ensemble technique to make a final decision [15]. XGBoost is an algorithm that addresses the shortcomings of the Gradient Boosting algorithm and is known for its speed and superior prediction performance compared with other models. Internal cross-validation was performed at each iteration to prevent overfitting [16].
2.6. Feature selection
In this study, we employed a permutation-importance technique for the feature selection. Permutation importance is a method commonly used in machine learning to assess the significance of model features, offering the advantage of applicability to any type of model. This technique quantifies the increase in prediction error when the values of the features are randomly permuted, thus breaking the relationship between the features and the actual outcome. By observing the increase in the model error for each feature, we gained some insights into the dependency of that particular attribute [17]. We utilized permutation importance to select features from a pool of 16, ultimately identifying 8 key features: PNI, pTNM, Postoperative Chemotherapy, Age, pN, Concurrent Chemoradiotherapy (CCRT), sex, and pT (Figure 1).
2.7. Optimal combination of hyperparameter
In this study, we used a grid search technique to tune the hyperparameters of each machine learning model. A grid search is an exploratory technique that determines the optimal combination of hyperparameter values by exploring all possible combinations [18]. We utilized a grid search to combine hyperparameter values for each model and cross-validated each combination using the training data to select the parameter combination exhibiting the best AUC performance.
2.8. Model performance comparison
After feature selection based on permutation importance, four machine learning algorithms were trained with selected features of the training dataset (n=1334). For model performance comparison, the following indices were used: sensitivity, specificity, accuracy, and area under the curve (AUC).
For machine learning, statistical analysis, and performance validation, we used Python software (version 3.7.0; Python Software Foundation, Wilmington, DE, USA) and the scikit-learn library (version 0.23.2). IBM SPSS (version 20; IBM Corp., Armonk, NY, USA) was used for the analysis. Statistical significance was set at p < 0.05. A schematic flowchart of the study is shown in Figure 2.
3. Results
3.1. Baseline patient demographics
A total of 961 patients were included in the study. The median follow-up period was 60.8 months (range:1.2-192.4). The recurrence rate during follow-up was 13.2% (n=127). In the chi-square test, age, initial CEA level, pT, LVI, PNI, pN, pTNM, and postoperative chemotherapy were statistically significant (p < 0.05). The baseline patient demographics are shown in Table 1.
3.2. Model performance outcomes
The highest AUC was obtained for SVM (0.831, 95% confidence interval:0.770-0.881). The sensitivity, specificity, and accuracy were 0.692 (95% confidence interval:0.482-0.857), 0.814 (95% confidence interval:0.747-0.870), and 0.798 (95% confidence interval:0.734-0.852), respectively. The lowest AUC was observed for XGB (0.804; 95% confidence interval:0.741-0.857), and its sensitivity, specificity, and accuracy were 0.308 (95% confidence interval:0.143-0.518), 0.928 (95% confidence interval:0.878-0.962), and 0.845 (95% confidence interval:0.786-0.918), respectively. In terms of the AUC value, SVM showed the best performance, whereas the specificity and accuracy were the highest for XGB. The confusion matrix for the model performance comparison and receiver operating characteristic (ROC) curves are shown in Table 2 and Figure 3.
3.3. Feature importance depending on machine learning methods
Figure 4 shows the respective values of feature importance in accordance with the machine learning models based on permutational importance. The variable with the highest importance was pT, as assessed by SVM, RF, and XGBoost (0.06, 0.12, and 0.13, respectively), whereas pTNM had the highest importance in LR (0.05). In the SVM, pT and sex had the highest values (0.06).
4. Discussion
In this study, we analyzed the factors associated with recurrence performed by four machine learning algorithms using 15-years database of consecutive rectal cancer patients who underwent curative surgery. Although SVM showed the best performance (AUC=0.831), other machine learning methods also had comparable AUC values of more than 0.8. In SVM, RF, and XGBoost, pT was the top-ranked feature of importance, whereas pTNM showed the highest feature importance in LR. Their characteristics were similar in terms of pathologic tumor stage. It is strongly suggested that pathologic tumor stage is the most influential predictor of rectal cancer recurrence after curative resection. Tumor stage is a well-known and established prognostic factor for most malignant diseases [19]. Especially in locally advanced rectal cancer, oncologists try to decrease the tumor stage through CCRT because tumor response with complete response or down-staging provides better oncologic outcomes [20]. In this regard, there are several studies to enhance the efficacy of CCRT with additional preoperative methods [21-22]. Our findings confirm again that tumor stage is a strongly important factor in the recurrence of rectal cancer.
In all machine learning methods except LR, the first- and second-highest feature importance were pT and sex. According to AJCC 8th edition, T3 is defined as ‘tumor invades through the muscularis propria into pericolorectal tissues,’ and T4a is defined as ‘tumor penetrates to the surface of the visceral peritoneum’ [12]. Because the lower rectum has no visceral peritoneum, T3 tumors can involve the mesorectal fascia. Therefore, the T stage is a more influential factor in rectal cancer than in colon cancer, which may be reflected in our results. Male sex was another high-ranked risk factor in this study. Previous studies have reported that male sex is a significant predictor for recurrence in colorectal cancer [23-25]. According to Demb et al., male sex had significantly higher odds ratio relative to the female sex for colorectal cancer recurrence, and the odds ratio was higher for rectal cancer (OR=2.84) compared to the distal colon cancer (OR=1.84) [25]. This implies that clear surgical resection is more challenging in male patients with rectal cancer because the pelvic cavity in men is narrower and deeper.
All machine learning models performed reliably, with no statistically significant differences in performance (p=0.274). The SVM demonstrated the highest AUC performance, whereas the RF may be a better choice when considering sensitivity and specificity. RF achieved the second-best performance with an AUC of 0.826, and the difference between sensitivity and specificity was smaller compared to SVM. SVM exhibited relatively large discrepancies in sensitivity (0.692) and specificity (0.814), indicating the potential presence of bias in training compared to RF. However, owing to the limited size of the test data, it is not possible to definitively conclude that the SVM is more biased.
This study had several limitations. First, this was a single-center retrospective study, and selection bias could not be excluded. Secondly, the analysis was performed using only a limited number of factors. There were no other clinically significant factors, such as smoking status, tumor regression grade after CCRT, mesorectal fascia involvement, or various molecular biomarker statuses (ras or microsatellite instability). We attempted to analyze as many factors as possible; however, there were many factors with more than 20% missing data. Factors with large proportions of missing data were excluded to improve the quality of the database. Consequently, no data were missing in our study. Third, there was an imbalance in the data ratios between the recurrence and non-recurrence groups. We employed the SMOTETomek technique to address this imbalance; however, it has limitations in fully resolving the underlying problem. The amount of data available for testing in the recurrence group was insufficient for adequate validation. Further research involving cross-validation is required to address these issues. Future studies should focus on collecting additional data from recurrence groups, and the generalization of the model should be addressed through the collection and validation of multicenter data. Finally, we did not distinguish between the p and yp stages (i.e., pathologic findings following preoperative systemic chemotherapy or radiation prior to surgery as a primary treatment) in the pathologic tumor stage. Because the tumor stage could decrease after CCRT, the p-stage could be underestimated in patients treated with CCRT. Although there are some limitations, our study has the strength of comparing risk factors for recurrence, focusing on rectal cancer, using various machine learning methods.
5. Conclusions
In this study, we analyzed and compared the importance of risk factors for rectal cancer recurrence using four different machine learning methods. We found that various machine learning methods increased the predictive validity of rectal cancer recurrence. The SVM showed the best AUC value. The most influential factor was pT for all machine learning methods, except for LR. Clinicians should be more alert if patients have a high pT stage during postoperative follow-up. Furthermore, it is necessary to enhance tumor response to reduce risk of tumor recurrence in rectal cancer.
Data Availability
All relevant data are within the manuscript and its Supporting Information files.
Funding
Not applicable
Conflicts of interest
The authors declare no conflicts of interest related to this article.
Author contributions
Conceptualization: Kwang-Gi Kim, Jeong-Heum Baek
Data curation: Youngbae Jeon, Kug-Hyun Nam, Tae-Sik Hwang
Formal analysis: Youngbae Jeon, Young-Jae Kim, Jisoo Jeon
Investigation: Youngbae Jeon, Young-Jae Kim, Jisoo Jeon
Supervision: Kwang-Gi Kim, Jeong-Heum Baek
Writing – original draft: Youngbae Jeon, Young-Jae Kim
Writing – review & editing: Youngbae Jeon, Young-Jae Kim
Acknowledgements
We would like to thank Editage (www.editage.co.kr) for the English language editing.
Footnotes
E-mail address and ORCID Youngbae Jeon: jybcolor{at}gilhospital.com, Young-Jae Kim: youngjae{at}gachon.ac.kr, Jisoo Jeon: jeon1923{at}naver.com, Kug-Hyun Nam: kbaboh{at}naver.com, Tae-Sik Hwang: ts329{at}naver.com