Skip to main content
Log in

Predictive models for bariatric surgery risks with imbalanced medical datasets

  • Original Research
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Bariatric surgery (BAR) has become a popular treatment for type 2 diabetes mellitus which is among the most critical obesity-related comorbidities. Patients who have bariatric surgery, are exposed to complications after surgery. Furthermore, the mid- to long-term complications after bariatric surgery can be deadly and increase the complexity of managing safety of these operations and healthcare costs. Current studies on BAR complications have mainly used risk scoring for identifying patients who are more likely to have complications after surgery. Though, these studies do not take into consideration the imbalanced nature of the data where the size of the class of interest (patients who have complications after surgery) is relatively small. We propose the use of imbalanced classification techniques to tackle the imbalanced bariatric surgery data: synthetic minority oversampling technique (SMOTE), random undersampling, and ensemble learning classification methods including Random Forest, Bagging, and AdaBoost. Moreover, we improve classification performance through using Chi-squared, Information Gain, and Correlation-based feature selection techniques. We study the Premier Healthcare Database with focus on the most-frequent complications including Diabetes, Angina, Heart Failure, and Stroke. Our results show that the ensemble learning-based classification techniques using any feature selection method mentioned above are the best approach for handling the imbalanced nature of the bariatric surgical outcome data. In our evaluation, we find a slight preference toward using SMOTE method compared to the random undersampling method. These results demonstrate the potential of machine-learning tools as clinical decision support in identifying risks/outcomes associated with bariatric surgery and their effectiveness in reducing the surgery complications as well as improving patient care.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Alexe, S., Blackstone, E., Hammer, P. L., Ishwaran, H., Lauer, M. S., & Snader, C. E. P. (2003). Coronary risk prediction by logical analysis of data. Annals of Operations Research, 119(1–4), 15–42.

    Article  Google Scholar 

  • Almdal, T., Scharling, H., Jensen, J. S., & Vestergaard, H. (2004). The independent effect of type 2 diabetes mellitus on ischemic heart disease, stroke, and death: A population-based study of 13,000 men and women with 20 years of follow-up. Archives of Internal Medicine, 164(13), 1422–1426.

    Article  Google Scholar 

  • American Diabetes Association. (2006). Diagnosis and classification of diabetes mellitus. Diabetes Care, 29(Supplement 1), S43–S48.

  • American Diabetes Association. (2015). Classification and diagnosis of diabetes. Diabetes Care, 38(Supplement 1), S8–S16.

  • Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6(1), 20–29.

    Article  Google Scholar 

  • Blanco, R., Larrañaga, P., Inza, I., & Sierra, B. (2004). Gene selection for cancer classification using wrapper approaches. International Journal of Pattern Recognition and Artificial Intelligence, 18(08), 1373–1390.

    Article  Google Scholar 

  • Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.

    Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  Google Scholar 

  • Brolin, R. (1996). Gastrointestinal surgery for severe obesity. Nutrition, 12(6), 403–404.

    Article  Google Scholar 

  • Buchwald, H. (2005). Bariatric surgery for morbid obesity: Health implications for patients, health professionals, and third-party payers. Journal of the American College of Surgeons, 200(4), 593–604.

    Article  Google Scholar 

  • Buchwald, H., Avidor, Y., Braunwald, E., Jensen, M. D., Pories, W., Fahrbach, K., et al. (2004). Bariatric surgery: A systematic review and meta-analysis. JAMA, 292(14), 1724–1737.

    Article  Google Scholar 

  • Buchwald, H., Estok, R., Fahrbach, K., Banel, D., Jensen, M. D., Pories, W. J., et al. (2009). Weight and type 2 diabetes after bariatric surgery: Systematic review and meta-analysis. The American Journal of Medicine, 122(3), 248–256.

    Article  Google Scholar 

  • Cawley, J., & Meyerhoefer, C. (2012). The medical care costs of obesity: An instrumental variables approach. Journal of Health Economics, 31(1), 219–230.

    Article  Google Scholar 

  • Centers for Disease Control and Prevention. (2011). National diabetes fact sheet: National estimates and general information on diabetes and prediabetes in the United States, 2011. Atlanta, GA: US department of health and human services, centers for disease control and prevention, 201(1).

  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.

    Article  Google Scholar 

  • Daousi, C., Casson, I., Gill, G., MacFarlane, I., Wilding, J., & Pinkney, J. (2006). Prevalence of obesity in type 2 diabetes in secondary care: Association with cardiovascular risk factors. Postgraduate Medical Journal, 82(966), 280–284.

    Article  Google Scholar 

  • Deeba, F., Mohammed, S. K., Bui, F. M., & Wahid, K. A. (2016). An empirical study on the effect of imbalanced data on bleeding detection in endoscopic video. In 2016 IEEE 38th annual international conference of the engineering in medicine and biology society (EMBC) (pp. 2598–2601). IEEE.

  • DeMaria, E. J., Portenier, D., & Wolfe, L. (2007). Obesity surgery mortality risk score: Proposal for a clinically useful score to predict mortality risk in patients undergoing gastric bypass. Surgery for Obesity and Related Diseases, 3(2), 134–140.

    Article  Google Scholar 

  • Fan, Y. J., & Chaovalitwongse, W. A. (2010). Optimizing feature selection to improve medical diagnosis. Annals of Operations Research, 174(1), 169–183.

    Article  Google Scholar 

  • Freund, Y., & Schapire, R. E. (1995). A desicion-theoretic generalization of on-line learning and an application to boosting. In European conference on computational learning theory (pp. 23–37). Springer.

  • Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning. Springer series in statistics (Vol. 1). Berlin: Springer.

    Google Scholar 

  • Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), 463–484.

    Article  Google Scholar 

  • Galar, M., Fernández, A., Barrenechea, E., & Herrera, F. (2013). EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognition, 46(12), 3460–3471.

    Article  Google Scholar 

  • Grundy, S., Barondess, J., Bellegie, N., Fromm, H., Greenway, F., Halsted, C., et al. (1991). Gastrointestinal surgery for severe obesity. Annals of Internal Medicine, 115(12), 956–961.

    Article  Google Scholar 

  • Gu, Q., Zhu, L., & Cai, Z. (2009). Evaluation measures of the classification performance of imbalanced data sets. In International symposium on intelligence computation and applications (pp. 461–471). Springer.

  • Hall, M. A. (1999). Correlation-based feature selection for machine learning. Ph.D. thesis, The University of Waikato.

  • Hall, M. A. (2000). Correlation-based feature selection of discrete and numeric class machine learning. University of Waikato, Department of Computer Science.

  • Inza, I., Larrañaga, P., Blanco, R., & Cerrolaza, A. J. (2004). Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial Intelligence in Medicine, 31(2), 91–103.

    Article  Google Scholar 

  • Inza, I., Larrañaga, P., Etxeberria, R., & Sierra, B. (2000). Feature subset selection by bayesian network-based optimization. Artificial Intelligence, 123(1–2), 157–184.

    Article  Google Scholar 

  • Jirapech-Umpai, T., & Aitken, S. (2005). Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes. BMC Bioinformatics, 6(1), 148.

    Article  Google Scholar 

  • Johnson, B. L., Blackhurst, D. W., Latham, B. B., Cull, D. L., Bour, E. S., Oliver, T. L., et al. (2013). Bariatric surgery is associated with a reduction in major macrovascular and microvascular complications in moderately to severely obese patients with type 2 diabetes mellitus. Journal of the American College of Surgeons, 216(4), 545–556.

    Article  Google Scholar 

  • Johnson, R. J., Johnson, B. L., Blackhurst, D. W., Bour, E. S., Cobb, W. S., Carbonell, A. M., et al. (2012). Bariatric surgery is associated with a reduced risk of mortality in morbidly obese patients with a history of major cardiovascular events. The American Surgeon, 78(6), 685–692.

    Google Scholar 

  • Kannel, W. B., & McGee, D. L. (1979). Diabetes and cardiovascular disease: The Framingham study. JAMA, 241(19), 2035–2038.

    Article  Google Scholar 

  • Karegowda, A. G., Manjunath, A., & Jayaram, M. (2010). Comparative study of attribute selection using gain ratio and correlation based feature selection. International Journal of Information Technology and Knowledge Management, 2(2), 271–277.

    Google Scholar 

  • Khalilia, M., Chakraborty, S., & Popescu, M. (2011). Predicting disease risks from highly imbalanced data using random forest. BMC Medical Informatics and Decision Making, 11(1), 51.

    Article  Google Scholar 

  • King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9(2), 137–163.

    Article  Google Scholar 

  • Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3), 18–22.

    Google Scholar 

  • Li, L., Weinberg, C. R., Darden, T. A., & Pedersen, L. G. (2001). Gene selection for sample classification based on gene expression data: Study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics, 17(12), 1131–1142.

    Article  Google Scholar 

  • López, V., Fernández, A., García, S., Palade, V., & Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250, 113–141.

    Article  Google Scholar 

  • Ogden, C. L., Carroll, M. D., Fryar, C. D., & Flegal, K. M. (2015). Prevalence of obesity among adults and youth: United States, 2011–2014. NCHS Data Brief, 219(219), 1–8.

    Google Scholar 

  • Ooi, C., & Tan, P. (2003). Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics, 19(1), 37–44.

    Article  Google Scholar 

  • Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21–45.

    Article  Google Scholar 

  • Pories, W. J. (2008). Bariatric surgery: Risks and rewards. The Journal of Clinical Endocrinology and Metabolism, 93(11 Supplement 1), s89–s96.

    Article  Google Scholar 

  • Quinlan, J. R. (2014). C4.5: Programs for machine learning. Amsterdam: Elsevier.

    Google Scholar 

  • Razzaghi, T., Safro, I. (2015). Scalable multilevel support vector machines. In ICCS (pp. 2683–2687).

  • Razzaghi, T., Roderick, O., Safro, I., & Marko, N. (2016). Multilevel weighted support vector machine for classification on healthcare data with missing values. PLoS ONE, 11(5), e0155,119.

    Article  Google Scholar 

  • Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33(1), 1–39.

    Article  Google Scholar 

  • Roumani, Y. F., May, J. H., Strum, D. P., & Vargas, L. G. (2013). Classifying highly imbalanced ICU data. Health Care Management Science, 16(2), 119–128.

    Article  Google Scholar 

  • Roumani, Y. F., Roumani, Y., Nwankpa, J. K., & Tanniru, M. (2018). Classifying readmissions to a cardiac intensive care unit. Annals of Operations Research, 263(1–2), 429–451.

    Article  Google Scholar 

  • Saeys, Y., Inza, I., & Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19), 2507–2517.

    Article  Google Scholar 

  • Sarker, A., & Gonzalez, G. (2015). Portable automatic text classification for adverse drug reaction detection via multi-corpus training. Journal of Biomedical Informatics, 53, 196–207.

    Article  Google Scholar 

  • Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.

    Google Scholar 

  • Şeref, O., Razzaghi, T., & Xanthopoulos, P. (2017). Weighted relaxed support vector machines. Annals of Operations Research, 249(1–2), 235–271.

    Article  Google Scholar 

  • Stamler, J., Vaccaro, O., Neaton, J. D., & Wentworth, D. (1993). Diabetes, other risk factors, and 12-yr cardiovascular mortality for men screened in the multiple risk factor intervention trial. Diabetes Care, 16(2), 434–444.

    Article  Google Scholar 

  • Taft, L., Evans, R. S., Shyu, C., Egger, M., Chawla, N., Mitchell, J., et al. (2009). Countering imbalanced datasets to improve adverse drug event predictive models in labor and delivery. Journal of Biomedical Informatics, 42(2), 356–364.

    Article  Google Scholar 

  • Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data mining: Practical machine learning tools and techniques. Burlington: Morgan Kaufmann.

    Google Scholar 

  • World Health Organization. (2016). Global report on diabetes. World Health Organization.

  • Xiong, M., Fang, X., & Zhao, J. (2001). Biomarker identification by feature wrappers. Genome Research, 11(11), 1878–1887.

    Article  Google Scholar 

  • Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. ICML, 97, 412–420.

    Google Scholar 

  • Zadrozny, B., Langford, J., & Abe, N. (2003). Cost-sensitive learning by cost-proportionate example weighting. In Third IEEE international conference on data mining, 2003. ICDM 2003 (pp. 435–442). IEEE.

  • Zheng, B., Zhang, J., Yoon, S. W., Lam, S. S., Khasawneh, M., & Poranki, S. (2015). Predictive modeling of hospital readmissions using metaheuristics and data mining. Expert Systems with Applications, 42(20), 7110–7120.

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded in part by Clemson University—Greenville Healthcare System postdoctoral program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Talayeh Razzaghi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 62 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Razzaghi, T., Safro, I., Ewing, J. et al. Predictive models for bariatric surgery risks with imbalanced medical datasets. Ann Oper Res 280, 1–18 (2019). https://doi.org/10.1007/s10479-019-03156-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-019-03156-8

Keywords

Navigation