Machine Learning Prediction Models for Chronic Kidney Disease Using National Health Insurance Claim Data in Taiwan

Surya Krishnamurthy; Kapeleshh Ks; Erik Dovgan; Mitja Luštrek; Barbara Gradišek Piletič; Kathiravan Srinivasan; Yu-Chuan Jack Li; Anton Gradišek; Shabbir Syed-Abdul

doi:10.3390/healthcare9050546

Machine Learning Prediction Models for Chronic Kidney Disease Using National Health Insurance Claim Data in Taiwan

Healthcare (Basel). 2021 May 7;9(5):546. doi: 10.3390/healthcare9050546.

Authors

Surya Krishnamurthy¹, Kapeleshh Ks², Erik Dovgan³, Mitja Luštrek³, Barbara Gradišek Piletič⁴, Kathiravan Srinivasan¹, Yu-Chuan Jack Li⁵, Anton Gradišek³, Shabbir Syed-Abdul⁵

Affiliations

¹ School of Information Technology and Engineering, Vellore Institute of Technology (VIT), Vellore 632014, India.
² Department of Biotechnology, Indian Institute of Technology Madras, Chennai 600036, India.
³ Department of Intelligent Systems, Jozef Stefan Institute, Jamova Cesta 39, 1000 Ljubljana, Slovenia.
⁴ Novo Mesto General Hospital, Šmihelska Cesta 1, 8000 Novo Mesto, Slovenia.
⁵ Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 110, Taiwan.

Abstract

Chronic kidney disease (CKD) represents a heavy burden on the healthcare system because of the increasing number of patients, high risk of progression to end-stage renal disease, and poor prognosis of morbidity and mortality. The aim of this study is to develop a machine-learning model that uses the comorbidity and medication data obtained from Taiwan's National Health Insurance Research Database to forecast the occurrence of CKD within the next 6 or 12 months before its onset, and hence its prevalence in the population. A total of 18,000 people with CKD and 72,000 people without CKD diagnosis were selected using propensity score matching. Their demographic, medication and comorbidity data from their respective two-year observation period were used to build a predictive model. Among the approaches investigated, the Convolutional Neural Networks (CNN) model performed best with a test set AUROC of 0.957 and 0.954 for the 6-month and 12-month predictions, respectively. The most prominent predictors in the tree-based models were identified, including diabetes mellitus, age, gout, and medications such as sulfonamides and angiotensins. The model proposed in this study could be a useful tool for policymakers in predicting the trends of CKD in the population. The models can allow close monitoring of people at risk, early detection of CKD, better allocation of resources, and patient-centric management.

Keywords: chronic kidney disease; deep learning; electronic health records; machine learning.

Abstract

Grants and funding