Abstract
Clinical notes provide a comprehensive and overall impression of the patient’s health. However, the automatic extraction of information within these notes is challenging due to their narrative style. In this context, our goal was to identify clusters of patients based on fourteen comorbidities related to obesity, automatically extracted with the cTAKES tool from the i2b2 Obesity Challenge data. Furthermore, results were compared with clusters obtained from experts’ annotated data. The sparse K-means algorithms were used in both experiment at two levels: at the first level, three clusters were found, and at the second, new clusters were found by applying the same algorithm to each of the clusters from the former level. The results show that three types of clusters could be identified based on the number of comorbidities and the percentage of patients suffering from them. Diabetes, hypercholesterolemia, atherosclerotic cardiovascular diseases, congestive heart failure, obstructive sleep apnea, and depression were the diseases with the highest weights contributing to the cluster distribution.
Similar content being viewed by others
References
Bukhanov, N., Balakhontceva, M., Krikunov, A., Sabirov, A., Semakova, A., Zvartau, N., and Konradi, A., Clustering of comorbidities based on conditional probabilities of diseases in hypertensive patients. Proc. Comput. Sci. 108:2478–2487, 2017. https://doi.org/10.1016/j.procs.2017.05.073.
Shivade, C., Raghavan, P., Fosler-Lussier, E., Embi, P. J., Elhadad, N., Johnson, S. B., and Lai, A. M., A review of approaches to identifying patient phenotype cohorts using electronic health records. JAMIA 21(2):221–230, 2014. https://doi.org/10.1136/amiajnl-2013-001935.
National Library of Medicine (US), UMLS® Reference Manual, 2009. http://www.ncbi.nlm.nih.gov/books/NBK9676/. Accessed 20 Mar 2018.
National Library of Medicine (US), Overview of SNOMED CT, 2016. https://www.nlm.nih.gov/healthit/snomedct/snomed_overview.html. Accessed 20 Mar 2018.
Chen, C.-Z., Wang, L.-Y., Ou, C.-Y., Lee, C.-H., Lin, C.-C., and Hsiue, T.-R., Using cluster analysis to identify phenotypes and validation of mortality in men with COPD. Lung 192(6):889–896, 2014. https://doi.org/10.1007/s00408-014-9646-x.
Bourdin, A., Molinari, N., Vachier, I., Varrin, M., Marin, G., Gamez, A.-S., Paganin, F., and Chanez, P., Prognostic value of cluster analysis of severe asthma phenotypes. J. Allerg. Clin. Immunol. 134(5):1043–1050, 2014. https://doi.org/10.1016/j.jaci.2014.04.038.
Rocha, A., and Rocha, B., Adopting nursing health record standards. Inform. Health Soc. Care 39(1):1–14, 2014. https://doi.org/10.3109/17538157.2013.827200.
van der Esch, M., Knoop, J., van der Leeden, M., Roorda, L. D., Lems, W. F., Knol, D. L., and Dekker, J., Clinical phenotypes in patients with knee osteoarthritis: A study in the Amsterdam osteoarthritis cohort. Osteoarthr. Cartil. 23(4):544–549, 2015. https://doi.org/10.1016/j.joca.2015.01.006.
Vavougios, G. D., Natsios, G., Pastaka, C., Zarogiannis, S. G., and Gourgoulianis, K. I., Phenotypes of comorbidity in OSAS patients: Combining categorical principal component analysis with cluster analysis. J. Sleep Res. 25(1):31–38, 2016. https://doi.org/10.1111/jsr.12344.
Joosten, S. A., Hamza, K., Sands, S., Turton, A., Berger, P., and Hamilton, G., Phenotypes of patients with mild to moderate obstructive sleep apnoea as confirmed by cluster analysis. Respirology 17(1):99–107, 2012. https://doi.org/10.1111/j.1440-1843.2011.02037.x.
Figueroa, R. L., and Flores, C. A., Extracting information from electronic medical records to identify the obesity status of a patient based on comorbidities and bodyweight measures. J. Med. Syst. 40(8):1–9, 2016.
Serrano-Pariente, J., Rodrigo, G., Fiz, J. A., Crespo, A., Plaza, V., and High Risk Asthma Res G, Identification and characterization of near-fatal asthma phenotypes by cluster analysis. Allergy 70(9):1139–1147, 2015. https://doi.org/10.1111/all.12654.
Ahmad, T., Pencina, M. J., Schulte, P. J., O'Brien, E., Whellan, D. J., Pina, I. L., Kitzman, D. W., Lee, K. L., O'Connor, C. M., and Felker, G. M., Clinical implications of chronic heart failure phenotypes defined by cluster analysis. J. Am. Coll. Cardiol. 64(17):1765–1774, 2014. https://doi.org/10.1016/j.jacc.2014.07.979.
Poirier, P., Giles, T. D., Bray, G. A., Hong, Y., Stern, J. S., Pi-Sunyer, F. X., and Eckel, R. H., Obesity and cardiovascular disease: Pathophysiology, evaluation, and effect of weight loss. Arterioscler. Thromb. Vasc. Biol. 26(5):968–976, 2006. https://doi.org/10.1161/01.ATV.0000216787.85457.f3.
Guh, D. P., Zhang, W., Bansback, N., Amarsi, Z., Birmingham, C. L., and Anis, A. H., The incidence of co-morbidities related to obesity and overweight: A systematic review and meta-analysis. BMC Pub. Health 9:1–20, 2009. https://doi.org/10.1186/1471-2458-9-88.
Foster, M. C., Hwang, S. J., Larson, M. G., Lichtman, J. H., Parikh, N. I., Vasan, R. S., Levy, D., and Fox, C. S., Overweight, obesity, and the development of stage 3 CKD: The Framingham heart study. Am. J. Kidney Dis. : Off. J. Natl. Kidney Found 52(1):39–48, 2008. https://doi.org/10.1053/j.ajkd.2008.03.003.
Sutherland, E. R., Goleva, E., King, T. S., Lehman, E., Stevens, A. D., Jackson, L. P., Stream, A. R., Fahy, J. V., Leung, D. Y. M., and Asthma Clin Res, N., Cluster analysis of obesity and Asthma phenotypes. Plos One 7(5):1–7, 2012. https://doi.org/10.1371/journal.pone.0036631.
Laing, S. T., Smulevitz, B., Vatcheva, K. P., Rahbar, M. H., Reininger, B., McPherson, D. D., McCormick, J. B., and Fisher-Hoch, S. P., Subclinical atherosclerosis and obesity phenotypes among Mexican Americans. J. Am. Heart Assoc. 4(3):e001540, 2015. https://doi.org/10.1161/jaha.114.001540.
LaGrotte, C., Fernandez-Mendoza, J., Calhoun, S. L., Liao, D., Bixler, E. O., and Vgontzas, A. N.., The relative association of obstructive sleep apnea, obesity, and excessive daytime sleepiness with incident depression: A longitudinal, population-based study. Int. J. Obes.:1–8, 2016. doi:https://doi.org/10.1038/ijo.2016.87.
Uzuner, Ö., Recognizing obesity and comorbidities in sparse data. JAMIA 16(4):561–570, 2009.
Reategui, R., and Ratte, S., Comparison of MetaMap and cTAKES for entity extraction in clinical notes. BMC Med. Inform. Dec. Mak. 18(Suppl 3):74, 2018. https://doi.org/10.1186/s12911-018-0654-2.
Witten, D. M., and Tibshirani, R., A framework for feature selection in clustering. J. Am. Stat. Assoc. 105(490):713–726, 2010. https://doi.org/10.1198/jasa.2010.tm09415.
Tibshirani, R., Walther, G., and Hastie, T., Estimating the number of clusters in a data set via the gap statistic. J. Roy. Stat. Soc. B 63:411–423, 2001. https://doi.org/10.1111/1467-9868.00293.
Bruce, S. G., Riediger, N. D., Zacharias, J. M., and Young, T. K., Obesity and obesity-related comorbidities in a Canadian first nation population. Prevent. Chron. Dis. 8(1):A03, 2011.
Willett, W. C., Dietz, W. H., and Colditz, G. A., Guidelines for healthy weight. N. Engl. J. Med. 341(6):427–434, 1999. https://doi.org/10.1056/NEJM199908053410607.
Leslie, W. S., Hankey, C. R., and Lean, M. E. J., Weight gain as an adverse effect of some commonly prescribed drugs: A systematic review. Qjm-Int J. Med. 100(7):395–404, 2007. https://doi.org/10.1093/qjmed/hcm044.
Peppard, P. E., Young, T., Barnet, J. H., Palta, M., Hagen, E., and Hla, K. M., Increased prevalence of sleep-disordered breathing in adults. Am. J. Epidemiol. 177(9):1006–1014, 2013. https://doi.org/10.1093/aje/kws342.
Wolf, J., Lewicka, J., and Narkiewicz, K., Obstructive sleep apnea: An update on mechanisms and cardiovascular consequences. Nutr. Metab. Cardiovas. 17(3):233–240, 2007. https://doi.org/10.1016/j.numecd.2006.12.005.
Canto, J. G., Kiefe, C. I., Rogers, W. J., Peterson, E. D., Frederick, P. D., French, W. J., Gibson, C. M., Pollack, C. V., Ornato, J. P., Zalenski, R. J., Penney, J., Tiefenbrunn, A. J., Greenland, P., and Investigators, N., Number of coronary heart disease risk factors and mortality in patients with first myocardial infarction. Jama J. Am. Med. Assoc. 306(19):2120–2127, 2011. https://doi.org/10.1001/jama.2011.1654.
Mamudu, H. M., Paul, T. K., Wang, L., Veeranki, S. P., Panchal, H. B., Alamian, A., Sarnosky, K., and Budoff, M., The effects of multiple coronary artery disease risk factors on subclinical atherosclerosis in a rural population in the United States. Prevent. Med. 88:140–146, 2016. https://doi.org/10.1016/j.ypmed.2016.04.003.
Kramer, C. K., Zinman, B., and Retnakaran, R., Are metabolically healthy overweight and obesity benign conditions?: A systematic review and meta-analysis. Ann. Intern. Med. 159(11):758–769, 2013. https://doi.org/10.7326/0003-4819-159-11-201312030-00008.
Dixon, J. B., Dixon, M. E., and O'Brien, P. E., Depression in association with severe obesity - Changes with weight loss. Arch. Intern. Med. 163(17):2058–2065, 2003. https://doi.org/10.1001/archinte.163.17.2058.
Roberts, R. E., Deleger, S., Strawbridge, W. J., and Kaplan, G. A., Prospective association between obesity and depression: Evidence from the Alameda County study. Int. J. Obes. 27(4):514–521, 2003. https://doi.org/10.1038/sj.ijo.08022204.
Luppino, F. S., de Wit, L. M., Bouvy, P. F., Stijnen, T., Cuijpers, P., Penninx, B. W., and Zitman, F. G., Overweight, obesity, and depression: A systematic review and meta-analysis of longitudinal studies. Arch. Gen. Psychiat. 67(3):220–229, 2010. https://doi.org/10.1001/archgenpsychiatry.2010.2.
Gao, Y. H., Zhao, H. S., Zhang, F. R., Gao, Y., Shen, P., Chen, R. C., and Zhang, G. J., The relationship between depression and Asthma: A meta-analysis of prospective studies. Plos One 10(7):1–12, 2015. https://doi.org/10.1371/journal.pone.0132424.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of Interest
The authors declare they have no conflict of interest.
Ethical Approval
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection on Transactional Processing Systems
Rights and permissions
About this article
Cite this article
Reátegui, R., Ratté, S., Bautista-Valarezo, E. et al. Cluster Analysis of Obesity Disease Based on Comorbidities Extracted from Clinical Notes. J Med Syst 43, 52 (2019). https://doi.org/10.1007/s10916-019-1172-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-019-1172-1