ABSTRACT
Objectives Several risk factors have been identified for severe clinical outcomes of COVID-19 caused by SARS-CoV-2. Some can be found in structured data of patients’ Electronic Health Records. Others are included as unstructured free-text, and thus cannot be easily detected automatically. We propose an automated real-time detection of risk factors using a combination of data mining and Natural Language Processing (NLP).
Material and methods Patients were categorized as negative or positive for SARS-CoV-2, and according to disease severity (severe or non-severe COVID-19). Comorbidities were identified in the unstructured free-text using NLP. Further risk factors were taken from the structured data.
Results 6250 patients were analysed (5664 negative and 586 positive; 461 non-severe and 125 severe). Using NLP, comorbidities, i.e. cardiovascular and pulmonary conditions, diabetes, dementia and cancer, were automatically detected (error rate ≤2%). Old age, male sex, higher BMI, arterial hypertension, chronic heart failure, coronary heart disease, COPD, diabetes, insulin only treatment of diabetic patients, reduced kidney and liver function were risk factors for severe COVID-19. Interestingly, the proportion of diabetic patients using metformin but not insulin was significantly higher in the non-severe COVID-19 cohort (p<0.05).
Discussion and conclusion Our findings were in line with previously reported risk factors for severe COVID-19. NLP in combination with other data mining approaches appears to be a suitable tool for the automated real-time detection of risk factors, which can be a time saving support for risk assessment and triage, especially in patients with long medical histories and multiple comorbidities.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
None
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study was approved by the Cantonal Ethics Committee of Bern (Project-ID 2020-00973). Participants either agreed to a general research consent or, for participants with no registered general research consent status (neither agreement nor rejection), a waiver of consent was granted by the ethics committee.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The datasets used and/or analyses during the current study are available from the corresponding author on reasonable request.