Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Evaluating biomedical feature fusion on machine learning’s predictability and interpretability of COVID-19 severity types

Haleigh West-Page, Kevin McGoff, Harrison Latimer, Isaac Olufadewa, Shi Chen
doi: https://doi.org/10.1101/2024.04.04.24305295
Haleigh West-Page
1Department of Mathematics and Statistics, University of North Carolina at Charlotte, 9201 University City Blvd Charlotte, NC 28223, USA
B.S.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: hwest10{at}charlotte.edu
Kevin McGoff
1Department of Mathematics and Statistics, University of North Carolina at Charlotte, 9201 University City Blvd Charlotte, NC 28223, USA
Ph.D.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Harrison Latimer
1Department of Mathematics and Statistics, University of North Carolina at Charlotte, 9201 University City Blvd Charlotte, NC 28223, USA
B.S.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Isaac Olufadewa
2Department of Epidemiology and Community Health, University of North Carolina at Charlotte, 9201 University City Blvd Charlotte, NC 28223, USA
M.B.B.S.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shi Chen
2Department of Epidemiology and Community Health, University of North Carolina at Charlotte, 9201 University City Blvd Charlotte, NC 28223, USA
Ph.D.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background Accurately differentiating severe from non-severe COVID-19 clinical types is critical for the healthcare system to optimize workflow. Current techniques lack the ability to accurately predict COVID-19 patients’ clinical type, especially as SARS-CoV-2 continues to mutate. We explore predictability and interpretability of multiple state-of-the-art machine learning (ML) techniques trained and tested under different biomedical data types and COVID-19 variants.

Methods Comprehensive patient-level data were collected from 362 patients (214 severe, 148 non-severe) with the original SARS-CoV-2 variant in 2020 and 1000 patients (500 severe, 500 non-severe) with the Omicron variant in 2022-2023. The data included 26 biochemical features from blood testing and 26 clinical features from patients’ clinical characteristics and medical history. Different ML techniques including penalized logistic regression (LR), random forest (RF), k-nearest neighbors (kNN), and support vector machines (SVM) were applied to build predictive models based on each data modality separately and together for each variant. Fifty randomized train-test-splits were conducted per scenario and performance results were recorded.

Findings The fused (hybrid) characteristic modality yielded the highest mean area under the curve (AUC) achieving 0·915, while the biochemical modality alone and the clinical modality alone had AUCs of 0·862 and 0·818 respectively. All ML models performed similarly under different testing scenarios and were robust when cross-tested with original and Omicron variant patient data. Our models ranked elevated d-dimer (biochemical), elevated high sensitivity troponin I (biochemical), and age greater than 55 years (clinical) as the most predictive features of severe COVID-19.

Interpretation ML is a powerful tool for predicting severe COVID-19 based on comprehensive individual patient-level data. Further, ML models trained on the biochemical and clinical modalities together witness enhanced predictive power. The improved performance of these ML models when trained and cross-tested with Omicron variant data supports the robustness of ML as a tool for clinical decision support.

Funding U.S. Centers for Disease Control and Prevention (CDC)

Evidence before this study We searched the PubMed database for publications investigating the use of machine learning (ML) in predicting severe COVID-19 types using patient-level data. We found studies published from the beginning of the COVID-19 pandemic in 2020 up to February 2023 using keywords such as “severe COVID-19”, “SARS-CoV-2”, “multimodal”, “machine learning”, “prediction”, and “data-driven.” The resulting studies were overall limited in scope, as they focused on single data modalities or uninterpretable models. Nearly all studies found only used patient data obtained from the outbreak of COVID-19 and lacked data from the later variants, such as Omicron. These limitations prevent identification of the data modalities and ML techniques most suitable for predicting severe types, as well as the generalizability of these models to multiple variants.

Added value of this study We built end-to-end machine learning pipelines with a variety of ML techniques, data modalities (biochemical, clinical, and fusion), and SARS-CoV-2 variants (original and Omicron) to compare the predictive power of each model type. Our study shows these models have strong predictive power severe COVID-19 when trained on multiple modalities and robustness across different variants of the virus, with two models achieving an AUC > 0·90. We compared feature rankings of models trained with the different variants and found overall agreement that the following features are highly predictive of severe COVID-19: elevated coagulation markers (d-dimer), indicators for heart damage (hsCRP, hsTNI), and age >55 years.

Implications of all the available evidence These findings result from a thorough analysis of the effect of data type, ML technique, and SARS-CoV-2 variant on the power to predict severe COVID-19. To our knowledge, no other work has provided analysis of the effect of these characteristics, particularly the SARS-CoV-2 variants, on the performance of ML models. This model yields a powerful framework for healthcare providers seeking clinical decision support tools for not only COVID-19, but many other viral respiratory illnesses. Our work demonstrates a need for further testing with larger datasets to confirm the benefits of biomedical feature fusion.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

The project described was supported by cooperative agreement (U01CK000677) from CDC. Its contents are solely the responsibility of the authors and does not necessarily represent the official views of CDC.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

An institutional review board (IRB) of Wuhan Union Hospital, Tongji College of Medicine, Huazhong University of Science and Technology gave ethical approval of this work. (IRB approval #IEC-J-345)

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • Introduction and Methods sections were heavily revised to capsulize background information and design choices. Results were condensed to eliminate repetitiveness. Table 1 was removed and converted to text in Methods sections. Figures 1 & 2 replaced to clarify model workflow. Tables 2 & 3 replaced by single table, again eliminating repetitiveness. Figure 3 updated to show results of Random Forest models instead of Logistic Regression models. More interpretation and critiques of the work were added to the Discussion. Added a few more references and unified reference formatting to Vancouver style.

Data Availability

All data produced are available online at

https://github.com/hnwestpage/Fusion-ML-COVID-19

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted March 26, 2025.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Evaluating biomedical feature fusion on machine learning’s predictability and interpretability of COVID-19 severity types
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Evaluating biomedical feature fusion on machine learning’s predictability and interpretability of COVID-19 severity types
Haleigh West-Page, Kevin McGoff, Harrison Latimer, Isaac Olufadewa, Shi Chen
medRxiv 2024.04.04.24305295; doi: https://doi.org/10.1101/2024.04.04.24305295
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Evaluating biomedical feature fusion on machine learning’s predictability and interpretability of COVID-19 severity types
Haleigh West-Page, Kevin McGoff, Harrison Latimer, Isaac Olufadewa, Shi Chen
medRxiv 2024.04.04.24305295; doi: https://doi.org/10.1101/2024.04.04.24305295

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Infectious Diseases (except HIV/AIDS)
Subject Areas
All Articles
  • Addiction Medicine (576)
  • Allergy and Immunology (868)
  • Anesthesia (306)
  • Cardiovascular Medicine (4482)
  • Dentistry and Oral Medicine (449)
  • Dermatology (385)
  • Emergency Medicine (615)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1528)
  • Epidemiology (15278)
  • Forensic Medicine (31)
  • Gastroenterology (1133)
  • Genetic and Genomic Medicine (6645)
  • Geriatric Medicine (671)
  • Health Economics (1006)
  • Health Informatics (4605)
  • Health Policy (1378)
  • Health Systems and Quality Improvement (1623)
  • Hematology (544)
  • HIV/AIDS (1276)
  • Infectious Diseases (except HIV/AIDS) (15961)
  • Intensive Care and Critical Care Medicine (1111)
  • Medical Education (626)
  • Medical Ethics (147)
  • Nephrology (674)
  • Neurology (6695)
  • Nursing (346)
  • Nutrition (1006)
  • Obstetrics and Gynecology (1153)
  • Occupational and Environmental Health (961)
  • Oncology (3369)
  • Ophthalmology (988)
  • Orthopedics (370)
  • Otolaryngology (421)
  • Pain Medicine (437)
  • Palliative Medicine (131)
  • Pathology (669)
  • Pediatrics (1704)
  • Pharmacology and Therapeutics (700)
  • Primary Care Research (717)
  • Psychiatry and Clinical Psychology (5495)
  • Public and Global Health (9285)
  • Radiology and Imaging (2223)
  • Rehabilitation Medicine and Physical Therapy (1375)
  • Respiratory Medicine (1201)
  • Rheumatology (598)
  • Sexual and Reproductive Health (721)
  • Sports Medicine (535)
  • Surgery (722)
  • Toxicology (100)
  • Transplantation (290)
  • Urology (267)