Abstract
Objective To describe COVID-19 subphenotypes regarding severity patterns including prognostic, ICU and morbimortality outcomes, through stratification based on gender and age groups, as described by inter-patient variability patterns in clinical phenotypes and demographic features.
Materials and methods We used the COVID-19 open data from the Mexican Government including patient-level epidemiological and clinical data from 778 692 SARS-CoV-2 patients from January 13, 2020 to September 30, 2020.
Inter-patient variability was analyzed by combining dimensionality reduction and hierarchical clustering methods. We produced cluster analyses for all combinations of gender and age groups (<18, 18-49, 50-64, and >64). For each group, the optimum number of clusters was selected combining a quantitative approach using the Silhouette coefficient, and a qualitative approach through a subgroup expert inspection via visual analytics. Using the features of the resultant age-gender clusters, we performed a meta-clustering analysis to provide an overall description of the population.
Results We observed a total of 56 age-gender clusters, grouped in 11 clinically distinguishable meta-clusters with different outcomes. Meta-clusters 1 to 3 showed the highest recovery rates (90.27-95.22%). These clusters include: (1) healthy patients of all ages, (2) children with comorbidities who had priority in medical resources, (3) young patients with obesity and smoking habit. Meta-clusters 4 and 5 showed moderate recovery rates (81.3-82.81%): (4) patients with hypertension or diabetes of all ages, (5) typical obese patients with three highly correlated conditions, namely, pneumonia, hypertension and diabetes. Meta-clusters 6 to 11 had very low recovery rates (53.96-66.94%) which include: (6) immunosuppressed patients with the highest comorbidity rate in many diseases, (7) CKD patients with the worse survival length and recovery, (8) elderly smoker with mild COPD, (9) severe diabetic elderly with hypertension, (10, 11) oldest obese smokers with severe COPD and mild cardiovascular disease with the latter (11) showing a relatively higher age and smoke rate, severe COPD and shorter survival length, reinforcing a high correlation between smoking habit and COPD among elderly. Additionally, the source Mexican state and type of clinical institution proved to be an important factor for heterogeneity in severity.
Discussion The proposed unsupervised learning approach successfully uncovered discriminative COVID-19 severity patterns for both genders and all age groups from clinical phenotypes and demographic features. A careful read of group outcomes showed consistent results regarding recent literature. Regarding the Mexican population, our results suggest that habits and comorbidities may play a key role in predicting mortality in older patients. Centenarians tended to fall in the groups with better outcomes repeatedly. Additionally, immunosuppression was not found as a relevant factor for severity alone but did when present along with chronic kidney disease. Further useful correlations could be found by evaluating the duration of unhealthy habits, demographic features, comorbidities, the time since diagnosis, recovery progress, readmission record, and the effect of source variability.
Conclusion The resultant eleven meta-clusters provide bases to comprehend the classification of patients with COVID-19 based on comorbidities, habits, demographic characteristics, geographic data and type of clinical institutions, as well as revealing the correlations between the above characteristics thereby help to anticipate the possible clinical outcomes for every specifically characterized patient. These subphenotypes can establish target groups for automated stratification or triage systems to provide personalized therapies or treatments.
Code available at: https://github.com/bdslab-upv/covid19-metaclustering
Dynamic results visualization at: http://covid19sdetool.upv.es/?tab=mexicoGov
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported by Universitat Politècnica de València contract no. UPV-SUB.2-1302 and FONDO SUPERA COVID-19 by CRUE-Santander Bank grant: Severity Subgroup Discovery and Classification on COVID-19 Real World Data through Machine Learning and Data Quality assessment (SUBCOVERWD-19).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Using Open Data from the Government of Mexico, terms available at: https://datos.gob.mx/libreusomx
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The studied sample is available in our GitHub repository.
Abbreviations
- COPD
- Chronic Obstructive Pulmonary Disease
- CKD
- Chronic Kidney Disease
- INMUSUPR
- Immunosuppression
- ICU
- Intensive Care Unit
- EHR
- Electronic Health Record
- RR
- Recovery Rate
- MC
- Meta-Cluster
- DIF
- National System for Integral Family Development
- IMSS
- Mexican Institute of Social Security
- ISSSTE
- Institute for Social Security and Services for State Workers
- PEMEX
- Mexican Petroleum Institution
- SEDENA
- Secretariat of the National Defense
- SEMAR
- Secretariat of the Navy
- SSA
- Secretariat of Health