Abstract
Importance There have been over 759 million confirmed cases of COVID-19 worldwide. A significant portion of these infections will lead to long COVID and its attendant morbidities and costs.
Objective To empirically derive a long COVID case definition consisting of significantly increased signs, symptoms, and diagnoses to support clinical, public health, research, and policy initiatives related to the pandemic.
Design Case-Crossover Population-based study.
Setting Veterans Affairs (VA) medical centers across the United States between January 1, 2020 and August 18, 2022.
Participants 367,148 individuals with positive COVID-19 tests and preexisting ICD-10-CM codes recorded in the VA electronic health record were enrolled.
Trigger SARS-CoV-2 infection documented by positive laboratory test.
Case Window One to seven months following positive COVID testing.
Main Outcomes and Measures We defined signs, symptoms, and diagnoses as being associated with long COVID if they had a novel case frequency of >= 1:1000 and they were significantly increased in our entire cohort after a positive COVID test when compared to case frequencies before COVID testing. We present odds ratios with confidence intervals for long COVID signs, symptoms, and diagnoses, organized by ICD-10-CM functional groups and medical specialty. We used our definition to assess long COVID risk based upon a patient’s demographics, Elixhauser score, vaccination status, and COVID disease severity.
Results We developed a long COVID definition consisting of 323 ICD-10-CM diagnosis codes grouped into 143 ICD-10-CM functional groups that were significantly increased in our 367,148 patient post-COVID population. We define seventeen medical-specialty long COVID subtypes such as cardiology long COVID. COVID-19 positive patients developed signs, symptoms, or diagnoses included in our long COVID definition at a proportion of at least 59.7% (based on all COVID positive patients). Patients with more severe cases of COVID-19 and multiple comorbidities were more likely to develop long COVID.
Conclusions and Relevance An actionable, empirical definition for long COVID can help clinicians screen for and diagnose long COVID, allowing identified patients to be admitted into appropriate monitoring and treatment programs. An actionable long COVID definition can also support public health, research and policy initiatives. COVID patients with low oxygen saturation levels or multiple co-morbidities should be preferentially watched for the development of long COVID.
Long COVID – Current Knowledge Base and Need
Numerous symptoms are cited as long-term sequela of COVID-19. “The symptoms may affect a number of organ systems, occur in diverse patterns, and frequently get worse after physical or mental activity.”1 A 2020 study found that the most common long-term symptoms were fatigue, dyspnea, joint pain, and chest pain.2 Others reported gastrointestinal tract disorders correlated with gut microbiome shifts after COVID-19 infection.3,4 Cognitive dysfunction, often referred to as brain fog, is a commonly reported long-term symptom. 5 Cognitive dysfunction is particularly concerning given evidence that COVID-19 can alter brain structure.6 The most common self-reported symptoms documented via a smartphone app were fatigue, headache, dyspnea, and anosmia.7,8
Concerningly high long COVID frequencies have been reported. A cohort study from the Netherlands found that approximately one in eight COVID-19 patients developed long-term somatic symptoms.9 Another study showed that approximately thirty percent of their cohort reported persistent symptoms, with many experiencing worse health-related quality of life (HRQoL) compared with baseline and negative impacts on at least one activity of daily living.10
Long COVID’s impacts extend beyond individual morbidity to include healthcare system and economic consequences. Cutler et. al. noted long COVID resulting in reduced workforce participation (e.g., 44% out of the work force), direct earning losses and worker shortages in service jobs. 11 A recent analysis of New York State disability claims trends described in the New York Times found that “71 percent of claimants with long COVID needed continuing medical treatment or were unable to work for six months or more” and opined that long COVID has exacerbated the current US labor shortage.12,13
The widespread occurrence of lingering ailments and their impacts on individuals and society make clear the need for a long COVID definition. U.S. public health officials note that we must balance our need for an accurate long COVID definition that includes all afflicted individuals with against our need for interim long COVID definitions to expedite immediate action and mobilization.14 In particular, a working definition of long COVID based on routinely collected coded data could support the identification of at risk or undiagnosed patients for monitoring, referral, or therapeutic interventions. In the current study we empirically derive an actionable broad-based long COVID definition to support current clinical, public health, research and policy initiatives related to the pandemic.
Methods
Overview
We enrolled Veterans who had laboratory confirmed positive COVID 19 tests. We examined Veterans’ electronic health records for novel International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) codes between one and seven months after a positive COVID-19 test. We grouped codes with a novel frequency of 1/1000 or greater by diagnosis type creating ICD-10-CM functional groups and performed Chi-square testing with Bonferroni correction to compare diagnosis frequencies before and after a positive COVID test. We defined ICD-10-CM functional groups that significantly increased in frequency as “upregulated” (see Figure 1). We then manually aggregated upregulated ICD-10-CM functional groups into medical specialties to organize our empiric definition of long COVID.
Population Definition and Data Extraction
We enrolled patients with laboratory confirmed positive COVID-19 studies and followed them for thirteen months (Six months before the COVID 19 test result and seven months after COVID) to create a long COVID definition. We utilized the electronic health records (EHR) of all patients that tested positive for COVID-19 at Veterans Affairs (VA) medical facilities nationally between January 1, 2020 and August 18, 2022. 2,377,720 patients were tested for COVID-19 during this time period.
We applied structured query language (SQL) queries to VA Informatics and Computing Infrastructure (VINCI), Corporate Data Warehouse (CDW) data tables15 to generate two diagnosis files for analysis. The first file (“before”) contains a row of retrospectively collected information for each patient and each ICD-10-CM diagnosis assigned to them in the six-month control window prior to their COVID-19 test. The row includes the ICD-10-CM code and its description, a unique patient identifier, the COVID-19 test date, and the calculated number of months between ICD-10-CM code entry and COVID-19 testing. This “before” file contained 14,980,288 observations across 426,970 patients.
We followed the patients for seven months. The second file (“after”) was created seven months after the last patient was enrolled. The after file contained ICD-10-CM codes assigned during the seven months following COVID-19 testing and similar related information as the “before” file. This “after” file contained 15,493,587 observations across 389,677 patients.
We limited analysis to the 367,148 patients that appeared in both the “before” and “after” files to ensure that we had a diagnostic history for each patient and eliminated acute findings by removing all ICD-10-CM codes documented less than a month after the positive COVID test (Figure 1). We used the date of the first positive COVID test for patients with multiple positive tests. Multiple repeating ICD-10-CM codes for a single patient were counted once. We wrote R and python programs to remove all data concerning patients who tested negative for COVID-19, ICD-10-CM codes that were documented less than a month after the positive COVID-19 test, and patients who were not present in both the “before” and “after” files. The methodology used to generate the patient cohort is depicted in Figure 2.
Data Analysis
We chose six month “before” and “after” COVID test control and case windows to allow patients to serve as their own controls. The six month “after” case window began one month after the positive COVID test. We defined “novel” ICD-10-CM codes as those that appeared in a patient’s “after” file but not in the “before” file. We calculated the frequency of each novel ICD-10-CM code as the percentage of the study cohort assigned the code. We excluded novel codes with a frequency of < 1:1000 from further analysis. We defined codes as upregulated when the frequency of that code in the after file was significantly increased as compared to the frequency in the before file, if it had a Chi-Square with a Bonferroni corrected p-value<0.00006. All resulting codes were grouped for additional analysis and organization (see ICD-10-CM Functional and Medical Specialty Groupings subsection of Methods).
We also defined ICD-10-CM functional groups as “upregulated” if they were statistically more frequent post COVID by Chi Square analysis with Bonferroni correction with a p<0.00006. We limited the long COVID ICD-10-CM code functional groups to those that were significantly increased in frequency. We used “before” and “after” frequencies to calculate odds ratios and confidence intervals for each novel ICD-10-CM functional group. We calculated odds ratios from frequency data.
We analyzed potential risk factors including vaccination status and COVID-19 severity for long COVID by creating 2×2 tables and applying Pearson Chi-Square testing. We applied a similar approach to the analysis of demographic factors. We used R version 4.1.2 and R-Studio to perform the statistical analysis.
ICD-10-CM Functional and Medical Specialty Groupings
We grouped ICD10 CM codes in three steps. We first combined ICD-10-CM codes that had the same initial three characters. We then grouped ICD-10-CM codes with different initial characters if the diagnoses were functionally similar to create our ICD-10-CM functional groups. For example, we grouped I47.1 (supraventricular tachycardia), I47.2 (ventricular tachycardia), and R00.0 (tachycardia, unspecified) together as tachycardia. Finally, we manually curated each of these ICD-10-CM functional groups into medical specialties for organizational purposes.
Long COVID Definition
We included in our long COVID definition each ICD-10-CM code with an incidence over six months (T0 + 1M – T0+7M)> 1:1000 and a significant overall frequency increase. Long COVID patients were defined as having any of the 323 upregulated ICD-10-CM codes between two and seven months after their positive COVID-19 diagnosis, but not in their pre-COVID diagnoses.
Risk Factors for Long COVID
The multivariate regression models were done for each risk factor one including Age, Gender, Race, Ethnicity, 2-year Elixhauser score, a second with Age, Gender, Race, Ethnicity, 2-year Elixhauser score, O2Sat <94%, and a third with Age, Gender, Race, Ethnicity, 2-year Elixhauser score, COVID Vaccination status. We present the univariate rates as well as the results of the regression analysis using R4.1.2 and R Studio.
Results
Long COVID Definition
We extracted ICD-10-CM diagnosis codes assigned to 367,148 patients who underwent a positive COVID test at VA. Patients with one or more novel COVID related diagnoses numbered 268,320. The remaining 98,828 patients had no novel long COVID ICD-10-CM diagnoses in their post-COVID period when compared to their pre-COVID period. Table 1 contains the demographic characteristics of the study cohort. Males were significantly older than the females on average, 60.29 (95% CI: 95% CI: 60.24 – 60.35) versus 47.85 (95% CI: 47.73 – 47.97), respectively.
We developed a definition of long COVID consisting of 323 ICD-10-CM diagnosis codes grouped into 143 ICD-10-CM functional groups that were significantly increased in our 367,148 patient post-COVID population. We define seventeen medical specialty long COVID subtypes including cardiology long COVID, neurology long COVID, and pulmonary long COVID. Table 2 shows the ICD-10-CM functional groups and medical specialties. Within each field, the ICD-10-CM code groups are sorted in descending order by their Odds Ratios.
Figures 3 and 4 show the signs, symptoms, and diagnoses with significantly increased relative risks in the post-COVID period with their respective confidence intervals sorted by medical specialty.
Case counts were greatest for the specialties of Cardiology (196,6320, Neurology (159,358), Ophthalmology (149,817) and Pulmonary (138,470). The lowest case counts were for Oncology (7,256), Rheumatology (10,543) and Dermatology (13,233).
COVID-19 test positive patients were assigned novel signs, symptoms, or diagnoses included in our definition of long COVID at a rate of between 59.7% (percentage based on COVID positive patients tested at the VA) and 76.6% (percentage based on all COVID positive patients with diagnostic history and follow up diagnoses one to seven months after test).
Most long COVID patients were documented with at least one ICD-10-CM code found in our long COVID definition within three months of their positive COVID-19 test. The percentage of patients documented with their first long COVID ICD-10-CM code decreases with each subsequent month (See Figure 6).
Risk Factors for Long COVID
We presented in Table 1 a comparison of demographic characteristics and Elixhauser comorbidity scores of long COVID patients and non-long COVID patients. The long COVID cohort was older with more comorbidities. The long COVID cohort also had higher percentages of White and Black individuals and non-Hispanic and non-Latino ethnicities. Patients with a 2-year Elixhauser score of greater than 21 had a much higher proportion developing long COVID (p < 0.001, Pearson Chi-Square) (See Table 3).
Our data did not indicate that vaccination was protective against the development of long COVID. However, vaccination resulted in significantly lower rates of novel ARDS in the post-covid period (13.2% CI: 10.4%-16.9%) as compared with the unvaccinated population (19.6% CI: 18.1% - 21.2), p<0.001.
Patients with minimum O2 saturations constituting severe COVID and severe COVID with severe desaturation were significantly more likely to develop long COVID (both had p-values<0.001, Pearson Chi-Square) (See Table 4).
The multivariate regression models all confirmed that COVID patients during the Omicron variant predominant period were at slightly higher risk of developing Long COVID at p-value < 0.001.
Discussion
Numerous reports document specialty-specific signs, symptoms and diagnoses correlated with long COVID. We present a novel analysis based on a large national data set and the full multispecialty breadth of ICD-10-CM diagnosis codes to create an overall holistic long COVID definition that confirms and extends previous reports.
We allowed patients to be their own controls and used the entire cohort before and after COVID-19 infection to determine the relative risk of signs, symptoms, and disorders. This ensured that the signal was both novel and upregulated. We found COVID-19 positive patients developed signs, symptoms, or diagnoses included in our long COVID definition at a proportion of between 59.7% (percentage based on COVID positive patients tested at the VA) and 76.6% (percentage based on all COVID positive patients with diagnostic history and follow up diagnoses one to seven months after test). More than three-fourths of long COVID patients met our long COVID definition within four months of their positive COVID-19 test.
We found long COVID frequency differences based on race and ethnicity. These differences may be related to socioeconomic status, which is directly correlated with the presence of comorbidities.18–20 The long COVID cohort was eight years older with more comorbidities (two-year Elixhauser score 7.97 in the long Covid patients vs 4.21 in the non-long Covid patients). In our cohort, the males were significantly older than the females on average, 60.29 (95% CI: 95% CI: 60.24 – 60.35) versus 47.85 (95% CI: 47.73 – 47.97), respectively. We found that long COVID frequency was increased in female patients, the more severely ill, and patients who had a more severe bout of COVID as judged by their minimum oxygen saturation.
We found 143 upregulated diagnostic groups, with odds ratios as high as 23. We also found seventeen upregulated medical specialty groupings containing between three and twenty-one signs, symptoms, or diagnoses. This provides strong evidence for a broad definition of long COVID.
Carfi et al. found that most common long-term symptoms were fatigue, dyspnea, joint pain, and chest pain.2 Each except joint pain is represented in our long COVID definition. However, joint pain may be related to findings in our definition such as difficulty walking and an overall decrease in mobility. COVID-19 is known to cause lung abnormalities, especially in cases with pneumonia.21 We found that the likelihood of developing pneumonia after COVID-19 infection is significantly upregulated, potentially interconnected with the numerous findings in our Pulmonary long COVID definition. Autopsy evaluation of COVID-19 victims’ lung tissue demonstrated diffuse alveolar damage with perivascular T-cell infiltration and severe endothelial injury. 22 Long COVID patients have been found to have abnormal 129Xe MRI gas exchange and CT vascular density measurements, which we postulate could be related to the pulmonary fibrosis (J84.10) or emphysema (J43.9) diagnoses identified in our definition. 23
Our definition shows that the long-term effects of COVID-19 are associated with damage to numerous body systems including the kidneys, heart, eyes, and nervous system. Our results are corroborated by other studies. Cognitive dysfunction (brain fog) is often associated with long COVID and can be difficult to diagnose and treat.5 COVID-19 infection is far more likely to cause cardiac complications than vaccination.24 The gastrointestinal codes we observed reflect previous literature25 and may relate to reported alterations to the gastrointestinal tract after COVID-19.3,4 Finally, previous studies have noted that COVID can alter ocular physiology, supporting our ophthalmology related findings.26
Patients with more severe cases of COVID 19, as manifest by low oxygen saturations, should be watched carefully for the development of long COVID as they were significantly more likely to develop long COVID. Sicker patients with higher 2-year Elixhauser scores were significantly more likely to develop long COVID. Patients with multiple comorbidities should be made aware of this risk and participate in active surveillance for the development of signs and symptoms of long COVID.
The American Medical Association (AMA) notes there are three categories of long COVID patients: Those who do not recover completely and have ongoing symptoms; those with symptoms related to chronic hospitalization; and those who develop new symptoms after recovery.27 In our study, we did not differentiate by these subtypes and instead leave that to future research. It is possible that some of these signs and symptoms may have occurred during the first month and may be the persistent subtype. It is possible that some of the upregulated codes may be found with other serious illnesses, though only 9.1% of our cohort had severe COVID based on oxygen saturation <94%. We are not able to distinguish conditions that represent acceleration of pre-existing disease from those that represent de novo COVID-related conditions. For example, is the increased incidence of Non-ST elevation (NSTEMI) myocardial infarction (I21.4) related to the general stress of acute illness impacting preexisting coronary artery disease or to an underlying de novo long COVID related condition? Better understanding will require additional research. In any event, whether causal or associative, de novo disease or exacerbation of chronic disease, new or persistent clinical problems require assessment, treatment, and monitoring.
Limitations include that the cohort study population is 84% male, reflective of the overall VA patient population which is between 87% and 95% male (depending upon data source and whether gender has been self-reported).28,29 Additionally, the male Veteran population who use the VA healthcare system is older than the population of female Veterans who use VA. Our study did not include home testing for Covid-19 that went unreported to the VA healthcare system. Patients who tested positive during the omicron dominant time period were slightly more likely to develop Long COVID when compared to the earlier strains (76% vs 72%, p<0.001). The reality of emerging viral variants emphasizes the need for a well-defined and well-maintained definition of long COVID over time and with variant-specific derivation. The study was not powered to show independence of the individual risk factors for long COVID.
In this case-crossover study, we hope that our empirically defined long COVID definition will lead to more consistent identification of long COVID and its medical specialty subtypes and support of a variety of COVID related initiatives. Our definition is actionable as individuals who have multiple co-morbidities and more severe bouts of COVID should be followed more closely for the development of long covid signs or symptoms. Our definition can also inform screening questions for high-risk patients. For example, helping clinicians identify patients with enhanced long COVID risk who may benefit from monitoring programs or patients with previously undiagnosed long COVID for whom it may be appropriate to create a referral to a long COVID clinic. We also anticipate that our long COVID definition may support through standardization of future subspecialty specific long COVID research.
Future research should look at health outcomes for each long COVID medical specialty subtypes to identify those at greatest risk of developing severe morbidity. Predictive analytics should be employed to help refer these individuals earlier to monitoring and treatment programs.
As of March 5th, 2023, there have been 759 million confirmed cases of COVID-19 worldwide.30 Case counts are ever increasing. As Dr. Levine notes, immediately useful long COVID definitions are needed as are ultimately more fully inclusive definitions. 14 We offer our long COVID definition as a public health contribution to our pandemic response.
Data Availability
The data can be made available to VA researchers with IRB approval.
Acknowledgements
This work has been supported in part by grants from NIH NLM T15LM012495, R25LM014213, NIAAA R21AA026954, R33AA0226954 and NCATS UL1TR001412. This study was funded in part by the Department of Veterans Affairs.
Footnotes
Peter L. Elkin, MD, MACP, FACMI, FNYAM, FAMIA, FIAHSI, SUNY Distinguished Service Professor, UB Distinguished Professor and Chair, Department of Biomedical Informatics, Professor of Internal Medicine, Professor of Surgery, Professor of Pathology and Anatomical Sciences, Professor of Psychiatry, Professor of Orthopedics President, Faculty Council, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, http://medicine.buffalo.edu/departments/biomedical-informatics.html, 77 Goodell Street, Buffalo, NY 14203, 507 358-1341