Abstract
Reports of “Long-COVID”, are rising but little is known about prevalence, risk factors, or whether it is possible to predict a protracted course early in the disease. We analysed data from 4182 incident cases of COVID-19 who logged their symptoms prospectively in the COVID Symptom Study app. 558 (13.3%) had symptoms lasting >28 days, 189 (4.5%) for >8 weeks and 95 (2.3%) for >12 weeks. Long-COVID was characterised by symptoms of fatigue, headache, dyspnoea and anosmia and was more likely with increasing age, BMI and female sex. Experiencing more than five symptoms during the first week of illness was associated with Long-COVID, OR=3.53 [2.76;4.50]. A simple model to distinguish between short and long-COVID at 7 days, which gained a ROC-AUC of 76%, was replicated in an independent sample of 2472 antibody positive individuals. This model could be used to identify individuals for clinical trials to reduce long-term symptoms and target education and rehabilitation services.
COVID-19 can manifest a wide severity spectrum from asymptomatic to fatal forms 1. A further source of heterogeneity is the duration of symptoms after the acute stage which could have considerable impact due to the huge scale of the pandemic. Hospitalised patients are well recognised to have lasting dyspnoea and fatigue in particular 2, yet such patients constitute the ‘tip of the iceberg’ of symptomatic SARS CoV2 disease 3. Few studies capture symptoms prospectively in the general population to ascertain with accuracy the duration of illness and the prevalence of long-lasting symptoms.
Here we report a prospective observational cohort study of COVID-19 symptoms in a subset of 4182 users of the COVID Symptom Study app meeting inclusion criteria (see online methods) 4,5. Briefly, the subset comprised individuals who had tested positive for SARS-CoV2 by PCR swab testing who logged as “feeling physically normal” before the start of illness (up to 14 days before testing) so that we could determine onset. We compare cases of long (reporting symptoms lasting more than 28 days, LC28) and short duration (reporting symptoms lasting less than 10 days, short-COVID). Our previous findings that clusters of symptoms predicted the need for acute care 6 led us to hypothesize that persistent symptomatology in COVID-19 (Long-COVID) is associated with a particular symptom pattern early in the disease which could be used to predict who might be affected. In particular, dyspnoea has been shown to be a significant predictor of long-term symptoms in an unselected population7.
Figure 1 shows the duration of symptoms reported in COVID+ cases (orange) over-laid on age, sex and BMI matched negative testing symptomatic controls (blue), depicting lines for the definitions of short-COVID, LC28 and LC56 (symptoms for more than 56 days) used in this study. The duration of COVID-19 symptoms followed an approximately log-normal distribution (sigma = 0.97, location =0.78, scale = 10.07), with an overall median symptom duration of 11 days (IQR[6;19]).
Of the 4182 COVID-19 swab positive users, 558 (13.3%) met the LC28 definition with a median duration of 41 days (IQR[33,63] of whom 189 (4.5%) met LC-56, and 95 (2.3%) LC94. In contrast 1591 (38.0%) had short disease duration (median 6, IQR[4-8]). The proportion with LC28 were comparable in all three separate countries (GB 13.3%, USA 16.1%, Sweden 12.1% p=0.35) and for LC56 (GB 4.7%, USA 5.5%, Sweden 2.5% p=0.07).
Table 1 summarises the descriptive characteristics of the study population overall and stratifying by symptom/disease duration. Age was significantly associated with Long-COVID (LC28) rising from 9.9% in 18-49 year olds to 21.9% in those aged >=70 (p < 0.0005), with a clear escalation in OR by age decile (Figure 1b), although females aged 50-60 had the highest odds. (ST2). Individuals with Long-COVID were more likely to have required hospital assessment in the acute period. LC28 disproportionately affected women (14.9%) compared to men (9.5%), although this sex effect was not significant in the older age-group. Long-COVID affected all socio-economic groups (assessed using Index of Multiple Deprivation), (Supplementary Figure 2). Asthma was the only/unique pre-existing condition providing significant association with long-COVID-19 (OR = 2.14 [1.55-2.96]).
Fatigue (97.7%) and headache (91.2%) were the most reported symptoms in those with Long-COVID, followed by anosmia and lower respiratory symptoms. Notably, while fatigue was reported continuously, other symptoms such as headache are reported intermittently (Figure 2, supplementary Table s1). To get better insight into the reported symptoms, we additionally analysed free text responses which were more common in Long-COVID cases (81%) than Short-COVID (45%). Cardiac symptoms (palpitations, tachycardia) were over-represented in the LC28 group (6.1%) compared to in short-COVID (0.5%) (p<0.0005) as were concentration or memory issues (4.1% vs 0.2%) (p<0.0005), tinnitus and earache (3.6% vs 0.2% p<0.0005) and peripheral neuropathy symptoms (pins and needles and numbness) (2% vs 0.5%) (p=0.004). Most of these symptoms were reported for the first time 3-4 weeks post symptom onset.
We examined whether there were different types of symptomatology within Long-COVID. We found two main patterns: those reporting exclusively fatigue, headache and upper respiratory complaints (shortness of breath, sore throat, persistent cough and loss of smell) and those with multi-system complaints including ongoing fever and gastroenterological symptoms (Supplementary figure 3). In the individuals with long duration (LC28), ongoing fever OR 2.16 [1.50 - 3.13] and skipped meals OR 2.52 [1.74; 3.65] were strong predictors of a subsequent hospital visit. Details of the frequency of symptoms persisting beyond 28 and 56 days after disease onset are provided in Supplementary table 3.
Individuals with long-COVID were more likely to report relapses (16.0%), compared to those not reporting long symptom duration (8.4%) (p<0.0005). In comparison, in the matched group of SARS-CoV2 negative tested individuals, a new bout of illness was reported in 11.5% of cases. Relapse in the context of long-COVID was longer than in the matched controls (median = 9 [5-18] vs 6 [4-10] days).
We explored how to predict risk of Long-COVID from data available early in the disease course. Individuals reporting more than 5 symptoms in the first week (the median number reported) were significantly more likely to go on to experience LC28, (OR=3.95 [3.10;5.04]). This strongest risk factor was predictive in both sexes and all age groups (supplementary Figures 4, a-e).
When analysed individually after adjusting for age and sex, every symptom in isolation was positively predictive of longer illness duration. The five symptoms experienced during the first week most predictive of Long-COVID were: fatigue OR=2.83 [2.09; 3.83], headache OR=2.62 [2.04;3.37], dyspnoea OR=2.36 [1.91;2.91], hoarse voice OR=2.33 [1.88 - 2.90] and myalgia OR=2.22 [1.80;2.73] (Figure 3). Similar patterns were observed in men and women. In adults aged over 70, loss of smell (which is less common) was the most predictive of long-COVID OR=7.35 [1.58 - 34.22] before fever OR=5.51 [1.75 - 17.36] and hoarse voice OR=4.03[1.21,13.42] (Supplementary figures 4). Plotting frequency of co-occurrence of symptoms in short-COVID versus long-COVID further illustrates the importance of early multi-symptom involvement (Figure 3c).
We further created Random Forest Prediction models using a combination of the first week’s symptom reporting, personal characteristics and comorbidities. Using all features, the average ROC AUC was 76.7% (SD=2.5) (Figure 3d) in the classification between short-COVID and LC28. The strongest predictor was age (29.2 %) followed by the number of symptoms during the first week (16.3%) and BMI (10.8%) while gender (3.7%) was ranked 6th shortly after hoarse voice (4.1) and shortness of breath (3.8). All individual symptoms, except abdominal pain and confusion, surpassed the comorbidity features. The ranking of feature importance was relatively similar across specific age group models. However, in the over 70s group it appeared that early features such as fever, loss of smell and comorbidities (especially heart and lung disease) were important, and thus could be considered ‘red flags’ in older adults (supplementary figure 6).
We simplified the prediction model to include only symptom number in the first week with age, and sex in a logistic regression model and we obtained ROC AUC of 76.7% (SD 2.5) (Figure 3d). When optimising the balance between false positives and false negatives, we obtained a specificity of 73.4% (SD 9.7) and a sensitivity of 68.7% (SD 9.9).
Key predictive findings of our analysis were validated in an independent dataset of 2472 individuals who reported testing antibody positive for SARS-CoV2 from 2 weeks after symptom onset where, again, the number of symptoms in the first week of illness was the strongest predictor of long-COVID, OR=5.12 [95% CI 3.65; 7.19]. The simple prediction model for Long-COVID, trained on the PCR positive cohort and including number of distinct symptoms experienced during the first week, age, and sex was similarly predictive of LC28, with a ROC AUC of 76.3% (SD=3.7%) (Figure 5 - b).
While this study provides important insights into the disease presentation, any generalisation should be considered carefully. Our study was limited by being confined to app users who were disproportionately female and under-represented those >70years which could increase or decrease our estimate of the extent of Long-COVID respectively. Applying a weighting following the UK population (see Supplementary Methods) the estimated proportion of people experiencing symptomatic COVID-19 going on to suffer long-COVID were similar: 14.5%, 5.1% and 2.2% for 4, 8- and 12-weeks duration respectively. These estimates may still be conservative: whilst estimates could be inflated due to PCR testing in the first wave being restricted to those more severely unwell, or if regular logging may have encouraged more symptoms to be noticed, Long-COVID may here be underestimated if individuals with prolonged symptoms were more likely to stop logging symptoms on the app. We had insufficient numbers to explore risk factors for disease over 2 months and were unable to analyse the impact of ethnicity due to incomplete data in this subset. In addition, while the list of symptoms on the app is necessarily non-exhaustive, the analysis of the free-text responses allowed us to highlight other symptoms present in long-COVID, such as cardiac and neurological manifestations starting generally a few weeks after the symptom onset. With emerging evidence of ongoing myocardial inflammation and change in neurological markers 8,9 associated with COVID-19, this calls for specific studies of cardiac and neurological longer-term sequelae of COVID-19.
At the population level, it is critical to quantify the burden of Long-COVID to better assess its impact on the healthcare system and appropriately distribute resources. In our study, prospective logging of a wide range of symptoms allowed us to conclude that the proportion of people with symptomatic COVID-19 who experience prolonged symptoms is considerable, and relatively stable across three countries with different cultures. Whether looking at a four-week or an eight-week threshold for defining long duration, those experiencing Long-COVID were consistently older, more female and were more likely to require hospital assessment than in the group reporting symptoms for a short period of time. The multi-system nature of the initial disease in Long-COVID was illustrated by the importance of the number of symptoms, and co-occurrence networks showing that those going on to experience long-COVID had greater number of concurrent symptoms, therefore supporting the need for holistic support 10. While asthma was not reported as a factor of risk for hospitalisation in 11, its association with Long-COVID warrants further investigation.
We found early disease features were predictive of duration. With only three features - number of symptoms in the first week, age and sex, a simple scoring derived from a logistic regression was able to accurately distinguish individuals with Long-COVID from those with short duration. Importantly, the model generalised well to the population reporting antibody testing. This important information could feature in highly needed targeted education material for both patients and healthcare providers. Moreover, the method could help determine at-risk groups and could be used to target early intervention trials of treatment (for example, of dexamethasone12 and remdesivir 13) and clinical service developments to support rehabilitation in primary and specialist care 14 to alleviate Long-COVID and facilitate timely recovery.
Data Availability
Data used in this study is available to bona fide researchers through UK Health Data Research using the following link
https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259
Ethics
In the UK, the App Ethics has been approved by KCL ethics Committee REMAS ID 18210, review reference LRS-19/20-18210 and all subscribers provided consent. In Sweden, ethics approval for the study was provided by the central ethics committee (DNR 2020-01803).
Funding
Zoe provided in kind support for all aspects of building, running and supporting the app and service to all users worldwide. Support for this study was provided by the NIHR-funded Biomedical Research Centre based at GSTT NHS Foundation Trust. This work was supported by the UK Research and Innovation London Medical Imaging & Artificial Intelligence Centre for Value Based Healthcare.
Investigators also received support from the Wellcome Trust, the MRC/BHF, Alzheimer’s Society, EU, NIHR, CDRF, and the NIHR-funded BioResource, Clinical Research Facility and BRC based at GSTT NHS Foundation Trust in partnership with KCL. ATC was supported in this work through a Stuart and Suzanne Steele MGH Research Scholar Award. CM is funded by the Chronic Disease Research Foundation and by the MRC AimHy project grant. LHN, DAD, ADJ, ADS, CG, WL are supported by the Massachusetts Consortium on Pathogen Readiness (MassCPR) and Mark and Lisa Schwartz. JM was partially supported by the European Commission Horizon 2020 program (H2020-MSCA-IF-2015-703787). The work performed on the Swedish study is supported by grants from the Swedish Research Council, Swedish Heart-Lung Foundation and the Swedish Foundation for Strategic Research (LUDC-IRC 15-0067).
Competing interests
Zoe Global Limited co-developed the app pro bono for non-commercial purposes. Investigators received support from the Wellcome Trust, the MRC/BHF, EU, NIHR, CDRF, and the NIHR-funded BioResource, Clinical Research Facility and BRC based at GSTT NHS Foundation Trust in partnership with KCL. RD, JW, JCP, AM and SG work for Zoe Global Limited and TDS and PWF are consultants to Zoe Global Limited. LHN, DAD,JM, PWF and ATC previously participated as investigators on a diet study unrelated to this work that was supported by Zoe Global Ltd.
Data and materials availability
Data used in this study is available to bona fide researchers through UK Health Data Research using the following link https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259
Online methods
Methods
Dataset
Data used in this study were acquired through the COVID 19 Symptom Study app, a mobile health application developed by Zoe Global Limited with input from physicians and scientists at King’s College London, Massachusetts General Hospital, Lund and Uppsala Universities15. The app, which collects data on personal characteristics and through prospective logging of symptoms, was launched in the UK, the US and Sweden between 24th March (UK) and 30th April (Sweden), and rapidly reached over 4 million users. This study focuses on 4182 users who reported testing positive to SARS-CoV2 by PCR swab test and had a disease onset between 25th March 2020 and 30th June 2020, for whom onset date matched with date of test and duration of symptoms could be estimated (Supplementary figure 1 presents a flowchart of study inclusion). We repeated analyses in an independent subgroup of 2472 app users who reported positive testing for antibodies against SARS-CoV2 more than 2 weeks after symptom onset, but without swab test results (Supplementary Figure 1).
To understand how the relapse rate compared to a comparable population not suffering from COVID-19, we selected an additional matched sample from all app users meeting study inclusion criteria but testing negative by PCR swab test, choosing for each COVID+ case the individual from the negative group with the smallest Euclidean distance based on sex, age and BMI 16.
Definitions
Onset of disease was defined as the first day of reporting at least one symptom lasting more than one day.
Disease end was defined as the last day of unhealthy reporting before reporting healthy for more than one week or the last day of reporting with less than 5 symptoms before ceasing using the app. For the participants included who ceased using the app with a cumulative number of symptoms of less than 5, disease end was considered as the last log.
Relapse was defined as 2 or more days of symptoms within a 7-day window after one week of healthy logging, given initial symptoms close to a positive swab test.
Long-COVID was defined as symptoms persisting for a period of more than 4 weeks (28 days LC28), more than 8 weeks (56 days, LC56) or more than 12 weeks (LC94) between symptom onset and end, while short duration was defined as the interval between symptom onset and end of less than 10 days, without a subsequent relapse (Short-COVID).
Inclusion/Exclusion criteria
To be included in the subsequent analysis, users of the COVID Symptoms Study app were selected based on the following criteria:
Inclusion criteria: Age >=18 yrs; reporting a positive SARS-CoV-2 swab test (PCR) confirming the diagnosis of COVID-19; disease onset between 14 days before and 7 days after the test date, and before the 30th June 2020 (to limit right censoring).
Exclusion criteria: individuals who started app reporting when already unwell; users reporting exclusively healthy throughout the study period; users with gaps of more than 7 days after an unhealthy report and not reporting any hospital visit (to account for gaps due to hospitalisation). In addition, individuals reporting for less than 28 days but reporting more than 5 symptoms at their last log were excluded, as duration could not be ascertained.
Statistical testing and modelling
Data collected prospectively until 02 September were included to allow sufficient time to ascertain duration. Univariate and multivariate logistic regression was used to assess symptoms associated with short- and long-COVID respectively, adjusting for sex and age, using Statsmodels v0.11.1 Python3.7. Separate models were fitted to subgroups stratified by sex and age (18-49; 50-69; >70). For analysis of relapse, results were compared for the long-COVID group to the whole sample using Mann Whitney U test.
We used a K-mode clustering analysis to investigate whether there was evidence of different sub-types of long-COVID, using the kmode package v0.10.2. Number of ideal symptom clusters was obtained via a silhouette analysis with dice distance metrics. Differences between long-COVID and short-COVID were visualised using a co-occurrence network (networkx for visualisation), applying a 10% threshold to remove rare edges to aid visualisation.
Finally, to create a predictive model for long-COVID, we used sklearn v0.22.2.post1 package, training random forest classifiers with stratified repeated cross-validation (10 times, 5 folds) with hyperparameter grid search including, as features, information available during the first week of illness, reported comorbidities (asthma, lung disease, heart disease, kidney disease and diabetes) and personal characteristics (BMI, age, sex). In addition to a global consideration of the studied sample population, separate models stratified by age were also entrained using a similar cross-validation setting (hyperparameter search and stratified sampling). After running the cross-validation for each model structure (50 times), the feature importance was averaged across the different repeated folds. A final simplistic model using the key personal characteristics and number of first week symptoms was further tested.
Using only 3 features, a logistic regression model was then assessed using the same stratification and cross-validation.
To assess performance on the test dataset (antibody positive), cross-validation was also performed to obtain an indication of the variability in performance using models that were trained on the whole PCR positive sample.
For the reduced logistic regression model, the score was given by the following formula:
Matching with negative sample
The negative cases selected for matching followed the same inclusion rules and were matched to the positive samples using the minimum Euclidean distance between the vectors of features created by age, BMI and sex. Sex feature was multiplied by 100 to ensure balance between feature strength.
Rebalancing to UK population demographics
Lastly, the rebalancing with respect to the UK population was performed by reweighting the age/sex proportions of LC28 in the studied sample by the one of the UK population based on census data from 2018. The weighting per age group is described in the table below