Long-term patient-reported symptoms of COVID-19: an analysis of social media data ================================================================================= * Juan M. Banda * Gurdas Viguruji Singh, S.R. * Osaid H. Alser * Daniel Prieto-Alhambra ## Abstract As the COVID-19 virus continues to infect people across the globe, there is little understanding of the long term implications for recovered patients. There have been reports of persistent symptoms after confirmed infections on patients even after three months of initial recovery. While some of these patients have documented follow-ups on clinical records, or participate in longitudinal surveys, these datasets are usually not publicly available or standardized to perform longitudinal analyses on them. Therefore, there is a need to use additional data sources for continued follow-up and identification of latent symptoms that might be underreported in other places. In this work we present a preliminary characterization of post-COVID-19 symptoms using social media data from Twitter. We use a combination of natural language processing and clinician reviews to identify long term self-reported symptoms on a set of Twitter users. ## Introduction As countries, such as the USA or Spain go through the perceived second wave of acute cases of COVID-19, those who suffered the condition in the preceding months are reporting concerning long-term symptoms. Acute organ injury during primo-infection includes acute kidney injury in 1 in 5 patients1, myocardial injury in 20%-30%, and acute respiratory failure. Long-term sequelae are therefore expected to include chronic kidney disease, heart failure and chronic obstructive pulmonary disease. As healthcare systems remain overwhelmed with the management of acute cases, little attention has been given to long-term COVID-19 sequelae. ## Methods We mined and manually reviewed social media data from a selective set of Twitter feeds (#longcovid, #chroniccovid) to characterize patient reports of long-term COVID-19 symptoms. Using the largest publicly available COVID-19 Twitter chatter dataset2, we used precise hashtags (#longcovid and #chroniccovid) to select tweets relevant to discussions related to the post-COVID experiences of Twitter users. We looked at English language Twitter data from 2020-05-21 to 2020-07-10, allowing >60 days after the start of the pandemic. We discarded retweets that did not have user comments. Any tweets from accounts with unusually high tweeting activity (possible bots) or that only shared other tweets were removed. We then annotated tweets using the Social Media Mining Toolkit3, Spacy NER annotator and a dictionary created from the Observational Health Data Sciences and Informatics (OHDSI) vocabulary4, which allows the annotated terms to tie into clinical conditions and observations. We identified the final set of tweets with clinical concepts for review. Two clinicians (GS, OA) manually reviewed these tweets to identify patients with COVID-19 and their self-reported symptoms, and to attribute ICD-10 codes to them. A third clinician (DPA) reviewed all decisions and resolved disagreements. Number of symptoms per tweet and person, and frequency (%) of symptoms reported of the total were reported. ## Results A total of 7,781 unique tweets were identified from 4,607 users. After annotation, 2,603 tweets were manually reviewed, resulting in 150 eligible tweets from 107 users. A total of 192 reports including 34 distinct ICD-10 codes were identified (Figure 1). The 10 most commonly mentioned symptoms were: malaise and fatigue (62%), dyspnea (19%), tachycardia/palpitations (13%), chest pain (13%), insomnia/sleep disorders (10%), cough (9%), headache (7%), and joint pain, fever, and unspecified pain by 6% each (Table 1). View this table: [Table 1.](http://medrxiv.org/content/early/2020/08/01/2020.07.29.20164418/T1) Table 1. Detailed count all self-reported symptoms and the number of users presenting them. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/01/2020.07.29.20164418/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2020/08/01/2020.07.29.20164418/F1) Figure 1. Number of users self-reporting each different symptom on their social media stream. Less common symptoms included ear-nose-throat (tinnitus, anosmia, chronic sinusitis, parageusia, aphonia), neuro-psychological (amnesia, neuralgia/neuropathy, disautonomia, visual disturbance, cognitive impairment, and disorientation), myalgia, and skin pruritus/rash. An average of 1.79 codes were reported per person, and 1.28 codes per tweet. ## Discussion Our analysis of patient-reported long-term COVID-19 symptoms matches clinician-collected data recently reported by Carfi *et al*5. Fatigue and dyspnea were the two most common symptoms in both datasets, and chest pain and cough were also reported in the top 5. Other, less commonly reported symptoms include headache, myalgia and dysgeusia/parageusia. Patients reported additional symptoms, including palpitations and insomnia/sleep disorders potentially attributable to neuro-psychological, cardiac or respiratory sequelae. Skin pruritus and rash are increasingly recognised as COVID-19 manifestations6. Additional, less recognized long-term symptoms reported in our data include persistent fever, tinnitus, anosmia, and neuro-psychological distress. We have shown that researchers can leverage social media data, specifically Twitter, to conduct long-term post-COVID studies of patient-relevant and self-reported symptoms. By leveraging the #longcovid hashtag movement, we found similar reports to those obtained from clinical adjudication5, demonstrating the validity of our methodological framework for future research. Manual validation identified additional, not previously reported symptoms that warrant further research, given their importance to patients. ## Data Availability Data will be shared upon request following the Twitter developers terms of usage. * Received July 29, 2020. * Revision received July 29, 2020. * Accepted July 31, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. 1.Richardson S, Hirsch JS, Narasimhan M, et al. Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area. JAMA. 2020;323(20):2052–2059.. doi:10.1001/jama.2020.6775 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2020.6775&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32320003&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F01%2F2020.07.29.20164418.atom) 2. 2.Banda JM, Tekumalla R, Wang G, et al. A large-scale COVID-19 Twitter chatter dataset for open scientific research -- an international collaboration [submitted online April 7, 2020]. arXiv [csSI]. [http://arxiv.org/abs/2004.03688](http://arxiv.org/abs/2004.03688). 3. 3.Tekumalla R, Banda JM. Social Media Mining Toolkit (SMMT). Genomics Inform. 2020;18(2):e16. doi:10.5808/GI.2020.18.2.e16 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.5808/GI.2020.18.2.e16&link_type=DOI) 4. 4.Hripcsak G, Ryan PB, Duke JD, et al. Characterizing treatment pathways at scale using the OHDSI network. Proc Natl Acad Sci U S A. 2016;113(27):7329–7336. doi:10.1073/pnas.1510502113 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMToiMTEzLzI3LzczMjkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8wOC8wMS8yMDIwLjA3LjI5LjIwMTY0NDE4LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 5. 5.Carfì A, Bernabei R, Landi F, Gemelli Against COVID-19 Post-Acute Care Study Group. Persistent Symptoms in Patients After Acute COVID-19 [published online July 09, 2020]. JAMA. doi:10.1001/jama.2020.12603 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2020.12603&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32644129&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F01%2F2020.07.29.20164418.atom) 6. 6.Matar S, Oulès B, Sohier P, et al. Cutaneous manifestations in SARS-CoV-2 infection (COVID-19): a French experience and a systematic review of the literature [published online June 26, 2020]. J Eur Acad Dermatol Venereol. doi:10.1111/jdv.16775 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/jdv.16775&link_type=DOI)