Characterization of the Second Wave of the COVID-19 Pandemic in India: A Google Trends Analysis

Background: The second wave of the COVID-19 pandemic has led to considerable morbidity and mortality in India, in part due to lack of healthcare access, low health literacy, and poor disease surveillance. In this retrospective, descriptive ecological study, we utilized Google Trends (GT) to characterize the second COVID-19 wave and its association with official case counts based on search terms related to symptoms, testing, disease complications, medications, preventive behaviors, and healthcare utilization. Methods: GT is a publicly available, online tracking system of Google searches. Searches are presented as relative search volumes (RSV) from 0 (least) to 100 (most number of searches). We performed pre-defined Web searches in India from 2/12/2021 to 5/09/2021. We characterized the peak RSV, RSV doubling rates, and Spearman rank correlation of selected search terms with official case counts. We also used date-adjusted linear regression to estimate the association between highly correlated search terms and official case counts. We then qualitatively classified public search queries into thematic groups to better understand public awareness and needs related to COVID-19. Results: We observed that searches for symptoms (most searched terms in order: fever, cough, headache, fatigue, chest pain), disease states (infection, pneumonia), COVID-19-related medications (remdesivir, ivermectin, azithromycin, Fabiflu, dexamethasone), testing modalities (PCR, CT Scan, D-dimer, C-reactive protein, oxygen saturation), healthcare utilization (oxygen cylinders, hospital, physician), and preventive behaviors (lockdown, mask, pulse oximetry, hand sanitizer, quarantine) all demonstrated increases, in line with increases in official case counts. Symptoms, PCR testing, outpatient medications, and preventive behaviors peaked around April 24th, approximately two weeks prior to the peak RSV in official case counts. Contrarily, healthcare utilization factors, including searches for hospital, physicians, beds, disease states, and inpatient medications did not peak until the first week of May. There were highly significant correlations between "Coronavirus Disease 2019" (r=0.959), "fever" (r=0.935), "pulse oximetry" (r=0.952), "oxygen saturation" (r=0.944), "C-reactive protein" (r=0.955), "D-Dimer" (r=0.945), & "Fabiflu" (r=0.943) and official case counts. Conclusion: GT search terms related to symptoms, testing, and medications are highly correlated with official case counts in India, suggesting need for further studies examining GT's potential use as a disease surveillance and public informant tool for public health officials.


Introduction
The COVID-19 pandemic has led to considerable morbidity, mortality, and near healthcare system collapse in India. This has been especially evident during the second wave of the COVID-19 pandemic, which has as of May 19th, 2021, resulted in a cumulative 25.5 million cases and over 283,000 reported deaths [1]. A novel B.1.617.2 Indian variant [2], mass gatherings, political rallies, slow governmental response, and considerable healthcare access issues have put India's COVID-19 situation to the forefront of international media [3][4][5][6][7]. In addition, many experts, including the World Health Organization and Lancet Commission [3,4], have scrutinized data reporting in India due to its lack of comparability with crematorium accounts, ambiguity surrounding suspected case handling, and paradoxically low case fatality rates compared to other countries [5][6][7].
Real-world, data-driven approaches to understand public response and conduct disease surveillance are of paramount importance. One such tool, Google Trends TM (GT), has been successfully used to describe disease epidemics including Ebola, Influenza, and early COVID-19 responses in India [8]. In addition to nearly universal access to Google and real-time data collection, many individuals search Google before accessing healthcare [9], suggesting that we may be able to better capture disease burden, disease processes, and societal response via the GT tool, especially in countries with inadequate public health infrastructure.
was delayed by approximately 2 weeks compared to 'fever' but also demonstrated a second doubling time of 11 days, identical to official case count doubling times (Table 3). Both 'headache' and 'fatigue' searches did not have a second doubling time due to a high baseline RSV.
Among disease complications, most individuals searched for pneumonia, infection, myocardial infarction, and stroke, in that order (Figure 1b). Searches for 'pneumonia' peaked on May 1st and had a first doubling date of April 1st and a second doubling date of April 18th. This is approximately 4 weeks after official case doubling times. However, the peak RSV for 'pneumonia' was five days prior to the peak in official cases (Table 3).
Among testing modalities, searches for 'polymerase chain reaction' (PCR) were of the highest RSV, followed by 'computed tomography' (CT Scan), 'C-reactive Protein', 'D-dimer', and 'Oxygen Saturation' (Figure 1c). The peak RSV was two weeks earlier for PCR searches than other testing modalities. Compared to official case counts, PCR searches peaked two weeks earlier and also had a first doubling date two weeks earlier (Table 3).

Socially-relevant Search Terms
General, healthcare utilization, preventive behavior, and festival/holiday searches also demonstrated similar peak RSVs and doubling times as clinically-relevant search terms (Table  3). General searches for the 'Coronavirus Disease 2019' topic peaked on May 1st, had a first doubling date of February 28th, approximately two weeks prior to official case count doubling, and a second doubling date of April 2nd, approximately two weeks after official case count doubling.
Among healthcare utilization searches, searches related to 'oxygen' were most prevalent, peaking on April 24th and having a second doubling time within one week of first doubling date. Searches for 'physician', 'hospital', and 'bed' peaked in the first week of May and did not show any doubling (Figure 2b, Table 3).
Among preventive behavior searches, we observed a general increase in searches for 'lockdown', as well as 'mask' and 'pulse oximetry' (Figure 1c). Nearly all preventive behavior searches peaked from April 18th to April 28th but had variable doubling dates (Table 3).
Among searches for festivals/mass gatherings, we see a tremendous increase in searches for Holi on March 29th and smaller, but significant, increases in other gatherings (Figure 2d). Peak searches of Eid-al-Fitr, Holi, Easter and Indian New Year occurred in the days leading up to the holiday. No searches under these general categories included queries regarding COVID-19. However, when searching for queries associated with Kumbh Mela, a Hindu festival that was held in Haridwar, Uttarakhand during April 2021 known to have been a "super-spreader" event, had increasing queries regarding the festival and "covid" or "covid cases" during this time frame.
When stratifying Indian New Year by regional festivities such as Ugadi, Gudi Padwa, Vaisakhi, and Vishu, there were no searches associated with "covid".

Correlation Between Searches and Official Case Counts
We determined the Spearman rank correlation between search terms and official case counts (Table 4). The searches yielding the highest correlation included 'Coronavirus Disease 2019' (r=0.959), 'fever' (r=0.935), 'pulse oximetry' (r=0.952), 'oxygen saturation' (r=0.944), 'C-reactive protein' (r=0.955), 'D-Dimer' (r=0.945), and 'fabiflu' (r=0.943). This can also be seen visually in Figure 3. Using date-adjusted linear regression to estimate the association between search terms and official case counts, we found that every 1 unit increase in the RSV for 'fever' resulted in a mean increase of 4,854 cases (95% CI: 4,538 -5,170; p<0.0001). Every 1 unit increase in 'pulse oximetry' yielded a mean increase of 3,141 (95% CI: 2,863 -3,420; p<0.0001) cases. Every 1 unit increase in 'fabiflu' resulted in a mean increase of 2,318 (95% CI: 1,821 -2,815; p<0.0001) cases. When looking at the correlation between official case counts and control search terms, we found that 'film' and 'election' had no correlation, but other control terms including 'cricket', 'bank', and 'animal' were significantly negatively correlated (Table 4). When we determined the correlation of search terms with official case counts stratified by time period (first 21 days, middle period, and last 21 days), we observed that correlations were of lesser significance in the first and last 21 days (data not shown).
We also determined the correlation between search terms and official death counts, which yielded similar but lower magnitude correlations (Table 5).

Qualitative Theme Analysis
In order to understand the underlying questions for specific search topics, we grouped related search queries into observed themes, namely administrative (testing, registration, location, costs), symptom expectations (disease progression, symptom management), and health literacy (meaning of terms, etiology, normal levels). Many individuals seemingly are unsure of testing locations, vaccine registration protocol, location of available hospitals, and costs of various tests, medications and healthcare utilization factors. Individuals are also unsure of whether presenting symptoms are related to COVID-19 and have questions about expected length of fever, definition of fever, normal values for common laboratory tests, and over-thecounter treatment options for common symptoms like fever and cough (Table 6).

Discussion
In this retrospective, descriptive, ecological study of GT search terms related to the second wave of the COVID-19 pandemic in India, we observed that searches for symptoms (e.g. fever, cough), disease states (e.g. infection, pneumonia), COVID-19-related medications (e.g. remdesivir, ivermectin), testing modalities (e.g. PCR, CT Scan), healthcare utilization (e.g. oxygen cylinders, hospital), and preventive behaviors (e.g. pulse oximetry, lockdown) all demonstrated an increase over our time period from 2/12/21 -5/9/2021, in line with increases in official case counts. While peak RSVs and doubling rates varied across search categories, we generally observed that symptoms, PCR testing, outpatient medications, and preventive behaviors peaked around April 24th, approximately two weeks prior to the peak RSV in official case counts, but exhibited a doubling date one-two weeks later than official case counts.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Contrarily, healthcare utilization factors, including searches for hospital, physicians, beds, disease states, and inpatient medications did not peak until the first week of May. There was a highly significant correlation between 'Coronavirus Disease 2019', 'Fever', 'Pulse oximetry', 'Oxygen saturation', 'C-reactive protein', 'D-Dimer', & 'Fabiflu' and official case counts.
GT has been used to understand various aspects of the first wave of the COVID-19 pandemic worldwide, including its use as a digital surveillance tool [10][11][12][13][14], its use as a psychosocial behavior informant [15], and its use as a method to characterize novel symptoms such as ageusia and agnosia [16]. GT has many advantages, including real-time, real-world analysis, large sample size, and non-healthcare-related searches that may be missed in studies looking only at patients seeking healthcare.
We found that fever, cough, headache, fatigue, and chest pain were the most prevalent symptom-related searches during our time period, suggesting that these may be the most common presenting symptoms during the current COVID-19 wave. Searches for gastrointestinal symptoms, agnosia/ageusia, body/muscle aches, and other commonly reported symptoms from the first wave were not nearly as prevalent nor did they demonstrate a significant rise during our time period. These results are in line with current literature describing prevalence of symptoms [17]. Additionally, fever and cough demonstrated the strongest correlation with official case counts, whereas headache and fatigue showed significant, but lower, correlation and did not display a second doubling. This may suggest that either headache and fatigue are specific to certain patients, that there was already a baseline high level of RSV possibly due to 'long-haul' Covid [18], or that the pattern of symptom searching is biased away from headache and fatigue, which can be easily overlooked. The increase in searches for chest pain is likely the result of pleuritic chest pain from COVID-19 pneumonia rather than chest pain due to cardiovascular disease; however, Ciofani et al. showed in the U.S. population that hospital admissions for myocardial infarction decreased, while searches for chest pain increased in line with COVID-19 cases. This may suggest that patients may be self-triaging themselves or avoiding seeking care for true cardiovascular chest pain [19]. Interestingly, searches for fever and cough peaked nearly two weeks earlier than case counts, suggesting that symptomatic individuals may search symptoms prior to getting tested and being included in official case counts. If this is in fact true, GT may provide a useful tool to target populations with increases in searches for symptoms, by for example, providing testing facilities earlier in disease course.
These results demonstrate that GT may be useful in not only characterizing the symptoms of a novel population-wide disease but also in understanding the relative prevalence of certain symptoms among all individuals including those who otherwise have mild infection but may still search about fever or cough.
As severe COVID-19 infection is known to cause a pro-inflammatory state with elevated inflammatory markers and susceptibility to clot formation, we explored searches related to laboratory testing and disease complications including myocardial infarction, stroke, and pulmonary embolism. While searches for C-reactive protein and D-dimer were amongst the most highly correlated searches with official case counts -likely because PCR tests for disease are done in coordination with other laboratory tests and CT scans in suspected cases -we did not see a significant rise in myocardial infarction, stroke, or pulmonary embolism, suggesting that individuals with severe complications may not search before seeking care. Future studies should include terms for signs/symptoms of complications rather than the disease complication itself.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Our search term topics included commonly-used related terms (e.g. heart attack), but may not have been comprehensive enough to capture all related searches.
In addition to clinically-relevant searches, we also explored preventive behaviors and healthcare utilization. Searches for 'oxygen saturation', 'pulse oximetry', and 'oxygen' (cylinders) were highly prevalent and also significantly associated with official case counts. This is in agreement with the well-publicized issue of oxygen shortages in India and difficulties of finding hospital beds [20]. Despite the increase in searches for 'hospitals', we did not see a concomitant increase in searches for 'ambulance', making it unclear whether ambulance shortages were lower than hospital shortages, whether families were more likely to transport patients, or whether searches for hospitals were the result of media publicization rather than the primary cause.
In addition to GT's use as a potential method to characterize disease, of equal importance is its role to better understand public questions concerning COVID-19 [21]. We found that related search queries could be grouped into three main themes, administrative, symptom expectations, and symptom management. This is in line with GT analyses by Hu et al. and Springer et al. that remarked that there is a need for increased disease awareness, with respect to treatment options and disease course [22,23]. This is especially evident and necessary in India, where health literacy remains low and many search queries were concerned with symptom management. There is also a need for India to address administrative questions including cost, location, and availability of testing, vaccine, and hospital facilities. Inability to access healthcare due to administrative issues is a primary source of disease burden and disproportionately affects those of lower socioeconomic status.
The results should be interpreted with caution due to many limitations. Firstly, our ecological study design does not allow us to make inferences at the individual patient level, nor does it provide information on the direction of correlation. Increases in searches may, at least partially, be due to increased presence of related topics in the media rather than individual situations. For example, Remdesivir exports were prohibited by Indian government officials on April 11th in anticipation of the predicted spike in demand for the drug. This may have contributed to the rise in interest for this term earlier than other medications.
Furthermore, searches for symptoms may not be solely due to COVID-19. However, given the high prevalence of COVID-19 In India and the significant correlation with confirmed COVID-19 cases, it is reasonable to infer that rises in related search terms are due to COVID-19. Additionally, search terms may not capture all non-English languages and regional colloquialisms. Although we tried to choose terms that would capture the largest percentage of related terms, this may have decreased specificity of searches. For example, the search topic "mask" rather than "face mask" or "covid mask" may encompass all of the intended searches with a high sensitivity; although the decision to choose the more general search term decreases specificity when including searches intended to find other masks, such as those for skincare or fashion purposes. Despite these limitations, GT's large sample size, real-world and real-time benefits, and ability to capture populations that may otherwise be excluded from traditional research studies provide a useful tool to understand COVID-19. In particular, our major strength is there has been no comprehensive study on various clinical and social aspects of the second wave of the COVID-19 pandemic in India.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 21, 2021.

Future Directions
Future studies should attempt to reconcile the predictive power of GT search terms, including highly correlated symptoms, testing, and healthcare utilization factors, in India, within specific regions in India, and other countries. Studies should also attempt to understand the impact of mass gatherings on GT searches to allow for prediction after upcoming events such as Eid. Future studies may also focus on understanding the association between search terms and official death counts in India, including prediction of underreporting of data based on GT models developed in other countries.
Furthermore, examination of other Internet activity besides Google Web Search (such as Youtube, Facebook, and Whatsapp) may provide a more accurate depiction of Internet-based health searches. In particular, Whatsapp is one of the largest modes of learning and spreading health-related information in South Asia [24]. Popular "chain-messages" or posts may be studied for keywords and interest based on time-course to check for correlation with COVID-19 case data and risk prediction capability. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)   is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 21, 2021. ; . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 21, 2021   is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 21, 2021.  P  a  r  a  c  e  t  a  m  o  l  0  .  9  1  1  <  0  .  0  0  0  1  4  1  5  8  (  3  2  5  9  -5  0  5  7  )  <  0  .  0  0  0  1  D  o  x  y  c  y  c  l  i  n  e  0  .  8  7  6  <  0  .  0  0  0  1  3  5  1  0  (  3  1  7  4  -3  8  4  5  )  <  0  .  0  0  0  1  D  e  x  a  m  e  t  h  a  s  o  n  e  0  .  7  6  5  <  0  .  0  0  0  1  3  2  6  8  (  2  7  3  4  -3  8  0  3  )  *  <  0  .  0  0  0  1  A  z  i  t  h  r  o  m  y  c  i  n  0  .  9  3  3  <  0  .  0  0  0  1  3  2  2  3  (  2  8  6  2  -3  5  8  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  C  o  r  r  e  l  a  t  i  o  n  B  e  t  w  e  e  n  S  e  a  r  c  h  T  e  r  m  s  a  n  d  O  f  f  i  c  i  a  l  C  O  V  I  D  -1  9  D  e  a  t  h  C  o  u  n  t  s   S  e  a  r  c  h  T  e  r  m  S  p  e  a  r  m

Meaning of Term
Ambulance supro ambulance, drdo bike ambulance, ambulance with ventilator, ambulance service, ambulance number.

Location, Contact information
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

News, Guidelines
Quarantine quarantine hotels in mumbai, quarantine meaning in marathi, quarantine meaning in bengali, home quarantine schedule, bhoomi quarantine watch

Location, Meaning of term, Guidelines
Face Masks facemask, full face mask for men, face shield, best face mask, face mask price

Purchase options, Cost
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 21, 2021. ; https://doi.org/10.1101/2021.05.19.21257473 doi: medRxiv preprint