Abstract
Background COVID-19 pandemic affected common disease infections, while the impact on hand, foot, and mouth disease (HFMD) is unclear. Google Trends data is beneficial in approximately real-time statistics and easily accessed, expecting to be used for infection explanation from information-seeking behavior perspectives. We aimed to explain HFMD cases before and during COVID-19 using Google Trends data.
Methods HFMD cases were obtained from the National Institute of Infectious Disease, and Google search data from 2009 to 2021 was downloaded using Google Trends in Japan. Pearson correlation coefficients were calculated between HFMD cases and the search topic “HFMD” from 2009 to 2021. Japanese tweets containing “HFMD” were retrieved to select search terms for further analysis. Search terms were retained with counts larger than 1000 and belonging to ranges of infection sources, susceptible sites, susceptible populations, symptoms, treatment, preventive measures, and identified diseases. Cross-correlation analyses were conducted to detect lag changes between HFMD cases and HFMD search terms before and during COVID-19. Multiple linear regressions with backward elimination processing were used to identify the most significant terms for HFMD explanation.
Results HFMD cases and Google search volume peaked around July in most years without 2020 and 2021. The search topic “HFMD” presented strong correlations with HFMD cases except in 2020 when COVID-19 outbroke. In addition, differences in lags for 73 (72.3%) search terms were negative, might indicating increasing public awareness of HFMD infections during the COVID-19 pandemic. Results of multiple linear regression demonstrated that significant search terms contained the same meanings but expanded informative search content during COVID-19.
Conclusions Significant terms for HFMD cases explanation before and during COVID-19 were different. The awareness of HFMD infection in Japan may improve during the COVID-19 pandemic. Continuous monitoring is important to promote public health and prevent resurgence. Public interest reflected in information-seeking behavior can be helpful for public health surveillance.
Background
Hand, foot, and mouth disease (HFMD) is an infectious disease that results in a blistering rash on the mouth, hands, and feet. Most infected individuals recover from HFMD within a few days. Various comorbidities, including myocarditis, neurogenic pulmonary edema, acute flaccid paralysis, and central nervous system complications, such as meningitis, cerebellar ataxia, and encephalitis, can also occur [1, 2]. HFMD has a worldwide distribution that outbreaks often occur during summer and early fall in the United States. Large outbreaks in Cambodia, China, Japan, Korea, Malaysia, Singapore, Thailand, and Vietnam have been reported in the past 2 decades [3, 4]. HFMD is seasonal in temperate Asia with a summer peak and subtropical Asia with spring and fall peaks, but not in tropical Asia, indicating a climatic role was identified for temperate Japan [5]. During the summer of 2011, Japan had the largest epidemic of HFMD on record, with 347,362 cases reported [6]. Coxsackievirus A6 (CV-A6) infection was responsible for most cases, with co-circulation of coxsackievirus A16 (CV-A16) and enterovirus A71 (EV-A71) [7]. EV-A71 has been sporadically detected from October 2014 onward. It became the predominant serotype in 2018, with approximately 70,000 reported cases, following an increased spread from the end of 2017 [8]. Since June 2019, a severe outbreak of HFMD has occurred in multiple regions of Japan, attracting public attention again [9]. As enteroviruses can spread rapidly by droplet and fomite transmission among children in daycare centers and kindergartens, understanding HFMD outbreaks is vital to public health, particularly during COVID [10].
Rapid recognition and reporting of HFMD infection are essential, and several studies have constructed models for explaining HFMD infection [11–15]. Rui et al. explored epidemiological characteristics and calculated the early warning signals of HFMD using a logistic differential equation (LDE) model in seven regions of China [11]. Yu et al. forecasted the number of HFMD cases with wavelet-based hybrid models in Zhengzhou, China [12]. Zhang et al. proposed a landscape dynamic network marker (L-DNM) to detect pre-outbreak signals of HFMD in Tokyo, Hokkaido, and Osaka, Japan [13]. Gao et al. used monthly HFMD infection cases and meteorological data to construct a weather-based early warning model with a generalized additive model across China [14]. Zhao et al. used a meta-learning framework and combines Baidu search queries for real-time estimation of HFMD cases [15]. The above studies used a range of data, including monthly or weekly HFMD infectious cases [11, 12, 14], dynamic information from city networks, horizontal high-dimensional data, records of clinic visits [13], meteorological data [14], and Baidu search queries data [15]. which are relatively difficult to access or delayed updates in Japan. Traditional surveillance and reporting systems lag an outbreak by one to two weeks because of the reporting and verification process. In Japan, the National Institute of Infectious Disease (NIID) has monitored the outbreak of various infectious diseases and issued weekly reports since 1999, but delayed for several weeks [16]. In addition, no studies have focused on changes in HFMD affected by the COVID-19 pandemic by Internet searching data compared with previous studies. As of August 2022, Google search data was considered reliable because its market share in Japan has been over 70% since 2009 [17, 18]. Google Trends data shows the information that the public is searching for more real-time and labor-saving, which may be valuable for infection surveillance.
The science of distribution and determinants of information in an electronic medium, specifically the Internet, or in a population, to inform public health and policy is defined as “infodemiology” [19]. Google Trends is frequently used in infodemiology research to gauge public interest [20]. Google Trends reflects public information-seeking behavior and allows users to analyze Google search data for specific search terms in any country or region over a selected period [21, 22]. Studies have shown that online query trends correlate with real-life epidemiologic phenomena such as the flu [23], sinusitis [24], lifestyle-related disease [25], asthma [26], and pruritus [27]. Researchers have also investigated public interest and information-seeking behavior in chronic obstructive pulmonary disease (COPD) [28], cancer [29, 30], bariatric surgery [31], kidney stone surgery [32], and suicide [33]. During the COVID-19 pandemic, similar studies using Google Trends search data were conducted to predict COVID-19 infectious cases [34, 35], explore public attitudes toward vaccination [36– 38], identify symptoms caused by pandemics [39–41], and assess affected medical services [42–44]. The above studies indicate that Google Trends could assist in gaining a better understanding and analysis of health information-seeking behavior. Information from Google Trends could be used to supplement the current infection reports with lag time.
This study aimed to explain HFMD infection using Google Trends data in Japan before and during the COVID-19 pandemic.
Methods
Data
We obtained actual HFMD cases from the weekly reports issued by NIID, which included new infectious cases and sentinel cases by prefecture and updated them from 1999 to the present [16]. Additionally, we set the geographic location to Japan and the category to health to limit irrelevant results and downloaded the relative search volume (RSV) of the “HFMD” search topic using Google Trends from January 1, 2009, to December 31, 2021. The normalized RSV data represented the search interest relative to the highest point for a given region and time. The scales of normalized RSV varied from 0 to 100, where 0 meant there were insufficient data for a term, while 100 was the peak popularity. We selected a search topic instead of the search term “HFMD” for comprehensive search information and limited the period from 2009 to 2021. In this study, we used the search topic “HFMD” and the search term “HFMD.” The weekly RSV of the search topic “HFMD” from 2009 to 2021 was gathered for further analysis.
To identify significant factors of HFMD infection, we created multiple linear regression models using HFMD-related search terms selected by Japanese tweets. We retrieved Japanese tweets through the publicly available Twitter Stream application programming interface (API) by querying the keywords “HFMD” to select “HFMD” related top words. Google applied improvements to the data collection system on January 1, 2016, and January 1, 2022, respectively. For consistency, 275,010 tweets restricting between 2016 and 2021 were downloaded in this study. Tokenization was used to select top words in Japanese tweets, which is a fundamental step in many natural language processing (NLP) methods, especially for languages like Japanese that are written without spaces between words. We tokenized all tweets and analyzed the unigram tokens. The website links, special characters, numbers, and “amp” (ampersands) were removed from the tweets before tokenization. The Python packages SpaCy and GiNZA were used to remove the Japanese stop words and implement tokenization. White space characters joined the tokenized words into text in the original order. The Python package scikit-learn was used to convert the white space–joined texts into unigram and bigram tokens and calculate the token counts. We provided counts of tokens in Appendix 1. Search terms with counts larger than 1000 and belonging to ranges of infection sources, susceptible sites, susceptible populations, symptoms, treatment, preventive measures, and identified diseases were selected for further analysis. Selected terms with corresponding interpretations and categories are provided in Appendix 2 and Appendix 3. We downloaded the weekly RSV of selected search terms through Google Trends in two periods: before (2016-2019) and during the pandemic (2020-2021) based on the first case of COVID-19 in Japan was confirmed on 16 January 2020.
Statistical Analysis
Initially, we calculated the Pearson correlation coefficient between the actual HFMD cases and RSV of the search topic “HFMD” each year from 2009 to 2021 instead of the whole period due to the periodic characteristic. Since our response variable HFMD cases and explanatory variables search term RSV are measured on a continuous scale, the parametric test should typically be selected instead of non-parametric analysis [45].
Second, we conducted cross-correlation analysis between actual HFMD cases and RSV of selected search terms. Cross-correlation is a measure of the similarity between two series as a function of the displacement of one relative to the other and was used to objectively estimate the time lag between cases of HFMD infection and related search terms [46]. We set the maximum lag to ±20 weeks due to the periodic characteristics of HFMD infection. We obtained 40 cross-correlation coefficients for each HFMD-related search term before and during the COVID-19 pandemic. Next, we selected the coefficients with the greatest absolute values and exhibited their true values. Finally, we compared the coefficients with the greatest absolute value in the periods before and during the COVID-19 pandemic. Regarding these coefficients, we assumed negative, zero, or positive values with the greatest absolute value representing the search terms that occurred earlier, coincided with, or later than the actual HFMD cases. Differences in lags during and before the COVID-19 pandemic was calculated to determine public awareness of HFMD.
Third, we conducted multiple linear regression to identify the most important Google search terms for explaining HFMD infection before and during the COVID-19 pandemic. We included HFMD-related search terms for multiple linear regression explanatory variables, with actual HFMD cases as response variables. Collinearity is the correlation between explanatory variables that expresses a linear relationship in a regression model. When the explanatory variables are correlated in the same regression model, they cannot explain the response variable dependently. We normalized the RSV of each selected term to avoid collinearity in regression models. Several common methods were used for explanatory variable selection to identify the most significant search terms and limit the number of explanatory variables, including forward selection, backward elimination, and stepwise regression [47]. Backward elimination was used in this study to find the best subset of search terms due to its easy implementation and automated availability. We used a p-value threshold of 0.05 to remove unnecessary search terms. In each round, we removed the search term with the highest p-value and reconstructed a multiple linear regression model until all p-values were under the threshold.
To assess the performance of the linear regression model, the coefficients of determination R2 or adjusted R2, which indicates how much variation in response is explained by the model, are often used [48]. We selected the adjusted R2 value for model evaluation to avoid overfitting the model.
Results
Basic description of HFMD cases and “HFMD” RSV
Figure 1 presents the actual HFMD and RSV cases from 2009 to 2021. Visual inspection of the figure indicated that both the actual HFMD cases and RSV of Google Trends peaked around July in most years except for 2020 and 2021. The number of HFMD infections surged after 2011, peaking every two years before 2020. The RSV coincided with this trend. In 2020, no periodic peak of infection was observed, whereas, in 2021, the peak was delayed to November.
As shown in Figure 2, we calculated the correlation between the actual HFMD cases and RSV each year from 2009 to 2021. These correlations’ mean (standard error) was 0.820 (0.052). Most coefficient values were greater than 0.7, except for 0.338 in 2020. Pearson correlation coefficients were provided in Appendix 4.
Cross-correlation between HFMD cases and search term RSV before and during the pandemic
We performed cross-correlation analysis to determine the temporal relationship between HFMD cases and search term RSV. Cross-correlation results before and during the pandemic were presented in Table 1.
Compared with period 1, the temporal correlation between HFMD cases and search term RSV changed in period 2. In period 1, 61 (60.4%), 6 (5.9%), and 34 (33.7%) of search term RSV presented earlier, coincide, and later than HFMD cases. In period 2, 73 (72.3%), 1 (1%), and 27 (26.7%) presented earlier, coincide, and later than HFMD cases. Differences in lags for 73 (72.3%) search terms were negative, might indicating increasing public awareness of HFMD infections during the COVID-19 pandemic. In contrast, lags for 5 search terms had no change, and 23 search terms exhibited delays.
Essential search terms for explaining HFMD cases before and during the pandemic
We identified the most significant search terms for HFMD infection using multiple linear regression with backward elimination procedures. As shown in Table 2, 18 search terms were significant in period 1 and accounted for an adjusted R2=96.7% of the variation in HFMD infection. Conversely, as shown in Table 3, 57 search terms were detected as significant with adjusted R2=98.4% in period 2, which included more critical variables than in period 1. Model specification formulas were provided in Appendix 5.
Compared with period 1, significant search terms in period 2 contained the same meanings and expanded informative search content. “Herpangina,” “nasal mucus,” “exhaustion,” “diarrhea,” “summer cold,” “young child,” “pediatric,” and “swelling” occurred in the original form. “Pain,” “reduce fever,” “oral cavity,” “chickenpox,” “itch,” and “sole of the foot” occurred in morphing or synonym forms. “Virus” was replaced by more specific infection sources in period 2, such as “adenovirus,” “entero-,” “coxsackie,” “mycoplasma,” “legionella,” and “hemolytic streptococcus.” Correspondingly, “adult” was superseded by other terms for susceptible populations, such as “child” and “infant” (Omit synonyms). Search terms related to susceptible sites, preventive measures, and treatment also were identified in period 2, such as “daycare center,” “kindergarten,” “handwashing,” “disinfection,” “hospital,” and “specialty drugs.” Multiple linear regression results corroborated the cross-correlation results and indicated that public awareness of HFMD might increase during COVID-19.
Discussion
This study presented trends and correlations in HFMD cases and RSV of “HFMD” from 2009 to 2021. Cross-correlation analyses were conducted between HFMD cases and search terms RSV before and during the pandemic. Additionally, multiple linear regressions were used to identify the significant search terms for explaining HFMD cases in two periods. Our results indicated that HFMD cases and RSV peaked around July in most years, except in 2020 and 2021, and surged after 2011 with peaks every two years before 2020. The search topic “HFMD” exhibited strong correlations with HFMD cases except in 2020, when COVID-19 outbroke. Furthermore, cross-correlation and multiple regression results revealed that the public might have improved awareness of HFMD infection during the pandemic. To our knowledge, this study is the first to explain HFMD cases using Google search data and examine changes in information-seeking behavior towards HFMD affected by the COVID-19 pandemic. Google search data could supplement public health surveillance and help authorities respond to infectious diseases rapidly.
From 2009 to 2021, the RSV of “HFMD” coincided with the HFMD cases except in 2020, which showed similar trends and peaks. In Japan, HFMD peaks generally occur around July [5]. During the COVID-19 pandemic, different from previous HFMD peaks disappeared in 2020 and lagged to November 2021. In 2020, Google Trends search data did not match the “HFMD” cases, with a relatively small peak in July. Despite a small peak in the RSV of search topic “HFMD”, the volume was much lower than in previous years. In contrast, the Japanese government implemented several measures to control COVID-19 that might potentially influence the spread of HFMD. Respiratory droplets and contact routes were mainly infection routes in HFMD and COVID-19 [49, 50]. Therefore, the susceptible population of HFMD also stay safe by taking standard precautions during COVID-19, such as physical distancing, wearing a mask, regularly washing hands, and coughing into a bent elbow or tissue [51].
The global pandemic might have enhanced public awareness of HFMD in addition to COVID-19 from the evidence provided by our results. 73 (72.3%) search terms cross-correlated earlier with HFMD cases during COVID-19, and significant search terms detected in period 2 contain more informative information. Previous studies demonstrated that the prevalence of respiratory infectious diseases reduced during the COVID-19 pandemic, such as influenza, varicella, herpes zoster, rubella, and measles [52–58]. This might have been due to adherence to non-pharmaceutical interventions and lower non-polio enterovirus activity during the COVID-19 pandemic compared with 2014-2019 [59]. Switzerland had an unprecedented complete absence of pediatric enteroviral meningitis in 2020 [60]. In Japan, community-acquired pneumonia [61] and influenza [62] admissions have been reduced during the COVID-19 pandemic. COVID-19 preventative actions and better personal hygiene are beneficial for preventing the spread of diseases. However, the prevalence of common diseases may rise as the public gradually complies less with infection control measures in the upcoming season [61]. Consistent with our results, a peak in HFMD infection and public interest re-occurred in November 2021. Continuous monitoring of HFMD is required, and public information-seeking behavior may be helpful in public health surveillance.
Google search data was applied in our study to explain HFMD cases affected by COVID-19 instead of Baidu search data was used for real-time estimation of HFMD cases in China [15] and other categories of data for HFMD prediction or explanation [11–14]. Differing from previous studies, we paid attention to the distinction between the information-seeking behavior of HFMD before and during COVID-19 and attempted to explain the HFMD cases using Google Trends data which has the most market share in Japan [17]. Many researchers have shown that the Google search data represents the public interest in a specific topic. However, Google search data should be used cautiously as a surveillance system because large events can easily interfere with it. Combing fine-grained data like mobility data could help develop surveillance systems that can effectively exclude biased or irrelevant information to respond rapidly [63].
Conclusion
This study described trends and correlations in HFMD cases with RSV of “HFMD” and identified significant search terms to explain HFMD infections before and during the COVID-19 pandemic. We found the prevalence of HFMD was abnormal during COVID-19, and public might enhance awareness of HFMD infection affected by the pandemic. It is critical to continuously monitor resurgent common infections as the public gradually reduces compliance with infection control measures. Public information-seeking behavior using Google search data may be useful for public health surveillance.
Limitations
This study had several limitations. First, our findings are limited to those who used Google to search for health-related information, which may not represent the entire community. The results may be biased toward younger people, who are more digitally connected than older individuals, although the Google search engine market share in Japan is nearly 80% [17]. Second, Google Trends improved geographical assignment and data collection systems in 2011, 2016, and 2022. Our results in the basic description of HFMD cases and “HFMD” RSV might be affected by them. Third, search data analysis is hypersensitive to large events, so complementary instead of replacing traditional research methods. Fourth, the specific HFMD-related terms we selected by Twitter might not represent all search terms in public use, especially hiragana, katakana, kanji, and alphabets used in Japan. Fifth, we used search data from 2016 to 2019 to represent the period before the COVID-19 pandemic due to restrictions of Google Trends, which may not represent the entire period. Finally, during the processing of backward elimination for regression model construction, significant terms might be eliminated due to jointly insignificant. Although we can find out potential significant terms from all eliminated terms, it is unpracticable to conduct F-test multiple times because multiple hypothesis testing leads to lower confidence levels. Hence, we remained current processing results.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
Declarations
Ethics Approval
All methods were carried out in accordance with relevant guidelines and regulations (DECLARATION OF HELSINKI). Ethical approval and consent to participate were not necessary as the study was based on openly available aggregated data.
Consent for Publication
Not applicable
Availability of Data and Materials
We used publicly available data published by Google Trends and the National Institute of Infectious Disease, Japan. All data generated or analyzed during this study are included in this published article (Appendix 6).
Competing Interests
The authors declare that they have no competing interests.
Funding
This work was supported by the Japan Science and Technology for pioneering research initiated by the next generation (SPRING; grant number JPMJSP2110).
Authors’ Contributions
QN, JL, MNT, and TA contributed to the study conception and design. QN, ZZ, AB, KH, MO, and AK collected data. QN, JL, and ZZ participated in data analysis and interpretation and drafted the manuscript. All authors contributed to the manuscript revision and approved the final version of the manuscript.
Acknowledgments
I appreciate support by 2022-2023 Google PhD Fellowship.
Footnotes
After major revision
Abbreviations
- HFMD
- hand, foot, and mouth disease
- EV-A71
- enterovirus A71
- CV-A6
- coxsackievirus A6
- CV-A16
- coxsackievirus A16
- NIID
- National Institute of Infectious Disease
- LDE
- logistic differential equation
- L-DNM
- landscape dynamic network marker
- COPD
- chronic obstructive pulmonary disease
- RSV
- relative search volume