ABSTRACT
Lack of knowledge is the main problem we face in the global Covid-19 pandemic. SARS-CoV-2 is a new virus of which there were no previous studies.
Using data from 50 very different countries and by means of a regression analysis, we studied the degree to which a series of variables (health indicators, environmental parameters, economic and social indicators, general characteristics of the country) were able to predict the number of people infected and killed by Covid-19. We also studied how these variables were changing their ability to predict the number of infected and dead by covid-19 during a 3 months period (March, April, May).
The number of deaths by Covid-19 can always be predicted with great accuracy from the number of infected, regardless of the characteristics of the country (which has better or worse health, greater or lesser wealth, regardless of its population structure…). Epidemiological measures to prevent transmission, mainly travel and mobility restrictions, proved to be much more efficient than having large hospital and medical resources.
Inbound tourism turned out to be the variable that best predicts the number of infected (and, consequently, the number of deaths) happening in the different countries. Electricity consumption and degree of air pollution of a country (CO2 emissions, nitrous oxide and methane) are also capable of predicting, with great precision, the number of infections and deaths from Covid-19 in that country. Characteristics such as the area and population of a country also can predict, although to a lesser extent, the number of infected and dead.
In contrast, a series of variables, which in principle would seem to have a greater influence on the evolution of Covid-19 (hospital bed density, Physicians per 1000 people, Researches in R & D, urban population…), turned out to have very little ability to predict both the number of infected and the number of deaths from Covid-19.
All this may explain why the countries that opted for social withdrawal policies since the start of the pandemic outbreak obtained better results.
INTRODUCTION
COVID-19 is a global pandemic caused by SARS-CoV-2 virus1,2. The virus rapidly spread and is happening at the same time all around the world, to different people living in different countries with different economies, different climates,… and, the effect of the disease is also different.
Infection and death counts vary widely not only from country to country but also through time. In some countries the virus is in the remission phase while in others is yet uncontrolled.
The main feature of COVID-19 pandemic is the lack of knowledge. Little is known yet, despite the effort of scientists around the world that have conducted numerous studies, and still do. Many studies have been performed since the outbreak of the pandemic using regression analysis 3–7, most of them focus on the clinical aspects of the pandemic and its implications on health care settings5,8–13, such as cardiac injury, kidney disease, respiratory conditions, diabetes, comorbidities8,9,14–19 gender and underlying comorbidities20, severe COVID-19 condition and risk factors 21–24, number of CD8 +T cells as risk factors for the duration of SARS-CoV-2 viral positivity 25 26, clinical symptoms as smell loss 27, smoking habits 28. And not as many, on other aspects like looking into, for example, temperature as an affecting parameter to transmission rates3,29–31. Some have used GIS modelling to understand what is happening with SARS CoV-2 in this pandemic 32. Some others try to unveil the relevance of temperature, humidity, climate, air quality, socioeconomic factors, demographics…3,4,38,5,30,31,33–37.
A classic scientific strategy to approach the unknown is through regression analysis. Many have used regression analysis or other statistical tools to understand this pandemic.
The relationship of COVID-19 with socioeconomic variables has also been investigated. For example, Mollalo et al. (2020) have performed a study in the U.S.A. compiling 35 environmental, socioeconomic, topographic and demographic variables and create a geodatabase to explain the spatial variability of the COVID incidence 37. GIS-based spatial modelling studies have also been done.32 Guha et al (2020) studied community and socioeconomic characteristics of people in the United States and its possible associations with COVID-19 cases and deaths6, Whittle et al (2020) studied socioeconomic predictors across neighbourhoods in New York 7, Pirouz et al (2020) mixed climate and urban parameters for three regions in Italy 4
In a Big Data and Artificial Intelligence world, complex models have been developed in order to explain and understand COVID-19 pandemic, from regressions that analyse millions of data to complex models that relate regressions with geolocation. Non the less, classic statistical tools can still provide valuable information.
In this study we used a regression analysis between the number of COVID-19 cases and the number of deaths, the dependent variables; and a series of characteristics of the studied countries, the independent variables. We included into the independent variables general indicators (economic indicators, health & research indicators, environmental indicators…) of the different countries to analyse the predictive power of those variables, and if they change with time.
Interestingly, some variables that where not considered as relevant variables to predict infected and death count evolution do it very accurately, while some others that a priori where considered accurate predictors of COVID-19 infected and death counts are less or non-predictors at all.
MATERIALS AND METHODS
We carried out a regression analysis, trying to unveil which characteristics make a country more vulnerable to Covid19.
The chosen dependent variables correspond to two of the most significant and worrisome characteristics of any outbreak: total number of infected people and total death toll. Data was collected from the World Health Organisation (WHO) Coronavirus Disease (COVID-19) Dashboard39 (retrieved from https://covid19.who.int on the 21st of March, 21st of April and 21st of May). This study focuses on the predictive ability or not of the chosen independent variables, regarding infected and death counts.
Most of the independent variables were selected based on logical criteria, searching for relations that may explain the differences on the dependent variables within countries. For example: we decided to check for environmental indicators as CO2, NO2 and methane emissions, that may show an influence on the disease incidence. Health indicators that may influence over the mortality figures or the incidence count were as well chosen for the study (e.g. Hospital Bed Density –beds per 1000– or Physicians per 1000 people). We also decided to take in consideration general demographic indicators to study any pattern that explained such differences (e.g. population, surface area, urban population…). Economic indicators where included as they may show different outcomes influenced by the countries’ wealth. Among all indicators we decided to include some that showed countries investigation level and for so Research and development expenditure (% of GDP), included among the economic indicators, and Researches in R & D, within health indicators, where chosen.
All data was obtained via internet repositories open to public access, such as The World Health Organization, The Central Intelligence Agency or The World Bank. The complete list of variables, as well as the link to each data set is presented in Table 1.
The territories for this study were the first 50 countries with more inbound tourism in year 2018 (last data available), according to the World Tourism Organization webpage40 (https://www.unwto.org/country-profile-inboundtourism).
Regression analysis allows to establish a relation between two variables, a dependent variable yi and an independent variable xi. This linear regression follows the model:
yi= xiβ +α+ εi
and was performed using the statistic package StatPlus:mac (AnalystSoft Inc.)
RESULTS & DISCUSSION
Results were revealing and we then regrouped the independent variables according to their predictive value into three groups:
Variables that are clearly predictive at some point of the covid-19 pandemic (statistically significant linear regression p < 0.05)
Variables that are not predictive at any time of the covid-19 pandemic (non-statistically significant linear regression p > 0.10)
Variables that have some prediction capacity at some time in the covid-19 pandemic period studied (statistically signification of linear regression between p< 0.10 and p> 0.05).
1. Variables that are clearly predictive at some point of the covid-19 pandemic (statistically significant linear regression p < 0.05) (Table 2)
Among all the studied parameters, the two variables that had the highest significant regression (p< 0.00001) during the time period studied were the number of dead by SARS-COV-2 (dependent variable) and the number of infected people (predictive independent variable). This may appear as an obvious conclusion, but it gains relevance when you consider that other parameters studied (i.e. health and research indicators, country richness…) had no significance. No matter what country you choose (rich, poor, with a good or not so good health system, etc.) the number of infected people is going to rule the number of deaths. This drives the idea that what really matters to fight COVID-19 pandemic is to prevent infections among their citizens rather than better health care systems or richness or development. This may be the reason behind that poorer countries with less developed health care systems did so well fighting this pandemic outbreak (e.g. Viet Nam).
Inbound tourism was the second variable with greater significance. It remains significant with the number of infected people during the three month period studied (March: p< 0.00001); April: p< 0.00001; May: p< 0.00006). Our results are concordant with the results obtained by Aldibasi et al (2020)41. Daon et al (2020) suggest that an additional prominent reason for the especially wide and rapid spread of COVID-19 is likely to be the current high prevalence of international travel42. For Chinazzi et al. (2020) most of the imported cases outside China have likely originated from air travel43 and Linka et al. suggest that the spread of COVID-19 in Europe closely followed air travel patterns and that the severe travel restrictions implemented there resulted in substantial decreases in the disease’s spread44. Kubota et al (2020) found that relative frequency of foreign visitors per population was positively correlated45 and Coelho et al. (2020) emphasized the role of the air transportation network in this epidemic46. Gross et al studying spatial dynamics of the COVID-19 in China found a strong correlation between the number of infected individuals in each province studied and the population migration from Hubei to those provinces47 with a slight decay along time, which agrees with our results.
Inbound tourism also has a significant association with the other dependent variable, number of dead people and stays significant in March (p< 0.0003), April (p< 0.00001) and May (p< 0.00001). This means that the number of people arriving to a country is a significant predictor of COVID-19 infected or dead figures, this is concordant with the results obtained by Anzai et al.48. Merler et al. conducted a study to see the effect of travel restrictions on the spread of the virus, and they found that the travel quarantine of Wuhan delayed the overall epidemic progression by only 3 to 5 days in mainland China but had a more marked effect on the international scale, where case importations were reduced by nearly 80% until mid-February 49.
Other results that we estimated relevant was the strong regression found between infected and dead by COVID-19 and independent variables that talk about atmospheric pollution.
Electricity consumption had high statistical significance with both dependent variables, number of infected people and number of deaths, during the period studied. Other environmental indicators (CO2 emissions, NO2 emissions and methane emissions) also showed high predictive power since they had statistical significance (with p< 0.001 values) Countries with more emissions and more energy usage had higher counts of infected or dead people.
Although Travaglio et al (2020) worked only with data from England their findings are similar to ours since they conclude that the levels of some air pollutants are linked to COVID-19 cases and morbidity50.
Similarly, Liu et al. (2020) studied the association of NO2 atmospheric level over the spread of the COVID-19 in Chinese cities, suggesting that ambient NO2 may contribute to the spread of COVID-195. Yao et al. (2020) although studying the effect of other pollution indicators (PM2.5 and PM10) found that COVID-19 held higher death rates with increasing concentration of PM2.5 and PM1051.We cannot forget that pollution variables are somehow related with transport (e.g. airplanes, trains and cars)
Population and Area of the countries studied had different predictive power along the studied period. On the 21st of March, both, Population and Area had high correlation with number of infected people (p=0.00007 and p=0.0043 respectively) but on the consecutive months, while Population had no statistical significance, the Area variable retained predictive power for the number of infected people. The p values for Area were p=0.009 on April and p=0.0002 on May. When we considered the relation of these two variables with the number of deaths, we found similar outputs, on the 21st of March both had statistical significance (Population: p=0.014; Area: p=0.0053), but on the following months the Population variable was no longer significant while Area still was (p=0.04 on April, p=0.007 in May). Aldibasi (2020) found a statistically significant positive association between Area(km2) and mortality rate and a negative association between Population size and incidence rate41.
2. Variables that are not predictive at any time of the covid-19 pandemic (non-statistically significant linear regression p > 0.10) (Table 3)
None of the variables included in this group had statistical significance during the time period studied, from 21st of March to the 21st of May. The regression analysis with Hospital Bed Density (beds per 1000 people), Number of Physicians (per 1000 people) had no correlation with number of infected nor number of deaths at any moment in our study. When confronted with this new pandemic, health care systems showed little efficiency in preventing infections and death counts, due to probably lack of knowledge. What proved to be more effective was early prevention measures.
Researches in R&D (per million people) or Infant Mortality Rate (indicator of a countries health and care system) showed similar behaviour. In the same line, Yao et al. (2020) did not find significance in the association between hospital beds per capita and COVID-19 death rate51. Sorci et al. (2020) found that the Number of hospital beds per 1,000 inhabitants was negatively associated with Case Fatality Rate 52. Qiu et al (2020) in their study in China, found that COVID-19 transmission is negatively moderated by the number of doctors at the city level53
The logic of any disease is that good health and research parameters of a country should have a positive effect over the number of infected or the number of deaths, but our results indicate no significance at all. It is necessary to highlight that the best predictor we found to death count by COVID-19 is the number of infected by COVID-19 (p>0,00001), there is a cause effect relation. What it really shows is that prevention, and not treatment, is the best way to face the COVID-19 pandemic.
Unemployment rate, Inflation rate and Military expenditure, categorised as economic indicators fell also in this group of non-predictive variables. We did not find statistically significant regressions between the different economic indicators we selected and the dependent variables (number of infected and number of deaths). Richness of a country does not predict its infected or dead by COVID-19, with the exception of those that showed easiness to travel across the country (i.e. Inbound tourism). Yao et al. among other variables they also studied Gross Domestic Product (GDP) per capita as a measurement of a country’s wealth and in their work, GDP per capita was found not to have significance in the association with COVID-19 death rate51. According to Qiu et al. (2020) COVID-19 transmission is positively moderated by GDP per capita53
In our study, Urban population (%) and Population Growth rate were also not statistically significant in relation with number of infected people and number of deaths by SARS-CoV-2.
3. Variables that have some prediction capacity at some time in the covid-19 pandemic period studied (statistically signification of linear regression between p< 0.10 and p> 0.05) (Table 4)
This group of variables showed evolution with time, generally they had no predictive power but at some moment in time of our study they acquired some degree of statistical signification. The Government expenditure on education (as % of GDP) had some correlation with number of infected and number of deaths by SARS-CoV-2 on the 21st of March but not the following two months.
The Population ages 65 and above in our study had no relation with the number of infected on the 21st of March, gained significance on the 21st of April (p=0.06) and went back on the 21st of May to be not statistically significant. When we studied Population ages 65 and above against number of deaths it showed no correlation in March or April, but it did in May (p=0.09).
Messner also studied the effect of education and countries’ median age on the COVID-19 outbreak and found that the quality of a country’s education system is positively associated with the outbreak and that countries with an older population are more affected54
Shagam (2020) using multivariate linear regression found significant correlation between incidence and mortality rates with GDP per capita (p = 2.6 × 10−15 and 7.0 × 10−4, respectively), country-specific duration of the outbreak (2.6 × 10−4 and 0.0019), fraction of citizens over 65 years old (p = 0.0049 and 3.8 × 10−4) and level of press freedom (p = 0.021 and 0.019).
In conclusion, regarding health and research indicators it is necessary to highlight that the best predictor to death count by COVID-19 was the number of infected by COVID-19 (p>0,00001), there is a cause effect relation. As stated previously, what it really shows is that prevention, and not treatment, is the best way to confront COVID-19 outbreak. Those countries that conducted more tests to isolate infectious promptly and impose to their citizens confinement policies at early stages were, by far, least affected. Piguillem et al. using Italy’s early figures to calibrate their model they estimate that without lockdowns there could be as much as 800, 000 fatalities. With a very early intervention that number reduces to about 120 230 fatalities, while if the intervention is started 60 days later, the number of fatalities is between 7, 200 and 9, 00055. Italy had, as to 21st of May, 227,364 infected people and 32,330 dead according to the World Health Organisation COVID-19 dashboard (https://covid19.who.int)
Early detection of cases through surveillance and aggressive contact tracing around known cases has helped to contain spread of the outbreak in Singapore. Together with other healthcare, border and community measures, they allow the COVID-19 outbreak to be managed without major disruption to daily living. Countries could consider these measures for a proportionate response to the risk of COVID-1956
Data Availability
All Data and links are available in the text
Competing Interest Statement
The authors have declared no competing interest.
ACKNOWLEDGEMENTS
We thank to Dr. C. García-Balboa, H. Díaz-Alejo, P. Martinez-Alesón for their help and advice on this paper.