## Abstract

The recent coronavirus pandemic follows in its early stages an almost exponential expansion, with the number of cases as a function of time reasonably well fit by *N* (*t*) ∝ *e*^{αt}, in many countries. We analyze the rate *α* in different countries, choosing as a starting point in each country the first day with 30 cases and fitting for the following 12 days, capturing thus the early exponential growth in a rather homogeneous way. We look for a link between the rate *α* and the average temperature *T* of each country, in the month of the epidemic growth. We analyze a *base* set of 42 countries, which developed the epidemic at an earlier stage, an *intermediate* set of 88 countries and an *extended* set of 125 countries, which developed the epidemic more recently. Fitting with a linear behavior *α*(*T*), we find increasing evidence in the three datasets for a decreasing growth rate as a function of *T*, at 99.66%C.L., 99.86%C.L. and 99.99995% C.L. (*p*-value 5 10^{−7}, or 5*σ* detection) in the *base, intermediate* and *extended* dataset, respectively. The doubling time is expected to increase by 40% 50%, going from 5° C to 25° C. In the *base* set, going beyond a linear model, a peak at about (7.7 ± 3.6)°*C* seems to be present in the data, but such evidence disappears for the larger datasets. Moreover we have analyzed the possible existence of a bias: poor countries, typically *−* located in warm regions, might have less intense testing. By excluding countries below a given GDP per capita from the dataset, we find that this affects our conclusions only slightly and only for the *extended* dataset. The significance always remains high, with a *p*-value of about 10^{−3} 10^{−4} or less. Our findings give hope that, for northern hemisphere countries, the growth rate should significantly decrease as a result of both warmer weather and lockdown policies. In general the propagation should be hopefully stopped by strong lockdown, testing and tracking policies, before the arrival of the next cold season.

## I. INTRODUCTION

The recent coronavirus (COVID-19) pandemic is having a major effect in many countries, which needs to be faced with the highest degree of scrutiny. An important piece of information is whether the growth rate of the confirmed cases among the population could decrease with increasing temperature. Experimental research on related viruses found indeed a decrease at high temperature and humidity [1]. We try to address this question using available epidemiological data. A similar analysis for the data from January 20 to February 4, 2020, among 403 different Chinese cities, was performed in [2] and similar studies were recently performed in [3–7]. The paper is organized as follows. In section II we explain our methods, in section III we show the results of our analysis and in section IV we draw our conclusions.

## II. METHOD

We start our analysis from the empirical observation that the data for the coronavirus disease in many different countries follow a common pattern: once the number of confirmed cases reaches order 10 there is a very rapid subsequent growth, which is well fit by an exponential behavior. The latter is typically a good approximation for the following couple of weeks and, after this stage of *free* propagation, the exponential growth typically gradually slows down, probably due to other effects, such as: lockdown policies from governments, a higher degree of awareness in the population or the tracking and isolation of the positive cases.

Our aim is to see whether the temperature of the environment has an effect, and for this purpose we choose to analyze the first stage of *free* propagation in a selected sample of countries. We choose our sample using the following rules:

we start analyzing data from the first day in which the number of cases in a given country reaches a reference number

*N*_{i}, which we choose to be*N*_{i}= 30 [8];we include only countries with at least 12 days of data, after this starting point.

The data were collected from [9]. We then fit the data for each country with a simple exponential curve *N* (*t*) = *N*_{0} *e*^{αt}, with *v*2 parameters, *N*_{0} and *α*; here *t* is in units of days. In the fit we used Poissonian errors, given by , on the daily counting of cases. We associated then to each country an average temperature *T*, for the relevant weeks, which we took from [10]. More precisely: if for a given country the average *T* is tabulated only for its capital city, we directly used such a value. If, instead, more cities are present for a given country, we used an average of the temperatures of the main cities, weighted by their population [11]. For most countries we used the average temperature for the month of March, with a few exceptions [12].

We analyzed three datasets. A first list of countries was selected on March 26th. The list of such *base* dataset includes 42 countries: Argentina, Australia, Belgium, Brazil, Canada, Chile, China, Czech Republic, Denmark, Egypt, Finland, France, Germany, Greece, Iceland, India, Indonesia, Iran, Ireland, Israel, Italy, Lebanon, Japan, Malaysia, Netherlands, Norway, Philippines, Poland, Portugal, Romania, Saudi Arabia, Singapore, Slovenia, South Korea, Spain, Sweden, Switzerland, Taiwan, Thailand, United Arab Emirates, United Kingdom, U.S.A..

An additional set of countries was added to the first dataset on April 1st, reaching a total of 88 countries. The added countries, in this *intermediate* set, are: Albania, Andorra, Algeria, Armenia, Austria, Bahrain, Bosnia and Herzegovina, Brunei, Bulgaria, Burkina Faso, Cambodia, Colombia, Costa Rica, Croatia, Cyprus, Dominican Republic, Ecuador, Estonia, Hungary, Iraq, Jordan, Kazakhstan, Kuwait, Latvia, Lithuania, Luxembourg, Malta, Mexico, Moldova, Morocco, New Zealand, North Macedonia, Oman, Panama, Pakistan, Peru, Qatar, Russia, Senegal, Serbia, Slovakia, South Africa, Tunisia, Turkey, Ukraine, Uruguay, Vietnam.

Finally an *extended* set has been studied on April 14th [13], adding the following countries to the previous dataset: Belarus, Bolivia, Cameroon, Congo, Cote d’Ivoire, Cuba, Democratic Republic of Congo, Djibouti, El Salvador, Georgia, Ghana, Guatemala, Guinea, Honduras, Jamaica, Kenya, Kosovo, Kyrgyzstan, Madagascar, Mali, Mauritius, Montenegro, Niger, Nigeria, Paraguay, Puerto Rico, Rwanda, Sri Lanka, Togo, Trinidad and Tobago, Uganda, Uzbekistan, Venezuela, Zambia.

Using such datasets for *α* and *T* for each country, we fit with two functions *α*(*T*), as explained in the next section. Note that the statistical errors on the *α* parameters, considering Poissonian errors on the daily counting of cases, are typically much smaller than the spread of the values of *α* among the various countries. This is due to systematic effects, which are dominant, as we will discuss later on. For this reason we disregarded statistical errors on *α*. The analysis was done using the software *Mathematica*, from Wolfram Research, Inc..

## III. RESULTS

We first fit the *base* dataset, with a simple linear function *α*(*T*) = *α*_{0} + *β T*, to look for an overall decreasing behavior. Results for the best fit, together with our data points, are shown in fig. 1. The estimate, standard deviation, confidence intervals for the parameters, together with the significance and the explained variance, *R*^{2}, are shown in Table I. From such results a clear decreasing trend is visible, and indeed the slope *β* is negative, at 99.66% C.L. (*p*-value 0.0034).

However, the linear fit is able to explain only a small part of the variance of the data, with *R*^{2} = 0.196, and its adjusted value , clearly due to the presence of many more factors.

In addition, a decreasing trend is also visible in this dataset, below about 10*°C*. For this reason we also fit with a quadratic function *α*(*T*) = *α*_{0} *− β*(*T − T*_{M})^{2}. Results for the quadratic best fit are presented in fig. 2 and in Table II. From such results a peak is visible at around *T*_{M} *≈* 8°*C*. The quadratic model is able to explain a slightly larger part of the variance of the data, since *R*^{2} 0.27 [14]. Moreover, despite the presence of an extra parameter, one may quantify the improvement of the fit, using for instance the Akaike Information Criterion (AIC) for model comparison, ΔAIC 2Δ*k* 2Δ ln(ℒ), where Δ*k* is the increase in the number of parameters, compared to the simple linear model, and Δ ln(ℒ) is the change in the maximum log-likelihood between the two models. This gives ΔAIC = 2.1, slightly in favor of the quadratic model.

We repeated then the same analysis for the *intermediate* dataset of 88 countries and for the *extended* dataset of 125 countries. Results for the linear fit of the *intermediate* sample are shown in fig. 3 and in Table III. The slope *β* is smaller in absolute value, but the significance actually slightly increases, since a zero slope is excluded at 99.86% C.L. (*p*-value 0.0014). Now *R*^{2} = 0.19 .

In this sample the quadratic trend is not visible anymore, and indeed the AIC does not prefer the quadratic fit: ΔAIC = +0.9 compared to the linear fit, in disfavor of the quadratic model. The *R*^{2} is also practically the same as in the linear fit.

For the *extended* sample results of the linear fit are shown in fig. 4 and in Table IV. The slope *β* becomes larger and, most importantly, the significance highly increases, since a zero slope is now excluded at 99.99995% C.L. (*p*-value 5·10^{−7}, or 5*σ* detection, translated in the language of a Gaussian adjusted distribution). Now *R*^{2} = 0.19 and .

In this dataset, which extends to April 14th, a few anomalies are however present: in the case of Bangladesh and Thailand it is possible to see that the exponential growth became much faster after the initial 12 days. We have checked what happens by using a different interval of time for these 2 cases, instead of the standard 12 days. Namely we have used 44 days for Thailand and 21 days for Bangladesh, which give the maximal value of *α* in both cases. The results for the linear fits using such corrected values is shown in Table V. The significance is lower, but still very high: *p*-value 4.6 10^{−6}, or 4.6*σ* detection, translated in the language of a Gaussian distribution.

Finally we have tested the existence of a possible bias on the data: the fact that poor countries have less intense testing. This could in principle be a source of major bias, since many countries with low income are located in warm regions. In order to discard such a bias we have analyzed the existence of a nonzero linear correlation *β* on subsamples of the *extended* dataset, by excluding countries with low income. More specifically we have set a threshold on the GDP per capita [15], and checked whether the correlation is still there, excluding countries below such a threshold from the analysis. We show in Fig. 5 our results: we find a correlation to exist, rather independently on the threshold that we applied. The significance of a nonzero beta (*p*-value) is plotted in Fig. 6 and remains always between 5 10^{−7} and 8 10^{−4}.

In addition, we have also checked for a correlation between the growth rate *α* and the GDP per capita, shortly *GDP*. We find *no* significant correlation in the *base* and *intermediate* datasets, while we find a negative correlation in the *extended* dataset, with *p*-value = 0.0012. This is not so surprising, since the *extended* dataset contains many low-income countries, where the disease has arrived later, and where most likely testing is not intense enough. For this dataset we performed thus a linear fit with two variables, *GDP* and *T*. Results are shown in Table VI. The dependence on *T* is still highly significant, with *p*-value ≃ 0.000048 and the best-estimate is *β* ≃ − 0.0031. As expected, *T* also has non-negligible correlation with the GDP per capita.

## IV. DISCUSSION AND CONCLUSIONS

We have collected data for countries that had at least 12 days of data after a starting point, which we fixed to be at the threshold of 30 confirmed cases. We considered three datates: a *base* dataset with 42 countries, collected on March 26th, an *intermediate* dataset with a total of 88 countries, collected on April 1st, and an *extended* dataset with a total of 125 countries, collected on April 14th. We have fit the data for each country with an exponential and extracted the exponents *α*, for each country. Then we have analyzed such exponents as a function of the temperature *T*, using the average temperature for the month of March (or slightly earlier in some cases), for each of the selected countries.

For the *base* dataset we have shown that the growth rate of the transmission of the COVID-19 has a decreasing trend, as a function of *T*, at 99.66% C.L. (*p*-value 0.0034). In this fit *R*^{2} = 0.196. In addition, using a quadratic fit, we have shown that a peak of maximal transmission seems to be present in this dataset at around (7.7 ± 3.6)°*C*. Such findings are in good agreement with a similar study, performed for Chinese cities [2], which also finds the existence of an analogous peak and an overall decreasing trend. Other similar recent studies [3-6] find results which seem to be also in qualitative agreement.

For the *intermediate* dataset we also found a decreasing slope *β*. This is smaller in absolute value, but the significance remains high, since a zero slope is excluded at 99.86% C.L. (*p*-value 0.0014). For this fit we found *R*^{2} = 0.11.

Finally for the *extended* dataset we found a very highly significance for a negative *β, p*-value 5 · 10^{−6} ∼ 5 · 10^{−7} (depending on the treatment of some anomalous cases), which would translate in a 4.5*σ* 5*σ* detection, in the language of Gaussian distributions. Here *R*^{2} = 0.16 0.2.

For all datasets we also tested the influence of a possible large bias: the fact that poorer countries have less intense testing, which might be in principle partially degenerate with effects of temperature. Our analysis indicate that this should not be a major issue: by excluding countries with low income from the analysis we find small variations on the best-fit value of *β*, and the significance of the correlation *β* remains very high, with *p*-value 8· 10^{−4} or less. We have also checked for a correlation between the GDP per capita and *α*: we find a significant correlation only in the *extended* dataset. This should be probably interpreted as the fact that poorer countries do not have enough testing capabilities. However, after taking into account of this variable, the dependence on *T* remains highly significant.

The decrease at high temperatures is expected, since the same happens also for other coronaviruses [1]. It is unclear instead how to interpret the decrease at low temperature (less than 8°*C*), present in the *base* dataset. This could be a statistical fluctuation, since it is not present in the *intermediate* and *extended* datasets. One possible reason for this decrease, if real, could be the lower degree of interaction among people in countries with very low temperatures, which could slow down the propagation of the virus.

A general observation is also that a large scatter in the residual data is present, clearly due to many other systematic factors, such as variations in the methods and resources used for collecting data and variations in the amount of social interactions, due to cultural reasons. Further study is required to assess the existence and the relevance of such factors.

As a final remark, our findings can be very useful for policy makers, since they support the expectation that with growing temperatures the coronavirus crisis should become milder in the coming few months, for countries in the Northern Hemisphere. As an example the estimated doubling time, with the quadratic fit, at the peak temperature of 7.7°*C* is of 2.6 days, while at 26°*C* is expected to go to about 4.6 days. The linear fit implies an increase in the doubling time by 50% (or 40%), going from 5° C to 25° C., using the estimate from the *extended* dataset (or the *extended* dataset, taking into account of the GDP per capita, at a reference value of 40 thousand dollars). For countries with seasonal variations in the Southern Hemisphere, instead, this should give motivation to implement strong lockdown policies before the arrival of the cold season.

We stress that, in general, it is important to fully stop the propagation, using strong lockdown, testing and tracking policies, taking also advantage of the warmer season, and before the arrival of the next cold season.

## Data Availability

Data are publicly available.

## Acknowledgments

We would like to acknowledge Viviana Acquaviva, Alberto Belloni, Ángel J. Gómez PelÁez, Jordi Miralda and Giorgio Torrieri, for useful discussions and comments.