COVID-19 mortality: positive correlation with cloudiness and sunlight but no correlation with latitude in Europe

We systematically investigated an ongoing debate about the possible correlation between SARS-CoV-2 (COVID-19) epidemiological outcomes and solar exposure in European countries, in the period of March - August 2020. For each country, we correlated its mortality data with solar insolation (watt/square metre) and objective sky cloudiness (as cloud fraction) derived from satellite weather data. We found a positive correlation between the monthly mortality rate and the overall cloudiness in that month (Pearson's r(35)=.779, P<.001; linear model fitting the data, adjusted R2 =0.59). In Europe, in colder months, approximately 34% to 58% of the variance in COVID-19 mortality/million appears to be predicted by the cloudiness fraction of the sky, except in August in which only ~15% of the variance was explained. The data show a low, negative correlation between the mortality rate with the overall insolation received by the country area in that entire month (Pearson's r(35)=-0.622, P<.001). Additionally, we did not find any statistically significant correlation between the mortality and the latitude of the countries when the "latitude of a country" was precisely defined as the average landmass location (country centroid). The unexpected correlation found between cloudiness and mortality could perhaps be explained by the following: 1) heavy cloudiness is linked with colder outdoor surfaces, which might aid virus survival; 2) reduced evaporation rate; 3) moderate pollution may be linked to both cloudiness and mortality; and 4) large-scale behavioural changes due to cloudiness (which perhaps drives people to spend more time indoors and thus facilitates indoor contamination).


Introduction
This study explores the influence of previously ignored climatological factors (objective cloudiness and solar irradiation) on SARS-CoV-2 (COVID-19) mortality in Europe.
The factors that might influence the spread and impact of COVID-19 have recently been extensively discussed in the media and in research articles. Coronaviruses spread in the population due to a combination of medical, biological and socio-economic factors. It has also been suggested that latitude [1,2] and climate (temperature, humidity [3,4]) significantly contribute to COVID-19 spread.
A copious amount of (mainly local) studies (some not yet peer-reviewed) on climatic influence on COVID-19 has been released, with mixed or contradictory results. The World Meteorological Organization even hosted an international virtual symposium on the issue to deal with these uncertainties ("Climatological, Meteorological and Environmental factors on COVID-19 pandemic", 4-6 August 2020).
Initially, (March -April 2020), the transmission of COVID-19 seemed to be associated with the 30-to 50-degree North longitude corridor and weather patterns and low specific and absolute humidity [2]; however this could just as plausibly reflect trade and human movement patterns in the Northern hemisphere. Late in the spring of 2020, tropical and subtropical countries began to see an increase of pandemic transmission, and the latitude dependency thus appears to be unsure at this point.
It has also been previously noted that COVID-19 had a higher impact in countries where epidemiological data had shown a degree of vitamin D deficiency in the population [1]. Vitamin D has a wide range of immunomodulatory, anti-inflammatory and antioxidant properties, and it thus seems (presumably) protective. In vivo vitamin D synthesis is photochemically dependent in humans, and its concentration levels drop without sufficient sunlight exposure. As the latitude increases, the amount of locally received sunlight generally decreases. These two facts were combined by other researchers in a range of compelling clinical hypotheses linking susceptibility to COVID-19 to vitamin D and indirectly to lack of sunlight exposure or increased latitude (see for example [5] or [6] for a more thorough perspective on these issues and [7] for a biochemical study).
In sharp contrast to the above hypotheses, other biochemical studies have not to found a correlation between vitamin D and COVID-19 epidemiological data (see [8] or [9]). These studies concluded that other unknown factors might be at play.
We tried to address these contradictions from an analytical, biophysical point of view: the amount of UV radiation that reaches the ground (and is thus presumably protective) is a fraction of total sunlight (which also contains visible light and infrared radiation). The total sunlight (known also as "solar insolation" or "solar irradiance at ground level") is defined as the flux of solar radiation per unit of horizontal area for a given location. It depends on several factors: primarily on solar zenith angle (which depends on latitude), secondarily on atmospheric composition (via absorption and scattering), and thirdly on seasonal change (due to the Earth's axial tilt). These are well known and researched topics; see for example [10] for an extensive review. Solar irradiance is one of the main factors that determine the temperature at ground level (with an almost linear dependence [11]), along with humidity (by inducing water evaporation and soil dryness) and the climate in general [12]). Solar irradiance is expressed in watts/square metre. Solar irradiance at ground level is heavily influenced by atmospheric composition, primarily by clouds [13] and secondarily by other factors such as dust, pollutants, and humidity [14]. Clouds exert a complex influence: by reflection of the sunlight (back into space) they reduce the amount of total energy that reaches the earth but by scattering they can counterintuitively direct some of the energy back to land, especially in the UV portion of the spectrum, thus modulating the biologically effective radiation dose received by living things [15].
In this study, we present detailed research on the direct influence on COVID-19 mortality in European countries of the above-discussed factors: a) sky cloudiness, b) solar irradiance and c) latitude. Our results show that there is a sizeable influence due to cloudiness, a smaller influence due to absolute amount of sunlight, and basically no influence due to latitude.

Epidemiological data
Different epidemiological variables about COVID-19 epidemics are collected and reported around the world. We used the "Coronavirus update", a publicly available epidemiological database summarized by Worldometers [16]; monthly data snapshots of it were retrieved from the Internet Archive Organization [17]. From this database we extracted the mortality, defined as the number of deaths per 1 million inhabitants in a given country in a particular month. There are two issues with this approach that might impact our study. First, there are legal and practical differences among countries regarding death recording and reporting [18]. The COVID-19 database we used records the deaths as reported by local authorities; we could not quantitatively assess the differences between the countries, which could impact comparison of different countries. Second, the time lag between the actual death and the reported time could be different for each case; the monthly intervals analysed probably average most of the differences.

Atmospheric cloudiness data
Cloudiness (also known as cloud fraction, cloud cover, cloud amount or sky cover) refers to the fraction of the sky obscured by clouds (in a particular location). It can be reported in various units. In this study, we used the cloud fraction (as tenths of the entire sky); 0.0 thus indicates a clear sky and 1.0 (or 10/10) indicates a completely covered sky.
We used a publicly available dataset of global cloudiness as measured from space by NASA's Terra and Aqua satellites using the MODIS instrument (Moderate Resolution Imaging Spectroradiometer) [19]. This dataset is collected continuously and presented as values averaged daily, weekly and monthly for the entire globe; for this study we chose the monthly averaged values. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 29, 2021. ; In this dataset the entire Earth surface is divided into a rectangular grid. Each rectangle of the grid contains the average cloud fraction of the sky covering that area. The data are available at different resolutions (sizes of the rectangular grid that covers the Earth). We chose a fairly detailed resolution of the grid, 0.25° latitude x 0.25° longitude. We ensured that the other geographical datasets that we used in this study matched the same spatial resolution.

Solar data
The average solar insolation (also known as solar irradiance, solar exposure, incoming sunlight) in watts/square metre at the Earth's surface was used in this study. We used a publicly available dataset inferred from measurements taken by Clouds and Earth's Radiant Energy System (CERES) instrument flying aboard NASA's Terra and Aqua satellites [20]. We used the same temporal and spatial sampling as presented above, i.e., monthly averaged values over a 0.25° latitude x 0.25° longitude grid.

Geographical data
For a single point on the Earth, the geographical coordinates are straightforward (latitude and longitude). However, for a country (or large region) the coordinates can be represented in several distinct ways, each with pros and cons, such as the location of the capital city, the location of the most populated city, the average between the most extreme points of the country,and the country centroid, among others.
As the "latitude of a country" measure, we used the country centroid. A centroid (also known as the centre of gravity or centre of mass) is the arithmetic mean of the positions of all the points in a geometrical object; for irregular objects, it is closest to the centre of the biggest part of the object (it is less influenced by very thin or heavily scattered boundaries). We chose this measure because several countries have highly irregular geometrical boundaries or long thin peninsulas or numerous islands (i.e., Greece, Norway, etc.); the country centroid is located closer to the widest area of the mainland. We used a standard database of countries centroids published by Google Maps developers [21].
As the "country boundary" measure we chose to use a simplified representation of the country border, the country bounding box, which is a rectangle on the surface of the Earth with North and South edges corresponding to the limiting (max and min) latitudes of the country and West and East edges corresponding to the limiting longitudes. We used a database of bounding boxes of all countries published by the Center of Humanitarian Data [22].
We are aware that the use of the country bounding box simplification can induce a degree of imprecision (smoothing) or overlapping errors. To minimize these errors, we carefully curated the bounding box coordinates to exclude non-mainland portions. We did this because we wanted to restrict the analysis to the land portion of Europe (for example, overseas remote islands such as Svalbard would include almost 600 sq km of Arctic Ocean surface in Norway's geographical definition). We checked this by comparing the centroid location of each country with the average geometric centre of its bounding box, making sure that the error was less than 1 degree of longitude or latitude. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

Inclusion and exclusion criteria
The copyright holder for this this version posted January 29, 2021. ; The inclusion criteria were as follows: a) all countries on the European continent; b) availability of epidemiological and geographical (cloudiness, insolation) data.
The exclusion criteria were as follows: a) population<0.5 million and b) a geographical bounding box of the country smaller than 0.25° latitude x 0.25° longitude. We used these criteria because European micro-states (Monaco, Vatican, San Marino, etc.) were too small to be properly sampled from the available resolution of geospatial data (insolation, cloudiness).
Russia was also excluded from analysis because the COVID-19 epidemiological data from the country was available only in an aggregate form (i.e., no data were available detailing the epidemiology in European and Asian parts of Russia).
In this way we obtained a list of 37

Calculation
Most of the European countries reported the first deaths in March 2020 (see Supplementary material, Figure S.1), and we used this month as the starting point for our analysis.
We employed multiple linear regression models with mortality as the response (it is log10 transformed in all the following analyses and figures). As predictor variables we used cloud fraction and solar insolation, both averaged (for the March -August interval). For each one of the reported models we checked their validity by analysing the normality of the residuals as follows: normal Q-Q plot, presence of outliers, and homogeneity of the variance of residuals (we assumed homoscedasticity if the result of the studentized Breusch-Pagan test was bigger than 0.05). We attempted a brief time-series analysis of the factors; we checked the distribution of the data in time and the autocorrelation of the disturbances with a Durbin-Watson test.

5
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 29, 2021. ; For each country, in the interval March -August 2020, we averaged the monthly cloudiness and the monthly mortality rate, and we built a linear model (Figure 1). It shows a modest but statistically significant correlation between the average mortality rate and the average sky cloudiness (Pearson's r(35)=.779, P<.001).

Overall Mortality vs. Cloudiness
The linear model that fits the data shown in Figure one is presented in Table 1. In continental Europe, from the beginning of the epidemic, approximately 59% of the variance in COVID-19 mortality/million appears to be predicted by the cloudiness fraction of the sky. The model passes the quality criteria listed in the Calculation section, and the residuals of the model 6 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250658 doi: medRxiv preprint appear to be homoscedastic, Breusch-Pagan test P-value=0.893). The dataset shows a low, negative correlation between the mortality rate (evaluated at the end of a month) and the overall insolation received by the country area in that entire month (Pearson's r(35)=-0.622, P<.001). Note that some of the data points are missing (for the month of August 2020 the solar irradiance data were not yet available at the time of writing).

Overall Mortality vs. Insolation
The linear model that fit this dataset explains only approximately 37% of the variance of the data; the regression model did not hold after a close inspection of the distribution of residuals. It passed the Breusch-Pagan test (P=0.11), but the analysis of the residuals vs. leverage 7 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 29, 2021. ; revealed some issues with the model: Portugal is a high leverage point (with a high Cook's distance); its exclusion from the analysis would significantly change the model. Quantilequantile (Q-Q) plot analysis indicates that both Portugal and Belgium have more extreme values that would be expected from the rest of the data points. We plotted the regression line in Figure 2 for the completeness of the graphic. This seems to strongly suggest that there are some highly divergent confounding factors in this regression model of mortality vs. insolation.
As we noted in the Introduction, it is well known that the solar irradiance at the ground level is primarily influenced by clouds and secondarily by other factors. We verified that this fact is indeed true in the analysed dataset: the insolation vs. cloud fraction has Pearson's r(183)=-0.804, P<.001, and the adjusted R2=0.654, for all the raw data points. Similar results were obtained if we analysed the averaged data for each country over the March-August interval: Pearson's r(35)=-0.838, P<0.001, adjusted R2=0.695. The other confounding factors that influenced the variation are unknown, and we tried to address some of the possibilities in the Discussion section. A combined model of mortality vs. cloudiness and insolation (with or without interaction) did not significantly change the above results.

Time-series analysis
We also wanted to make sure that the above results were not the result of chronological influence over the chosen factors. As a preliminary check, we verified the independent variation in time of these variables; these are reported in Supplementary material, Figures S.4 -S.6. The overall mortality rate ( Figure S.4) increases continuously. The cloudiness seem to be randomly distributed with no discernible pattern in the 6-month interval analysed (Figure S.5); this is expected because the seasonal temporal variability in the cloud cover is usually ~30%, with bigger differences between extreme seasons [28]. The average solar insolation in European countries month-by-month (Figure S.6) follows the expected sinusoidal pattern due to seasonal changes [10].
After these checks we verified the correlations of these factors (cloud fraction, insolation) with mortality at monthly intervals ( Figure 3). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint For the dataset analysed we found no consistent correlation between the insolation and mortality in the month-by-month analysis. The Pearson's r values were negative (with the exception of March) but the models did not pass the stringent criteria defined in the Calculation section. The dataset (Figure 4) shows no correlation between the mortality rate and the latitude of the European countries for the entire interval studied (Pearson's r(35)=.06, P=0.72). We used the "country centroid" [21] latitude as the single number that defined the latitude of a country (as discussed in the Methods section).

10
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250658 doi: medRxiv preprint

Discussion
Contrary to some previously reported results (for example [2,5]) we did not find a correlation between latitude and mortality in the studied European countries. This result is possibly because we used the rather stringent centroid definition of the coordinates of countries, rather than a broad geographical area [2] or the coordinates of the capitals [1].
We also found a very low correlation (and negative) between the measured amount of sunlight and mortality that was not consistent month-by-month. This result tends to support the previously published idea that a greater amount of sunlight might have some beneficial effects [29]. However the lack of month-by-month consistency and the low quality of the regression model suggests that the absolute amount of sunlight does not seem to have an impact on mortality.
Together, these two results seem to suggest that the mortality-vitamin D hypothesis (lower sunlight at higher latitudes leads to lower vitamin D levels, which leads to higher mortality) is perhaps incomplete. As a note of interest, other studies found that serum vitamin D correlates stronger with skin tone than with latitude (higher melanin content effectively prevents UV photons from initiating synthesis); see for example Åkeson et al. [30] for a study of vitamin D variation at a single latitude (Sweeden), Martin et al. [31] for a general meta-analysis, and a review of vitamin D and COVID-19 [32].
Unexpectedly, we found that the cloudiness in a month (the monthly averaged cloud fraction) has the strongest positive correlation with COVID-19 mortality. This has been, to the best of our knowledge, unreported up to now.
We suggest the following possible explanations for this unexpected finding:

1) Heavy cloudiness is linked with colder outdoor surfaces, which might aid virus survival
A higher cloud cover quickly cools down the outdoor surfaces, especially at lower latitudes [33]. Colder surfaces facilitate the survival of SARS-Cov-2 [34]. As observed by Heneghan & Jefferson [35], the risk of incidence was higher in days with lower temperature (citing a study of a previous SARS epidemic by Lin et al, 2005 that observed an 18-fold increase in incidence in lower temperature days than in higher temperature days [36]); the plausible increase in actual incidence might thus lead to an increased number of deaths. As an easy-to-spot example for this unexpected finding, in the spring of 2020 Spain was cloudier than Norway (see Supplementary Material, Figures S.2 and S.1 for cloudiness and mortality charts).

2) Reduced evaporation rate
We speculate that in addition to lowering temperature, a cloudy sky may drive down the evaporation rate (via two mechanisms: lower solar energy reaching the ground and perhaps by increasing the relative air humidity). This might favour the stabilization of the infectious droplets and enhance viral propagation (as suggested by Sajadi [2]). This seems to be supported by the evidence that coronaviruses survive better in colder environments but contradicts the finding that higher humidity impairs the transmission [3]. This is highly speculative because in this dataset we could not check for actual humidity in the atmosphere (and to further complicate the interaction, we note that rainy clouds do increase the humidity).

11
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250658 doi: medRxiv preprint

3) Moderate pollution may be linked to both cloudiness and mortality
Even moderate pollution in the atmosphere helps clouds form [37], and it was also previously found that airborne pollution was linked to worse outcomes for the COVID-19 patients (see for example [38] for documented pollution effects in Italy; other similar studies are currently under review). Therefore, it might be that cloudiness data could be a proxy for pollution data (which is not available for all countries).

4) Behavioural changes due to cloudiness
We suggest that an additional simple hypothesis is that overly cloudy weather (a higher cloud fraction) might change behavioural choices at large scale, nudging people to spend more time indoors rather than outdoors. It was suggested that spending more time in closed environments facilitates the secondary transmission of COVID-19, thus driving up the rate of infections (Nishiura et al., 2020, under review). This explanation seems to be consistent with the observation that living in overcrowded conditions is associated with poorer outcomes [8] and with the observation that some other environmental factors are at play in addition to the lack of sun-induced vitamin D synthesis [9].
The results show a difference in the relationships of COVID-19 deaths to cloud fraction versus solar radiation. This could perhaps be explained by a higher net amount of infrared radiation that reaches the ground surface when higher clouds are present compared with the presence of lower clouds [39,40] (we thank an anonymous reviewer for this insight). Our study could not distinguish between the types of clouds constituting the cloud fraction.
We are aware that this study has limitations: we did not investigate the precipitation rate, wind velocity, air pressure, air pollution and density (these are factors that are under scrutiny for their possible impact on COVID-19 epidemiology). We acknowledge that our observational retrospective study is limited: temporal autocorrelation cannot be excluded over longer periods of time. The possibility of spatial autocorrelation was not researched by this study. Cloudiness might be a confounding factor in the previous studies that related vitamin D synthesis to latitude and sunlight. The authors support the current guidelines that in patients with vitamin D deficiency this should be treated irrespective of any link with respiratory infections. We are also aware that different countries took different governmental responses to the COVID-19 crisis, which has lead to different epidemiological outcomes. Last, we urge the reader not to extrapolate these results because we only investigated these variables in 37 European countries, not in the entire world. Additionally, this investigation was time-limited for 6 months; the story is developing, and we wait to see what it is the impact of an entire seasonal cycle. We plan to update the analysis as new data become available (see Data Availability Statements below). We hope that these results will bring a warning about the possible impact of extended cloudiness on COVID-19 transmission.

Conclusions
The data from the European continent in the spring-summer of 2020 suggest that the atmospheric cloudiness over a longer period (i.e., a month) seems to explain about a third of the variance of coronavirus (COVID-19) mortality; a higher degree of sky cloudiness in a month seems to be correlated with an increased mortality rate in that month. This knowledge 12 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 29, 2021. ; might help advise public health policies related to COVID-19 mitigation and control; we suggest increased vigilance and increased frequency of sanitation of outdoor high-risk surfaces during cloudy weather.
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250658 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250658 doi: medRxiv preprint