Patterns of SARS-CoV-2 exposure and mortality suggest endemic infections, in addition to space and population factors, shape dynamics across countries

Some countries have been crippled by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic while others have emerged with few infections and fatalities; the factors underscoring this macro-epidemiological variation is one of the mysteries of this global catastrophe. Variation in immune responses influence SARS-CoV-2 transmission and mortality, and factors shaping this variation at the country level, in addition to other socio-ecological drivers, may be important. Here, we construct spatially explicit Bayesian models that combine data on prevalence of endemic diseases and other socio-ecological characteristics to quantify patterns of confirmed deaths and cases across the globe before mass vaccination. We find that the prevalence of parasitic worms, human immunodeficiency virus and malaria play a surprisingly important role in predicting country-level SARS-CoV-2 patterns. When combined with factors such as population density, our models predict 63% (56-67) and 76% (69-81) of confirmed cases and deaths among countries, respectively. While our findings at this macro-scale are necessarily associative, they highlight a need for studies to consider factors, such as infection by other pathogens, on global SARS-CoV-2 dynamics. These relationships are vital for developing countries that already have the highest burden of endemic disease and are becoming the most affected by the SARS-CoV-2 pandemic.


Introduction
non-endemic areas [16]. Moreover, it may not be exposure to one particular pathogen that 84 increases or decreases the risk of SARS-CoV-2 infection or mortality, but the diversity of 85 pathogens endemic in a country that is important in shaping immune responses and the 86 overall SARS CoV-2 pattern (i.e., exposure to a diversity of pathogens may damp down 87 immune responses [19]) or prevent disease through immune memory. Given the potential 88 importance of exposure history and coinfection, the exclusion of these variables from most 89 macroecological models of SARS-CoV-2 is an important oversight. 90 The ability to distinguish meaningful relationships from spatial artefacts among 91 macroecological predictors of SARS-CoV-2 dynamics is important. For example, national 92 pathogen prevalence and diversity is tied to climate and economic variables. Countries closer 93 in space may have similar numbers of cases due to the increased chances of exporting the 94 virus across borders [20]. International air travel is also known to facilitate SARS-CoV-2 95 spread between countries [21], and countries with high connectivity with other countries may 96 have increased numbers of introduction events. Further, while sharing health resources 97 among neighbouring communities could reduce mortality rates, countries with health systems 98 overwhelmed with COVID cases have few resources to share. High connectivity between 99 neighbouring countries could also facilitate the spread of the virus and increase mortality. 100 Age is also a well-known risk factor, so countries with younger populations are likely to have 101 experienced reduced mortality than countries with higher mean ages [e.g., 1]. Country 102 average age may also impact cases as there is a well-known bias in testing older 103 demographics [22,23]. Economic variables such as gross domestic product (GDP) and health 104 spending can influence the number of detected cases and deaths [24] as well as many possible 105 predicting factors like exposure history and population demographics.
Here we use global data to construct Bayesian generalised additive models (GAMs) to 107 untangle the connections between factors shaping the number of SARS-CoV-2 cases and 108 deaths across countries. Results from our macroecological models can inform mechanistic 109 research (i.e., hypothesis generating) to help understand varying vulnerabilities to this 110 pressing global disaster.

112
Data retrieval 113 We extracted confirmed cases and confirmed deaths (per million people, hereafter 'cases' and 114 'deaths') for each country from the World Health Organization (WHO) on the 26th of 115 February 2021 before widespread vaccination (https://covid19.who.int/). In addition, we 116 accessed the number of tests per country (per thousand people) for the same period [25]. We 117 also used the WHO transmission classification scheme (i.e., countries with community 118 transmission, clusters of cases only, sporadic cases and no cases) to account for differences in 119 control success. 120 We downloaded the mean estimate of prevalence of 14 endemic diseases in each country per 121 country from the Institute for Health Metrics and Evaluation (IHME) GBD (Global Burden of 122 Disease) database (see Table S1). Using this prevalence data, we calculated endemic 123 pathogen diversity using the inverse Simpson's metric [26]. To measure overall infectious CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)  Bayesian modelling 151 We specified a separate model for each country-level response variable: cases per million 152 people and deaths per million people. We used the same initial set of variables for both 153 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 16, 2021. ; models, except that 'cases' was added to the deaths model. We modelled the (count) data for 154 both response variables using a (log-link) negative binomial distribution-an overdispersed 155 and robust generalisation of the variables, we applied a small penalty to the linear term(s) of the spline basis, permitting the 168 whole term to be shrunk to zero [38]. Next, we removed all variables whose smooth functions 169 were consistent with zero slope, and compared the performance of the reduced model to the 170 full model using LOO. Finally, to account for heteroskedasticity, we added region-level terms 171 to the modelled shape (variance) parameter, using LOO and posterior predictive checks to 172 assess the merits of the added terms. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 16, 2021. ; posterior predictive distribution to identify countries where our model was not adequate (i.e., 178 the observation was outside the 95% CI). Our complete workflow and data are available on 179 github (https://github.com/nfj1380/covid19_macroecology).

181
Our models reveal the importance of endemic pathogen prevalence, space and demography in 182 explaining the number of cases and deaths for countries across the globe. For each model, we 183 used posterior predictive checks to verify that our models could adequately predict the 184 observed data (Fig. S1), and Bayesian R 2 estimates as a goodness-of-fit measure (case model:  Table S2 for LOO model comparison results. 195 We excluded GDP as it was highly collinear (ρ >0.7, see Fig. S2) with health spending. We 196 excluded air connectivity as it was highly positively correlated with mean age and the spatial 197 network (i.e., countries with high air connectivity were close to each other and had a similar 198 mean age). We also excluded pathogen diversity as it was strongly negatively correlated with 199 mean age (i.e., countries with high pathogen diversity had lower mean ages), and the 200 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 16, 2021. ; proportion older than 65 years in the population as it was positively associated with mean age 201 (Fig, S2). 1d) with this proportion lowest in central Africa (Fig. 3). In contrast, mean age had a strongly 222 fluctuating non-linear effect on SARS-CoV-2 deaths, with deaths peaking at a mean age of 223 22 and 50 yrs, and minimal when the mean age was around 35 (Fig. 2d). Population density 224 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 16, 2021. ; had a minor negative relationship with deaths for each country (Fig. 2e). Central Africa had 225 the lowest mean age and the northern latitudes of the continent had some of the lowest human 226 population density values (Fig. 3j).

227
There was a strong positive, approximately linear, relationship between SARS-CoV-2 cases 228 and deaths in each country (Fig. 2f). The cases in the spatial neighbourhood also showed a 229 positive, mostly linear fit in both models (Figs. 1e/2h). We did not find a conditional effect of 230 estimated tests on SARS-CoV-2 cases. However, there was some variation in cases explained 231 by region, with the Western Pacific and Africa having a lower baseline (region-level 232 intercept) than the other regions after controlling for all other variables in the model (Fig. 1f).

233
For the SARS-CoV-2 deaths models, regional variation of the baseline was negligible ( Fig.   234 2g). For both models, the inclusion of region-level terms to the modelled variance (Poisson 235 overdispersion) significantly improved predictive performance (Table S2)  CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 16, 2021. ;  Table S1 for variable details, Fig. 3 for the spatial distribution of model  CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 16, 2021. ;    Overall, our models predicted the number of SARS-CoV-2 cases and deaths well, with model 277 estimates from only eight countries out of the 181 included being outside our 95% credible 278 interval (Fig. 4). The uncertainty of the modelled predictions varied by region and response 279 (Fig. 4). European estimates of SARS-CoV-2 cases and deaths had the highest precision (i.e., 280 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 16, 2021. ; smallest credible intervals) in our models and estimates with the lowest precision included 281 the Western Pacific (e.g., China, Australia, Cambodia) and South-East Asian WHO regions 282 (Fig. 4, Fig. S4). The precision of our model estimates of cases for the African region were 283 comparable to Europe, the Americas and Eastern Mediterranean (Fig. 4, Fig. S4). Tanzania    CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 16, 2021. ; SARS-CoV-2 [11,13], it is plausible that our population-level results reflect individual-level have the highest estimated prevalence of HIV in the world (Fig. 3). Further, there is evidence 329 that malaria can also shape SARS-CoV-2 dynamics by reducing the severity of the disease.

330
For example, a study of healthcare workers in India found that patient recovery from SARS-  CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 16, 2021. ; modulatory effects of helminth infections. Areas of the Amazon in Brazil, for example, have 349 much higher COVID-19 death rates compared to the rest of the country, but also have high   CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 16, 2021. ; cities that are present for other infectious diseases (sanitation, access to economic and 374 medical resources) appear to be outweighed by the facilitated SARS-CoV-2.

375
Interestingly we found a weak pattern of a mean age on SARS-CoV-2 cases and deaths, 376 unlike what has been found in other smaller scale analyses [53]. Other country-level models 377 have used the proportion of population >70 as a predictor in similar models, but we found 378 that mean age was strongly correlated with proportion over 65 (ρ = 0.9). Our analysis differs 379 from others because we have attempted to account for and model spatial autocorrelation and 380 regional differences (i.e., cases in one country are shaped by the number of cases in 381 neighbouring countries). The spatial neighbourhood predictor was quite strongly correlated to 382 mean age (ρ= 0.63), and this may have accounted for mean age variation in our models (i.e., CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 16, 2021. ; case data is of utility as COVID-19 deaths are generally better reported than cases [58]. The 398 weak relationship between cases and tests is not surprising as it has been demonstrated that 399 regional differences can vary by country [1]. Moreover, while we assembled a diverse set of 400 predictors, this dataset is not comprehensive. We aimed to maximise the number of countries 401 we included in the analysis without introducing large amounts of missing data. However, the 402 predictive performance of our models was surprisingly high, and few model estimates of 403 cases and deaths not including the observed values for each country (Fig. 4). Countries such    CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted

11
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 16, 2021. values across all other variables were removed from the analysis.

25
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 16, 2021. ; https://doi.org/10.1101/2021.07.12.21260394 doi: medRxiv preprint Table S2: Leave-one-out (LOO) model comparison results for both models. Model 26 performance is quantified using expected log pointwise predictive density (ELPD), 27 expressed here as the relative estimates (ELPD) with respect to the best performing model.

28
The performance of an alternative model is deemed comparable to the best model if the mean 29 estimate lies within approximately one standard error of ELPD = 0. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 16, 2021. ; https://doi.org/10.1101/2021.07.12.21260394 doi: medRxiv preprint 8 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 16, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 16, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 16, 2021.