Buying time: an ecological survival analysis of COVID-19 spread based on the gravity model

COVID-19 has spread in a matter of months to most countries in the world. Various social and economic factors determine the time in which a pandemic reaches a country. This time is essential, because it allows countries to prepare their response. This study considered a gravity model that expressed time to first case as a function of multiple socio-economic factors. First, Kaplan-Meier analysis was performed for each variable in the model by dividing countries into two groups according to the median of the respective variable. In order to measure the effect of these variables, parameters of the gravity model were estimated using accelerated failure time (AFT) survival analysis. In the Kaplan-Meier analysis the differences between high and low value groups were significant for every variable except population. The AFT analysis determined that increased personal freedom had the largest effect on lowering the survival time, controlling for detection capacity. Higher GDP per capita and a larger population also reduced survival time, while a greater distance from the outbreak source increased it. Understanding the influence of factors affecting time to index case can help us understand disease spread in the early stages of a pandemic.


Gravity model
A gravity model for the intensity of spread can be written as follows (Viboud et al., 2006): where is the intensity of spread between communities i and j of populations and , and is the distance between communities. The parameters , 1 , 2 and are to be estimated. Greater intensity of spread leads to a faster propagation of the disease to neighboring communities (Li et al., 2011). This means shorter periods of time until the neighboring communities experience their first case. In addition to population size and distance, economic and political factors can also influence spread. Thus, I propose the following model: Where is the number of days until the index case in country i, is the distance from country i to the country where the disease originated (China), is a measure of human and economic freedom in country i, is the population, is the GDP per capita, is the degree of urbanization, is the volume of air travel, and is the epidemiological detection capacity. 0 , 1 , 2 , 3 , 4 , 5 , 6, 7 are the model parameters. The model can be rewritten by taking the logarithm of both sides: log ( ) = log( 0 ) + 1 log( ) + 2 log( ) + 3 log( ) + 4 log( ) + 5 log( ) + + 6 log( ) + 7 log( ) + If we let log( 0 ) = ; log( ) = 1 ; log( ) = 2 and so on for all variables, we obtain: log( ) = + 1 1 + 2 2 + ⋯ + 7 7 + . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 6, 2020. . https://doi.org/10.1101 This equation is the log-linear representation of the accelerated failure time (AFT) model, which is a parametric survival model (Wei, 1992).

Data sources
Time from the beginning of the outbreak to the first case in each country was gathered from ECDC public data on COVID-19 on the 11 th of April (ECDC, 2020a

Statistical analysis
A total of 156 countries were considered for the analysis, all of them experiencing their COVID-19 case by the 11 th of April 2020. Five countries without air travel data were treated as missing at random and dropped from the analysis. China, as the starting point, was not included. The starting date of the analysis was considered the 30 th of December 2019. Median, minimum and maximum survival times (times to first case) were determined. Each independent variable in the model was divided by its median into two groups. The survival probability of each group was then assessed using Kaplan-Meier estimates. The survival probabilities of the "low" (below median) and "high" (above median) groups were compared using the log-rank test. P-values were considered significant below 0.05. The variables (continuous, in log form) were included in the AFT model, . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 6, 2020. . https://doi.org/10. 1101 to evaluate their individual effect on survival time. The best fitting distribution for the AFT model was chosen using Akaike's Information Criterion (AIC) (Bozdogan, 1987). P-values of model coefficients were considered significant below 0.05. A second model was designed, with the HFI separated into its constituent parts, personal freedom and economic freedom. Cox proportional hazards regression was also performed, and the proportional hazards assumption was tested using Schoenfeld's residuals (Grambsch & Therneau, 1994). Statistical analysis was performed in R.

Results
The first countries affected had either geographical proximity to China or very high economic development. Developing countries reported cases later, particularly those at a considerable distance from China. Figure 1 depicts survival times across the globe. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 6, 2020. . https://doi.org/10.1101/2020.05.01.20087569 doi: medRxiv preprint days. The last affected was Yemen at 102 days. As Figure 2 shows, the distribution of survival Median time to first case was 68 days. The first affected country (after China) was Thailand, at 14 days. The last affected was Yemen at 102 days. The distribution of survival times is bimodal, with a group of neighboring and developed countries reporting their first cases in the first wave.
The rest of the world was affected in a larger, second wave. Between 35 and 52 days only two countries reported their index case.
The survival curve for all countries is shown on the top left of Figure 3. The period in which few countries reported index cases is the flat portion of the survival curve. The independent variables of the model, each divided into two groups by their median, are depicted in Figure 3. The log-rank . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 6, 2020. . https://doi.org/10. 1101 test showed that the observed difference in survival between "low" and "high" groups is statistically significant (p < 0.0001) for all variables except population (p = 0.49). Longer distance from China was associated with longer time to event. For the other variables, lower values were associated with longer time to event. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 6, 2020. . https://doi.org/10. 1101 Variables were then included in the AFT model. Based on the AIC, the best distribution to fit the data was the Gompertz distribution. As shown in Table 1, HFI, population, and GDP per capita were significantly associated with survival time (p < 0.05). The model is in log-log form, which means that a variable coefficient is interpreted as the percentage change in survival time given a 1% change in the variable, or the elasticity of the survival time with respect to the variable.
Coefficients were positive for all variables except distance. HFI had the largest effect, with a coefficient of 2.46. The weakest effect is that of air transportation volume, with a coefficient of 0.058. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 6, 2020. . https://doi.org/10. 1101 Another AFT analysis with the HFI replaced by its constituents, personal and economic freedom was performed (Table 2). Personal freedom had the largest effect (coefficient = 1.8). Economic freedom had a lower effect (coefficient = 0.273) and was not significant (p = 0.71). A model including personal freedom instead of the HFI had the lowest AIC of all the models tested. Thus, personal freedom is the variable that has the most influence on survival time. Cox proportional hazards analysis was also performed. Variable coefficients had similar values. However, the proportional hazards assumption was not met. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 6, 2020. . https://doi.org/10. 1101

Discussion
This study suggests that some of the factors associated with disease spread in the theoretical framework of the proposed gravity model are supported by empirical data from the current COVID-19 pandemic. Previous attempts to relate the gravity model to spread in the context of a pandemic used generalized linear models rather than survival analysis to model time to index case (Li et al., 2011). Past work on the topic also did not include potential confounders such as personal freedom, air travel volume and urbanization. Opportunities to conduct an ecological study of this type are as rare as major pandemics. Spread of the disease to the entire world allows for a survival analysis with more data points and no censoring, which leads to more precise estimates.
Selection bias is an important issue in ecological studies. Only countries that had experienced their index case by the 11 th of April were included. This represents the vast majority of countries in the world. However, generalizability of the study to the Pacific Island nations and other countries not included might be limited. The potential for information bias should also be brought up. Data upon which indices like the Global Health Security Index are constructed is self-reported by countries.
Nonetheless, alternatives of comparable comprehensiveness are not available.
A key finding of the study is the fact that higher personal freedom is associated with less time until a pandemic reaches a country. Potential confounding could be caused by the tendency of countries with lower personal freedom to underreport and underdiagnose (Kavanagh, 2020). However, the Global Health Security Detection and Reporting score mitigates at least some of the confounding, as it accounts for the capacity and willingness of countries to report. This could have implications in the policy of the early stages of a pandemic. Most countries have opted for social distancing measures which reduce personal freedom, like limiting public gatherings and travel restrictions (Lewnard & Lo, 2020). Consensus on the effectiveness of these measures has not been reached.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 6, 2020. .
Countries like Sweden are trying to limit social and economic disruption, even though certain models predict high mortality associated with this strategy (Gardner et al., 2020). Implementing restrictions as soon as a potentially pandemic virus starts spreading could delay its arrival to a country. If countries had more time to prepare their response, there would be a lower probability of straining health systems and thus fewer deaths. This type of analysis could be used to guide risk assessment and identify countries that are likely to be affected sooner in the course of a pandemic. These high-risk countries that have less time to spare would benefit from even more attention to pandemic preparedness. The results merit further investigation into the application of the model at the district and regional level, to assess whether it can be used at a smaller scale.
In conclusion, the gravity model-based survival analysis managed to measure the influence of important socio-economic variables on the time from the beginning of a pandemic to the first case in a country. Ecological survival analysis at the country level can aid in identifying patterns of spatial and temporal spread and potentially provide insight into the influence of social and economic factors on the global transmission of viral diseases.

Data availability
The data that support the findings of this study are available on figshare at https://doi.org/10.6084/m9.figshare.12205265.v1 . These data were derived from the public domain sources listed as references in the link and manuscript.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 6, 2020. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 6, 2020. . https://doi.org/10. 1101