The correspondence between the structure of the terrestrial mobility network and the emergence of COVID-19 in Brazil

The inter-cities mobility network is of great importance in understanding outbreaks, especially in Brazil, a continental-dimension country. Grounded on the complex networks approach, cities are here represented as nodes and the flows as weighted edges - these geographical graphs, (geo)graphs, are handled in a Geographical Information System. We adopt the IBGE database from 2016, which contains the weekly flow of people between cities in terrestrial vehicles. The present work aims to investigate the correspondences of the networks' measures, like strength, degree, and betweenness with the emergence of cities with confirmed cases of COVID-19 in Brazil, and special attention is given to the state of Sao Paulo. We show that the results are better when certain thresholds are applied to the networks' flows to neglect the lowest-frequency travels. The correspondences presented statistical significance for most measures up to a certain period. Until the end of April, the best matchings are with the strength measure (total flow related to a node/city) under a high flow threshold in the Sao Paulo state, when the most connected cities are reached. After this stage, the lower thresholds become more suitable, indicating a possible signature of the outbreak interiorization process. Surprisingly, some countryside cities such as Campina Grande (state of Paraiba), Feira de Santana (state of Bahia), and Caruaru (state of Pernambuco) have higher strengths than some states' capitals. Furthermore, some cities from the Sao Paulo state such as Presidente Prudente and Ribeirao Preto are captured in the top-rank positions of all the analyzed network measures under different flow thresholds. Their importance in mobility is crucial and they are potential super spreaders like the states' capitals. Our analysis offers additional tools for understanding and decision support to inter-cities mobility interventions regarding the SARS-CoV-2 and other epidemics.


INTRODUCTION
The complex network approach (1) emerges as a natural mechanism to handle mobility data computationally, taking areas as nodes (fixed) and movements between origins and destinations as connections (flows) (2,3,4) . The inter-cities mobility network is vital for understanding outbreaks, especially in Brazil, a continental-dimension country (5,6,7) .
As of May 1st, 2020, the pandemic of COVID-19, caused by the SARS-CoV-2, has globally spread, with about 2,066,023 confirmed cases and 239,447 deaths. In Brazil, there are more than 92,665 confirmed cases and more than 6,439 deaths (8,9,10) , with the first documented case located in the city of São Paulo on February 25th, 2020.
This paper presents an investigation on how topological properties of terrestrial mobility networks relate to the emergence of COVID-19 cases in Brazil, considering cities as nodes and flows as weighted edges. We compute three pointwise measures for each node, namely the strength, degree, and betweenness centrality to find the structurally more important cities and contrast them with the documented cases of COVID-19 until May 1st, 2020.
The most common mobility data used in studies of this nature in Brazil are the pendular travels, from the 2010 national census (IBGE) (11) . In this paper, we use the roads' IBGE data from 2016 (12) , which contains the flows between cities considering terrestrial vehicles in which it is possible to buy a ticket (mainly buses and vans). The information collected by that research seeks to quantify the interconnection between cities, the movement of attraction that urban centers carry out for the consumption of goods and services, and the long-distance connectivity of Brazilian cities. The North region is not included in this paper, because neither the fluvial nor the air modals are covered and their roles are key to understanding the spreading process there, especially in the Amazon region.
Our contributions are the analysis of i) the Brazilian inter-cities mobility networks under different flow thresholds to neglect the lowest-frequency travels, especially in the beginning of the outbreak, when the interiorization of the disease is still not in progress; ii) the correspondence between the networks' statistics and the emergence of COVID-19 in Brazil. The present investigation offers additional tools for understanding and decision support in the containment of the ongoing epidemiological spreading (13,14) and others in the future. From the mobility data, the authorities have a preliminary list of cities with a high likelihood of having patients to further employ preventive actions like social distancing. This paper is organized as follows: the Method section presents the data and the techniques we employ, such as the complex networks' measures, and the geographical visualization tools. Following, the analysis results are exhibited with the discussion and final remarks.

METHOD
The above-cited IBGE data (12) contains the weekly travel frequency (flow) between pairs of Brazilian cities/districts . The frequencies are aggregated within the round trip, which means that the number of travels from city A to city B is the same as from B to A. We produce two types of undirected networks with a different number of nodes to capture actions in two N scales (country and state): 1.
-Brazil without the North region (BRWN): nodes are cities and edges are the 987 N = 4 flow of direct travels between them. 2.
-São Paulo state (SP): a subset of the previous network, containing only cities 20 N = 6 within the São Paulo state.
We focus on two versions of each network for certain flow thresholds , the ( ) η η 0 η = 0 that is the original network from the IBGE data and ( ), to neglect travels with η d η = d lower-level frequencies. The corresponds to the higher flow threshold that produces the d network with the largest diameter. The motivation behind is to get a threshold high enough to η d not consider the least frequent connections and to not disregard the most frequent ones (4) .

Complex network measures
The topological degree of a node is the number of links it has to other nodes. As here k the networks are undirected, there is no distinction between incoming and outgoing edges.
In a connected graph, there is at least one shortest path between any pair of nodes σ vw and . The betweenness (2) centrality of a node is the rate of those shortest paths that pass v w b i through : i Although it is a pointwise measure, it takes into account non-local information related to all shortest paths on the network. It is worth highlighting that in the present context this centrality index is not a transportation (physical) measure but a mobility (process) one. Besides, both degree and betweenness do not account for the network flows here, but the binary (weightless) networks. The diameter of a network is the distance between the farthest nodes, given by the maximum shortest path.
The strength of a node on the other hand is the accumulated flow from incident edges: CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 22, 2020. . https://doi.org/10.1101/2020.05.17.20104612 doi: medRxiv preprint in which is the flow between nodes and . F ij i j In our context, the degree gives the number of cities that a city is connected to, showing the number of possible destinations for the SARS-CoV-2. The strength captures the total number of people that travel to (or come from) such places in a week. From a probability perspective, the cities that receive more people are more vulnerable to SARS-CoV-2. The betweenness centrality, on the other hand, considers the entire network to depict the topological importance of a city in the routes that are more likely to be used.

Geographical visualization
A geographical approach for complex systems analysis is especially important for mobility phenomena (14) . Santos et al. (2017) (15) proposed a graph where the nodes have a known geographical location, and the edges have spatial dependence, the (geo)graph. It provides a simple tool to manage, represent, and analyze geographical complex networks in different domains (4,16) and it is used in the present work. The geographical manipulation is performed with the PostgreSQL Database Management System and its spatial extension PostGIS. Lastly, the maps are produced using the Geographical Information System ArcGIS.

RESULTS AND DISCUSSION
This section presents the results of the topological analysis for the previously mentioned networks. Table 1   The |E| decreases for increasing , due to the removal of edges with lower flows. The η resulting networks are undirected. Throughout the paper, both the degree and the betweenness measures do not account for the flows, but weightless edges instead. Two nodes are connected when between them there is a nonzero flow, which means that the number of connections decreases for increasing threshold ( ). We compute the diameter of the networks for varying . η η 4 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 22, 2020. . Following the (geo)graphs approach, it is possible to visualize nodes and edges of the Brazilian mobility network in the geographical space for in Figure 2. The edges for are η d η 0 not plotted, because there are more than 59000 and the visualization was not clear. It is important to highlight some key cities like Belo Horizonte, Rio de Janeiro, São Paulo and Salvador, and the high number of connections between them. Figure 3 depicts the geographical graph regarding the state of São Paulo.

5
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 22, 2020. . Figure 4 shows the map of the topological degree related to each node/city, considering all original flows ( ), and in Figure 5 there is the equivalent for . Key cities are η 0 07.55 η d = 2 labeled in the maps. . The edges for are η d η 0 not plotted, because there are more than 59000 and the visualization was not clear.

6
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 22, 2020.    and , respectively. η 0 η d Some cities with high strength also appear in a report (17) of most vulnerable cities to COVID-19 due to their intense traffic of people, namely São Paulo, Campinas, São José do Rio Preto, São José dos Campos, Ribeirão Preto, Santos, Sorocaba, Jaboticabal, Bragança Paulista, Presidente Prudente, Bauru, and many others. Currently, they all have a significant number of confirmed cases.

9
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  We now assess which of the computed measures ( , , and ) better approximates the s k b emergence of COVID-19 in Brazil. We compare the top-ranked cities of each 1, ] n ∈ [ X measure with the cities that contain confirmed cases. According to the available data of the n notified cases from daily state bulletins of the Brazilian Health Ministry (10) , until May 1st, 2020, the number of cities with at least one confirmed patient with COVID-19 is in the 902 X = 1 BRWN network, which corresponds to 38% of the nodes, and in SP (52% of the 23 X = 3 nodes). This provides a way of tracking the response of each measure in detecting vulnerable cities according to the evolution of the virus spreading process.
Some cities from the aforementioned data are not present in our network, due to a simplification that the IBGE does: it groups small neighboring cities with almost no flow into single nodes. For simplicity, and considering that such cities do not contain cases in the first days of the outbreak, they are not accounted for in our analysis.
In order to verify whether the rate of correspondence between the top-ranked cities from the networks' measures and the cities with COVID-19 cases has statistical significance, we verify what are the results of picking cities at random instead of under the measures' guidance. We perform simulations for each , choosing nodes by sort and monitoring 10 5 1, ] n ∈ [ X n what is the rate of positive cases. Figure 8 presents the correspondence of the first cities p n with COVID-19 documented cases and both the simulated data and the top-ranked nodes under , , and . The gray region represents 95% of the rates' occurrences in the simulations for s k b each , and the maximum observed value is the dashed line. n In our analysis, on May 1st, about 95% of the simulations have matching rates within for the BRWN network, and the same volume is within for the SP. . 38 .01 0 ± 0 .52 .03 0 ± 0 The results for node selection during the first days via the network indexes all lie above the dashed line, which means that all indexes are a better heuristic than picking nodes at random in the beginning. However, immediately after April 21st, with and for both thresholds start k η 0 b to cross the dashed line in SP, having results compared to the simulations. Those three curves become to have the worst results for BRWN as well, after a transient.
Oscillations are perceived in Figure 8 a) for small , but they stabilize afterward and n follow a tendency. The matching is at maximum in the beginning, because the first p documented case was in the city of São Paulo, which is the first ranked city in all measures. The curve then decreases until reaching a region where the oscillations take place.
The network quantifiers pose good correspondences already in the beginning of the spreading process as the dashed line is not touched until approaches . The high-frequency n X oscillations of Figure 8 a) are pronounced up to March 24th ( ). That is probably the 50 n ≈ 1 transient needed for the spreading process to reach a more steady behavior.
There is no mark on March 24th in Figure 8 b), because the number of new cities with confirmed cases is negligible in the period. Interestingly, on March 31st, a week later, the high-frequency oscillations start to diminish in SP. A few days further, after April 7th, the betweenness centrality with starts to be a bad predictor for BRWN and then for SP. η d 11 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 22, 2020. . https://doi.org/10.1101/2020.05.17.20104612 doi: medRxiv preprint Following, we quantitatively evaluate the curves from Figure 8 and others with different thresholds, to check exactly which better captures the spreading process of COVID-19 in the η mobility network. Figure 9 displays the integral of each of those curves with , is the average flow of the network and is the standard deviation. The is μ σ η d marked with the vertical line, showing to be a good threshold in SP, but bad in BRWN. While for SP the strength is always the best measure, there is a certain oscillation in BRWN, where both the and are the best predictors for small threshold, switching to at and then s b b 5 η ≈ 4 to at . The best prediction is given by betweenness with , and similar results k 10 η ≈ 1 0 η ≈ 6 are captured by both and at . When it comes to the SP network, the captures the s b η 0 η d exact point where has the best outcome. s 12 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 22, 2020. . https://doi.org/10.1101/2020.05.17.20104612 doi: medRxiv preprint  Table 2 enumerates the first twenty ordered cities according to the best-evaluated measures and compares them side-by-side with the first twenty cities with COVID-19 cases in the BRWN network. The best measures for SP are compared with each other in Table 3 as well. In both networks, the metrics present high-frequency oscillations in the beginning as shown in Figure 8, but still have some correspondences with the first confirmed cases.

13
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 22, 2020. .

TABLE 2
Cities with at least one case of COVID-19 in Brazil (BRWN) in the order they were documented (10) , side-by-side with the top-ranked cities regarding , and for and . s k b η 0 η d The best combination is with (second column). Matching cities are colored alike. s η 0 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 22, 2020. .  (10) . Table 3, as in Table 2, also displays cities that are captured by the three rightmost columns that do not appear in the first, showing their high level of vulnerability: Ribeirão Preto, Jundiaí, Sorocaba, Piracicaba, and Presidente Prudente. They all have documented cases before May 1st, though. Our study also captured the most influential cities that had cases already in the beginning, like São Paulo, Campinas, São José dos Rio Preto, São José dos Campos and Taubaté. Other cities appear in the second column (best metric) but not in the first: Praia Grande, São Vicente, São Carlos, Registro, Sertãozinho.
Due to their importance in mobility, many cities of Table 3, especially in the second column, appear in the report (17) on the vulnerability of microregions of São Paulo state to the SARS-CoV-2 pandemic of April 5th either as potential spreaders or places with a high probability of receiving new cases. They all have notified cases by May 1st and some have the highest numbers of São Paulo state . 3 Both and with pose good results at the beginning of the pandemics for the s b η 0 BRWN network, but alone started to be the best predictor from the end of April. The most s important cities, due to their high flow of travelers and their role in the most used routes, are reached first, followed by those with smaller flows, probably because of the interiorization of the virus -the outbreak reaching the countryside cities. This behavior is even more pronounced in SP, in which under is the best option at first, neglecting lower flow venues, but the s η d η 0 started to be the best option from the end of April.
In the ongoing pandemics, from May 1st, the index with is currently the best s η 0 predictor and may help to figure out which countryside cities are about to receive new cases. Moreover, it may help in the following waves of the disease. In the case of another pandemic, one could first compute the strength of the networks according to the last updated data from IBGE and identify the top-ranked cities. In Brazil, it is enough checking on strength at the original data, as we presented, since it produces similar results as the betweenness centrality and is computationally cheaper to obtain. Regarding the state of São Paulo, one better checks on the strength index with threshold in the first weeks and only then switch to . As our results η d η 0 show, the correspondence has statistical significance and, along with other information about the regions such as where are the first notified cases, the pandemic could be closely traced.

FINAL REMARKS
We present a complex network-based analysis in the Brazilian inter-cities mobility networks towards the identification of cities that are vulnerable to the SARS-CoV-2 spreading. The networks are built with the IBGE terrestrial mobility data from 2016 that have the weekly flow of people between cities. The cities are modeled as nodes and the flows as weighted edges and the geographical graphs, (geo)graphs, are visualized within Geographical Information Systems.
Two scales are investigated, the Brazilian cities without the North region, and the state of São Paulo. The former does not account for the North due to the high number of fluvial routes and some intrinsic local characteristics that are not represented with the terrestrial data. The state of São Paulo is important in the ongoing pandemic since the first documented case was in the state capital and it is currently one of the main focus of the virus spreading.
Three network measures are studied, namely the strength, degree, and betweenness centrality, under several flow thresholds to account for different mobility intensities, ranging from the original flow data to networks with only the edges with higher weights. We verified that the strength has the best matching to the cities with COVID-19 confirmed cases. Moreover, the strength measure with the original flows showed to be the best option for Brazil. Oppositely, a more restricted threshold culminates in better correspondences at the beginning of the pandemic in SP. Probably due to the interiorization of the spreading process, a transition is observed after a certain point, when the original flows have better results as the connections to smaller cities are only present when they are accounted for.
Surprisingly, some countryside cities such as Campina Grande (state of Paraíba), Feira de Santana (state of Bahia), and Caruaru (state of Pernambuco) have higher strengths than some states' capitals. Furthermore, some cities from the São Paulo state such as Presidente Prudente and Ribeirão Preto are captured in the top-rank positions of all the analyzed network measures under different flow thresholds. Their importance in mobility is crucial and they are potential super spreaders like the states' capitals.
As future work, we intend to analyze aerial and fluvial mobility data as well, as they include valuable information about the transport of people and goods. The former is fundamental to the discussion of the dynamics for the Brazilian North region, especially the Amazon, and the latter captures long-range connections. Lastly, one could check for correspondences between the networks' measures and data from other epidemic outbreaks.

17
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 22, 2020. . https://doi.org/10.1101/2020.05.17.20104612 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 22, 2020. . https://doi.org/10.1101/2020.05.17.20104612 doi: medRxiv preprint