Risk assessment via layered mobile contact tracing for epidemiological intervention

There is strong interest globally amidst the current COVID-19 pandemic in tracing contacts of infectious patients using mobile technologies, both as a warning system to individuals and as a targeted intervention strategy for governments. Several governments, including India, have introduced mobile apps for this purpose, which give a warning when the individual's phone establishes bluetooth contact with the phone of an infected person. We present a methodology to probabilistically evaluate risk of infection given the network of contacts that individuals are likely to encounter in real life. Instead of binary "infected" or "uninfected" statuses, an infection risk probability is maintained which can be efficiently calculated based on probabilities of recent contacts, and updated when a recent contact is diagnosed with a disease. We demonstrate on realistic networks that this method sharply outperforms a naive immediate-contact method even in an ideal circumstance that all infected persons are known to the naive method. We demonstrate robustness to missing contact information (such as when phones fail to make bluetooth contact or the app is not installed). We show, within our model, a strong flattening of the infectious peak when even a small fraction of cases are identified, tested and isolated. In the real world, where most known-infected persons are isolated or quarantined and where many individuals may not carry their mobiles in public, we believe the improvement offered by our method warrants consideration. Importantly, in view of widespread concerns on privacy and contact-tracing, our method relies mainly on direct contact data that can be stored locally on users' phones, and uses limited communication via intermediary servers only upon testing, mitigating privacy concerns.


Introduction 28
The COVID-19 coronavirus pandemic, which has expanded from China in December 2019 to affect almost 29 every country in the world by now (April 2020), has led to a strong interest in non-pharmacological 30 interventions to curtail spread. Early efforts in China, Singapore, South Korea and other countries involved 31 extensive testing as well as identification and isolation of contacts of infected individuals and mobile-based 32 alerts [13]. Several governments have also experimented with mobile contact-tracing applications. At we accomplish this via a recursive function, avoiding cycling back to a previous contact by passing 88 an "ignore list" of contacts in each function call. Additionally, if a contact was met multiple times 89 in the relevant timeframe, an update is performed the same number of times, since each meeting 90 carried a risk. 91 6. Each individual A's contacts older than m days drop off the contact list, but this does not change 92 p A . That is, if B, who last met A more than m days ago, is diagnosed infected, this is unlikely to 93 require updating p A . 94 7. Individuals who are recovered are marked as as immune. They play no further part in our simulation. 95 In the real world, some instances of re-infection within a short time frame have been noted ([5], and 96 additional cases reported in news media). 97 Simulation 98 We simulate an agent-based model on a network, in which agents interact stochastically over time and 99 are categorized as "susceptible", "exposed", "infectious", and "recovered". These are the categorizations 100 of the compartmental SEIR model in epidemiology, discussed and compared in the next subsection. We 101 implement the risk update algorithm on the same agent-based framework and compare the risk profile 102 predicted by our algorithm with the actual infections of the agent-based simulations. 103 We initialize a population of size N , whose individuals are nodes on a weighted network. The network 104 represents all possible contacts in this population: a link indicates two people who may make a contact, 105 and the weight of the link is the probability of their making a contact at a given time. We consider 106 random networks with uniform degree distribution and uniform link weights, Barabàsi-Albert-structured 107 networks, and networks with family structures and small-world features. Our results are consistent across 108 all these structures. A key parameter of the network, used below, is the average number of contacts per 109 node, defined as where w ij is the weight of the link between i and j.

111
Individuals are marked as susceptible (S), exposed (E-infected but not yet infectious), infectious (I) 112 and recovered (R, assumed immune to future infection). The simulation is initialized with all individuals 113 being uninfected (susceptible) except a small number (eg, 10 out of 10,000) who are infectious. With each 114 individual is associated a probability, which is initialized to 1 for infectious individuals and 0 for others.

115
In each pass of the simulation, which we call an "epoch", every link on the graph is sampled once, 116 and a contact is made with probability equal to the weight of the link. So links weighted 1 (such as 117 family links) are always sampled, while other links may be rarely sampled. After each contact between a 118 pair of individuals, if one is infectious and the other is susceptible, the other is marked "exposed" with a 119 probability p t . For contacts other than S-I, nothing is done.

120
At the end of each epoch, each individual is sampled for their status. Exposed individuals become 121 infectious with a probability p e and infectious individuals recover with a probability p r .

122
In parallel with this, at each contact, the probability scores of individuals making contact are updated 123 as described above in "Risk factor evaluation". This is done for each sampled pair of contacts if at least 124 one has a non-zero probability score.

125
An example of a possible simulation is in figure 1.

126
We also keep track of a "naive probability" for each individual, which consists simply of updating every time a known susceptible individual meets a known infected individual. We call this the "naive 128 oracle" approach, since this algorithm does not consider contact with people who have a risk factor, only 129 with truly infectious people; but knows the true infectious status of the contacted person. In the real 130 world, this is known for only a fraction of infectious people.

131
Thus, the parameters of the simulation are p t , p e and p r . However, these are in turn derived from other 132 parameters as follows: R 0 is the "basic reproduction number" (see next subsection); M d is the number of 133 3 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 1, 2020.  4 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 1, 2020. deaths in the population), is usually written as Here S is the number of susceptible individuals in the population, E is the number of individuals exposed die of the disease may also be counted in R since they are no longer infectious). 145 We seek to estimate parameters of our simulation from epidemiological measurements in the real 146 world. A key epidemiological parameter is R 0 , the "basic reproduction number" or the average number where N is the total population. This is valid for the SIR model as well as the SEIR model without births 149 and deaths. This assumes a "well-mixed" system, but otherwise the same equation is commonly used with 150 N being an "effective population".

151
In terms of individual contacts and contact rates, we can alternatively write where p t = transmission probability per contact as above, and C is the total number of contacts while 153 the patient is infectious. C is equal to the rate of contact R c (per epoch, say) times the average recovery 154 time (also in epochs). So if the contact rate is 100 contacts per epoch, and the recovery time is 10 epochs, 155 then C = 1000, and if R 0 = 2, then p t = 0.002.

156
The average recovery time is 1/k r , so comparing the two definitions of R 0 (equations 7 and 8), we 157 can identify k i N = rate of infection = rate of contacts × probability of infection per contact = R c p t .

158
Therefore, p t = R 0 k r /R c . If, for example, R 0 is empirically estimated as 2, the recovery time is taken to 159 be 10 days, and the average rate of contact per day is 100, then we estimate p t as 0.002. More generally, 160 we take the rate of contact per epoch to be exactly equal to the average number of contacts per link, N c 161 (equation 1). Then we have which we use in the network simulation.  2. List all possible "events" that can occur which will change the state of the system: there will be 174 {i|si=S} {j|sj =I} C ij infection events, N E exposed-becoming-infectious events, and N I recovery 175 events possible, for a total of n events. For each of these, compute the rate or probability per unit 176 time for that event to occur, denoted a j for event j, j ∈ {1, 2, . . . , n}.     This process is repeated for as many epochs as desired, usually until the number of exposed and infected   CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 1, 2020. . Simulation on a 10000-node network that includes family units (size 1-5; link weight 1), spreaders (1% of total, 1000 links each, link weight 0.1 each) and links added via the Barabási-Albert method (link each node to a random new node with probability proportional to current coordination number of new node; link weight 0.1). This network has 183,599 links. We used M d = 10 epochs. The average contact of each node is 3.77/epoch, ie 37/day. Parameters were: R 0 = 3, d e = 5 days (5 epochs), d r = 15 days( 150 epochs), p t = 0.0053 (calculated from above). The simulation over 1500 epochs (150 days) agrees well with an independently implemented Gillespie simulation, but disagrees with the SEIR prediction. Shown is SEIR for effective population size N c = 4000, which gives the best fit but is hard to justify. (c) Simulation on a 100,000 node network with 1,948,709 links, layered similarly to the network in (b), except that the BA links have weight 0.05, with the same parameters except p t = 0.0069 (calculated). The SEIR plotted is for effective population 23,000.

216
While the object of this exercise is not to predict overall numbers in the population, but to identify in-  Notably, at this time we have no "recovery" for probabilities, other than the testing-and-resetting 228 mechanism discussed further below, not used in this figure. The methodology is expected to be most 229 useful in early stages of an epidemic. T P T P + F P So, at all epochs shown, the probabilistic method has a TPR of about 67% at a FPR of 50%; the naive 244 oracle performs at less than 50% TPR at 50% FPR in all cases, and its performance grows worse at later 245 epochs (as the infection spreads.) Since suspected individuals will be tested via accurate RT-PCR tests, 246 we feel this FPR rate is acceptable, especially given the effectiveness of a testing+isolation strategy that 247 tests even a small fraction of risky individuals (next section) 248 Effectiveness of testing and isolation of patients 249 The naive oracle above is assumed to know the status of every covid-19 positive patient. Also, we update 250 naive probabilities only on contact between infectious and uninfected patients.

251
For a more realistic comparison with the real world operation of these methods, we can simulate "test-252 ing" of patients, after which they are marked "tested positive" (known infectious) or negative (susceptible). 253 We implement testing at each epoch by selecting a predetermined fraction of all individuals with 254 a high probability to be tested; the test simply looks at their true infected status. If truly infected, We can also isolate tested-positive patients, by weakening their links to all their contacts. Figure 6  suggests that a test rate of even 10% has a very strong effect in flattening the curve. However, though 273 this suggests the effectiveness of testing and isolation (which has been widely noted [1,14,3] and is being 274 practised by most countries), we caution against drawing quantitative conclusions from our model.

275
Lossy data 276 With mobile tracking, it is likely that several individuals will not be carrying their mobile or will not have 277 the app installed, therefore the probabilistic updates will not occur. Figure 6 (b) shows the effect of such 278 missing contacts, implemented by randomly ignoring updates with a given probability. This appears to 279 have negligible effect for up to 60% loss in contacting (40% successfully recorded contacts). phones, with the goal of isolating them. This is argued[9, 3] to be an effective way to control the outbreak 283 and build "digital herd immunity". We demonstrate in an agent-based simulation on a network that our 284 method is a better predictor, based on TPR/FPR or precision/recall, of truly infected patients compared 285 to a naive first-contact-based prediction, even in an ideal case where the naive method is an "oracle" 286 that always knows the true status of the contact. Our results are robust to loss in detection of contacts, 287 which is expected to be significant in real life. Our simulations show that testing only the most probable 288 individuals (p > 0.8) and isolating them (reducing link strength by a factor of 10) strongly flattens the 289 curve of infection. Though we have tried to make our network structure realistic, the real world has several 290 complications over a simulation; nevertheless we expect these results to hold qualitatively. 291 9 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 1, 2020. .  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 1, 2020. and last meeting. Infection probability information is exchanged via bluetooth at the time of contact, 296 but is used only to update one's own probability and need not be stored. Only one step, the "update 297 contacts" procedure that propagates the change in diagnosis of an individual to the individual's contacts, 298 and the contacts' contacts, recursively, requires the means for one mobile phone to communicate to another 299 post-contact. This likely requires the use of an intermediary server, but this use is limited and privacy 300 concerns can be mitigated by using an encrypted protocol and deleting communication request data once 301 the request is carried out.

302
Overall, our probabilistic contact tracing framework appears to outperform the naive method signif-303 icantly, whether implemented as an "oracle" that knows all truly infected individuals, or implemented 304 with a testing framework to recognize only positively-tested individuals. While it can be used to identify 305 immediate contacts of a tested individual, it can go further to identify at-risk individuals in the wider 306 population, while also substantially taking care of privacy concerns.

307
While we focus on the SEIR disease model, more complex models featuring asymptotic individu- The network generation and simulation code is available at https://github.com/rsidd120/EpiTracSim.