Misdiagnosis prevents accurate monitoring of transmission and burden for sub-critical pathogens: a case study of Plasmodium knowlesi malaria

Maintaining surveillance of emerging infectious diseases presents challenges for monitoring their transmission and burden. Incomplete observation of infections and imperfect diagnosis reduce the observed sizes of transmission chains relative to their true sizes. Previous studies have examined the effect of incomplete observation on estimates of pathogen transmission and burden. However, each study assumed that, if observed, each infection was correctly diagnosed. Here, I leveraged principles from branching process theory to examine how misdiagnosis could contribute to bias in estimates of transmission and burden for emerging infectious diseases. Using the zoonotic Plasmodium knowlesi malaria as a case study, I found that, even when assuming complete observation of infections, the number of misdiagnosed cases within a transmission chain for every correctly diagnosed case could range from 0 (0 - 4) when R0 was 0.1 to 86 (0 - 837) when R0 was 0.9. Data on transmission chain sizes obtained using an imperfect diagnostic could consistently lead to underestimates of R0, the basic reproduction number, and simulations revealed that such data on up to 1,000 observed transmission chains was not powered to detect changes in transmission. My results demonstrate that misdiagnosis may hinder effective monitoring of emerging infectious diseases and that sensitivity of diagnostics should be considered in evaluations of surveillance systems.


INTRODUCTION 27
For pathogens with sub-critical transmission (i.e., ! < 1), a robust surveillance system that 28 identifies and correctly diagnoses infections is necessary to monitor changes in pathogen 29 transmission and burden (1). Such pathogen surveillance is important both for measuring 30 progress towards elimination of diseases with immediate public health importance, such as 31 measles (2-4) and malaria (5), and for assessing the future threat of emerging infectious diseases 32 (6), such as avian influenza (7), human monkeypox (1,8), and Middle East respiratory syndrome 33 coronavirus (2,9). 34 Considerable work has been devoted to advance a mathematical framework that 35 leverages the data collected by surveillance systems to obtain estimates of transmission and 36 burden for pathogens with sub-critical dynamics (1,2,4,10,11). These studies have improved our 37 understanding of a wide range of emerging infectious diseases and have critically evaluated the 38 sensitivity of these estimates to the quality of data from the surveillance system. Crucially, each 39 study modeled variation in surveillance quality through variation in the ascertainment fraction 40 (i.e., the proportion of infections that are detected) and assumed that, once detected, all infections 41 were correctly diagnosed. In reality, however, non-specific clinical and biological features are 42 likely to limit the sensitivity of clinical diagnosis, particularly for emerging infectious diseases 43 (12,13). The extent to which misdiagnosis affects estimates of transmission and burden for 44 pathogens with sub-critical dynamics remains largely unaddressed. 45 The zoonotic Plasmodium knowlesi malaria offers a natural case study to examine the 46 impact of misdiagnosis on estimates of transmission and burden. Endemic to Southeast Asia 47 Assuming that the number of secondary infections caused through one generation of 73 pathogen transmission followed a negative binomial distribution with mean ! and dispersion 74 parameter , I used the branching framework to calculate summary statistics of the transmission 75 chains. Specifically, I solved for the probability that a transmission chain was truly of size j 76 infections, " , and the mean size of transmission chains, .  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint In eq. (1), Pr ( ) is the probability that a transmission chain is of size , " , computed using eq. 107 (S1), and Pr () is the probability that a transmission chain is of observed size , " , , computed 108 using eq. (S2) for the model of independent observation and using the numerator of eq. (S7) for 109 the model of size-dependent observation. For the model of independent observation, the 110 probability of observing a transmission chain of size ̂ given that the transmission chain is truly 111 of size is 112 113 CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Effect of Misdiagnosis on Statistical Power to Detect Changes in Transmission 155
Bias in ! estimates on account of misdiagnosis could reduce the statistical power to detect 156 changes in ! over time using data on the size of transmission chains. To measure statistical 157 power as a function of the number of observed transmission chains, I followed an approach taken 158 by Blumberg et al.
(2). I assumed that ! was historically equal to 0.1 and then increased to 159 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 16, 2021. ; https://doi.org/10.1101/2021.09.13.21263501 doi: medRxiv preprint ! + Δ ! , where Δ ! was set to 0.1, 0.5, or 0.9. I then simulated observed transmission chains 160 and estimated + ! while varying from 1 to 1,000. I then compared the model in which ! was 161 estimated to have changed to the null hypothesis that there was no change in transmission (i.e., 162  average observed size of transmission chains increased as more infections were observed (Fig. 2,  214 left column). By contrast, + ! estimates decreased with increasing (&$' for the model of size-215 dependent observation (Fig. 2, right column). This counterintuitive effect can be explained by the 216 observation that, if (&$' is low, larger transmission chains have a greater probability that at least 217 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The underestimates of burden (Fig. 1) and transmission (Fig. 2) indicated that 230 misdiagnosis of P. knowlesi may affect the statistical power to detect changes in transmission 231 based on the size of observed transmission chains. To test this, I simulated changes in 232 transmission and measured the statistical power to detect that change. I observed that, under 233 scenarios in which ! increased by 0.9, data on 1,000 observed transmission chains provided 234 only 10.3% power using an imperfect diagnostic method, compared to 100% if using a perfect 235 diagnostic method (Fig. 3). At smaller increases in ! , data on observed transmission chain sizes 236 obtained using an imperfect diagnostic method had effectively no power to detect a change in 237 transmission. 238 239 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.   (1,2,7,11). In 250 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. which we would incorrectly conclude that human-to-human transmission of the pathogen was 270 unlikely to be occurring. For every scenario considered, the estimate of ! was less than the true 271 value, indicating that bias due to misdiagnosis exceeds the competing positive bias from 272 incomplete observation when assuming size-dependent observation (1). For pathogens such as P. 273 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review) preprint
The copyright holder for this this version posted September 16, 2021. ; https://doi.org/10.1101/2021.09.13.21263501 doi: medRxiv preprint knowlesi, these simulation results suggest that, in settings where misdiagnosis is common, the 274 extent of human-to-human transmission could be greater than previously thought. To date, it has 275 been believed that nearly all cases of P. knowlesi in humans are caused by spillover from long-276 tailed and pig-tailed macaques, the zoonotic reservoir (17). The lack of observed human-to-277 human transmission may be explained by multiple factors, including low parasite densities in 278 humans (16) and restricted vector habitat preference (15), and is supported by a lack of genetic 279 diversity across human P. knowlesi infections (22). Nevertheless, human-to-human transmission 280 of P. knowlesi has been demonstrated experimentally (23), and these results suggest that, if or 281 when human-to-human transmission occurs, misdiagnosis could cause us to underestimate its 282

magnitude. 283
Finally, I demonstrated that data on the sizes of transmission chains diagnosed using a 284 diagnostic with realistic sensitivity would be insufficient to monitor changes in transmission. were based upon simulated data only. I used simulations representative of P. knowlesi to 293 illustrate possible outcomes that may occur due to misdiagnosis (19,20), yet I lacked empirical 294 data on the distribution of transmission chain sizes for P. knowlesi. As such, this analysis is not 295 estimating the true extent of human-to-human transmission of P. knowlesi. Second, methods 296 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Complete Observation and Correct Diagnosis 317
For a pathogen with sub-critical transmission dynamics, I modeled the number of offspring 318 caused by a single infection through one generation of transmission as a negative binomial 319 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 16, 2021. ; https://doi.org/10.1101/2021.09.13.21263501 doi: medRxiv preprint distribution with mean ! and dispersion parameter (1,11). Therefore, it follows that 320 transmission chains of size occur with probability, 321 322 " = Γ( + − 1) Γ( )Γ( + 1) In eq. (S1), Γ(·) is the gamma function. Because ! < 1, the mean transmission chain size can 325 be calculated as the mean of a geometric series with common ratio ! and is equal to * */2 " . 326

Incomplete Observation and Imperfect Diagnosis 328
In the case of P. knowlesi and many other pathogens, the size of transmission chains that are 329 identified by a surveillance system will be affected by two factors. First, infections in a 330 transmission chain may not present within the health system, due to a lack of symptoms or 331 access to treatment. Second, infections in the transmission chain that do present within the health 332 system may be misdiagnosed and thus inaccurately recorded within the surveillance system. The first model of incomplete observation and diagnosis assumes that each individual is subject 341 to an independent probability #$% equal to the product of observation probability, %&' , and the 342 sensitivity of the diagnostic method, . Therefore, the probability that we observe and correctly 343 diagnose cases from a transmission chain is equal to 344 where 3 is the probability that a transmission chain is of true size , calculated using eq. (S1). 348 The probability that a transmission chain is of observed size is equal to the normalized 349 The probability that a transmission chain is of observed size is then equal to 378 380 381 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.