Ongoing outbreak of COVID-19 in Iran: challenges and signs of concern

Since the first outbreak in China, the Coronavirus Disease 2019 (COVID-19) has rapidly spread around the world. Iran was one of the first countries outside of China to report infections with COVID-19. With nearly 100 exported cases to various other countries, it has since been the epicentre of the outbreak in the Middle east. By examining the age-stratified COVID-19 case fatality rates across the country and 14 university hospitals in Tehran, we find that, in younger age groups, the reported cases on 13/03/2020 only capture less than 10% of symptomatic cases in the population. This indicates significant levels of under-reporting in Iran. Using the 18 full-genome sequences from cases with a travel history or link to Iran, as well as the one full genome sequence obtained from within the country, we estimate the time to the most recent common ancestor of sequences which suggests the likely start of the outbreak on 21/01/2020 (95% HPD: 05/12/2019 - 14/02/2020) with an approximate doubling time of 3.07 (95% HPD: 1.68 - 16.27). Also, based on known exported cases to Oman, Kuwait, Lebanon, and China, we estimate the outbreak size on 25 February and 6 March to be around 13,700 (95% CI: 7,600 - 33,300) and 60,500 (43,200 - 209,200), respectively. Knowing the size of the outbreak at two time points and the typical doubling times associated with the COVID-19 epidemics in countries across Europe and North America, we can independently verify that the likely start of epidemic in Iran is around 15/01/2020 (27/12/2019 - 24/01/2020). Our assessment of the fate of the epidemic based on current levels of non-pharmaceutical interventions implemented by the government suggests upward of 10 million cases (IQR: 6.7M - 18M) and 100,000 ICU beds required (IQR: 77K - 140K) during the peak of the epidemic with more than 100,000 cumulative deaths (IQR: 180K - 240K). We also predict a peak in demand for ICU beds on 21/04/2020 (IQR: 06/04/2020 - 23/05/2020). The large span of the peak of the ICU demand is a result of two separate peaks, with the first occurring at around 15/4/2020 and the second in approximately a months time. The latter is also expected to last longer and is based on the relatively relaxed social distancing measures in place. The exact magnitude and timing of the peaks strictly depends on levels of interventions and can change significantly upon new information or change of policy. We caution that a lack of, or relaxed, stringent intervention measures, during a period of highly under-reported spread, would likely lead to the healthcare system becoming overwhelmed in the next few months.

growing in the country. Another possible explanation for the second peak is that changes of population behaviour (e.g. due to control or awareness of increasing deaths) can lead to a reduction in transmission, which, naturally, leads to a drop in reported infections while 141 incidence in deaths follows this reduction with some delay in time.

143
Age-stratified analysis of fatality rates in Iran: evidence of significant 144 under-reporting 145 We examine the only nationwide data available for age-specific naïve Case Fatality Ratio between nCFR in Iran and China across all age groups (see Fig. 3a). In particular, Iran's 156 nCFR for < 50 years-old is the same as those reported from hospitalised patients in Tehran 157 and more than 10 times higher than China which suggests that Iran is only reporting severe 158 cases of the infection. There also seems to be an underestimation of deaths in the > 80 159 age-group in Iran compared to those in China. We also note that nCFR will likely be an 160 underestimate of the true CFR in Iran because the results from all cases are not known 161 yet and the epidemic is still ongoing. To adjust for such biases in CFR, e.g. using a CFR 162 correction method developed by Russell et al. [2020], we need to have temporal data, but 163 we were not able to obtain more than one data point as of 10/04/2020. 164 A recent analysis by researchers at Imperial College London [Triggle, 2020] showed 165 that the age-specific risk of dying from COVID-19 greatly overlaps with 'normal' annual 166 death rate in males and females. This suggests those dying from COVID-19 could have 167 died within that same year and that SARS-CoV-2 has accelerated this. However, we see 168 a completely different picture in Iran. Figure 3b shows a significant increase in death rate 169 due to COVID-19 compared to the annual rates in almost all age categories which is also 170 likely due to under-reporting of cases in Iran. 171 5 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10.1101/2020

Phylogenetic analysis
We used 20 whole genome sequences obtained from patients that had travelled out of Iran during the current epidemic (and their infections were reported outside the country), combined with the first whole genome sequence obtained from Iran, in order to estimate pa-176 rameters of the epidemic using population genetic models implemented in BEAST [Suchard 177 et al., 2018]. We reconstructed a maximum liklelihood phylogenetic tree of all sequences 178 that were linked to Iran using PhyML [Guindon et al., 2010] (see Fig. 4). Two of the se-179 quences were from epidemiologically linked patients, and were therefore removed from the 180 dataset as they would bias estimators based on the coalescent, resulting in 19 sequences 181 overall including the genome from Iran. We have highlighted those in the metadata in the 182 supplementary files. We used the known sampling times, with a CTMC rate reference prior 183 on the rate, in order to estimate the evolutionary rate from this particular sample from 184 serial samples. We implemented the exponential growth model, with a lognormal prior on 185 population size (mean=1, stedev=2), and the growth rate implementation with a laplace 186 prior (scale=100), and an HKY+G model of substitution. Our inferred substitution rate 187 is consistent with other studies [Boni et al., 2020, Rambaut, 2020 Table 3). The method used by CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10.1101/2020 al. [2020], but it still does not account for the variation in catchment population size of 210 international airports in Iran and the expected exposure time of those exported passengers.

211
Here, we estimate the outbreak size in late February and early March using a detailed list of flight information from all airlines and try to accurately estimate the flux of passengers to/from four international airports in Iran with the highest number of weekly international flights to countries reporting exported cases from Iran (see Table 3, and supplementary materials) before the Gulf Cooperation Council countries suspended their flights to Iran. To capture the variation that exposure time, t, and catchment population size, M , create on the probability of having an exported case on a flight, p = tD M , where D is the daily passenger flux, we use a Beta-binomial distribution where the compound Beta function is defined on the random variable ζ = t M and a support of ( t min Mmax , tmax M min ) where t min = 20, t max = 50, M min = 4 × 10 7 , and M max = 5.56 × 10 7 . Then, for each country, i, in Table 3 with n i cases and corresponding success probability p i of finding a case, we can calculate the expected outbreak sizeλ using the likelihood function given bŷ where g(p) is the Beta distribution for the country i with shape parameters α = β = 2 212 (empirically fitted). To account for asymptomatic cases going undetected at the airports, 213 we further assume that the 'true' number of exported cases, accounting for asymptotic or 214 mildly symptomatic cases, is twice those reported -we note that this is a conservative 215 approximation in that the true percentage of symptomatic cases is likely higher. , and official numbers on those dates (see Fig. 6b).

228
Burden of the epidemic on the healthcare system  Table 1 and Table 2. We assume that a few (1-2) infected individuals sparked 233 7 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10. 1101/2020 tible individuals who become exposed to SARS-CoV-2 after an effective contact with an 235 infectious person. After an incubation period of 4 days, exposed individuals of age-group 236 a will develop either a clinical infection with probability 1 − m a and become hospitalised, 237 or experience a subclinical infection with probability m a where they recover or become 238 severely ill after 5 days. Those hospitalised may recover or be moved to the ICU stage 239 after 8 days where they may either die or stabilise after 10 days. We assume that removed 240 individuals are immune to reinfection over the period we simulated the epidemic (1 year) -241 we still do not know how long-lasting immunity will be to COVID-19, but we used similar suggests that the contribution from importation is likely going to be very small.

248
In our modelling analysis, we considered a range of scenarios of the epidemic from 249 moderate to high levels of intervention efficacy according to set dates in policy announce-250 ments by the government (Table 4). Our finding suggests that the current policy measures we project 221,000 cumulative deaths (IQR: 1.8 × 10 5 -2.4 × 10 5 ) (see Fig. 7b). We also 258 find there is likely going to be a second peak 40-80 days following the first peak in early is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10. 1101/2020 at least, a month prior to first official reports in Iran. We further estimate that, based on travel data, by 25 February there were approximately 13,700 (95% CI: 7,600 -33,300) 273 and by 6 March 60,500 (43,200 -209,200) cases in Iran, which is about a hundred times 274 higher than the official numbers at that time. The large discrepancy is partly due to late 275 detection as well as limitations in testing capacity particularly during the early stages of indication of a major outbreak in those countries. We cannot rule out under-reporting in 307 those countries, but it has also been suggested that temperature and humidity could play 308 a role in sustaining an epidemic in a country, although the effect of these is debated and 309 may be small [Ficetola and Rubolini, 2020, Notari, 2020, Sajadi et al., 2020 Any effects 310 of climate are likely to be most relevant to particular outbreaks in specific locations and 311 are unlikely to be relevant to the overall pandemic outbreak given a susceptible population 312 9 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10. 1101/2020 the pandemic due to their climate, whereas Iran has more ideal conditions for COVID-314 19 spread. Nevertheless, significant cases are expected in the coming months in warmer 315 countries and cases originating in Iran could further exacerbate the spread of the pandemic 316 within the broader region. Thus, Iran could become a regional reservoir and contribute to 317 a second wave of epidemics in the Middle East.

318
Supporting materials 319 We enclosed a supplementary file to this paper. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10. 1101/2020    11 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 23, 2020. . The government announced that the majority of workers must return to work while they must observe some level of social distancing. We have assumed that the re-opening results in up to 20% drop in effectiveness of mitigation measures M (t).
05/04/2020 [BBC Persian, 2020d] 12 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 23, 2020. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10.1101/2020 (a)  14 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10.1101/2020

15
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10.1101/2020 Figure 4: Maximum likelihood phylogenetic tree of samples linked to Iran. The labels include the sampling times used in the BEAST analysis. Red labels indicate the two epidemiologically linked samples that were excluded from the subsequent analysis.

16
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 23, 2020.

17
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 23, 2020. show the summary of the analysis in total deaths, peak number of cases and ICU beds required, and the peak time since the seeding event for cases and ICU beds required times.

18
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10.1101/2020 Robertson. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for 22 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10.1101/2020 Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 23, 2020. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 23, 2020. Benford's law (BL) gives the probability distribution of leading digits in a determined set 538 of numbers [Benford, 1938]. The probability that digit d = 1, 2, ..., 9 is a leading number 539 is given by P (d) = log10(1 + (1/d)) where numbers with leading digit 1 have the highest 540 probability of appearance and this probability steadily decreases as the starting digit be-541 comes larger. Benford's law is used in a variety of different areas to study irregularities in 542 data [Brown, 2005, Durtschi et al., 2004 and is also frequently used in medicine to assess 543 the quality of data [Crocetti andRandi, 2016, Idrovo et al., 2011]. 544 Figure 8a shows the distribution of leading digits in data acquired from Iran, the USA, 545 and the UK compared to the Benford distribution. The observed data in Iran is the cases 546 reported by each province from 18/02/2020 to 20/03/2020. We find numbers that are 547 larger than 9 in the data to take out any bias that may come into the distribution from 548 single digit numbers. We find a total of 397 numbers larger than 9 from which we then 549 sample 300 random numbers 8 times to generate an average distribution with error bars 550 representing the standard deviation. Similarly, we find 19494 numbers larger than 9 from 551 the data of US states from 22/01/2020 to 11/04/2020 and 619 numbers from the data 552 of English provinces from 09/03/2020 to 11/04/2020. We sample the numbers obtained 553 from the US and the UK in a similar way. We then normalise the frequency distribution 554 by the sample size of 300 to obtain the probability distribution shown in Fig. 8a. The 555 probability distribution does not show any conclusive evidence to suggest a manipulation 556 of data in any of the three countries. The low or inaccurate number of reported cases in Iran 557 are therefore likely due to limitations in testing and failure to detect infected individuals.

558
While this method can be used to test if data manipulation has occurred it does not give 559 any information about the deliberate absence of data, for example not reporting deaths 560 from specific hospitals [Goodman, 2016]. 561 Figure 8b shows the distribution of leading numbers for the total daily reported deaths 562 across the whole country for Iran, the US, and the UK until 11/04/2020. There are a total 563 of 42 reported numbers larger than 9 in Iran, 28 in the US, and 29 in the UK. We randomly 564 sample a section of numbers larger than 9 equal to the smallest available data which is 28 565 numbers larger than 9 from the US. The frequency distribution of the leading number is 566 then normalised by 28 to obtain a probability distribution. There are no error bars since 567 the available data is small and sampling was only done once. The daily reported number 568 of deaths in the entire country is perhaps not a reliable statistic to study for comparison 569 to Benford's law since the available data is too small. But since Iran does not report 570 deaths for individual provinces and cities, the total number of daily deaths in the country 571 is the only statistic available of deaths due to COVID-19 form Iran and the only means 572 of comparison to data from other countries. We still observe a significant distribution of 573 numbers leading with 1 which can be an explanation for Fig. 2c and why Iran has a fatality 574 26 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 23, 2020.

27
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10. 1101/2020  (Right) Binomial likelihood analysis for each country assuming 100% detection (green) and 50% detection (blue). Red dots represent the maximum likelihood and 95% confidence interval.

28
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10. 1101/2020