Abstract
While the first infection of an emerging disease is often unknown, information on early cases can be used to date it. In the context of the COVID-19 pandemic, previous studies have estimated dates of emergence (e.g., first human SARS-CoV-2 infection, emergence of the Alpha SARS-CoV-2 variant) using mainly genomic data. Another dating attempt used a stochastic population dynamics approach and the date of the first reported case. Here, we extend this approach to use a larger set of early reported cases to estimate the delay from first infection to the Nth case. We first validate our model using data on Alpha variant infections in the UK, dating the first Alpha infection at (median) August 21, 2020 (95% interquantile range across retained simulations, IqR: July 23 – September 5, 2020. Next, we apply our model to data on COVID-19 cases with symptom onset before mid-January 2020. We date the first SARS-CoV-2 infection in Wuhan at (median) November 28, 2019 (95%IqR: November 2–December 9, 2019). Our results fall within ranges previously estimated by studies relying on genomic data. Our population dynamics-based modelling framework is generic and flexible, and thus can be applied to estimate the starting time of outbreaks in contexts other than COVID-19.
Author summary While the first infection of an emerging disease is often unknown, information on early cases can be used to date it. In the context of the COVID-19 pandemic, previous studies have estimated dates of emergence of epidemic outbreaks (e.g., first human SARS-CoV-2 infection, emergence of the Alpha SARS-CoV-2 variant) using mainly genomic data. Another dating attempt used a population-level stochastic approach and the date of the first reported case. Here, we extend this generic and flexible approach to use a larger set of early reported cases to estimate the time elapsed between the first infection and the Nth case. Our model dates the first Alpha infection at around August 21, 2020, and the first SARS-CoV-2 infection in Wuhan at around November 28, 2019. Our findings fall within ranges previously estimated by studies relying on genomic data.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
SJ's postdoctoral fellowship was funded by a grant from the MODCOV19 platform of the National Institute of Mathematical Sciences and their Interactions (Insmi, CNRS) to FD. FD was funded by ANR-19-CE45-0009 (TheoGeneDrive). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study used ONLY openly available human data. Data on early Alpha cases were retrieved from the Global Initiative on Sharing Avian Influenza Data (GISAID), available at doi.org/10.55876/gis8.230104xg. For comparison to our results, V. Hill and J. Pekar shared their previously published results, available at doi.org/10.1093/ve/veac080 and doi.org/10.1126/science.abp8337, respectively.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Following comments by anonymous reviewers, we made the following major changes. i) We generated synthetic data and tested our framework on it to validate the approach. ii) We compared our optimized approach to a computationally more intensive but more classical Approximate Bayesian Computation (ABC) approach, and found similar results. iii) We revised our rejection criteria, and simplified them. iv) We added sensitivity analyses, changing the values of input parameters, but also changing the datasets used. Our findings were robust to these changes.
Data Availability
All data and codes needed for reproducibility of our results and the corresponding figures are available at a public Github repository: https://github.com/sjijon/estimate-emergence-from-data.