Abstract
Since the beginning of the COVID-19 pandemic, daily counts of confirmed cases and deaths have been publicly reported in real-time to control the virus spread. However, substantial undocumented infections have obscured the true prevalence of the virus. A machine learning framework was developed to estimate time courses of actual new COVID-19 cases and current infections in 50 countries and 50 U.S. states from reported test results and deaths, as well as published epidemiological parameters. Severe under-reporting of cases was found to be universal. Our framework projects for countries like Belgium, Brazil, and the U.S. ∼10% of the population has been once infected. In the U.S. states like Louisiana, Georgia, and Florida, more than 4% of the population is estimated to be currently infected, as of September 3, 2020, while in New York the fraction is 0.12%. The estimation of the actual fraction of currently infected people is crucial for any definition of public health policies, which up to this point may have been misguided by the reliance on confirmed cases.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported by Lyda Hill Philanthropies.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Not Applicable
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All code, daily updated estimates, and their visualizations are freely available at a GitHub repository (https://github.com/JungsikNoh/COVID19_Estimated-Size-of-Infectious-Population).
https://github.com/JungsikNoh/COVID19_Estimated-Size-of-Infectious-Population