Assessing the Performance of COVID-19 Forecasting Models in the U.S.

Kyle J. Colonna; John S. Evans

doi:10.1101/2020.12.09.20246157

ABSTRACT

Dozens of coronavirus (COVID-19) forecasting models have been created; however, little information exists on their performance. Here we examine the performance of nine commonly-used COVID-19 forecasting models, as well as equal- and performance-weighted ensembles, based on their knowledge – i.e., accuracy and precision, and their ‘self-knowledge’ – i.e., ‘calibration’ and ‘information’. Calibration and information are measures commonly employed in structured expert judgment to assess an expert’s ability to meaningfully communicate the extent and limits of their knowledge.¹ Data on observed COVID-19 mortality in 4 states, selected to reflect differences in racial composition and COVID-19 case rates, over eight weeks in the summer of 2020 provided the basis for evaluating model predictions.

Only two models showed little bias (geometric mean of observed/predicted < 10%) and good precision (geometric standard deviation of observed/predicted < 1.6). Three models demonstrated good calibration and information. However, only one model exhibited superior performance in both dimensions.

Nearly all models under-predicted COVID-19 mortality, some quite substantially. Further, model performance depends on racial composition and case rates, and forecasts in the short-term outperform forecasts in the medium-term on all criteria. The performance-weighted ensembles also outperformed the equal-weighted ensemble on all criteria.

The ability of models to accurately and precisely predict mortality and the ability of the modelers to provide meaningful characterizations of the uncertainty in their estimates are potentially important to model developers and to those using model output to inform decisions.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

Kyle Colonna's involvement was funded by the Harvard Population Health Sciences PhD scholarship. Prof. John Evans' involvement was funded by the Department of Environmental Health and the Harvard Cyprus Initiative at the T.F. Chan School of Public Health.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

No IRB/oversight body approval or exemption was necessary as the data is publicly available.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

Model forecasting data was gathered from the COVID-19 Forecast Hub's publicly available structured data storage repository on GitHub. Observed state COVID-19 mortality and case data was gathered from the Centers for Disease Control and Prevention (CDC). State population and racial composition data was collected from one-year estimates from the Census Bureau's 2018 American Community Survey (ACS). Table 4 in the supplemental material (S.2.1.) provides the racial composition statistics and case rate data. Table 5 in the supplemental material (S.2.2.) provides the model predictions, their uncertainty distributions, and the subsequent observations for COVID-19 mortality. Data was analyzed using Microsoft Excel and EXCALIBUR (a software package for using Cooke's Classical Method).

The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.