TY - JOUR T1 - Assessing the Performance of COVID-19 Forecasting Models in the U.S. JF - medRxiv DO - 10.1101/2020.12.09.20246157 SP - 2020.12.09.20246157 AU - Kyle J. Colonna AU - Roger M. Cooke AU - John S. Evans Y1 - 2021/01/01 UR - http://medrxiv.org/content/early/2021/09/04/2020.12.09.20246157.abstract N2 - Dozens of coronavirus disease 2019 (COVID-19) forecasting models have been created, however, little information exists on their performance. Here we examined the performance of nine oft-cited COVID-19 forecasting models, as well as equal- and performance-weighted ensembles, based on their predictive accuracy and precision, and their probabilistic ‘statistical accuracy (aka calibration)’ and ‘information’ scores (measures commonly employed in the evaluation of expert judgment) (Cooke, 1991). Data on observed COVID-19 mortality in eight states, selected to reflect differences in racial demographics and COVID-19 case rates, over eight weeks in the summer of 2020 and eight weeks in the winter of 2021, provided the basis for evaluating model forecasts and exploring the stability/robustness of the results. Two models exhibited superior performance with both predictive and probabilistic measures during both pandemic phases. Models that performed poorly reflected ‘overconfidence’ with tight forecast distributions. Models also systematically under-predicted mortality when cases were rising and over-predicted when cases were falling. Performance-weighted ensembles consistently outperformed the equal-weighted ensemble, with the Classical Model-weighted ensemble outperforming the predictive-performance-weighted ensemble. Model performance depended on the timeframe of interest and racial composition, with better predictive forecasts in the near-term and for states with relatively high proportions of non-Hispanic Blacks. Performance also depended on case rate, with better predictive forecasts for states with relatively low case rates but better probabilistic forecasts for states with relatively high case rates. Both predictive and probabilistic performance are important, and both deserve consideration by model developers and those interested in using these models to inform policy.Significance Statement Coronavirus disease 2019 (COVID-19) forecasting models can provide critical information for decision-makers; however, there has been little published information on their performance. We examined the COVID-19 mortality forecasting performance of nine commonly used and oft-cited models, as well as distribution-averaged equal- and performance-weighted ensembles of these models, during two distinct phases of the pandemic. Only two of the models demonstrated superior performance in both their point predictions and forecast probability distributions. Most of the other models exhibited overconfidence, with overly narrow probability distributions. Constructed performance-weighted ensembles consistently outperformed the equal-weighted ensemble, with the ensemble utilizing the Classical Model method performing best. Performance was also found to depend on timeframe of interest, state racial composition, and recent state case rates.Competing Interest StatementThe authors have declared no competing interest.Funding StatementKyle J. Colonna's involvement was funded by the Harvard Population Health Sciences PhD scholarship. Roger M. Cooke's involvement was pro bono. John S. Evans' involvement was funded by the Department of Environmental Health and the Harvard Cyprus Initiative at the T.F. Chan School of Public Health.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:No IRB/oversight body approval or exemption was necessary as the data is publicly available.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesObserved state COVID-19 mortality and case data was gathered from the Centers for Disease Control and Prevention (CDC) (39). State population and racial composition data was collected from one-year estimates from the Census Bureau's 2018 and 2019 American Community Survey (ACS) (40). Tables S3 & S4 in the SI appendix provide the racial composition statistics and case rate data. Model forecasting data was gathered from the COVID-19 Forecast Hub's publicly available structured data storage repository on GitHub (8). Tables S5 & S6 in the SI appendix provide the model and ensemble predictions, their uncertainty distributions, and the subsequent observations of COVID-19 mortality. ER -