Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

The United States COVID-19 Forecast Hub dataset

View ORCID ProfileEstee Y Cramer, Yuxin Huang, Yijin Wang, Evan L Ray, Matthew Cornell, View ORCID ProfileJohannes Bracher, Andrea Brennen, Alvaro J Castero Rivadeneira, Aaron Gerding, Katie House, Dasuni Jayawardena, Abdul H Kanji, Ayush Khandelwal, Khoa Le, View ORCID ProfileJarad Niemi, Ariane Stark, Apurv Shah, Nutcha Wattanchit, Martha W Zorn, View ORCID ProfileNicholas G Reich on behalf of the US COVID-19 Forecast Hub Consortium
doi: https://doi.org/10.1101/2021.11.04.21265886
Estee Y Cramer
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Estee Y Cramer
Yuxin Huang
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yijin Wang
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Evan L Ray
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Matthew Cornell
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Johannes Bracher
2Chair of Econometrics and Statistics, Karlsruhe Institute of Technology
3Computational Statistics Group, Heidelberg Institute for Theoretical Studies
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Johannes Bracher
Andrea Brennen
4IQT Labs
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alvaro J Castero Rivadeneira
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Aaron Gerding
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katie House
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dasuni Jayawardena
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Abdul H Kanji
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ayush Khandelwal
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Khoa Le
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jarad Niemi
5Iowa State University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jarad Niemi
Ariane Stark
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Apurv Shah
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nutcha Wattanchit
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Martha W Zorn
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nicholas G Reich
1University of Massachusetts Amherst
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nicholas G Reich
  • For correspondence: nick@umass.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Academic researchers, government agencies, industry groups, and individuals have produced forecasts at an unprecedented scale during the COVID-19 pandemic. To leverage these forecasts, the United States Centers for Disease Control and Prevention (CDC) partnered with an academic research lab at the University of Massachusetts Amherst to create the US COVID-19 Forecast Hub. Launched in April 2020, the Forecast Hub is a dataset with point and probabilistic forecasts of incident hospitalizations, incident cases, incident deaths, and cumulative deaths due to COVID-19 at national, state, and county levels in the United States. Included forecasts represent a variety of modeling approaches, data sources, and assumptions regarding the spread of COVID-19. The goal of this dataset is to establish a standardized and comparable set of short-term forecasts from modeling teams. These data can be used to develop ensemble models, communicate forecasts to the public, create visualizations, compare models, and inform policies regarding COVID-19 mitigation. These open-source data are available via download from GitHub, through an online API, and through R packages.

Background & Summary

To understand how the COVID-19 pandemic would progress in the United States, dozens of academic research groups, government agencies, industry groups, and individuals produced probabilistic forecasts for COVID-19 outcomes starting in March 2020.1 We have collected forecasts from over 82 modeling teams in a data repository, thus making forecasts easily accessible for COVID-19 response efforts and forecast evaluation. The data repository is called the United States (US) COVID-19 Forecast Hub (hereafter, Forecast Hub) and was created through a partnership between the US Centers for Disease Control and Prevention (CDC) and an academic research lab at the University of Massachusetts Amherst.

The Forecast Hub was launched in early April 2020 and contains real-time forecasts of reported COVID-19 cases, hospitalizations, and deaths. As of September 8, 2021, the Forecast Hub had collected nearly 65 million individual point or quantile predictions contained within over 4,600 submitted forecast files from over 100 unique models. The forecasts submitted each week reflected a variety of forecasting approaches, data sources, and underlying assumptions. There were no restrictions in place regarding the underlying information or code used to generate real-time forecasts. Each week, the latest forecasts were combined into an ensemble forecast (Figure 1) and all recent forecast data were updated on an official COVID-19 Forecasting page hosted by the US CDC.2 The ensemble models were also used in the weekly reports that are posted on the Forecast Hub website (https://covid19forecasthub.org/doc/reports/).

Figure 1:
  • Download figure
  • Open in new tab
Figure 1:

Time series of weekly incident deaths at the national level and forecasts from the COVID-19 Forecast Hub ensemble model for selected weeks in 2020 and 2021. Ensemble forecasts (blue) with 50%, 80% and 95% prediction intervals shown in shaded regions, and the ground-truth data (black) for incident cases (A), incident hospitalizations (B), incident deaths (C) and cumulative deaths (D). The truth data come from JHU CSSE (panels A, C, D) and HealthData.gov (panel B).

Forecasts are quantitative predictions about future observations. Forecasts differ from scenario-based projections, which examine feasible outcomes conditional on a variety of future assumptions. Because forecasts are unconditional estimates of future observations, they can be evaluated. An important feature of the Forecast Hub is that submitted forecasts are time-stamped so that the exact time at which a forecast was made public can be verified. In this way, the Forecast Hub serves as a public, independent registration system for these forecast model outputs. Data from the Forecast Hub have served as the basis for research articles for forecast evaluation3 and forecast combination.4–6 These studies can be used to determine how well models have performed at various points during the pandemic, which can, in turn, guide best practices for utilizing forecasts in practice and inform future forecasting efforts.3

Any modeling team was eligible to submit forecast data to the Forecast Hub, provided they submitted data in the correct format. Teams submitted predictions in a structured format to facilitate data validation, storage, and analysis. Teams also submitted a metadata file and license for their model’s data. Forecast data, ground truth data from the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE),7 New York Times (NYTimes),8 and USA Facts,9 as well as model metadata were stored in the public Forecast Hub GitHub repository.10

The forecasts were automatically synchronized with an online database called Zoltar via calls to a REpresentational State Transfer (REST) application programming interface (API)11 every six hours (Figure 2). Forecast data may be downloaded directly from GitHub, via the covidHubUtils R package,12 the zoltr R package13 or zoltpy python library.14

Figure 2:
  • Download figure
  • Open in new tab
Figure 2:

Schematic of the data storage and related infrastructure surrounding the COVID-19 Forecast Hub. (A) Forecasts are submitted to the COVID-19 Forecast Hub GitHub repository and undergo data format validation before being accepted into the system. (B) A continuous integration service ensures that the GitHub repository and PostgreSQL database stay in sync with essentially mirrored versions of the data. (C) Truth data for visualization, evaluation, and ensemble building are retrieved once per week using both the covidHubUtils and the covidData R packages. Truth data are stored in both repositories. (D) Once per week, an ensemble forecast submission is made using the covidEnsembles R package. It is submitted to the GitHub repository and undergoes the same validation as other submissions. (E) Using the covidHubUtils R package, forecast and truth data may be extracted from either the GitHub or PostgreSQL database in a standard format for tasks such as scoring or plotting.

This dataset of real-time forecasts created during the COVID-19 pandemic can provide insights into the shortcomings and successes of predictions and improve forecasting efforts in years to come. Though these data are restricted to forecasts for COVID-19 in the United States, the structure of this dataset has been used to create datasets of COVID-19 forecasts in the EU and the UK, and longer-term scenario projections in the US.15–18 The general structure of this data collection could be applied to additional diseases or forecasting outcome in the future.11

This large collaborative effort has provided data on short-term forecasts for over a year of forecasting efforts. These data were collected in real-time and therefore are not subject to retrospective biases. The data are also openly available to the public, thus fostering a transparent, open science approach to support public health efforts.

Methods

Data Acquisition

Beginning in April of 2020, the Reich Lab at the University of Massachusetts, Amherst, in partnership with the CDC, began collecting probabilistic forecasts of key COVID-19 outcomes in the United States (Table 1). The effort began by collecting forecasts of deaths and hospitalizations at the weekly and daily scale for the 50 US states, Washington DC, and 4 territories (Puerto Rico, US Virgin Islands, Guam, and the Northern Mariana Islands) as well as the aggregated US national level. In July 2020, the effort expanded to include forecasts of weekly incident cases at the county, state, and national levels. Forecasts may include a point prediction and/or quantiles of a predictive distribution.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1:

Forecast characteristics for all four outcomes. The table shows the temporal scale, spatial scale of locations, horizons stored, number of quantiles, and the dates of the earliest forecast, earliest standardized truth data, and the earliest ensemble build.

Any team was eligible to submit data to the Forecast Hub. Upon initial submission of forecast data, teams were required to upload a metadata file that briefly described the methods used to create the forecasts and specified a license under which their forecast data were released. No model code was stored by the Forecast Hub.

During the first month of operation, members of the Forecast Hub team downloaded forecasts made available by teams publicly online, transformed these into the correct format, and pushed them into the Forecast Hub repository. Starting in May 2020, all teams were required to format and submit their own forecasts.

Repository structure

The dataset is stored in two locations, and all data can be accessed through either source. The first is the COVID-19 Forecast Hub GitHub repository and the second is an online database, Zoltar, which can be accessed via a REST API.11 Details about data format and access are documented in the subsequent sections.

Zoltar: data backend

The data can be accessed through the Zoltar forecast repository REST API. Through the API, subsets of submitted forecasts can be queried directly from a PostgreSQL database. This eliminates the need to access individual CSV files and facilitates access to versions of forecasts in cases when they are updated.

Outcomes and locations

The Forecast Hub dataset stores forecasts for four different outcomes: incident hospitalizations, incident cases, incident deaths, and cumulative deaths (Table 1). Incident hospitalizations can be submitted for a horizons of 1 - 130 days in the future, incident cases can be submitted for 1 - 8 weeks in the future, and incident and cumulative deaths can be submitted for 1 - 20 weeks into the future. For all outcomes, forecasts can be submitted on a national and state level. Incident case forecasts were first introduced as a forecast outcome several months after the Hub started and have several key differences with other predicted outcomes. They are the only outcome for which the Hub accepts county-level forecasts in addition to the state and national level. Because there are over 3,000 counties in the US, this required some compromises on the scale of data collected for these forecasts in other ways. Specifically, case forecasts are required to have fewer quantiles (seven quantiles) compared to other outcomes which can have up to twenty-three quantiles. This gives a coarser representation of the forecast (see the section on Forecast format below).

Weekly targets follow the standard of epidemiological weeks (EW) used by the CDC, which defines a week to start on a Sunday and end on the following Saturday.19 Forecasts of cumulative deaths target the number of cumulative deaths reported by the Saturday ending a given week. Forecasts of weekly incident cases or deaths target the difference between reported cumulative cases or deaths on consecutive Saturdays. As an example of a forecast and the corresponding observation, forecasts submitted between Tuesday, October 6, 2020 (day 3 of EW41) and Monday, October 12, 2020 (day 2 of EW42) contained a “1 week ahead” forecast of incident deaths that corresponded to the change in cumulative reported deaths observed in EW42 (i.e., the difference between the cumulative reported deaths on Saturday, October 17, 2020, and Saturday, October 10, 2020), a “2 week ahead” forecast that corresponded to the change in cumulative reported deaths in week EW43. In this paper, we refer to the “forecast week” of a submitted forecast as the week corresponding to a “0-week ahead” horizon. In the example above, the forecast week would be EW41. Daily incident hospitalization horizons are for the number of reported hospitalizations a specified number of days after the forecast was generated.

Forecast assumptions

Forecasters used a variety of assumptions to build models and generate predictions. Forecasting approaches include statistical or machine learning models, mechanistic models incorporating disease transmission dynamics, and combinations of multiple approaches.3 Teams have also included varying assumptions regarding future changes in policies and physical distancing measures, the transmissibility of COVID-19, vaccination rates, and the spread of new virus variants throughout the United States.

Weekly submissions

A forecast submission consists of a single comma-separated value (CSV) file submitted via pull request to the GitHub repository. Forecast submissions are validated for technical accuracy and formatting (see exclusion criteriabelow) before being merged. To be included in the weekly ensemble model, teams were required to submit their forecast on Sunday or prior to a deadline on Monday. The majority of teams contributing to the dataset submitted forecasts to the Hub repository on Sunday or Monday, although some teams submitted at other times depending on their model production schedule.

Model designation

Each model stored in the repository must have a classification of “primary”, “secondary” “other”. Each team must only have one “primary” model. Teams submitting multiple models with similar forecasting approaches can use the designations “secondary” or “other” for their models. Models with the designation “primary” are included in evaluations, the weekly ensemble, and the visualization. The “secondary” label is designed for models that have a substantive methodological difference than a team’s “primary” model. Models with the designation “secondary” are included only in the weekly ensemble and the visualization. The “other” label is designed for models that are small variations on a team’s “primary” model. Models with the designation “other” are not included in evaluations, the ensemble build, or the visualization.

Ensemble and baseline forecasts

Several models have a special status, either as a baseline or as an ensemble that combines multiple models from the Hub to create a single forecast.

The COVIDhub-baseline model was created by the Hub in May 2020 as a benchmarking model. Its point forecast is the most recent observed value as of the forecast creation date with a probability distribution around that based on weekly differences in previous observations.3 The baseline model initially produced forecasts for case and death outcomes. Hospitalization baseline forecasts were added in September 2021.

The COVIDhub-ensemble model creates a combination of submitted forecasts to the Hub. Other work details the methods used for determining the appropriate combination approach.4,5 Starting in February 2021, GitHub tags were created to document the exact version of the repository used each week to create the COVIDhub-ensemble forecast. This creates an auditable trail in the repository so the correct version of the used forecasts could be recovered even in cases when some forecasts were subsequently updated.

Several other models also are combinations of some or all models submitted to the Forecast Hub. As of August 1, 2021, these models are COVIDhub-trained_ensemble, FDANIHASU-Sweight, JHUAPL-SLPHospEns, and KITmetricslab-select_ensemble. These models are flagged in the metadata using the Boolean metadata field, “ensemble_of_hub_models”.

Exclusion criteria

No forecasts were excluded from the dataset due to the forecast values or the background experience of the forecasters. Forecast files were only rejected if they did not meet the automatic formatting criteria implemented through automatic GitHub checks.20 These included checks to ensure that, among other criteria:

  • A forecast file is submitted no more than 2 days after it has been created (to ensure forecasts submitted were truly prospective). The creation date is based on the date in the filename created by the submitting team.

  • The forecast dates in the content of the file are in the format YYYY-MM-DD and match the creation date.

  • Quantile forecasts do not contain any quantiles at probability levels other than those required (see Forecast Format section below).

Updates to files

To ensure that forecasting is done in real-time, all forecasts are required to be submitted to the Hub within 2 days of the forecast date, which is listed in a column within each forecast file. Though occasional late submissions were accepted up through January of 2021, the policy was updated to not accept late forecasts due to missed deadlines, updated modeling methods, or other reasons.

Exceptions to this policy were made if there were programing or data errors that affected the forecasts in the original submission or if a new team joined. If there was an error, teams were required to submit a comment with their updated submission affirming that there was a bug and that the forecast was only produced using data that were available at the time of the original submission. In the case of updates to forecast data, both the old and updated versions of the forecasts can be accessed either through the GitHub commit history or through time-stamped queries of the forecasts in the Zoltar database. Note that an updated forecast can include “retracting” a particular set of predictions in the case when an initial forecast was not able to be updated. When new teams join the Hub, they can submit late forecasts if they can provide publicly available evidence that the forecasts were made in real-time (e.g. GitHub commit history).

Ground truth data

Data from the JHU CSSE dataset21 are used as the ground truth data for reported cases and deaths. Data from the HealthData.gov system for state-level hospitalizations are used for the hospitalization outcome. JHU CSSE obtained counts of cases and deaths by collecting and aggregating reports from state and local health departments. HealthData.gov contains reports of hospitalizations assembled by the U.S. Department of Health and Human Services. Teams were encouraged to use these sources to build models. Although hospitalization forecasts were collected starting in March 2020, the hospitalizations data from HealthData.gov were only available later, and we started encouraging teams to target these data in November 2020. Some teams used alternate data sources including the NYTimes, USAFacts, US Census data, and other signals.3 Versions of truth data from JHU CSSE, USAFacts, and the NYTimes are stored in the GitHub repository.

Previous reports of ground truth data for past time points were occasionally updated as new records became available, definitions of reportable cases, deaths, or hospitalizations changed, or errors in data collection were identified and corrected. These revisions to the data are sometimes quite substantial, and for some purposes such as retrospective ensemble construction, it is necessary to use the data that would have been available in real-time. The historically versioned data can be accessed either through GitHub commit records, data versions released on HealthData.gov, or third-party tools such as the covidcast API provided by the Delphi group at Carnegie Mellon University or the covidData R package.22

Data Records

Summary of forecast data collected

In the initial weeks of submission, there were fewer than 10 models providing forecasts. As the pandemic spread, the number of teams submitting forecasts increased to 82; as of July 2021, 82 primary, 4 secondary models and 15 models with the designation “other” had been submitted to the Forecast Hub. In the first six months of 2021, a median of 35.5 teams (range: 30 to 38) contributed incident case forecasts (Fig 3a), a median of 12 teams (range: 9 to 14) contributed incident hospitalizations (Fig 3b), a median of 43 teams (range 37 to 49) contributed incident death forecasts (Fig 3c), and a median of 44 teams (range 34 to 46) contributed cumulative death forecasts (Fig 3d). As of September 8 2021, the dataset contained 4,602 forecast files with 64,902,239 point or quantile predictions for unique combinations of targets and locations.

Figure 3:
  • Download figure
  • Open in new tab
Figure 3:

Number of primary forecasts submitted for each outcome per week from April 27th 2020 through July 31st 2021. In the initial weeks of submission, fewer than 10 models provided forecasts. Over time, the number of teams submitting forecasts for each forecasted outcome increased into early 2021 and then saw a small decline through the summer of 2021.

GitHub repository data structure

Forecasts in the GitHub repository are available in subfolders organized by model. Folders are named with a team name and model name, and each folder includes a metadata file and forecast files. Forecast CSV files are named using the format “<YYYY-MM-DD>-<team abbreviation>-<model abbreviation>.csv”. In these files, each row contains data for a single outcome, location, horizon, and point or quantile prediction as described above.

The metadata file for each team, named using the format “metadata-<team abbreviation>-<model abbreviation>.txt”, contains relevant information about the team and the model that the team is using to generate forecasts.

Forecast format

Forecasts were required to be submitted in the format of point predictions and/or quantile predictions. Point predictions represented single “best” predictions with no uncertainty, typically representing a mean or median prediction from the model. Quantile predictions are an efficient format for storing predictive distributions of a wide range of outcomes.

Quantile representations of predictive distributions lend themselves to natural computations of, for example, pinball loss or a weighted interval score, both strictly proper scoring rules that can be used to evaluate forecasts23 However, they do not capture the structure of the tails of the predictive distribution beyond the reported quantiles. Also, the quantile format does not preserve any information on correlation structures between different outcomes.

Variable descriptions

The forecast data in this dataset are stored in seven columns:

  1. forecast_date - the date the forecast was made in the format YYYY-MM-DD.

  2. target - a character string giving the number of days/weeks ahead that are being forecasted (horizon) and the outcome. Horizons must be one of the following:

    1. “N wk ahead cum death” where N is a number between 1 and 20

    2. “N wk ahead inc death” where N is a number between 1 and 20

    3. “N wk ahead inc case” where N is a number between 1 and 8

    4. “N day ahead inc hosp” where N is a number between 0 and 130

  3. target_end_date - a character string representing the date for the forecast target in the format YYYY-MM-DD. For “k day-ahead” targets, target_end_date will be k days after forecast_date. For “k week ahead” targets, target_end_date will be the Saturday at the end of the specified epidemic week, as described above.

  4. location - character string of Federal Information Processing Standard Publication (FIPS) codes identifying U.S. states, counties, territories, and districts as well as “US” for national forecasts. The values for the FIPS codes are available in a CSV file in the repository and as a data object in the covidHubUtils R package for convenience.

  5. type - character value of “point” or “quantile” indicating whether the row corresponds to a point forecast or a quantile forecast.

  6. quantile - the probability level for a quantile forecast. For death and hospitalization forecasts, forecasters can submit quantiles at 23 probability levels: 0.01, 0.025, 0.05, 0.10, 0.15, …, 0.95, 0.975, 0.99. For cases, teams can submit up to 7 quantiles at levels .025, 0.100, 0.250, 0.5, 0.750, 0.900 and 0.975. If the forecast “type” is equal to “point”, the value in the quantile column is equal to “NA”.

  7. value - non-negative numbers indicating the “point” or “quantile” prediction for the row. For a “point” prediction, value is simply the value of that point prediction for the target and location associated with that row. For a “quantile” prediction, the model predicts that the eventual observation will be less than this value with the probability given by the quantile probability level.

Metadata format

Each team documents their model information in a metadata file which is required along with the first forecast submission. Each team is asked to record their model’s design and assumptions, the model contributors, the team’s website, information regarding the team’s data sources, and a brief model description. Teams may update their metadata file periodically to keep track of minor changes to a model.

Variable descriptions

A standard metadata file should be a YAML file with the following required fields in a specific order:

  1. team_name - the name of the team (less than 50 characters).

  2. model_name - the name of the model (less than 50 characters).

  3. model_abbr - an abbreviated and uniquely identified name for the model that is less than 30 alphanumeric characters. The model abbreviation must be in the format of ‘[team_abbr]-[model_abbr]’ where each of the ‘[team_abbr]’ and ‘[model_abbr]’ are text strings that are each less than 15 alphanumeric characters that do not include a hyphen or whitespace.

  4. model_contributors - a list of all individuals involved in the forecasting effort, affiliations, and email addresses. At least one contributor needs to have a valid email address. The syntax of this field should be name1 (affiliation1) <user@address>, name2 (affiliation2) <user2@address2>

  5. website_url* - a URL to a website that has additional data about the model. We encourage teams to submit the most user-friendly version of the model, e.g. a dashboard, or similar, that displays the model forecasts. If there is an additional data repository where forecasts and other model code are stored, this can be included in the methods section. If only a more technical site, e.g. GitHub repo, exists that link should be included here.

  6. license - one of the acceptable license types in the Forecast Hub. We encourage teams to submit as a “cc-by-4.0” to allow the broadest possible use, including private vaccine production (which would be excluded by the “cc-by-nc-4.0” license). If the value is “LICENSE.txt”, then a LICENSE.txt file must exist within the model folder and provide a license.

  7. team_model_designation - upon initial submission this field should be one of “primary”, “secondary” or “other”.

  8. methods - a brief description of the forecasting methodology that is less than 200 Characters.

  9. ensemble_of_hub_models - a Boolean value (‘true’ or ‘false’) that indicates whether a model combines multiple hub models into an ensemble.

*previously named model_output

Teams are also encouraged to add model information with optional fields described in Supplement 1.

Technical Validation

Two similar but distinct validation processes were used to validate data on the GitHub repository and on Zoltar.

GitHub repository

Validations were set up using GitHub Actions to manage continuous integration and automated data checking.20 Teams submitted their metadata files and forecasts through pull requests on GitHub. Each time a new pull request was submitted, a validation script ran on all new or updated files in the pull request to test for their validity. Separate checks ran on metadata file changes and forecast data file changes.

The metadata file for each team was required to be in valid YAML format, and a set of specific checks were required before a new metadata file could be merged into the repository. Checks included ensuring that the proposed team and model names do not conflict with existing names, that a valid license for data reuse is specified, and that a valid model designation was present. A list of specific validations for metadata may be found in Supplement 2.

New or changed forecast data files for each team were required to pass a series of checks for data formatting and validity. These checks also ensured that the forecast data files did not meet any of the exclusion criteria (see the Methods section for specific rules). Furthermore, a list of specific validations for forecast data files is provided in Supplement 2.

Zoltar

When a new forecast file is uploaded to Zoltar, unit tests are run on the file to make sure that forecast elements contain valid structure. (For a detailed specification of the structure of forecast elements, see https://docs.zoltardata.com/validation/.) If a forecast file does not pass all unit tests, the upload will fail and the forecast file will not be added to the database; only when all tests pass will the new forecast be added to Zoltar. The validations in place on GitHub ensure that only valid forecasts will be uploaded to Zoltar.

Observed data

Raw observed data from multiple sources including JHU, NYTimes, USAFacts, and Healthdata.gov is downloaded and reformatted using the scripts in the R packages covidHubUtils (https://github.com/reichlab/covidHubUtils) and covidData (https://github.com/reichlab/covidData. This data generating process is automated by GitHub Actions every week and the results (called “truth data”) are directly uploaded to the Forecast Hub repository and Zoltar. In specific, case and death raw observed data are aggregated to a weekly level and all three outcomes (cases, deaths, and hospitalization) are reformatted for use within the Hub.

Usage Notes

We have developed the covidHubUtils R package (https://github.com/reichlab/covidHubUtils) to facilitate bulk retrieval of forecasts for analysis and evaluation. Examples of how to use the covidHubUtils package and its functions can be found at https://reichlab.io/covidHubUtils/. The package supports loading forecasts from a local clone of the GitHub repository or by querying data from Zoltar. The package supports common actions for working with the data, such as loading in specific subsets of forecasts, plotting forecasts, scoring forecasts, retrieving ground truth data, and many other utility functions to simplify working with the data.

Communicating results from the COVID-19 Forecast Hub

Communication of probabilistic forecasts to the public is challenging, 24,25 and the best practices regarding the communication of outbreaks are still developing.26 Starting in April 2020, the CDC published weekly summaries of these forecasts on their public website27, and these forecasts were occasionally used in public briefings by the CDC director.28 Additional examples of the communication of Forecast Hub data can be viewed through weekly reports generated by the Hub team for dissemination to the general public, including state and local departments of health.(https://covid19forecasthub.org/doc/reports/)

Data Availability

All data produced are available online at https://github.com/reichlab/covid19-forecast-hub

https://github.com/reichlab/covid19-forecast-hub

Funding

For teams that reported receiving funding for their work, we report the sources and disclosures below.

AIpert-pwllnod:Natural Sciences and Engineering Research Council of Canada

Caltech-CS156: Gary Clinard Innovation Fund

CEID-Walk: University of Georgia

CMU-TimeSeries: CDC Center of Excellence, gifts from Google and Facebook

COVIDhub:This work has been supported by the US Centers for Disease Control and Prevention (1U01IP001122) and the National Institutes of General Medical Sciences (R35GM119582). The content is solely the responsibility of the authors and does not necessarily represent the official views of CDC, NIGMS or the National Institutes of Health. Johannes Bracher was supported by the Helmholtz Foundation via the SIMCARD Information & Data Science Pilot Project. Tilmann Gneiting gratefully acknowledges support by the Klaus Tschira Foundation.

CU-select: NSF DMS-2027369 and a gift from the Morris-Singer Foundation.

DDS-NBDS:NSF III-1812699

epiforecasts-ensemble1: Wellcome Trust (210758/Z/18/Z)

FDANIHASU: supported by the Intramural Research Program of the NIH/NIDDK

GT_CHHS-COVID19: William W. George Endowment, Virginia C. and Joseph C. Mello Endowment, NSF DGE-1650044, NSF MRI 1828187, research cyberinfrastructure resources and services provided by the Partnership for an Advanced Computing Environment (PACE) at Georgia Tech, and the following benefactors at Georgia Tech: Andrea Laliberte, Joseph C. Mello, Richard “Rick” E. & Charlene Zalesky, and Claudia & Paul Raines, CDC MInD-Healthcare U01CK000531-Supplement.

IHME: This work was supported by the Bill & Melinda Gates Foundation, as well as funding from the state of Washington and the National Science Foundation (award no. FAIN: 2031096)

Imperial-ensemble1:SB acknowledges funding from the Wellcome Trust (219415).

Institute of Business Forecasting : IBF

IowaStateLW-STEM: NSF DMS-1916204, Iowa State University Plant Sciences Institute Scholars Program, NSF DMS-1934884, Laurence H. Baker Center for Bioinformatics and Biological Statistics.

IUPUI CIS : NSF

JHU_CSSE-DECOM: JHU CSSE: National Science Foundation (NSF) RAPID “Real-time Forecasting of COVID-19 risk in the USA”. 2021-2022. Award ID: 2108526. National Science Foundation (NSF) RAPID “Development of an interactive web-based dashboard to track COVID-19 in real-time”. 2020. Award ID: 2028604

JHU_IDD-CovidSP: State of California, US Dept of Health and Human Services, US Dept of Homeland Security, Johns Hopkins Health System, Office of the Dean at Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University Modeling and Policy Hub, Centers for Disease Control and Prevention (5U01CK000538-03), University of Utah Immunology, Inflammation, & Infectious Disease Initiative (26798 Seed Grant).

JHU_UNC_GAS-StatMechPool: NIH NIGMS: R01GM140564

JHUAPL-Bucky: US Dept of Health and Human Services

KITmetricslab-select_ensemble: Daniel Wolffram gratefully acknowledges support by the Klaus Tschira Foundation.

LANL-GrowthRate: LANL LDRD 20200700ER

MIT-Cassandra: MIT Quest for Intelligence

MOBS-GLEAM_COVID: COVID Supplement CDC-HHS-6U01IP001137-01; CA

NU38OT000297 from the Council of State and Territorial Epidemiologists (CSTE)

NotreDame-FRED: NSF RAPID DEB 2027718

NotreDame-mobility: NSF RAPID DEB 2027718

PSI-DRAFT: NSF RAPID Grant # 2031536

QJHong-Encounter: NSF DMR-2001411 and DMR-1835939

SDSC_ISG-TrendModel: The development of the dashboard was partly funded by the Fondation Privée des Hôpitaux Universitaires de Genève

UA-EpiCovDA: NSF RAPID Grant # 2028401

UChicagoCHATTOPADHYAY-UnIT: Defense Advanced Research Projects Agency (DARPA) #HR00111890043/P00004 (I. Chattopadhyay, University of Chicago).

UCSB-ACTS: NSF RAPID IIS 2029626

UCSD_NEU-DeepGLEAM: Google Faculty Award, W31P4Q-21-C-0014

UMass-MechBayes: NIGMS #R35GM119582, NSF #1749854, NIGMS #R35GM119582

UMich-RidgeTfReg: This project is funded by the University of Michigan Physics Department and the University of Michigan Office of Research.

UVA-Ensemble: National Institutes of Health (NIH) Grant 1R01GM109718, NSF BIG DATA Grant IIS-1633028, NSF Grant No.: OAC-1916805, NSF Expeditions in Computing Grant CCF-1918656, CCF-1917819, NSF RAPID CNS-2028004, NSF RAPID OAC-2027541, US Centers for Disease Control and Prevention 75D30119C05935, a grant from Google, University of Virginia Strategic Investment Fund award number SIF160, Defense Threat Reduction Agency (DTRA) under Contract No. HDTRA1-19-D-0007, and Virginia Dept of Health Grant VDH-21-501-0141

Wadnwani_AI-BayesOpt: This study is made possible by the generous support of the American People through the United States Agency for International Development (USAID). The work described in this article was implemented under the TRACETB Project, managed by WIAI under the terms of Cooperative Agreement Number 72038620CA00006. The contents of this manuscript are the sole responsibility of the authors and do not necessarily reflect the views of USAID or the United States Government.

WalmartLabsML-LogForecasting: Team acknowledges Walmart to support this study

Author Consortium

Estee Y Cramer1, Yuxin Huang1, Yijin Wang1, Evan L Ray1, Matthew Cornell1, Johannes Bracher2,3, Andrea Brennen4, Alvaro J Castro Rivadeneira1, Aaron Gerding1, Katie House1, Dasuni Jayawardena1, Abdul H Kanji1, Ayush Khandelwal1, Khoa Le1, Jarad Niemi5, Ariane Stark1, Apurv Shah1, Nutcha Wattanachit1, Martha W Zorn1, Tilmann Gneiting2, Anja Mühlemann6, Youyang Gu7, Yixian Chen8, Krishna Chintanippu8, Viresh Jivane8, Ankita Khurana8, Ajay Kumar8, Anshul Lakhani8, Prakhar Mehrotra8, Sujitha Pasumarty8, Monika Shrivastav8, Jialu You8, Nayana Bannur9, Ayush Deva9, Sansiddh Jain9, Mihir Kulkarni9, Srujana Merugu9, Alpan Raval9, Siddhant Shingi9, Avtansh Tiwari9, Jerome White9, Aniruddha Adiga10, Benjamin Hurt10, Bryan Lewis10, Madhav Marathe10, Akhil Sai Peddireddy10, Przemyslaw Porebski10, Srinivasan Venkatramanan10, Lijing Wang10, Maytal Dahan11, Spencer Fox12, Kelly Gaither11, Michael Lachmann13, Lauren Ancel Meyers12, James G Scott12, Mauricio Tec12, Spencer Woody12, Ajitesh Srivastava14, Tianjian Xu14, Jeffrey C Cegan15, Ian D Dettwiller15, William P England15, Matthew W Farthing15, Glover E George15, Robert H Hunter15, Brandon Lafferty15, Igor Linkov15, Michael L Mayo15, Matthew D Parno15, Michael A Rowland15, Benjamin D Trump15, Samuel Chen16, Stephen V Faraone16, Jonathan Hess16, Christopher P Morley16, Asif Salekin17, Dongliang Wang16, Yanli Zhang-James16, Thomas M Baer18, Sabrina M Corsetti19, Marisa C Eisenberg19, Karl Falb19, Yitao Huang19, Emily T Martin19, Ella McCauley19, Robert L Myers19, Tom Schwarz19, Graham Casey Gibson1, Daniel Sheldon1, Liyao Gao20, Yian Ma21, Dongxia Wu21, Rose Yu22,21, Xiaoyong Jin23, Yu-Xiang Wang23, Xifeng Yan23, YangQuan Chen24, Lihong Guo25, Yanting Zhao26, Jinghui Chen27, Quanquan Gu27, Lingxiao Wang27, Pan Xu27, Weitong Zhang27, Difan Zou27, Ishanu Chattopadhyay28, Yi Huang28, Guoqing Lu29, Ruth Pfeiffer30, Timothy Sumner31, Liqiang Wang31, Dongdong Wang31, Shunpu Zhang31, Zihang Zou31, Hannah Biegel32, Joceline Lega32, Fazle Hussain33, Zeina Khan33, Frank Van Bussel33, Steve McConnell34, Stephanie L Guertin35, Christopher Hulme-Lowe35, VP Nagraj35, Stephen D Turner35, Benjamín Bejar36, Christine Choirat36, Antoine Flahault37, Ekaterina Krymova36, Gavin Lee36, Elisa Manetti37, Kristen Namigai37, Guillaume Obozinski36, Tao Sun36, Dorina Thanou38, Xuegang Ban20, Yunfeng Shi39, Robert Walraven7, Qi-Jun Hong40,41, Axel van de Walle41, Michal Ben-Nun42, Steven Riley43, Pete Riley42, James A Turtle42, Duy Cao44, Joseph Galasso44, Jae H Cho7, Areum Jo7, David DesRoches45, Pedro Forli45, Bruce Hamory45, Ugur Koyluoglu45, Christina Kyriakides45, Helen Leis45, John Milliken45, Michael Moloney45, James Morgan45, Ninad Nirgudkar45, Gokce Ozcan45, Noah Piwonka45, Matt Ravi45, Chris Schrader45, Elizabeth Shakhnovich45, Daniel Siegel45, Ryan Spatz45, Chris Stiefeling45, Barrie Wilkinson45, Alexander Wong45, Sean Cavany46, Guido España46, Sean Moore46, Rachel Oidtman28,46, Alex Perkins46, Andrea Kraus47, David Kraus47, Jiang Bian48, Wei Cao48, Zhifeng Gao48, Juan Lavista Ferres48, Chaozhuo Li48, Tie-Yan Liu48, Xing Xie48, Shun Zhang48, Shun Zheng48, Matteo Chinazzi49, Alessandro Vespignani50,49, Xinyue Xiong49, Jessica T Davis49, Kunpeng Mu49, Ana Pastore y Piontti49, Jackie Baek51, Vivek Farias52, Andreea Georgescu51, Retsef Levi52, Deeksha Sinha51, Joshua Wilde51, Andrew Zheng51, Amine Bennouna51, David Nze Ndong52, Georgia Perakis53, Divya Singhvi54, Ioannis Spantidakis51, Leann Thayaparan51, Asterios Tsiourvas51, Shane Weisberg51, Ali Jadbabaie55, Arnab Sarker55, Devavrat Shah55, Leo A Celi56, Nicolas D Penna56, Saketh Sundar57, Russ Wolfinger58, Lauren Castro59, Geoffrey Fairchild59, Isaac Michaud59, Dave Osthus59, Daniel Wolffram2,3, Dean Karlen60,61, Mark J Panaggio62, Matt Kinsey62, Luke C. Mullany62, Kaitlin Rainwater-Lovett62, Lauren Shin62, Katharine Tallaksen62, Shelby Wilson62, Michael Brenner63,64, Marc Coram64, Jessie K Edwards65, Keya Joshi66, Ellen Klein64, Juan Dent Hulse67, Kyra H Grantz67, Alison L Hill68, Joshua Kaminsky67, Kathryn Kaminsky7, Lindsay T Keegan69, Stephen A Lauer67, Elizabeth C Lee67, Joseph C Lemaitre70, Justin Lessler71, Hannah R Meredith67, Javier Perez-Saez67, Sam Shah7, Claire P Smith67, Shaun A Truelove67, Josh Wills7, Lauren Gardner68, Maximilian Marshall68, Kristen Nixon68, John C. Burant7, Wen-Hao Chiang72, George Mohler72, Junyi Gao73, Lucas Glass74, Cheng Qian74, Justin Romberg75, Rakshith Sharma74, Jeffrey Spaeder76, Jimeng Sun73, Cao Xiao77, Lei Gao6, Zhiling Gu6, Myungjin Kim6, Xinyi Li78, Guannan Wang79, Lily Wang6, Yueying Wang6, Shan Yu10, Chaman Jain80, Sangeeta Bhatia81, Pierre Nouvellet81,82, Ryan Barber20, Emmanuela Gaikedu20, Simon Hay20, Steve Lim20, Chris Murray20, David Pigott20, Robert C Reiner20, Prasith Baccam83, Heidi L Gurung83, Steven A Stage83, Bradley T Suchoski83, Chung-Yan Fong84, Dit-Yan Yeung84, Bijaya Adhikari85, Jiaming Cui75, B. Aditya Prakash75, Alexander Rodríguez75, Anika Tabassum75,86, Jiajia Xie75, John Asplund87, Arden Baxter88, Pinar Keskinocak88, Buse Eylul Oruc88, Nicoleta Serban88, Sercan O Arik89, Mike Dusenberry89, Arkady Epshteyn89, Elli Kanal89, Long T Le89, Chun-Liang Li89, Tomas Pfister89, Rajarishi Sinha89, Thomas Tsai90, Nate Yoder89, Jinsung Yoon89, Leyou Zhang89, Daniel Wilson91, Artur A Belov92, Carson C Chow93, Richard C Gerkin40, Osman N Yogurtcu92, Mark Ibrahim94, Timothee Lacroix94, Matthew Le94, Jason Liao95, Maximilian Nickel94, Levent Sagun94, Sam Abbott96, Nikos I Bosse96, Sebastian Funk96, Joel Hellewell96, Sophie R Meakin96, Katharine Sherratt96, Rahi Kalantari97, Mingyuan Zhou97, Sen Pei98, Jeffrey Shaman98, Teresa K Yamana98, Omar Skali Lami51, Dimitris Bertsimas52, Michael L Li51, Saksham Soni51, Hamza Tazi Bouardi51, Madeline Adee99, Turgay Ayer100,88, Jagpreet Chhatwal101, Ozden O Dalgic102, Mary A Ladd99, Benjamin P Linas103, Peter Mueller99, Jade Xiao88, Qinxia Wang98, Yuanjia Wang98, Shanghong Xie98, Donglin Zeng104, Jacob Bien14, Logan Brooks105, Alden Green105, Addison J Hu105, Maria Jahja105, Daniel McDonald106, Balasubramanian Narasimhan107, Collin Politsch105, Samyak Rajanala107, Aaron Rumack105, Noah Simon20, Ryan J Tibshirani105, Rob Tibshirani107, Valerie Ventura105, Larry Wasserman105, John M Drake108, Eamon B O’Dea108, Yaser Abu-Mostafa109, Rahil Bathwal109, Nicholas A Chang109, Pavan Chitta109, Anne Erickson109, Sumit Goel109, Jethin Gowda109, Qixuan Jin109, HyeongChan Jo109, Juhyun Kim109, Pranav Kulkarni109, Samuel M Lushtak109, Ethan Mann109, Max Popken109, Connor Soohoo109, Kushal Tirumala109, Albert Tseng109, Vignesh Varadarajan109, Jagath Vytheeswaran109, Christopher Wang109, Akshay Yeluri109, Dominic Yurk109, Michael Zhang109, Alexander Zlokapa109, Robert Pagano110, Chandini Jain111, Vishal Tomar111, Lam Ho112, Huong Huynh113,114, Quoc Tran113,115, Velma K Lopez116, Jo W Walker116, Rachel B Slayton116, Michael A Johansson116, Matthew Biggerstaff116, Nicholas G Reich1

1University of Massachusetts Amherst

2Chair of Econometrics and Statistics, Karlsruhe Institute of Technology

3Computational Statistics Group, Heidelberg Institute for Theoretical Studies

4IQT Labs

5Iowa State University

6Institute of Mathematical Statistics and Actuarial Science, University of Bern

7Unaffiliated

8Walmart

9Wadhwani Institute of Artificial Intelligence

10University of Virginia

11Texas Advanced Computing Center

12University of Texas at Austin

13Santa Fe Institute

14University of Southern California

15US Army Engineer Research and Development Center

16State University of New York Upstate Medical University

17Syracuse University

18Trinity University, San Antonio

19University of Michigan - Ann Arbor

20University of Washington

21University of California, San Diego

22Northeastern University

23University of California at Santa Barbara

24University of California, Merced

25Jilin University

26University of Science and Technology of China

27University of California, Los Angeles

28University of Chicago

29University of Nebraska Omaha

30National Cancer Institute (NCI), NIH

31University of Central Florida

32University of Arizona

33Texas Tech University

34Construx

35Signature Science, LLC

36Swiss Data Science Center, EPFL & ETHZ

37Institute of Global Health, Faculty of Medicine, University of Geneva

38Center for Intelligent Systems, EPFL

39Rensselaer Polytechnic Institute

40Arizona State University

41Brown University

42Predictive Science, Inc

43Imperial College, London

44University of Dallas

45Oliver Wyman

46University of Notre Dame

47Masaryk University

48Microsoft

49Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University

50ISI Foundation

51Operations Research Center, Massachusetts Institute of Technology

52Sloan School of Management, Massachusetts Institute of Technology

53Sloan School of Management and Operations Research Center, Massachusetts Institute of Technology

54Leonard N Stern School of Business, New York University

55Institute for Data, Systems, and Society, Massachusetts Institute of Technology

56Laboratory for Computational Physiology, Massachusetts Institute of Technology

57River Hill High School

58SAS Institute Inc

59Los Alamos National Laboratory

60TRIUMF

61University of Victoria

62Johns Hopkins University Applied Physics Lab

63 School of Engineering and Applied Sciences, Harvard University

64Google Research

65Department of Epidemiology, UNC Gillings School of Public Health, University of North Carolina at Chapel Hill

66Harvard TH Chan School of Public Health

67Johns Hopkins Bloomberg School of Public Health

68Johns Hopkins University

69University of Utah

70École Polytechnique Fédérale de Lausanne

71Department of Epidemiology, University of North Carolina Gillings School of Global Public Health

72Indiana University–Purdue University Indianapolis

73University of Illinois at Urbana-Champaign

74Analytics Center of Excellence, IQVIA

75Georgia Institute of Technology

76IQVIA

77Amplitude

78Clemson University

79College of William & Mary

80Institute of Business Forecasting

81Imperial College London

82School of Life Sciences, University of Sussex

83IEM, Inc.

84The Hong Kong University of Science and Technology

85University of Iowa

86Virginia Tech

87Metron, Inc.

88Georgia Insitute of Technology

89Google Cloud

90Harvard University

91Federal Reserve Bank of San Francisco

92Food and Drug Administration, Center for Biologics Evaluation and Research

93NIH

94Facebook AI Research

95Facebook

96London School of Hygiene & Tropical Medicine

97The University of Texas at Austin

98Columbia University

99Massachusetts General Hospital

100Emory University Medical School

101Massachusetts General Hospital, Harvard Medical School

102Value Analytics Labs

103Boston University School of Medicine

104UNC Chapel Hill

105Carnegie Mellon University

106University of British Columbia

107Stanford University

108University of Georgia

109California Institute of Technology

110No affiliation

111Auquan

112Dalhousie University

113AIpert

114Virtual Power System

115Walmart Inc.

116Centers for Disease Control and Prevention

Supplemental Information

Supplement 1: Optional fields in each metadata file:

  1. institution_affil - University or company names, if relevant.

  2. team_funding - Like an acknowledgement in a manuscript, you can acknowledge funding here.

  3. repo_url - A github repository url or something similar.

  4. twitter_handles - one or more twitter handles (without the @) separated by commas.

  5. data_inputs - A description of the data sources used to inform the model and the truth data targeted by model forecasts. Common data sources are NYTimes, JHU CSSE, COVIDTracking, Google mobility, HHS hospitalization etc. An example description could be “cases forecasts use NYTimes data and target JHU CSSE truth data, hospitalization forecasts use and target HHS hospitalization data”

  6. citation - a url (doi link preferred) to an extended description of your model, e.g. blog post, website, preprint, or peer-reviewed manuscript.

  7. methods_long - An extended description of the methods used in the model. If the model is modified, this field can be used to provide the date of the modification and a description of the change.

Supplement 2: Validations for pull requests and metadata files submitted to the forecast hub

Each time an update (pull request) from a forecast team is submitted to the Hub, a set of validation rules that enforce the metadata requirements as outlined in the Data Records section are applied to the contents of the update. Any file in an update that fails to conform to the rules will cause the entire update to fail.

Forecasts

Each forecast file is subject to the validation rules documented at: https://github.com/reichlab/covid19-forecast-hub/wiki/Forecast-Checks.

Miscellaneous

Additionally, each team must have their files under a folder named consistently with their model_abbr, and they must only have one primary model.

Acknowledgements

This work has been supported in part by the US Centers for Disease Control and Prevention (1U01IP001122) and the National Institutes of General Medical Sciences (R35GM119582). The content is solely the responsibility of the authors and does not necessarily represent the official views of CDC, FDA, NIGMS or the National Institutes of Health. Johannes Bracher was supported by the Helmholtz Foundation via the SIMCARD Information & Data Science Pilot Project. Tilmann Gneiting gratefully acknowledges support by the Klaus Tschira Foundation.

Footnotes

  • ↵** See group authorship list as appendix

  • Disclaimer: The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

References

  1. 1.↵
    Haghani, M. & Bliemer, M. C. J. Covid-19 pandemic and the unprecedented mobilisation of scholarly efforts prompted by a health crisis: Scientometric comparisons across SARS, MERS and 2019-nCoV literature. Scientometrics 125, 2695–2726 (2020).
    OpenUrl
  2. 2.↵
    CDC. COVID-19 Mathematical Modeling. https://www.cdc.gov/coronavirus/2019-ncov/covid-data/mathematical-modeling.html (2021).
  3. 3.↵
    Cramer, E. Y. et al. Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the US. medRxiv 2021.02.03.21250974 (2021) doi:10.1101/2021.02.03.21250974.
    OpenUrlAbstract/FREE Full Text
  4. 4.↵
    Brooks, L. C. et al. Comparing ensemble approaches for short-term probabilistic COVID-19 forecasts in the U.S. International Institute of Forecasters (2020).
  5. 5.↵
    Ray, E. L. et al. Challenges in training ensembles to forecast COVID-19 cases and deaths in the United States. International Institute of Forecasters (2021).
  6. 6.↵
    Taylor, J. W. & Taylor, K.S. Combining Probabilistic Forecasts of COVID-19 Mortality in the United States. Eur. J. Oper. Res. (2021) doi:10.1016/j.ejor.2021.06.044.
    OpenUrlCrossRef
  7. 7.↵
    CSSEGISandData/COVID-19. GitHub https://github.com/CSSEGISandData/COVID-19.
  8. 8.↵
    covid-19-data, https://github.com/CSSEGISandData/COVID-19. (Github).
  9. 9.↵
    US COVID-19 cases and deaths by state. https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/ (2021).
  10. 10.↵
    Cramer, E. et al. reichlab/covid19-forecast-hub: release for Zenodo, 20210816. (2021). doi:10.5281/zenodo.5208210.
    OpenUrlCrossRef
  11. 11.↵
    Reich, N. G., Cornell, M., Ray, E. L., House, K. & Le, K. The Zoltar forecast archive, a tool to standardize and store interdisciplinary prediction research. Sci Data 8, 59 (2021).
    OpenUrl
  12. 12.↵
    Wang, S. Y. et al. reichlab/covidHubUtils: repository release for Zenodo. (2021). doi:10.5281/zenodo.5207940.
    OpenUrlCrossRef
  13. 13.↵
    Cornell, M., Gruson, H., Wang, S. Y. & Ray, E. reichlab/zoltr: Release for Zenodo, 20210816. (2021). doi:10.5281/zenodo.5207856.
    OpenUrlCrossRef
  14. 14.↵
    Cornell, M. et al. reichlab/zoltpy: Release for Zenodo, 20210816. (2021). doi:10.5281/zenodo.5207932.
    OpenUrlCrossRef
  15. 15.↵
    covid19-scenario-modeling-hub, https://github.com/midas-network/covid19-scenario-modeling-hub. (Github).
  16. 16.
    covid19-forecast-hub-europe: European Covid-19 Forecast Hub, https://github.com/epiforecasts/covid19-forecast-hub-europe. (Github).
  17. 17.
    covid19-forecast-hub-de: German and Polish COVID-19 Forecast Hub, https://github.com/KITmetricslab/covid19-forecast-hub-de. (Github).
  18. 18.↵
    Borchering, R. K. et al. Modeling of Future COVID-19 Cases, Hospitalizations, and Deaths, by Vaccination Rates and Nonpharmaceutical Intervention Scenarios -United States, April-September 2021. MMWR Morb. Mortal. Wkly. Rep. 70, 719–724 (2021).
    OpenUrl
  19. 19.↵
    MMWR Weeks. CDC https://www.n.cdc.gov/nndss/document/MMWR_Week_overview.pdf.
  20. 20.↵
    Hannan, A., Huang, Y. D. & Wang, S. Y. reichlab/covid19-forecast-hub-validations: Release for Zenodo, 20210816. (2021). doi:10.5281/zenodo.5207934.
    OpenUrlCrossRef
  21. 21.↵
    Dong, E., Du, H. & Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis. 20, 533–534 (2020).
    OpenUrlCrossRefPubMed
  22. 22.↵
    Ray, E. et al. reichlab/covidData: repository release for Zenodo. (2021). doi:10.5281/zenodo.5208224.
    OpenUrlCrossRef
  23. 23.↵
    Bracher, J., Ray, E. L., Gneiting, T. & Reich, N. G. Evaluating epidemic forecasts in an interval format. PLoS Comput. Biol. 17, e1008618 (2021).
    OpenUrl
  24. 24.↵
    Gigerenzer, G., Hertwig, R., van den Broek, E., Fasolo, B. & Katsikopoulos, K. V. ‘A 30% chance of rain tomorrow’: how does the public understand probabilistic weather forecastsã Risk Anal. 25, 623–629 (2005).
    OpenUrlCrossRefPubMedWeb of Science
  25. 25.↵
    Raftery, A. E. Use and Communication of Probabilistic Forecasts. Stat. Anal. Data Min. 9, 397–410 (2016).
    OpenUrl
  26. 26.↵
    Risk-Communication-and-Behavior-Best-Practices-and-Research-Findings-July-2016.pdf.
  27. 27.↵
    CDC. COVID-19 Forecasts: Deaths. https://www.cdc.gov/coronavirus/2019-ncov/covid-data/forecasting-us.html (2021).
  28. 28.↵
    Waldrop, T., Andone, D. & Holcombe, M. CDC warns new Covid-19 variants could accelerate spread in US. CNN (2021).
Back to top
PreviousNext
Posted November 04, 2021.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
The United States COVID-19 Forecast Hub dataset
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
The United States COVID-19 Forecast Hub dataset
Estee Y Cramer, Yuxin Huang, Yijin Wang, Evan L Ray, Matthew Cornell, Johannes Bracher, Andrea Brennen, Alvaro J Castero Rivadeneira, Aaron Gerding, Katie House, Dasuni Jayawardena, Abdul H Kanji, Ayush Khandelwal, Khoa Le, Jarad Niemi, Ariane Stark, Apurv Shah, Nutcha Wattanchit, Martha W Zorn, Nicholas G Reich
medRxiv 2021.11.04.21265886; doi: https://doi.org/10.1101/2021.11.04.21265886
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
The United States COVID-19 Forecast Hub dataset
Estee Y Cramer, Yuxin Huang, Yijin Wang, Evan L Ray, Matthew Cornell, Johannes Bracher, Andrea Brennen, Alvaro J Castero Rivadeneira, Aaron Gerding, Katie House, Dasuni Jayawardena, Abdul H Kanji, Ayush Khandelwal, Khoa Le, Jarad Niemi, Ariane Stark, Apurv Shah, Nutcha Wattanchit, Martha W Zorn, Nicholas G Reich
medRxiv 2021.11.04.21265886; doi: https://doi.org/10.1101/2021.11.04.21265886

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Epidemiology
Subject Areas
All Articles
  • Addiction Medicine (228)
  • Allergy and Immunology (504)
  • Anesthesia (110)
  • Cardiovascular Medicine (1238)
  • Dentistry and Oral Medicine (206)
  • Dermatology (147)
  • Emergency Medicine (282)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (531)
  • Epidemiology (10020)
  • Forensic Medicine (5)
  • Gastroenterology (499)
  • Genetic and Genomic Medicine (2452)
  • Geriatric Medicine (236)
  • Health Economics (479)
  • Health Informatics (1642)
  • Health Policy (752)
  • Health Systems and Quality Improvement (636)
  • Hematology (248)
  • HIV/AIDS (533)
  • Infectious Diseases (except HIV/AIDS) (11864)
  • Intensive Care and Critical Care Medicine (626)
  • Medical Education (252)
  • Medical Ethics (74)
  • Nephrology (268)
  • Neurology (2280)
  • Nursing (139)
  • Nutrition (352)
  • Obstetrics and Gynecology (454)
  • Occupational and Environmental Health (536)
  • Oncology (1245)
  • Ophthalmology (377)
  • Orthopedics (134)
  • Otolaryngology (226)
  • Pain Medicine (157)
  • Palliative Medicine (50)
  • Pathology (324)
  • Pediatrics (730)
  • Pharmacology and Therapeutics (312)
  • Primary Care Research (282)
  • Psychiatry and Clinical Psychology (2280)
  • Public and Global Health (4832)
  • Radiology and Imaging (837)
  • Rehabilitation Medicine and Physical Therapy (491)
  • Respiratory Medicine (651)
  • Rheumatology (285)
  • Sexual and Reproductive Health (238)
  • Sports Medicine (227)
  • Surgery (267)
  • Toxicology (44)
  • Transplantation (125)
  • Urology (99)