A semi-parametric, state-space compartmental model with time-dependent parameters for forecasting COVID-19 cases, hospitalizations, and deaths

Short-term forecasts of the dynamics of COVID-19 in the period up to its decline following mass vaccination was a task that received much attention but proved difficult to do with high accuracy. A major obstacle has been capturing variations in the underlying kinetics of transmission resulting from changes in public policy, individual behaviors, and evolution of the virus. However, the availability of standardized forecasts and versioned data sets from this period allows for continued work in this area. Here we introduce the Gaussian Infection State Space with Time-dependence (GISST) forecasting model. We evaluate its performance in 1-4 week ahead forecasts of COVID-19 cases, hospital admissions, and deaths in the state of California made with official reports of COVID-19, Google's mobility reports, and vaccination data available each week from June 29, 2020 to April, 26, 2021. Evaluation of these forecasts with a weighted interval score shows them to consistently outperform a naive baseline forecast and often score closer to or better than a high-performing ensemble forecaster. The GISST model also provides parameter estimates for a compartmental model of COVID-19 dynamics, includes a regression submodel for the transmission rate, and allows for parameters to vary over time according to a random walk. GISST provides a novel, balanced combination of computational efficiency, model interpretability, and applicability to large multivariate data sets that may prove useful in improving the accuracy of infectious disease forecasts.


Author summary
The COVID-19 pandemic has been unprecedented both in the volume of related data made available and the volume of infectious disease forecasts generated. Improvements in forecasting methods can allow us to more fully identify regular patterns within the available data, which may provide further insight into the epidemiology of COVID-19. Improved forecasting algorithms would also be of value during any future outbreaks of similar pathogens, and to develop models useful for scenario planning. In this paper, we introduce a new forecasting model developed for forecasting cases, deaths, and hospital admissions at the state level in the United States. We use versioned data sets to apply this to the same forecasting tasks that models performed during the pandemic. We find that this new model performs well relative to a leading ensemble forecasting method. We also show that this model can be useful for understanding how COVID-19 cases, hospital admissions, and deaths are related to each other and human mobility and vaccine administration, and how these relationships changed over time.

Introduction 1
Questions about the future trajectory of the COVID-19 pandemic were central to the 2 personal [1] and public policy [2] decisions of most of the world in 2020. In principle, 3 predictive models can be key tools for decision support during infectious disease 4 outbreaks, but their value rests largely on the trustworthiness of their ability to capture 5 the key relationships in complex systems subject to changing rules and conditions [3]. 6 One way to evaluate the trustworthiness of such a model is to measures its predictive 7 performance over short-term time horizons [4]. 8 One of the major initiatives to develop models with short-term forecasting value is 9 the COVID-19 Forecast Hub [5], which collected forecasts of COVID-19 cases, 10 hospitalizations, and deaths over the course of the pandemic in a standard format [6].

11
The COVID-19 Forecast Hub not only makes it easier to compare the forecasts of 12 different models made during the pandemic, but also creates a set of benchmark 13 forecasting tasks which models developed later can use to demonstrate their predictive 14 value. Of course, making good forecasts after the data are available is not as impressive 15 as doing so beforehand, but this exercise still has value for developing models for the 16 next pandemic, or resurgence of COVID-19. A model that can provide good prospective 17 forecasts should also be able to provide good retrospective forecasts. 18 The non-stationary nature of COVID-19 pandemic [7] presents a challenge for 19 forecasting by undermining the invariance assumptions that are the basis of most 20 forecasting methods [3]. Ref [7] draws a contrast between simple statistical models, 21 which may be readily adapted to the most recent observations in a time series, and 22 epidemiological models, which-although invaluable for providing insight into the 23 past-are not as readily adapted to the most recent observations and thus more prone 24 to systematic forecasting errors when the data are nonstationary. 25 Here, we present a hybrid model, Gaussian Infection State Space with 26 Time-dependence (GISST), which aims to combine the interpretability [8] of 27 epidemiological models with the robustness to non-stationarity of statistical models by 28 allowing key parameters of the epidemiological model to be fitted to the most recent 29 observations. This model was developed for forecasting COVID-19 cases, 30 hospitalizations, and deaths at the state level in the United States. The GISST model is 31 interpretable in the sense that it incorporates our understanding of the high-level 32 relationships among cases, hospitalizations, and deaths and produces outputs such as  Many times series indicators of the status of the COVID-19 pandemic are available. To 50 allow our model's forecasting skill to be compared with those of many other models, we 51 use the forecasting targets established by the COVID-19 Forecast Hub [5,6]. These  Science and Engineering (CSSE) at Johns Hopkins University [9]. 55 They also make use of data on hospital admissions in the "COVID-19 Reported 56 subject to revision. To ensure a fair comparison between our model and those models 65 that provided forecasts as the pandemic was progressing, for any forecast we used the 66 version of the data available on the day for forecast submission to the forecasting hub. 67 For evaluation of forecasts, we use a version of the data that is recent at the time of this 68 writing. 69 The forecasting hub had a flexible submission format that allowed teams to submit 70 for any subset of counties and states as well as the United States as a whole. We focus 71 on targets at the state level in this work, in the largest state of California. One reason is 72 that the county-level data are subject to more reporting anomalies. Another is that 73 many important policy decisions are made at the state level. A third is that 74 approximations used in our process model were better suited to modeling the dynamics 75 of larger, state-level populations than those of smaller county populations.

76
The time series of cases, deaths, and hospitalizations are available at a daily 77 resolution, but the targets are Sunday to Saturday weekly totals for cases and deaths.

78
In keeping with the conventions of the COVID-19 Forecast Hub, we restrict our 79 attention to 1-4 week ahead forecasts of cases and deaths and 1-28 day ahead forecasts 80 of hospital admissions. The conventional wisdom is that forecasts at longer time 81 horizons are generally not accurate enough to be useful [5]. California to illustrate the general trajectory of the time series used to fit our model as 84 well as the extent to which it was revised over time.

85
The hospital admissions time series is unique in that it begins later than the cases 86 and deaths time series, in the middle of summer, and in that it did not become available 87 until November of 2020. At the beginning of the hospital admissions time series, there is 88 also a misleading trend of growth that is simply due to the number of hospitals 89 providing data increasing. To remove this artifact, we consider as missing the 90 observations in which the field 91 previous day admission adult covid confirmed coverage was below 90% of its 92 last observed value. We apply this transformation both for the data sets used for fitting 93 and those used for evaluation.

95
The GISST model includes a regression which expresses the risk of infection to 96 susceptibles in terms of likely predictors of immunity and exposure. One such predictor 97 is an aggregated measurement of the amount of time individuals spend in residential 98 areas, as quantified in Google's Community Mobility Reports [11]. This metric is 99 expressed as a change from a baseline. A value of 0% indicates no change relative to the 100 median value of the indicator from January 3 through February 6, 2020. The baseline is 101 calculated separately for each day of the week, which allows it to account for weekly 102 cycles in mobility. One consequence of this form is that the percent change from 103 baseline is typically lower on weekends because people already spent a lot of time in 104 residential areas on weekends before the pandemic during the period when the baseline 105 was calculated. In our model of COVID-19 transmission, we do not attempt to replicate 106 weekly cycles of exposure, and rather model the average exposure of individuals over the 107 course of a week. Using the raw metric in the reports would result in our model 108 incorrectly predicting reduced exposure during the work week. Therefore, to obtain a to the beginning forecast date in the UTC time zone. This was the the data available to 116 all forecasting teams 22-23 hours before the submission deadline of 6PM ET. Although 117 these data were not subject to revision, there was a gap of a few days between the last 118 mobility data point and the forecast date which we are able to reproduce by using the 119 archived data sets. To remove weekly seasonality, we calculated a moving 7-day average 120 with a centered moving window. The last 3 observations of this 7-day average time 121 series are missing because data for the full window is not yet available, and we also 122 considered as missing output from any windows which included the observed date of the 123 August 7, 2021 5/32 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 6, 2021. ;  The gap at the end of the time series was filled in by repeating the last non-missing 127 value. Finally, values of the unsmoothed data on holidays were used to replace 128 corresponding values in the smoothed time series. In this way, holidays do not affect the 129 extrapolation of values in the gap between the end of the data and the forecast date, 130 but they are still able to affect exposure in our model. Note, however, that we are only 131 accounting for the reduction in exposure due to individuals being able to spend more 132 time at home rather than any increased exposure due to social activity.   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 6, 2021. X → L β t XY /N Incubation L → Y ηL Removal of infectious individual whose cases will be reported Y → Z (1 − p h,t )ρ t γY Removal of infectious individual whose cases will not be reported Y → 0 (1 − p h,t )(1 − ρ t )γY Reporting of cases Z → Z r γ z,w Z Hospitalization of an infection on a weekday Y → H + A + Z p h,t γY Hospitalization of an infection on Saturday or Sunday conclude in deaths, which will on average be reported with a longer lag from the date of 195 hospital admission than will cases. Thus the potential to lead to observable events with 196 3 different lag times makes it necessary for the hospital admission event to propagate to 197 3 pathways in our model.

198
August 7, 2021 8/32 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 6, 2021. ; Probability of a removal on day t being reported as a case 0.5 * * The parameter p h,t is fixed to 0.03 when hospitalization data are not available to avoid colinearity with p d . Additionally, p h,weekend is assumed equal to 0.9 when hospitalization data are not available. The parameter ρ t is estimated for dates up to June 2020 as described in Estimation of ρ t , while γ d,w and γ z,w are fixed at this value for some days of the week and estimated for others as described in Day of week effects on reporting.
We could fit this Markov chain model using a simulation-based approach such as where the angular brackets indicate expected value and the overdots indicate time August 7, 2021 9/32 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 6, 2021. ; https://doi.org/10.1101/2021.09.02.21262995 doi: medRxiv preprint where J is the Jacobian matrix of the system in Eq 1 and Q is the matrix defined by which may be calculated from individual-level reactions in Table 1. The matrix S has 217 elements S ij , which are the stochiometric coefficients of species j in reaction i in the 218 reactions in Table 1. The vector f collects the propensities column of Table 1 Although we do not attempt to model variation in exposure due to the day of the week, 227 the effect of the day of the week on reporting of cases and deaths cannot be ignored. 228 We account for this by allowing the reporting rate of cases, γ z,w , and the reporting rate 229 of deaths, γ d,w , to depend on the day of the week. Rather than introduce a parameter 230 for each day of the week, we break the days of the week into 2 or 3 groups and keep  Table 2 on Monday through Friday and the weekend value is 237 estimated. The parameter γ d,w is fixed to the value given in Table 2 on Thursday   238 through Saturday and its values on the other two groups of days is estimated.

239
Hospital admissions were also affected by the day of the week, being noticeably lower 240 on weekends most of the time. We model this depression by introducing the parameter 241 p h,weekend with an allowed range of 0 to 1. As show in Table 1, the rate of hospital 242 admission on weekends is calculated by multiplying the weekday rate by this reduction 243 factor. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 6, 2021. ; https://doi.org/10.1101/2021.09.02.21262995 doi: medRxiv preprint for such effects in our model. Our covariates for quantifying these effects have been 248 described in the Data subsection of Materials and Methods. We use doses t to denote 249 the number of vaccine doses administered on day t and residential t to denote our 250 indicator of the time spent in residential areas on day t. Our regression model for the 251 transmission rate on day t, β t , follows 252 log β t = min(β 0,t + doses t · β doses + residential t · β res , log(1000)).
The upper limit of 1000 on the value of β t reflects our scientific understanding that the reproduction number is unlikely to exceed 10 even in the absence of control measures and it prevents numerical problems in accurately calculating a solution for our process model with very large transmission rates. Although for the sake of brevity we refer to β t as a transmission rate, it serves to summarize effects on both transmissibility and susceptibility. That is β t = (transmission rate on day t) × (suseptibility on day t). The intended interpretation of our regression model in terms of this factorization of β t is transmission rate on day t = exp(β 0,t + residential t · β res ) (5) susceptibility on day t = exp(doses t · β doses ).
The coefficient β 0,t allows for the effects of NPIs for which we lack covariates to affect 253 the transmission rate, and also allows for the reproduction number to decline with the 254 depletion of susceptibles at a rate consistent with heterogeneity in susceptibilty [14].

255
These effects are allowed to vary over time according to a random walk model, which we 256 describe in the subsection Time dependence of parameters. influenza at the state level [16]. Table 3 lists the parameters in our model which we 283 August 7, 2021 11/32 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 6, 2021. ; https://doi.org/10.1101/2021.09.02.21262995 doi: medRxiv preprint allow to take random walks, along with the frequency of steps in the walks, and the 284 standard deviation of the normally distributed step size. Some of these parameters are 285 transformed for the purpose of allowing normally distributed step sizes while staying 286 within their natural domains. The frequency of steps and the standard deviations were 287 determined through trial and error to achieve a good fit to the data. That is, they were 288 treated as hyperparameters which were considered fixed when optimizing the likelihood 289 with respect to our regular model parameters.

303
Fitting methods

304
The GISST model has many parameters which must be estimated from the data before 305 forecasts can be made. We do this using the method of maximum likelihood. The log is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 6, 2021. ;  Sometimes, during optimization, these initial values can become unreasonable and lead 325 to numerical problems in calculating the likelihood. To avoid such problems, we do not 326 allow any values to be greater than N , and we do not allow the intial value of X to be 327 less than N/10. All of these initial values for means are collected in a vector denoted 328 x 0|0 . This is the initial vector of means for our process model. The initial covariance 329 matrix for this model is denoted P 0|0 and is set to a diagonal matrix in which all 330 elements are 1 except for those corresponding to Z r , A , and D r . These values are 331 equal to zero at the beginning of the day by definition so their variances are set to zero. 332 Similar to the means, the covariance matrix estimate is simply a rough approximation 333 which the algorithm will sharpen. With the process model fully initialized, the next step 334 is to generate a predictive distribution for the observed data based on the process model. 335 The predictive distribution for the next observations are generated by numerical 336 integration of the system of equations in Eqs 1 and 2 over a time span of 1 day, where 337 we used the convention of 365.25 days per year. For this step we used the Tsit5 [20] 338 integrator in the DifferentialEquations.jl package [21]. When solved with starting values 339 from time t, the resulting vector of means is denotedx t+1|t and the resulting covariance 340 matrix P t+1|t . If any elements ofx t+1|t are negative, this must be due to numerical 341 errors, so we set them to zero. Similarly, if any negative values appear on the diagonal 342 of P t+1|t , the corresponding rows and columns are set to zero. The next step is to 343 iteratively update these predictions for the mean and covariance based on the observed 344 data.

345
The update of the prediction is done by using equations that minimize squared 346 August 7, 2021 13/32 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 6, 2021. ; https://doi.org/10.1101/2021.09.02.21262995 doi: medRxiv preprint prediction error when the process model is linear, and which remain relatively accurate 347 when the model is non-linear. An intermediate step in this update is to calculate the 348 Kalman gain matrix K t . This matrix determines the weight to give to two types of 349 predictions: (1) predictions of the state variables on day t calculated as described in the 350 previous paragraph by projecting past estimates of the state variables forward according 351 to the process model, and (2) predictions of values of the state variables on day t based 352 solely on observations on day t and the observation model. The equation for K t is where H is a matrix that maps the state variables to observed variables and Σ t is the 354 sum of the covariance matrix R t of the observation model and the projection of the 355 covariance matrix of the process model into the observed coordinates. The matrix H is 356 3 × 9 and contains zeros except for one column in each row, which corresponds to the 357 state variables Z r , A, and D r respectively in rows 1, 2, and 3. The matrix R t follows where Y t−1 is the estimate of Y in x t−t|t−1 . The ones on the diagonal of R prevent 359 numerical problems that could occur if the τ parameters are all nearly zero. The matrix 360 Σ t follows Next, a prediction error, denotedỹ t|t−1 , for the process model is calculated as where z t is a column vector containing the observed number of reported cases, hospital 363 admissions, and reported deaths for day t respectively in rows 1, 2, and 3. The The data-updated covariance estimate, P t|t , satisfies where I denotes the identity matrix. After setting the means and covariances of the 367 variables Z r , A, and D r in these updated estimates to zero, they can be used as initial 368 values for projecting the process model forward to obtainx t+1|t and P t+1|t . In this way, 369 process model predictions are obtained for all observations. Any missing observations, 370 such as hospital admissions in data sets from early in the epidemic, are handled by 371 setting the corresponding column in K t to zero. Then, the results of these calculations 372 may be used in an equation for the log likelihood.

373
The log likelihood of our model is calculated as the sum of the likelihood for the 374 one-step ahead predictions of our process model and the step sizes in our random walk 375 models for time-dependent parameters. The expression for the log likelihood of our 376 process model predictions-i.e., the marginal log likelihood of the Kalman filter-is where the subscript "nm" indicates that rows and columns which correspond to missing 378 observations in z t have been omitted and d nm is the number of non-missing observations 379 in z t . The log likelihoods for the random walk steps are simply mean-zero Gaussian is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 6, 2021. Our optimizer is a local optimizer which performs best when initialized in the 385 neighborhood of the global optimum, so the first step in optimizing the likelihood was 386 to choose good starting parameters for our optimizer. In most cases, we do this by using 387 the parameters estimated from a version of the observed data issued one week earlier for 388 a particular location, which we refer to as a warm start. Any new random walk step 389 parameters introduced by extension of the number of observations are assumed to be 390 zero. In the first data set containing hospital admissions, the parameter τ h is initialized 391 with the sample variance of all non-missing observations, and the parameter p h is 392 initialized as the quotient of the sum of all hospital admissions divided by the sum of all 393 reported cases on days when hospital admissions were nonmissing.

394
In the first data set fitted, which in this study is the data set available on June 29, 395 2020, all parameter initializations had to be estimated from the data. We found that 396 the following rough estimates lead to a satisfactory fit. All τ c,t were intialized as the hospitalizations. If this value was less than 0.01, the initialization was raised to 0.01.

406
All γ z,w and γ d,w were initialized at the values given in Table 2. 407 An estimate of the time series of the effective reproduction number 408 R e = (β t /γ)(X(t)/N ) was calculated to generate initial values of β res and β 0,t . Let 409 cases on day t be denoted c t . Our raw estimate of R e was (c t+7 /ρ t+7 )/(c t /ρ t ). These probabilities were fitted to ensure they were valid probabilities. To keep all parameters 427 on similar scales, the covariate doses t was divided by N and the covariate 428 residential t was divided by 100. We provided a gradient function to the optimizer by 429 using forward mode automatic differentiation via the ForwardDiff.jl [23] and 430 DiffEqSensitivity.jl [24] packages. We ran the optimizer for 1000 iterations when fitting 431 the first data set and when fitting the hospitalization data for the first time and 100 432 iterations when using a warm start. This number of iterations was typically sufficient to 433 reduce the norm of the gradient to be 10 times smaller than the norm of the parameter 434 vector. We stopped early if the norm of the gradient was 100 times smaller than the 435 August 7, 2021 15/32 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 6, 2021. ; norm of the parameter vector. The suitability of the returned parameters was evaluated 436 by calculating the Hessian matrix with automatic differentiation and verifying that it 437 was positive definite (i.e., the negative log likelihood was convex). The inverse of the 438 Hessian matrix was also used as a covariance matrix to generate Wald-type interval 439 estimates for parameters [25]. found that forecasts generated using predictions of β t from an AR-1 model fitted to the 458 estimates of β t from days beyond May 1, 2020 up to the forecast date performed better 459 than forecasts which assumed that β t would remain at its most recently estimated value 460 throughout the forecast period. The key benefit of the use of the AR-1 model seems to 461 be that it limits the extent to which forecasts overshoot the peaks of pandemic waves. 462 In calculating forecasts, all parameters modeled as random walks were fixed at their 463 final estimated value. To produce forecasts for the COVID-19 Forecast Hub, we . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 6, 2021. ; Fig 7. Effective reproduction number of fitted model for California. The model is fitted to data available on April 26, 2021. The full model corresponds to plugging β t from Eq 4 into R e = (β t /γ)(X(t)/N ). We also plot R e with a β calculated by setting the effects of one or both of the covariates from their estimated values to zero to show their effects. models. We analyze forecasts of data from dates ranging from June 29, 2020, to April wave is not straightforward as many surveillance systems were changing rapidly as the 491 severity of the pandemic grew. We do not evaluate forecasts beyond April 2021 because 492 the indicator counts were reduced and subject to more changes in the reporting schedule 493 as the pandemic began to wane.

495
All data used was publicly available. An archive of the code used to obtain the data and 496 generate all results is available on Zenodo at 497 https://doi.org/10.5281/zenodo.5112578.

499
Goodness of fit 500 We experimented with model structure until two goodness of fit criteria were satisfied. 501 The first was that errors in the 1-step ahead forecast had the expected normal 502 August 7, 2021 17/32 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 6, 2021. ; https://doi.org/10.1101/2021.09.02.21262995 doi: medRxiv preprint distribution. We used this diagnostic to evaluate whether that the model's normality 503 assumptions were reasonable. Fig S1 indicates that they were, with the exception of a 504 relatively small number of outliers.

505
The second criteria was that MASE (mean absoluate scaled error) [27] was below 1 506 for all model fits. Due to the strong weekly seasonality in some of the data, we 507 calculated MASE with both a non-seasonal naive model and a 7-day seasonal naive 508 model. Our criteria was that both versions be less than 1. This criterion provided a fast 509 method to evaluate the forecasting skill of our model and its potential to outperform 510 baseline models at the longer forecast horizons in our primary evaluation analysis. 511 Fig S2 shows that our model for California satisfied these criteria.

512
Parameter estimates 513 An advantage of the mechanistic aspects of our model is that they provide some insight 514 into why the a time series of indicators may be forecasted to go in a certain direction. 515 To illustrate how this may be possible with our model, we next present an example of 516 parameter estimates in our fitted models. The plausibility of these estimates in light of 517 other sources of information about COVID-19 also provides a means to evaluate the 518 assumptions of our model.

519
The effective reproduction number is perhaps the parameter of greatest interest in 520 our model because it provides an easy-to-understand statement of whether the 521 pandemic is growing or not [28]. The (instantaneous) effective reproduction number R e 522 in our model is a derived quantity equal to R e = (β t /γ)(X(t)/N ).  Table 4, along with estimates of other parameters that were not  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 6, 2021. ; time. Fig S3 shows that the estimated effect of our indicator for the amount of time 536 spent in residential areas was greatly reduced once hospital admissions data became 537 available for fitting. We interpret this as a sign that our model was systematically 538 incorrect in its assumed number of hospital admissions before being provided data, such 539 that the availability of data triggered a step change in parameter estimates. Perhaps 540 more epidemiologically interesting, the effect of this indicator trends toward zero from 541 January onward. This behavior is consistent with NPIs becoming less important as difference in scores might be the peak of the second wave and end of the third wave of 558 the pandemic (Fig S7). However, we do not use the mean of the log of the weighted 559 interval score as our summary, but rather the mean of the untransformed weighted  throughout February. Performance at later dates was not as strong for GISST but these 583 had less influence on the overall score due to the decreasing trend in hospitalizations.

584
August 7, 2021 19/32 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 6, 2021. ; Fig 8. Overall performance of short-term forecasts of cases and deaths. Lower scores indicate better performance. The GISST forecast consistently outperforms the COVID-19 Forecast Hub Baseline forecast and performs relatively best for 4-week ahead death forecasts.

585
We have introduced the GISST forecasting model and compared its performance in 586 forecasting cases, hospital admissions and deaths with a simple baseline forecaster and a 587 respected ensemble forecaster. Although some of its early December forecasts greatly 588 overestimated future cases and hospital admissions, in summary evaluations of cases 589 and deaths forecasts the GISST model dominated the baseline model and typically had 590 a score closer to the ensemble model (Fig 8). In summary evaluations of forecasts of 591 hospital admissions, the GISST model scored better than the ensemble in 28 out of 28 592 horizons (Fig 10). In addition to these positive performance attributes, GISST has the 593 advantage of allowing trends in the forecasted indicators to be plausibly linked to each 594 other and leading indicators such as metrics of human mobility. Additionally, it is 595 computationally efficient. Even for the fits to April 2021 data sets, when our model had 596 the largest number of parameters and observations, fitting an additional week of data 597 required less than 4 hours of serial computation (Fig S9). In addition to permitting 598 forecasts to be generated online for a large number of locations, this efficiency facilitates 599 model development by shortening the time required to test changes in model structure. 600 There are COVID-19 forecasts from many other forecasters besides the COVID-19

601
Forecast Hub Baseline and Ensemble models, and we next put our model in context   is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 6, 2021. ; Lower scores indicate better performance. The GISST forecasts score better than the COVIDhub Ensemble forecasts at all horizons. However, the difference in scores is slight at the 7 and 8 day horizons.
costly. Further, it does not include hospital admissions. The method introduced in 608 Ref [29] is much more computationally efficient than GISST, but it does not forecast Bayesian method for estimating such changepoints of transmission rates in a mechanistic 615 population model. From the results of such estimation in time series of COVID cases in 616 Germany, it seems that the data were only highly informative of about 1 out of 3 617 changepoint times. This result suggests to us that when prior information about the 618 timing of changepoints is not readily available, a model that assumes a random walk in 619 parameter values may be a better choice to avoid biases due to model misspecification. 620 Recently, Ref [32] presented results from an SEIR-type model of COVID in US 621 states that included environmental covariates such as temperature and population 622 density in addition to human mobility covariates. This was a spatially hierarchical 623 Bayesian model fit with MCMC and likely more computationally demanding than 624 GISST. Further, it forecasted deaths only and these forecasts were considered poor by 625 the authors for horizons of more than 2 weeks. However, it is difficult to objectively 626 compare accuracy with other models because the forecasts are not available in the 627 standard Forecast Hub format.

628
The worst-scoring forecasts from GISST were made from dates in December (Figs S7 629 August 7, 2021 22/32 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 6, 2021. ; https://doi.org/10.1101/2021.09.02.21262995 doi: medRxiv preprint and S8) when the GISST model greatly over-predicted the number of cases and 630 hospitalizations (Figs 9A and 9B). Although increasing the accuracy of such predictions 631 may prove extremely difficult, future work should evaluate the benefits of incorporating 632 uncertainty in future values of β t , which seems to be one of the strengths of the 633 COVID-19 Forecast Hub Ensemble forecaster.

634
Another limitation of the GISST model, as we have presented it here, may be that it 635 is missing important covariates in the regression model for the transmission rate (Eq 4). 636 For example, we have not tried using other mobility metrics provided in the Google variables. Variables identified in this manner may sufficiently lead the indicators we wish 647 to forecast that they improve forecasts just by being available up to or close to the date 648 of forecast. Knowing which variables are important for predicting the transmission rate, 649 even if they are difficult to forecast, could also be valuable for the design of forecasts.

650
Supporting Information 651 Fig S1. Quantile-quantile plot of distribution of 1-step ahead forecast errors in fit to California data. The specific residuals in the panels are the elements ofỹ t|t−1 in Eq 10 divided by the square root of variances on the diagonal of Σ t in Eq 9. These are quantiles from a fit to the data available on April 26, 2021, and thus represent a fit to data spanning the majority of the available data.