SARS-CoV-2 epidemic after social and economic reopening in three US states reveals shifts in age structure and clinical characteristics

In the United States, state-level re-openings in spring 2020 presented an opportunity for the resurgence of SARS-CoV-2 transmission. One important question during this time was whether human contact and mixing patterns could increase gradually without increasing viral transmission, the rationale being that new mixing patterns would likely be associated with improved distancing, masking, and hygiene practices. A second key question to follow during this time was whether clinical characteristics of the epidemic would improve after the initial surge of cases. Here, we analyze age-structured case, hospitalization, and death time series from three states – Rhode Island, Massachusetts, and Pennsylvania – that had successful re-openings in May 2020 without summer waves of infection. Using a Bayesian inference framework on eleven daily data streams and flexible daily population contact parameters, we show that population-average mixing rates dropped by >50% during the lockdown period in March/April, and that the correlation between overall population mobility and transmission-capable mobility was broken in May as these states partially re-opened. We estimate the reporting rates (fraction of symptomatic cases reporting to health system) at 96.0% (RI), 72.1% (MA), and 75.5% (PA); in Rhode Island, when accounting for cases caught through general-population screening programs, the reporting rate estimate is 94.5%. We show that elderly individuals were less able to reduce contacts during the lockdown period when compared to younger individuals. Attack rate estimates through August 31 2020 are 6.4% (95% CI: 5.8% ‒ 7.3%) of the total population infected for Rhode Island, 5.7% (95% CI: 5.0% ‒ 6.8%) in Massachusetts, and 3.7% (95% CI: 3.1% ‒ 4.5%) in Pennsylvania, with some validation available through published seroprevalence studies. Infection fatality rates (IFR) estimates for the spring epidemic are higher in our analysis (>2%) than previously reported values, likely resulting from the epidemics in these three states affecting the most vulnerable sub-populations, especially the most vulnerable of the ≥80 age group.


Rhode Island
Data for Rhode Island were tracked daily from the Rhode Island Department of Health (RIDOH) website reporting COVID-19 response data, which links to a publicly available datasheet that tracks all eleven data streams, with periodic interruptions to the availability of age-based data for new cases, hospitalizations, and deaths. Case data were originally reported with delays of one to five days, with most results being same day or one day delayed by the April reporting period. RIDOH staff on the data and epidemiology team periodically updated past results to move cases from 'day of reporting' to 'day of test', and all final and cleaned case data in Rhode Island were reported by 'day of test'.

Massachusetts
Data from Massachusetts were obtained from the daily and weekly daily archive of the Massachusetts Department of Public Health's (MassDPH) case, testing, hospitalization, and death data. Confirmed daily case data were available by 'date of testing' (CasesByDate.csv), and age-structure of new cases was available (Age.csv, "Catchment 1") but only summed to a portion (typically 80% to 90%) of the total daily case numbers, a likely result of no age reporting in a number of hospitals. Confirmed daily death data (confirmed only, excluding probable) were available by 'date of death' (DateOfDeath.csv), and deaths by age were available (Age.csv) but again only summed to a portion (again, typically 80% to 90%) of the total daily case numbers (potentially the result of no age reporting from some locations).
Age-structured cumulative hospitalization were included in the data (Age.csv) but these had to be excluded as these hospitalization counts were obtained by following up with patients who had reported as symptomatic (thus, their denominator was symptomatic cases, not all infections). It was clear that this data stream was underreported as for a period in April current hospitalized exceeded cumulative hospitalized. Numbers of patients in ICU and on mechanical ventilation were available daily from April 4 (Hospitalization from hospitals file.csv).
Current hospitalization data for the April 7-14 date range were manually adjusted as a smaller set of hospitals was reporting for those days. After September 6, 2020, age-stratified cases, hospitalizations, and death are reported in MassDPH's weekly public health report (weekly-covid-19-dashboard-data-9-30-2020.xlsx) instead of the daily covid dashboard (Age.csv ) We assume that death data are not underreported. Current hospitalization data were obtained from 68 acute care hospitals (ACH) in Massachusetts, which represent all ACH in MA and would cover all or nearly all COVID-19 hospitalizations in MA.
Symptomatic case data were smoothed with a 7-day moving average to remove a weekly periodic signal (lower Sunday/Monday reporting).

Pennsylvania
Daily new confirmed case data were pulled from the Pennsylvania Department of Health Coronavirus Update Archive. Counts are reported beginning with the first reported case on March 6. For the period from April 17 to May 21, confirmed cases were estimated from the reported values, which are noted to be the sum of confirmed and probable cases. Confirmed cases were estimated to be 97% of the combined total, based off of dates for which confirmed and probable counts were available separately, and match neighboring count data well. After May 21, daily confirmed counts (separate from probable cases) were made available. Beginning June 9, these counts were determined from the sum of confirmed cases reported at the county-level, as the state total was no longer reported on the archive. One outlier (April 16) was removed from the data set, as the daily change reported was inconsistent with neighboring counts. No data was available on June 8. For all dates, the 'confirmed only' data were used in data fitting.
Age groups for new cases from March 27 to June 7 were also available on the Pennsylvania DOH Archive. Age data were reported as the percentage of 'positive cases by age range to date' in the following age ranges: 0-4, 5-12, 13-18, 19-24, 25-49, 50-64, and 65+. These were translated into 10-year age bands by rebinning the counts under the assumption that the age probability density of case counts was relatively smooth. A monotonic spline was fit to the empirical cumulative distribution and then differentiated to estimate the 1year age band (probability distribution function) of the original age categories. These were validated against known age breakdown data, and underreporting for the 80+ group was found, as would be expected for the 65+ age category. Beginning June 17, age-structured data was pulled data from Pennsylvania's online arcgis dashboard and scraped using the R package 'rvest'. Age data reported on the dashboard were binned in 10-year age groups, so no rebinning was necessary. For days when the age-structured case counts exactly matched the previous day's counts (apparently because they had not been updated), the data were omitted.
Daily death counts are reported on Pennsylvania's arcgis dashboard as well, beginning with the first reported death on March 18. These numbers are retrospectively updated to reflect the date of death.
According to the dashboard: "Death data is based on the date of death as reported to EDRS. Death counts for prior dates will change as additional death records are registered or amended." Beginning May 17, agestructured death counts (binned into 5-year age groups) were available from weekly reports through the Pennsylvania DOH Coronavirus Archive. These reports also included the breakdown of the location of deaths (i.e. hospital, hospice, long-term living facilities, and residence). More frequent age-structured death data were collected from daily pulls from the Pennsylvania arcgis dashboard beginning June 17.
Cumulative hospitalization counts were available on the Pennsylvania DOH Archive from March 27 to April 6. From April 17 to May 21, these counts were calculated as the sum of the approximated agestructured hospitalization counts. These age-structured values were estimated from the reported percentages of 'hospitalizations by age range to date', available for this time period on the Pennsylvania DOH Archive.
Hospitalization counts are not reported after May 21.
The currently hospitalized population in Pennsylvania has been reported on the online arcgis dashboard since April 17. As in MA, a problem with this data stream is the fact that in mid-April, the current hospi-

Mobility Data
Both Facebook and SafeGraph apply differential privacy to the provided data by adding Laplacian noise to stay-at-home counts and discarding information for locations with too few users or devices [1] [13].
Both sources are limited by their observed populations, which are not representative of states in question.
Except for re-weighting by the total population of reported counties/CBGs, no adjustments are made to estimate mobility for the general public, or to make the two metrics comparable. containing about 600-3000 people each [6]. A sampling bias is introduced in the SafeGraph mobility data when CBGs are not all sampled at equal rates relative to their true populations. For example, urban areas may be oversampled due to a greater usage rate of smart phones compared to rural areas. Fortunately, the true populations of CBGs are known from the official US Census Bureau. When the true population fraction of a CBG in a state is compared to the fraction of SafeGraph observations of that CBG in the state, we can identify whether the CBG is over or underrepresented in the samples. Stratified reweighting allows us to increase the weight of underrepresented CBGs and decrease the weight of overrepresented CBGs . A reweighting factor is therefore necessary and calculated by using a stratification method as reported in the equation below.
In the above equation, W CBG,S represents the weighting factor for a CBG in state S. N CBG represents the true population of the CBG as reported in the US Census Bureau, N S represents the true population of S as reported by the Census Bureau, n CBG represents the number of SafeGraph observations sampled from the CBG and n S represents the total number of SafeGraph observations sampled from S. From this equation, it is true that as the sample fraction of CBG in S approaches the true population fraction of CBG in S, W gets closer to one. Within a given state S, W CBG,S is greater than one for CBGs which are under-represented in the SafeGraph data, and vice-versa. The raw data for a CBG is multiplied by these weighting factors to correct for sampling bias.

Variable Names
Our eleven data streams are coded with the following variable names: x a,t cumulative case counts by age group a y a,t cumulative hospitalizations by age group a z a,t cumulative hospital deaths by age group a w t cumulative out-of-hospital deaths (not available by age group) u t cumulative hospital discharges (not available by age group) For some states and some time periods, age-structured data are not available, but age-aggregated data are available. This is common for cumulative case counts and cumulative hospitalizations and can occur for all variables; in some cases, age-stratified data are reported weekly or irregularly. We refer to these aggregated data as x t = cumulative new case counts (all ages) y t = cumulative new hospitalizations (all ages) Age-structured counts of cumulative cases, hospitalizations, and deaths can be available at the same time as the corresponding cumulative counts x t , y t , and z t , and it is common for the summed age-structured counts to record fewer individuals than the reported total counts. That is, is a common situation due to missing age information for some patients; in this case we assume that the age data are missing completely at random. There are also occurrences where the age-sums are larger than the reported totals, and this occurs occasionally because the cumulative age data have been put in the incorrect row in a data base (e.g. one week earlier than where they should be). These latter cases were handled individually, in order to not assume that one data stream is correct while the other is not, and verified with a state DOH when possible.

Asymptomatic fraction by age
To assess the percentage of SARS-CoV-2 infections that never progress to symptoms, literature review was conducted during the course of our analysis to identify studies that reported asymptomatic fractions by age group. A total of 12 studies/datasets were included: 8 peer-reviewed articles, 2 CDC MMRW reports, 1 pre-print, and 1 news report. Four of these studies report asymptomatic infection that was confirmed, with follow-up, to remain asymptomatic [16,19,26]; these are shown with yellow boxes/circles in Figure S1. Pham et al. described cases observed during Vietnam's quarantine and isolation effort during the first 100 days of the pandemic. In this paper, confirmed infections that never developed symptoms during the 14-day quarantine are defined as asymptomatic infection. Kimball [20], Payne et al. [22], So and Smith [25] did not contain sufficient follow-up to ensure that patients were truly asymptomatic, thus it is possible that some of these studies may include pre-symptomatic individuals; these are shown with gray boxes/circles in Figure S1 Combining the four studies that include asymptomatic cases only, the weighted-mean of the asymptomatic fraction across all ages is 43.6%. The asymptomatic fraction by age is: [10][11][12][13][14][15][16][17][18][19]  The five probabilities above ("simple average" in Table 2 of the main text) are compared to five other asymptomatic models in our analysis.
Figure S1: Asymptomatic Fraction by Age Boxes (means) and circles (medians) show the percentage of SARS-CoV-2 infections that are asymptomatic for a particular age group. Yellow markers correspond to studies that included confirmed asymptomatics (with sufficient follow-up) while gray markers correspond to studies that may have included pre-symptomatic individuals. One study (magenta) looked at pregnant women only. Gray bars indicate either the standard deviation in age or the interquartile range of age for the selected group. Larger markers correspond to larger sample sizes.

Mathematical Transmission Model
A classical ordinary differential equations (ODE) model was used to model infection and clinical progression of SARS-CoV-2, with compartmental diagram shown in Figure S2. The model is age-structured with nine age classes broken into 10-year age bands, with ≥ 80 as the last age class. Age-specific forces of infection (Λ a ) determine how quickly individuals move from S a (susceptibles in age class a) to E 1,a (stage 1 of the exposed class for age class a). All other transitions in the model are linear. Classes with a circled number in the upper right are broken up into that many stages. For example, the exposed state is six stages long (E 1,a to E 6,a ) with each stage being one day long. The mean duration of the exposed period is six days, and the coefficient of variation of this duration is 1/  The force of infection on age class a is where b runs over age classes and j runs over stages of a class.
The parameter c a,b is the contact rate between age groups a and b, which is symmetric in our model The parameter β t is a time-dependent population-mixing parameter, whose values are inferred and constrained by a spline structure (see Equation S20).
Admission to the ICU, if it occurs, typically occurs early in the hospitalization period. Patients can be admitted immediately after presentation to the emergency department or after a short medical-floor hospital stay [24]. Thus, individuals can enter C A from H A,1 but not H A,2 through H A,4 . Time on ventilator is broken up into six stages, as death occurs more quickly than recovery. Total time on mechanical ventilation is a parameter that is inferred with a starting value of and constrained to be close to 10.8 days [3,8,12,29]. to estimate a 'death-prob-home' parameter for the 60-69 age-group as well.
Dynamical equations for the susceptible classes arė where σ a is the relative susceptibility of age-class a (set to 0.6 for < 20 and 1.0 for everyone else ) and N is the population size. Other parameters not described above are listed in Table S1. All other differential equations are sums of linear terms with transition rates determined by 'lengths of stay' in each class (above and in Table S1), and probabilities determining whether someone progress to (for example) death, a milder clinical state, a more severe clinical state, or the same clinical state.  3 Likelihood and Inference

Daily New Count Data
Daily new case counts are assumed to follow a negative binomial distribution with dispersion parameter k 1 .
The expected mean number of new symptomatic cases in each age group, ρ t ·J a,t (θ), is obtained from the ODE model, with θ representing the parameters that are input into the ODEs (i.e., contact rate, length of hospital stay, etc). The parameter ρ t is the fraction of symptomatic cases that are expected to be observed by the health system; ρ t is the reporting rate at time t. The ∼ symbol is used to indicate first differencing across time steps, andJ is the daily incidence while the J t -variables are the cumulative incidence. Differencing may be done across multiple time steps if data are missing for certain time steps.
For daily new confirmed case data, the likelihood of these data is treated differently depending on whether age-data exist or not. Let T = {1, 2, . . . , t f } be the set of all times for which data exist. And, let T a and T m be a partition of T, where age data are present (T a ) and age data are missing (T m ). T = T a ∪ T m .
For convenience, we treat these two disjoint subsets as the ordered tuples: When age data exist, in each age group we calculate the likelihood of new cases since the last time point that age data were available. For example, on the second day that age data are available, we write the likelihood as where the left-hand side above is a shorthand for the joint likelihood of new total cases and new cases in each of the eight age groups. Again, using ∼ as a shorthand for first differencing across time steps, we can re-write the above equation compactly as L 1 (x t2 ,x 0,t2 , . . . ,x 8,t2 ) = P (x t2 ) · P (x 0,t2 ,x 1,t2 , . . . ,x 8,t2 |x t2 ) = NegBin(x t2 ; ρ t2Jt2 , k 1 ) · MultiNom(x 0,t2 , . . . , with the parameters of the probability mass functions above written after the semi-colon. The parameter k 1 is the dispersion parameter for the negative binomial distribution and ρ t2Jt2 is the mean. The negativebinomial parameterization used is from which the variance can be written as ρ tjJtj + (ρ tjJtj ) 2 /k 1 . Our model for the age data, conditioned on the total data, is multinomially distributed, with probabilities proportional to the ODE-derived expected age-structured incidence. This is the likelihood that would result if each age-structured data stream were independent of each other, and each were Poisson distributed.
If age data do not exist for a time point, we write and note that the first differencing above is done across exactly one time step, meaning that the time step t * j − 1 could have age data. The likelihood of all of the reported data on symptomatic confirmed cases is then

Daily New Hospitalization Data
New hospitalization data (i.e. daily incidence of hospitalization) are treated identically to the symptomatic confirmed case data. A hospital reporting parameter ρ H is used to indicate that in PA and MA, the new daily hospitalization numbers appear to be incomplete. The dispersion parameter for the daily reported numbers is k 2 . As before, time points are separated into those that have age data (a) and those are missing (m) age data: Using the same negative binomial and multinomial approach as for the case data, the likelihood for the hospitalization incidence data stream can be written as L 4 (ỹ t * j ) . (S12)

Daily New Death Data
Death data can be broken down by home and hospital deaths, and they can also be broken down by age.
Again, the age data may only appear at certain time steps, but the home and hospital death data streams typically have complete day to day reporting. The ODE model has state variables for home deaths (D HM ) and hospital deaths (D HP ), and we use D = D HM + D HP for the total death count. As before we break the time points into those that have age data (a) and those are missing (m) age data: When age data are present, the joint probability of the age, home death, and hospital death data can be separated like so P (home deaths, hospital deaths, age-stratified deaths) = P (age|hosp, home) · P (home) · P (hosp) since the at-home deaths and the hospital deaths are independent. We write the likelihood as × P ( z a,t2 − z a,t1 | total deaths ) . (S13) For any timestep j, this can be written more compactly as L 5 (w tj ,z tj ,z 0,tj . . .z 8,tj ) = P (z tj −w tj ) · P (w tj ) · P (z a,tj | total deaths ) = NegBin(z tj −w tj ;D HP,tj , k 3 ) · NegBin(w tj ;D HM,tj , k 4 ) · · · · MultiNom(z 0,tj · · ·z 8,tj ;z tj , {D a,tj /D tj } ) . (S14) When age data are not available this is

Data on Current Number of Patients in Hospital, ICU, and on Ventilators
The model tracks current hospitalizations (H t , summed across age classes), current ICU occupancy (C t ), and current number of patients using ventilators (V t ). Note that in the model these are disjoint groups, so H t represents hospitalized patients that are not in the ICU and C t represents ICU patients that are not on ventilators. The data on the other hand are typically nested so that h t includes all hospitalized patients including ICU patients c t and patients on ventilators v t . The likelihood equations for the 'current' data streams are written down as normal distributions around the first differences of the hospitalized, ICU, and ventilated populations. If first differencing cannot be done because of missing data, time points are simply excluded. We assume independent Gaussian likelihoods for the increments of observed quantities corresponding to H t , C t , and V t , and model the observations of sums of these using standard multivariate normal theory.
The ventilated patient counts are thus modeled as: where ϕ σv is a normal PDF with mean zero and standard deviation σ v . The likelihoods for the data streams of the ICU patient counts is calculated by subtracting out the ventilated patients and the model for the hospitalization counts are similarly constructed by subtracting the ICU patient counts

Hospital Discharges
Data on hospital discharges are not age-structured in any of the data that we have. Therefore, the comparison between cumulative discharges in the data u t and cumulative discharges predicted by the ODE model U t (θ) is modeled as with no "reporting rate" parameter or special accounting of missingness, other than time points being skipped in the calculation when two consecutive data points are not available.

Likelihood of 11 passive surveillance data streams
The likelihood of all 11 passive surveillance data streams is the product of L 1 through L 10 , and this likelihood L is expressed given the ODE parameters (θ), two reporting rates, five dispersion parameters for count data, and three error variances for occupancy data in the hospital, ICU, and on ventilation: L( 11 data streams | θ, ρ, ρ H , k 1 , . . . , k 5 , σ 1 , σ 3 , σ 3 ) .

Random Screening Data
In some instances, data are available that separate a department's asymptomatic/random screening data from their passive surveillance data. Passive surveillance is a health system confirming (by diagnostic test) self-reporting individuals who present with symptoms. Random screening is a health system planning testing campaigns in vulnerable populations, older populations, health-care workers, or certain high-contact groups.
These individuals are tested randomly whether they have symptoms or not. Inference on the reporting parameter ρ assumes that all data are collected via passive surveillance. If the random screening data are reported, the estimate of ρ should be adjusted accordingly. Below we outline this adjustment for ρ using Rhode Island's asymptomatic random screening data.
Let c t be the random screening coverage on day t, i.e. the fraction of the total population that receives a random test per day. For a symptomatic individual, we define p NCAS as the probability that this individual was not caught by random screening before symptoms occurred, at which point they would (with probability ρ t ) present to a hospital or clinic and be reported by the passive surveillance system. The key to clean integration of the passive surveillance and random screening testing components into a single likelihood is to avoid double-counting of individuals. Assuming a test sensitivity of σ E = 0.8 and an incubation period duration of ε = 6.0 days, then and the expected number of daily cases caught by passive surveillance is which then becomes the mean of the negative binomial observation function of new cases, and the likelihood equation should now reflect that a negative binomial distribution with mean as above should be equal to b p,t = total number of positive cases reported through the passive surveillance system (but not the total number of cases reported for that day).
When b p,t is not available, we simply model the sum total of cases that we would expect to find through passive surveillance and random screening combined (i.e. the total number of new cases reported in a day).
This expected value is where N ·c t is the number of individuals tested in a day (through random screening) and π t is the probability that a randomly screened individual will test positive, which is simply the prevalence in the population: Above, we exclude all hospitalized cases as these individuals would not have an opportunity to take part in a random screening program, and we make sure to not double count individuals who had reported through passive surveillance and were informed of their positive test results.
Using the data in Table S2,  The contact rate parameter β t controls the rate of contact between individuals, and is allowed to vary over time through a penalized spline expansion. We use a cubic B-spline expansion with one knot every seven days. If s (t) is the -th B-spline basis function evaluated at time t, then with α 1 , . . . , α L being the spline basis function loadings, which are parameters which need to be estimated.
To model temporal consistency in contact rate parameters, we specify a correlated multivariate normal prior distribution on α 1 , . . . , α L , with correlation defined to penalize second differences between the loadings α 1 , . . . , α L . The variance of the multivariate normal prior is controlled by a variance parameter, σ 2 β , which is given an inverse gamma prior. Together, this hierarchical prior provides a flexible model for changing contact rates over time, with smoothness of the contact rates estimated from the data.
For the Rhode Island and Pennsylvania analyses, the reporting rate parameter ρ t is modeled as a penalized I-spline expansion, with knot locations chosen to allow for flexible, monotonic increases in the reporting rate from March to mid-June. The spline basis function loadings are estimated, with the same prior as described for the contact rate parameter β t . However, in the Massachusetts analysis, the reporting rate ρ t is kept constant, with a uniform prior with limits between 0 and 1, as an I-spline results in a worse fit. Parameters controlling the variance of data models around the mean defined by the ODE model come in two forms. Observed hospitalization data are given Gaussian likelihoods (L 3 , L 4 , L 5 ) with variance parameters to be estimated by the data. These variance parameters are assigned diffuse inverse-gamma priors.
Observed count data are given negative binomial likelihoods (L 1 , L 2 ) and have dispersion parameters which are assigned diffuse exponential priors.

Inference
Inference on parameters controlling the ODE model, such as contact rate parameters and hospitalization stay parameters, as well as parameters that control variation in the observed data around ODE model means,

Parameters and Priors for Final Runs
The most challenging part of the inference was identification of the time from symptoms appearance to hospital admission (time-symp-to-hosp). This parameter was estimated in the early part of the epidemic in China as 4.64 days [4], 7 days [27], and 9.1 days [18]. Later estimates from the US and Europe put these estimates at between 3.5 to 7.0 days [2,10,11,21]. Clinicians working with COVID-19 hospital admissions in Rhode Island indicated that the lower part of this range was more believable.

Rhode Island
All 11 data streams from section 1.1 were included. Reporting rate ρ was fit with an I-spline as RIDOH informed us that March reporting was very low due to lack of testing. Fit was substantially improved when comparing I-spline to constant reporting. Time from symptoms to hospitalization was given a prior of [2.0, 3.5] days as there was nothing in the data to inform this parameter and it appeared to have poor identifiability in many runs. Due to the completeness of the data in Rhode Island, there was strong support for age patterns changing (piece-wise constant assumption) from the spring phase to the summer phase, and there was strong support for a lower rate of ICU admission in the summer phase, although the effect size in RI for this was small. Posterior distributions for clinical and contact parameters are shown in Figure S5.
The prior distribution of the clinical parameters was constrained, with positive support limited to parameter combinations in which the expected cumulative reported symptomatic count total was within 10% of the observed cumulative total in the data. Thus, the MCMC sampler rejected parameter combinations in which the fitted cumulative case count deviated too far from the observed count.

Massachusetts
Only 8 of 11 data streams from section 1.1 were included. Hospital deaths and hospital discharges were not available in Massachusetts, and the 'cumulative hospitalized' data stream could not be used because it was obtained by following up with symptomatic patients to ask about subsequent hospitalization (i.e. this was a fraction of the fraction ρ, and not a fraction of the total population N ). Therefore, in order to obtain reliable fits and convergence in MCMC, probability of hospitalization by age was restricted to tight bounds around the median values obtained for the same parameters in the final RI run. In addition, the prior distribution on the length of the hospital stay (LOS) was set to be between 11.8 and 12.8 days, as this provided the best fit (lowest DIC) among several narrow prior ranges that were investigated. The LOS here refers to a typical medical-floor hospital stay for an individual that does not progress to critical care (these estimates are sometimes difficult to obtain as many studies mix ICU and non-ICU patients when reporting these averages). The prior distribution of the clinical parameters was constrained, with positive support limited to parameter combinations in which the expected cumulative reported symptomatic count total was within 10% of the observed cumulative total in the data. Thus, the MCMC sampler rejected parameter combinations in which the fitted cumulative case count deviated too far from the observed count.
Reporting rate ρ was kept constant as an I-spline provided a worse fit. Time from symptoms to hospitalization was given a prior of [3.5, 5.0] days as, again, there was nothing in the data to inform this parameter.
A mode of about 3.8 days for this parameter showed some identifiability in MA. The 'current number of patients on ventilator' data stream was consistently under-estimated in MA, and the parameters describing probability of progress from ICU to ventilation and duration on a ventilator kept being pulled to lower values by the inference, suggesting that there is something wrong in one of (1) model of clinical progression from hospital to ICU to ICU-on-ventilation, (2) likelihood function linking these data to the model, (3) the completeness of the 'currently on ventilator' data stream, or (4) the assumption that the pattern of clinical progression was constant throughout the epidemic.
In MA, the lockdown end-day had to be fixed at May 11 (plausible based on re-opening schedule of MA).
Otherwise, the inference identified a second (lower likelihood) lockdown end-day in late May. Posterior distributions for clinical and contact parameters are shown in Figure S6.

Pennsylvania
Only 9 of 11 data streams from section 1.1 were included. Hospital discharges were not available, and current number of patients in ICU was not available. Cumulative hospitalization data stopped being reported on the PA DOH website on May 21, and the inference for the reporting rate ρ is largely influenced by these first 2.5 months of data. Reporting rate ρ was fit with an I-spline (as in RI) as this provided a better fit than a constant reporting rate. Time from symptoms to hospitalization was given a narrow prior of [2.5, 3.0] days as there was nothing in the data to inform this parameter and it appeared to have poor identifiability in many runs. There was strong support for age patterns changing (piece-wise constant assumption) from the spring phase to the summer phase, and there was strong support for a much lower rate of ICU admission in the summer phase.
Hospital reporting appears to be incomplete in PA, i.e. the cumulative hospitalizations data stream reported through May 21 appeared to undercount new hospitalizations. Assuming that deaths are not undercounted, and that hospital occupancy and ventilator occupancy are not undercounted, a hospital reporting rate was fit and found to be identifiable. We estimate that 77.0% (95% CI: 62.8% -87.1%) of new hospitalizations were being reported at this time. Posterior distributions for clinical and contact parameters are shown in Figure S7. Figure S3: Model fit to Massachusetts daily data, using the best fit model which accounts for different agebased contact rates after the lockdown and a different rate of ICU admissions starting in early June (Table  3) Figure S4: Model fit to Pennsylvania daily data, using the best fit model which accounts for different agebased contact rates after the lockdown and a different rate of ICU admissions starting in mid-June (Table 3).  len.hospstay scaling factor should be multiplied by 10.7 to get the length of a medical-floor hospital stay in days. The two dev.icu.frac parameters are scaling factors that are meant to be multiplied by the ICU admission probabilities in Lewnard et al [17]. The parameter dev.ventdeath.mid is the scaling factor v in Table S1. Contact rates for age classes are presented as relative to the 0-9 age group. Hospitalization fraction (of symptomatics) is assumed to be the same for the 0-9 and 10-19 age groups.