Rates of SARS-CoV-2 transmission between and into California state prisons

Correctional institutions are a crucial hotspot amplifying SARS-CoV-2 spread and disease disparity in the U.S. In the California state prison system, multiple massive outbreaks have been caused by transmission between prisons. Correctional staff are a likely vector for transmission into the prison system from surrounding communities. We used publicly available data to estimate the magnitude of flows to and between California state prisons, estimating rates of transmission from communities to prison staff and residents, among and between residents and staff within facilities, and between staff and residents of distinct facilities in the state’s 34 prisons through March 22, 2021. We use a mechanistic model, the Hawkes process, reflecting the dynamics of SARS-CoV-2 transmission, for joint estimation of transmission rates. Using nested models for hypothesis testing, we compared the results to simplified models (i) without transmission between prisons, and (ii) with no distinction between prison staff and residents. We estimated that transmission between different facilities’ staff is a significant cause of disease spread, and that staff are a vector of transmission between resident populations and outside communities. While increased screening and vaccination of correctional staff may help reduce introductions, large-scale decarceration remains crucially needed as more limited measures are not likely to prevent large-scale disease spread.

In addition to expanding and continuing overall disease spread, correctional institutions amplify racial inequities in disease burden.The rate of incarceration is six times as high for Black people as for white people.Similarly, Latinx and indigenous people are incarcerated at a rate three times that of white people [30].For this reason, outbreaks in prisons tend to amplify disparities in disease burden [31][32][33][34][35][36][37][38][39][40][41], with all its consequences including the mass disabling impacts of post-acute sequelae of COVID-19 (i.e.long COVID) [42][43][44][45], even without accounting for disparities among prison residents.Racial differences in incidence and/or mortality within correctional facilities may further exacerbate health disparity [46].Additionally, counties with more Latinx and Indigenous people and lower average incomes are associated with higher infection rates in correctional facilities located Department of Corrections and Rehabilitation (CDCR).These data report a daily cumulative number of resident and staff cases at each of the 34 CDCR facilities.Cumulative case counts for each county were obtained from the California Health and Human Services Open Data Portal.
A daily count of community cases in each California county was estimated by subtracting prison resident and staff cases in the county from the cumulative count of community cases.Isotonic regression was then used to estimate nondecreasing series of numbers of county, resident, and staff cases, and daily new cases were then calculated as the first difference of each of those series.

Statistical analysis
We modeled transmission between recorded cases using a Hawkes process model (see Appendix A for details).We used maximum likelihood estimation of the Hawkes process's parameters to estimate the rates of transmission between and within the resident and staff populations within the facilities and between distinct facilities, and from community to residents and staff of a prison in the prison's county and across the state.
The model included ten parameters describing intensity of transmission among prison resident, staff, and community populations, listed and described in Table 1.The ten unknown model parameters provide the constants of proportionality determining transmission rates between populations (see Appendix A for details).
The timing of transmission events was parametrized using a generation interval between infection of a case and transmission from that case, combined with a reporting interval from a case's date of infection to the day that case is listed in case count data.The generation time distribution used in these estimates was that estimated in a meta-analysis of COVID-19 generation times [65], a Weibull distribution with mean 5.5 days and standard deviation 1.8 days (parameters α = 3.37, β = 6.12).The reporting interval was parametrized as a sum of a log-normal incubation period (mean 5.51 days, s.d.2.4 days) and log-normal detection delay (mean 5 days, s.d.2.8 days) as estimated by Xu et al. [66][67][68].The contribution of a source case to creation of secondary cases on each day was assumed proportional to the probability density of the generation interval, plus the secondary case's reporting interval, minus the first case's reporting interval.The generation interval and second case's reporting interval conditional on infection date (forward reporting interval ) have the above distributions, while the first case's reporting interval conditional on reporting date (backward reporting interval ) is estimated as follows, where p r (∆t) is the probability mass function of the forward reporting interval and Î(t) is an estimate of the daily number of true infections [69]: .
We estimated incidence from the reported case counts as Î(t) = X(t + r) where X(t) is reported case count on day t and r is the nearest whole number to the mean forward reporting delay.
The model parameters were estimated by fitting the data using maximum likelihood estimation, using the log likelihood function defined in Appendix A. Confidence intervals for each parameter value were estimated using profile likelihood estimation [70].
We estimated the number and proportion of resident and staff cases attributed to each of the six different sources of infection modeled-from residents and staff in the same institution, from residents and staff in the CDCR system as a whole, and from community cases in the county containing the institution and across the state.Proportion of cases attributed to a source was estimated by the total intensity of transmission from the source relative to total intensity of transmission from all sources, where total intensity of transmission from a source is the per capita transmission rate estimated by the model fit scaled by the total number of cases recorded in the source population.

Transmission hypotheses
To evaluate the implications of the model fit, we constructed two nested models as special cases of the full model by constraining its parameters, detailed in Appendix A.5.
An "independent prisons" model, representing a hypothesis that transmission between prisons is not involved in outbreak dynamics, that is, that outbreaks in prisons originate in introductions from the surrounding communities, into either the resident or staff populations.This model is implemented by defining the between-prison transmission parameters of the full model to equal zero.The remaining parameters were fit using maximum likelihood estimation on the full model's log likelihood function.
Second, a "well-mixed prisons" model represents a hypothesis that staff and residents do not have distinct roles in transmission, and can be treated as interchangeable in modeling.This is implemented by assuming the contact rates with all populations are equal for staff and for residents.
The remaining free parameters were fit using maximum likelihood estimation on the full model's log likelihood function.
We applied a likelihood ratio test to evaluate whether there were significant differences in the fit between the full model and either nested model, with three and six degrees of freedom, respectively.We estimated parameter confidence intervals using profile likelihood, and the number and proportion of resident and staff cases attributed to each source, for the two nested models as for the full model as described above.

Sensitivity analysis
Since true dates of infection are unknown and can only be approximated by dates of case detection, there is an inherent amount of uncertainty in the transmission dynamics of the cases recorded that can not be avoided.To examine the dependence of model results on assumptions about precision in detection of cases' timing, we fit the full model's parameters using a range of assumed reporting delay distributions, and the best fit results were plotted.The mean and standard deviation of the reporting delay distribution were varied independently by scaling and shifting its probability density function, and the effects of variation in the mean and standard deviation on the model estimate were reported separately.

Code and data availability
All data used in this study came from public sources.Source code and data files are available from the corresponding author by request.

Results
The data analyzed included 48,389 resident, 17,778 staff, and 2,492,711 community cases after removing prison cases from the California counties' counts.Outbreaks in California state prisons often coincided with surges of COVID-19 transmission in their surrounding communities (Figure 1), and outbreaks among residents often coincided with surges of cases among staff.The correlations among seven-day average case counts in a facility and its county were 0.42 between residents and staff, 0.24 between community and staff, and 0.14 between community and residents.The full Hawkes process model estimated that transmission to residents was almost entirely from other residents within a facility, and transmission to staff was largely from staff at the same facility (Table 1, Figure 2, Figure 3, Figures B. 1-B.6).Infective residents within the same facility are estimated to have contributed about 99.5% of the transmission to resident cases, and similarly staff at the same facility contributed about 92.9% of the transmission to staff.A smaller amount of transmission is estimated between staff across facilities, contributing about 3.1% of transmission to staff.Transmission between residents and staff within a facility is similarly comparatively rare, contributing about 0.4% of resident and 2.9% of staff cases.Contact with the local community contributes about 0.8% to staff cases.The result is similar under the independent prisons model, which assumes no transmission between facilities, and correspondingly attributes more staff cases to local community contacts, while under the well-mixed prisons model, which makes no distinction between staff and residents, transmission to each is primarily from local staff and residents, secondarily from other facilities, and thirdly from local communities.
Likelihood ratio testing found that the full model differed significantly from the independent prisons model (p < 0.01), indicating that the rate of transmission between distinct CDCR facilities is significantly different from zero.The rate between staff at distinct facilities in particular has 95% confidence interval distinct from zero in the full model.
The full model also differed significantly from the well-mixed prisons model (p < 0.001), indicating a difference between the sources for transmission to staff and to residents.This suggests that transmission from community to staff, and subsequently from staff to residents, is likely a contributor to prison COVID-19 cases.

Independent Prisons
Well

Sensitivity analysis
We examined how the maximum-likelihood estimate of the full model's parameters was affected by assumptions about timing of cases' detection, by varying the mean and standard deviation of the reporting delay distribution.The model results are found to be insensitive to variation in the mean delay.We found that the standard deviation of the reporting delay, indicating the amount of uncertainty in timing of infections, has a quantitative impact on the model results, but not a qualitative one in the range of values examined (Figure 4).

Discussion
Our results indicate that prison staff unsurprisingly have more outside contact events than prison residents, both with the surrounding community and with other facilities.The latter may involve staff who travel between facilities [52,53]: CDCR data shows staff members work at an average of two facilities [3].This supports the idea that staff may be an important source of transmission into and out of prisons.Testing of staff may be inadequate to prevent spread into prisons [59], and the problem may be exacerbated by low staff vaccination levels [60].
Both testing and vaccination are important occupational health measures for correctional staff as well as for outbreak prevention [71,72].Transmission due to transfer of infected residents is not clearly distinguished from zero by this model, perhaps due to relative rarity compared to everyday transmission.Nonetheless, it is known to have caused major outbreaks in the CDCR system, and must be taken very seriously as a source of risk.
Prevention of prison outbreaks is crucial, not only for protection of residents and staff, but to reduce overall transmission and exacerbation of unjust disparities in disease burden [8,11,13,15,24,33,[73][74][75][76][77].Decarceration continues to be urgently needed [6,8,11    ences between the 34 CDCR facilities may be obscured by this model.Dynamics such as rare but influential transfers of infected individuals may not be identifiable from the data set used.Variable case detection could introduce bias; for example, if testing is more frequent during outbreaks in facilities, the proportion of cases originating in the facilities could be overestimated.The assumption of symmetric transmission rates between staff and residents may be a limitation, as for example one or the other population may be more likely to be isolated while infective.
We find the Hawkes process formulation to be a powerful and flexible technique for estimation of transmission rates between segments of a population.Mechanistic knowledge of the process such as a generation time distribution can be used directly to infer mixing rates from data without extensive intermediary steps such as simulation.It can be used to estimate mixing rates in a broad variety of structured-population disease transmission problems, and it may prove useful in a wide range of disease modeling applications in the future.

A.2 Discrete-Time Hawkes Processes
Implicit in the definition of Hawkes processes is the assumption that one can always resolve the timing of each event, so that no two events are simultaneous.Here, the data consists a series of daily case counts X i (t) of each type, and multiple cases are commonly observed each day, so we must define a discrete-time version of the Hawkes process to proceed.
Informally, the Hawkes process is such that ∆N i (t) = N i (t + ∆t) − N i (t) is Poisson distributed with rate h i (t) ∆t, where h i (t) depends (only) on events up to time t, whereas the probability of multiple events in the interval [t, t + ∆t) is of order (∆t) 2 , which becomes negligible as ∆t → 0.
We measure time so that ∆t is one day, and thus cannot neglect the possibility of multiple events on a single day, and thus define our process N i (t), t = 0, 1, . . .assuming N i (0) is given, where we can understand the latter terms as the discrete equivalents of the integrals t 0 ν ji (t − s) N j (ds).By definition, N i (t) = N i (0) + t−1 s=0 ∆N i (s).The probability that ∆N i (t) = n it is then n it !and the log likelihood for a vector of parameters Θ given N = (N 1 , . . ., N m ) is thus

A.3 Detection Delays
Inference of transmission dynamics may be affected by the timing of cases' inclusion in case counts relative to the date of their infection, if the interval from infection to case reporting date is variable.
Let each case, identified by an index x, be infected on day t x and included in case counts for day t x = t x + d x , where d x has probability mass function p d (s), s = 0, 1, . ... Let X i (t) denote the reported case count on day t while ∆N i (t) = N i (t + 1) − N i (t) is the number of cases infected on day t.The case counts on day t are then We approximate this process by using a serial interval distribution relating case detections to previous detections.Case detections on day t are generated at rate Given a case detected on day s, the probability [69,104] that a given one of its secondary cases is detected on day t is Two of those probabilities are given by distributions already defined: The other one is a backward detection interval probability [69]: .
The true daily incidences ∆N being unobserved, we approximate by assuming the true incidence rises and falls as the reporting curve does with a lag, ∆N i (t) ∝ X i (t + d), with d the nearest integer to the mean reporting delay: .
Using that, we can model the cases recorded on day t as being generated on day t at rate with .
We use this to define an adjusted likelihood function:

A.4 Application to Case Counts
When the process is represented by a series of daily case counts X i (t) (t = 0, . . ., T ) of each type, we have a log likelihood function For our purposes, we will assume mutually exciting groups are the number of infected staff at each correctional facility and the number of infected residents at each correctional facility.We will include the number of infected members of each county excluding prison residents and staff as an external source of hazard.
We write X R i (t) and X S i (t) for the daily number of cases reported in residents and staff, respectively, at the i th facility, i = 1, . . ., m f (where m f is the number of facilities).For the functions σ ij , we use a common generation interval distribution ν(∆t) and detection delay distribution p d (∆t) to determine the contribution of each past infection to the present force of infection.Assume that infectious contacts between residents of facility i and facility j occur at rate β ij (with units of events per unit time), so that the contribution of the former to infection of the latter on day t Similarly, assume that infective contacts between staff at facility i and residents at facility j occur at rate γ ij , such that its contribution to the infection of residents of facility j is γ ij T s=0 σ ij (t − s; s) X S i (s).Assume that infective contacts between staff of facility i and facility j occur at rate ζ ij , so that the contribution to infection of staff of facility j due to residents and staff of facility i is . Finally, we model transmission between the community and the prisons.In this study we consider only transmission into the prisons, not to or between community members, and we assume that all transmission from outside the prison system is from the California county populations.
Let ψ ki denote the contact rate between residents of prison i and county k, and let ξ ki denote the contact rate between staff of prison i and county k.Denote the daily number of (non-prison) cases in county k by X C k (t), k = 1, . . ., m c , where m c is the number of counties.We then write the total intensity for residents in prison j, including transmission from residents, staff, and the surrounding community: For the intensity for staff in prison j, we have For simplicity, we will assume that β ij = β w when i = j (within facility) and β ij = β b when i = j (between facilities); we use analogous notation for γ w , γ b , ζ w , ζ b .Similarly we model the parameters ψ and ξ as ψ w and ξ w between a facility and the county containing that facility, and ψ b and ξ b between a facility and all other counties.Let the parameters be written as a vector for all i, k, t as X.The log likelihood for the parameters is then This log likelihood function is used to fit the parameter vector Θ using maximum likelihood estimation.
Given a parameter estimate Θ, the contribution of each source population to transmission to residents or staff is estimated by its contribution to their total Hawkes process intensity over time, e.g.T t=0 m f j=1 T s=0 β w σ jj (t − s; s) X R j (s) for within-facility transmission from residents to residents.The proportion of transmission from each source is estimated by the absolute contribution divided by the maximum likelihood estimate of the total Hawkes intensity to residents or staff.

Figure 1 :
Figure 1: The 7-day average of daily community, resident, and staff cases in each prison and its county, between April 1, 2020 and March 22, 2021.The 7-day average of community cases does not include resident and staff cases in prisons and has been scaled by 1/100.See Table C.1 for abbreviations of facility names.

Figure 2 :
Figure 2: Estimated absolute and percent contributions of sources to transmission to residents and staff cases.

Figure 3 :
Figure 3: Estimated proportion of resident and staff cases attributed to each source of infection, under the full Hawkes process model and the two nested Hawkes process models.

Figure 4 :
Figure 4: Dependence of full model result on mean (left) and standard deviation (right) of assumed case reporting delay.Tinted bars denote fraction of infection intensity affecting residents and staff due to each source of infection.

A. 5
Nested models The independent prisons model was implemented by defining the between-prison parameters β b , γ b , and ζ b to equal zero.The remaining parameters were used in fitting.The well-mixed prisons model was implemented by constraining β w = γ w = ζ w , β b = γ b = ζ b , ψ w = ξ w , ψ b = ξ b , and using the remaining free parameters β w , β b , ψ w , ψ b in fitting.B Detailed transmission intensitiesB.1-B.6 detail the estimated intensities of transmission to prison residents and staff attributes to various sources, under the full and nested models described above.

Figure B. 1 :
Figure B.1: Incidence rates (gray) and intensities (colors, stacked) for resident cases by facility, in the maximum likelihood estimate for the full Hawkes process model.

Figure B. 3 :Figure B. 4 :Figure B. 5 :Figure B. 6 :
Figure B.3: Incidence rates (gray) and intensities (colors, stacked) for resident cases by facility, in the maximum likelihood estimate for the independent prisons nested model, with local transmission only.

Table 1 :
Description and estimated values of model parameters, with 95% confidence intervals.

−Mixed Prisons Full Model
, 12, 14, 15, 18, 19, 21, 23, 24, 26,   30, 33, 61, 74, 76, 78-97].This study has a number of limitations.The use of a stationary Hawkes process formulation does not account for temporal variation in transmission rates, whether due to depletion of susceptible individuals or other causes.While we use it here to look at the relative contributions to the transmission rate, which may be relatively insensitive to that approximation, it may introduce inaccuracy.The time period studied may not be predictive of dynamics after March 2021.Differ-

Table C .
1 lists the names and abbreviations used for CDCR's facilities.