Abstract
To quickly detect hotspots, the New York City Health Department launched a SARS-CoV-2 percent positivity cluster detection system using census tract resolution and the SaTScan prospective Poisson-based space-time scan statistic. Soon after implementation, this system prompted an investigation identifying a gathering with inadequate social distancing where viral transmission likely occurred.
Spatiotemporal analysis of high resolution COVID-19 data can support local health officials to monitor disease spread and target interventions (1,2). Publicly available data have been used to detect COVID-19 space-time clusters at county and daily resolution across the US (3,4) and purely spatial clusters at ZIP code resolution in New York City (NYC) (5).
For routine public health surveillance, the NYC Department of Health and Mental Hygiene (DOHMH) uses the case-only space-time permutation scan statistic (6) in SaTScan* to detect new outbreaks of reportable diseases (7) (e.g., Legionnaires’ disease (8) and salmonellosis (9)). Given wide variability in testing across space and time, case-only analyses would be poorly suited for COVID-19 monitoring, as true differences in disease rates would be indistinguishable from differences in testing rates. In addition, we sought to detect newly emerging or re-emerging hotspots during an ongoing epidemic, which is more challenging than detecting a newly emerging outbreak in the context of minimal or stable disease incidence. A new approach was needed to detect areas where COVID-19 diagnoses were increasing or not decreasing as quickly relative to other parts of the city.
We developed a system to detect community-based clusters of increased percent test positivity for SARS-CoV-2 in near-real time at census tract resolution in NYC, accounting for testing variability. DOHMH launched the system on June 11, 2020, and the first COVID-19 cluster with a verified common exposure was detected on June 22.
The Study
Clinical and commercial laboratories are required to report all results (including positive, negative, and indeterminate results) for SARS-CoV-2 tests for New York State residents to the
New York State Electronic Clinical Laboratory Reporting System (ECLRS) (10). For NYC residents, ECLRS transmits reports to DOHMH. Laboratory reports include specimen collection date and patient demographics, including residential address. Patient symptoms and illness onset date, if any, are not available from electronic laboratory reports and are obtained through patient interviews, although not all patients are interviewed.
To detect emerging clusters, the space-time scan statistic uses a cylinder where the circular base covers a geographical area and the height corresponds to time (11). This cylinder is moved, or “scanned,” over both space and time to cover different areas and time periods. At each position, the number of cases inside the cylinder is compared with the expected count under the null hypothesis of no clusters using a likelihood function, and the position with the maximum likelihood is the primary candidate for a cluster. The statistical significance of this cluster is then evaluated, adjusting for the multiple testing inherent in the many cylinder positions evaluated.
To quickly detect emerging hotspots, prospective analyses are conducted daily (12). To adjust for the multiple testing stemming from daily analyses, recurrence intervals are used instead of p-values (13). A recurrence interval of D days means that under the null hypothesis, if we conduct the analysis repeatedly over D days, then the expected number of clusters of the same or larger magnitude is one.
The space-time scan statistic can be utilized with different probability models. We used the Poisson model (11), where the number of cases is distributed according to the Poisson probability model, with an expected count proportional to the number of persons tested. Analyses were adjusted non-parametrically for purely geographical variations that were consistent over time, as the goal was to detect newly emerging hotspots. Fitting a log-linear function, we also adjusted for citywide temporal trends in percent positivity, as the goal was to detect local hotspots rather than general citywide trends.
We developed SAS code (SAS Institute, Inc., Cary, NC, USA) that generated input and parameter files (Table 1, Technical Appendix Table 1), invoked SaTScan in batch mode, read analysis results back into SAS for further processing, and output files to secured folders. For any signals (defined as clusters with recurrence interval ≥100 days), the code also generated a patient linelist, visualizations, and investigator notification email. Similar SAS code referencing markedly different input parameters is freely available.†
During June 11–30, 28 unique primary clusters were detected (Table 2). Despite a permissive maximum spatial cluster size setting of half of persons tested, clusters during this period were geographically small (median radius: 0.69 km). Citywide during this period, SARS-CoV-2 percent positivity was 1.3%, while median percent positivity within these clusters was 4.7% (range: 1.2%–30.6%). In 10 clusters, at least half of patients were 18–34 years-old (Table 2).
On June 22, in the context of waning case counts citywide, the system detected a cluster of 6 patients (median age: 40 years) residing in a 0.64-kilometer radius, all with specimens collected on June 17 (Figure). DOHMH staff interviewed patients for common exposures, such as attending the same event or visiting the same location. On June 23, a DOHMH surveillance investigator (D.B.) determined that two patients in the cluster had attended the same gathering, where recommended social distancing practices had not been observed. In response, DOHMH launched an effort to limit further transmission, including testing, contact tracing, community engagement, and health education emphasizing the importance of isolation and quarantine.
Conclusions
Automated spatiotemporal cluster detection analyses detected emerging, highly focused areas to target COVID-19 containment efforts in NYC. One-third of clusters consisted predominantly of young adults, suggesting poor adherence to social distancing guidelines in this age group (14).
Cluster investigations required substantial effort, and while only one cluster included patients with a verified common exposure, detecting localized transmission is important to prioritize focused interventions such as promoting increased testing and public messaging. During June, we made several adjustments to improve signal prioritization, including increasing the minimum temporal cluster size from 2 to 3 days and increasing the minimum number of cases in clusters from 2 to 5 cases.
Our system is subject to several limitations. First, analyses were based on specimen collection date, but given delays in testing availability and care seeking, these dates did not necessarily represent recent infections. Timeliness was further limited by delays from specimen collection to laboratory testing and reporting. Clusters dominated by asymptomatic patients or patients with illness onset >14 days prior to diagnosis may not require intervention, as a positive PCR result indicates the presence of viral RNA but not necessarily viable virus (15). Second, geocoding is required for precision, and of unique NYC residents with a specimen collected during June 2020 for a PCR test for SARS-CoV-2 RNA, 4.9% had a non-geocodable residential address and were excluded from analyses. Finally, automation coding was complex (Technical Appendix). Planned SaTScan software enhancements that will facilitate wider adoption by other health departments include: adding a software interface for prospective surveillance, enabling temporal and spatial adjustments for the Bernoulli probability model, and enabling the log-linear temporal trend adjustment with automatically calculated trend at a sub-annual scale.
Our COVID-19 early detection system has highlighted areas in NYC warranting a rapid response. This work has guided prioritization of case investigations, contact tracing efforts, health education, and community engagement activities. Such local targeted, place-based approaches are necessary to minimize further transmission and to better protect people at high risk for severe illness, including older adults and people with underlying health conditions.
Data Availability
Data at high spatiotemporal resolution are not publicly available in accordance with patient confidentiality and privacy laws. Publicly available data are linked below.
First author biographical sketch
Dr. Greene is the director of the Data Analysis Unit at the Bureau of Communicable Disease of the New York City Department of Health and Mental Hygiene, Long Island City, New York. Her research interests include infectious disease epidemiology and applied surveillance methods for outbreak detection.
Technical Appendix
Geocoding
Patient addresses were geocoded daily using version 20A of the NYC Department of City Planning’s Geosupport geocoding software, implemented in R through C++ using the Rcpp package.3 Addresses that failed to geocode were then cleaned using a string searching algorithm performed against the Department of City Planning’s Street Name Dictionary and Property Address Directory. Addresses that failed to geocode after cleaning were then verified using the IBM Infosphere USPS service.
Study period and time precision
SaTScan v.9.6 can estimate a temporal trend (see below), but only at an annual time scale, as this feature was originally developed to accommodate long-term secular trends across multiple years, as for cancer incidence. As a workaround to accommodate a rapidly changing trend, as for SARS-CoV-2 test positivity, reassign one day as if it were one year in the SaTScan case and population input files and conduct analyses at annual resolution. For example, for a 21-day study period ending June 19, 2020, reassign May 30, 2020 as the year “2000” and June 19, 2020 as the year “2020,” and indicate a time precision and a time aggregation of “year,” (i.e., PrecisionCaseTimes=1 and TimeAggregationUnits=1 in the SaTScan parameter file). The minimum and maximum temporal cluster sizes would be input as years instead of days.
Similarly, with input data expressed in years, nonparametric adjustment for space by day-of-week interaction was not possible.
Temporal trend adjustment
As a workaround for a bug in SaTScan v.9.6 in calculating a temporal trend adjustment in the prospective setting, first use the case and population files to run a retrospective purely temporal Poisson analysis, with the temporal adjustment “Log linear with automatically calculated trend” (TimeTrendAdjustmentType=3 in the SaTScan parameter file). Read in this automatically calculated temporal trend from the SaTScan text output. Retain the magnitude of trend (“X”) and sign of X determined by “increase” or “decrease.” Example SaTScan text output excerpt:
SaTScan v9.6
Program run on: Mon Jun 22 05:17:48 2020
Retrospective Purely Temporal analysis
scanning for clusters with high rates
using the Discrete Poisson model.
Adjusted for time trend with an annual decrease of 6.42984%.
The time trend is the same for retrospective and prospective analyses. Then, run the prospective spatio-temporal Poisson analysis, inserting the calculated time trend in the parameter file as user-specified (TimeTrendAdjustmentType=2, TimeTrendPercentage=-6.42984 in the SaTScan parameter file). Example user interface screenshot:
Acknowledgments
We thank all staff members of the DOHMH Incident Command System Surveillance and Epidemiology Section for processing, cleaning, and managing input data; for conducting patient interviews and cluster investigations; and for logistical support. We also thank the NYC Test and Trace Corps for their assistance in managing the cases and contacts included in and identified by cluster investigations.
S.K.G. and E.R.P were supported by the Public Health Emergency Preparedness Cooperative Agreement (grant NU90TP922035-01), funded by the Centers for Disease Control and Prevention. This article’s contents are solely the responsibility of the authors and do not necessarily represent the official views of the Centers for Disease Control and Prevention or the Department of Health and Human Services.
Footnotes
↵* Kulldorff M, Information Management Services, Inc. SaTScan v9.6: software for the spatial and space- time scan statistics (www.satscan.org). 2018.
↵† https://github.com/CityOfNewYork/communicable-disease-surveillance-nycdohmh
↵3 Eddelbuettel D, Francois R. Rcpp: Seamless R and C++ integration. Journal of Statistical Software. 2011;40(8):1–18.