The COVID-19 Pandemic Vulnerability Index (PVI) Dashboard: Monitoring county-level vulnerability using visualization, statistical modeling, and machine learning

Background While the COVID-19 pandemic presents a global challenge, the U.S. response places substantial responsibility for both decision-making and communication on local health authorities, necessitating tools to support decision-making at the community level. Objectives We created a Pandemic Vulnerability Index (PVI) to support counties and municipalities by integrating baseline data on relevant community vulnerabilities with dynamic data on local infection rates and interventions. The PVI visually synthesizes county-level vulnerability indicators, enabling their comparison in regional, state, and nationwide contexts. Methods We describe the data streams used and how these are combined to calculate the PVI, detail the supporting epidemiological modeling and machine-learning forecasts, and outline the deployment of an interactive web Dashboard. Finally, we describe the practical application of the PVI for real-world decision-making. Results Considering an outlook horizon from 1 to 28 days, the overall PVI scores are significantly associated with key vulnerability-related outcome metrics of cumulative deaths, population adjusted cumulative deaths, and the proportion of deaths from cases. The modeling results indicate the most significant predictors of case counts are population size, proportion of black residents, and mean PM2.5. The machine learning forecast results were strongly predictive of observed cases and deaths up to 14 days ahead. The modeling reinforces an integrated concept of vulnerability that accounts for both dynamic and static data streams and highlights the drivers of inequities in COVID-19 cases and deaths. These results also indicate that local areas with a highly ranked PVI should take near-term action to mitigate vulnerability. Discussion The COVID-19 PVI Dashboard monitors multiple data streams to communicate county-level trends and vulnerabilities and facilitates decision-making and communication among government officials, scientists, community leaders, and the public to enable effective and coordinated action to combat the pandemic.


Introduction 53
Defeating the COVID-19 pandemic requires well-informed, data-driven decisions at all 54 levels of government, from federal and state agencies to county health departments. Numerous 55 datasets are being collected in response to the pandemic, enabling the development of predictive 56 models and interactive monitoring applications (Wynants et al. 2020; ESRI 2020). However, this 57 multitude of data streams-from disease incidence to personal mobility to comorbidities-is 58 overwhelming to navigate, difficult to integrate, and challenging to communicate. Synthesizing 59 these disparate data is crucial for decision-makers, particularly at the state and local levels, to 60 prioritize resources efficiently, identify and address key vulnerabilities, and evaluate and 61 implement effective interventions. To address this situation, we developed a COVID-19 62 Pandemic Vulnerability Index (PVI) Dashboard (https://covid19pvi.niehs.nih.gov/) for 63 interactive monitoring that features a county-level Scorecard to visualize key vulnerability 64 drivers, historical trend data, and quantitative predictions to support decision-making at the local We assembled U.S. county-and state-level datasets into 12 key indicators across four 67 major domains: current infection rates (infection prevalence, rate of increase), baseline 68 population concentration (daytime density/traffic, residential density), current interventions 69 (social distancing, testing rates), and health and environmental vulnerabilities (susceptible 70 populations, air pollution, age distribution, comorbidities, health disparities, and hospital beds). 71 These 12 indicators (some of which combine multiple datasets) are integrated at the county level 72 into an overall PVI score, employing methods previously used for geospatial prioritization and 73 profiling (Bhandari et  In developing the PVI, we performed rigorous statistical modeling of the underlying data 78 to enable quantitative analysis and monitoring and provide short-term predictions of cases and 79 deaths. Our modeling efforts directly address the discussion raised by Chowkwanyun and Reed 80 (2020) about racial disparities in COVID-19 case and death rates. By contextualizing factors 81 such as these racial disparities, correcting for socioeconomic factors, health resource allocation, 82 and co-morbidities, and highlighting place-based risks and resource deficits, the PVI can help 83 explain differences in the spatial distribution of cases. Specifically, we performed three types of 84 modeling efforts, all of which are regularly updated. First, epidemiological modeling on 85 cumulative case-and death-related outcomes provides insights into the epidemiology of the 86 pandemic. Second, dynamic time-dependent modeling provides similar outcome estimates as 87 national-level models but with county-level resolution. Finally, a Bayesian machine learning 88 approach provides data-driven, short-term forecasts. Herein, we describe the development of the 89 PVI, including the epidemiological modeling and machine-learning forecasts, and its use in an 90 interactive web Dashboard. 91

Methods 92
Data Streams Included in the Pandemic Vulnerability Index 93 To the best of our knowledge, we have assembled the most extensive set of community-94 level data streams related to COVID-19. These data streams span four major domains, namely 95 infection rate, population concentration, intervention measures, and healthcare vulnerability. 96 The specific components (i.e., datasets) comprising the current PVI model are provided in a 97 dedicated Details page linked from the Dashboard. Supplementary Table 1  the creation of the PVI score and the weights applied to them were informed by our 137 epidemiological modeling (described in subsequent section) as well as general knowledge of 138 contributors to general health morbidities. 139 The PVI profiles translate numerical results into visual representations as component 140 slices of a radar chart, with each slice representing one piece (or related pieces) of information. 141 For each profile, the radial length of a slice represents its rank relative to all other entities (i.e., 142 counties), with a longer radius indicating higher concern or risk. The relative width (e.g., 143 fraction of a full, 360° circle) of a slice indicates the contribution of its score to the overall 144 model. These visual profiles provide a risk assessment of the strength, relative contribution, and 145 robustness of the multiple data sources used in the model. Figure 2 illustrates the PVI workflow 146 and the results for two example counties. This type of data integration framework has been 147 proven effective for communicating risk prioritization and profiling information among 148 scientists, regulators, stakeholders, and the general public and has been featured in publications 149 for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The diverse array of data assembled for the epidemiological modeling that informs the 153 COVID-19 PVI Dashboard represents an advance over the ever-increasing number of models 154 related to COVID-19. To provide context and ensure that the data streams provide conclusions 155 and priority rankings that are broadly consistent with other epidemiological models, we 156 performed cross-sectional analysis of cumulative (i) cases, (ii) deaths, (iii) deaths as a proportion 157 of the population, and (iv) deaths as a proportion of reported cases using data current as of 158 8/24/2020. We emphasize that the PVI is not intended to be an epidemiological modeling tool 159 per se as it does not explicitly distinguish between factors of vulnerability for cases vs. deaths. 160 Our modeling described here is intended to anchor the components of the PVI and provide 161 context within the larger field of COVID-19-related epidemiological modeling. Additionally, this 162 modeling is not intended to provide forecasts, which are the primary focus of projection models, 163 as discussed in the subsequent section (see Forecasting). 164 As the initial analyses displayed evidence of count overdispersion, we performed 165 generalized linear modeling in R version 3.5 with the gam() procedure using a negative binomial 166 model with observed cumulative counts as the response (see Supplementary Tables 2-5) (R Core  167 Team 2018). For analyses (i), (ii), and (iv), we used log(population size) values as predictors 168 with estimated coefficients. For analysis (iii), we used the "offset" command to model the death 169 rate. Similarly, for analysis (iv), we used log(cumulative cases) as an offset to model the death 170 rate among cases, which may produce biased results due to regional variation in reporting rates. 171 It should be noted that a constant underreporting bias across counties would be absorbed into the 172 intercept and would otherwise produce valid coefficient estimates for the predictors. Analysis 173 (iv) may provide important clues about the death risk as including cases in the denominator 174 removes a large portion of the stochastic variation. Moreover, for all analyses, we used the 175 proportion of the state population that has been tested as a predictor to account for additional 176 sources of bias. 177 To anchor our efforts to previous work, we included as additional fixed predictors those 178 from Wu et al. (2020), who focused primarily on the effects of a PM2.5 air pollution index using 179 an analysis analogous to our model (iii). Before analysis, we removed predictors with pairwise 180 correlation with any other predictor greater than 0.85 and predictors that would be collinear with 181 a series of predictors, such as the overall proportion of minority residents. For pairs exceeding 182 the correlation threshold, we favored predictors with the lower missingness rate (if any) or those 183 that are reported in other work. Dynamic predictors (i.e., those that changed substantially over 184 the modeled period) were incorporated using simple county averages over the March-August 185 period covered by the PVI. With over 3,100 counties (according to FIPS codes), most with >0 186 cases and deaths, the analysis can easily support the 27 to 28 final predictors used. To facilitate 187 comparison with previous sources, we used predictors as they are given in their source. 188 Accordingly, in some instances, predictors are represented as proportions and, in other instances, 189 they are represented as percentages. 190 for use under a CC0 license.
This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 13, 2020. . https://doi.org/10.1101/2020.08.10.20169649 doi: medRxiv preprint To provide additional context, we also performed negative binomial modeling (R version  191 3.5 bam() with "REML" fitting) (R Core Team 2018) of daily cases up to 6/11/2020 192 (Supplementary For the accurate prediction of future COVID-19 cases and deaths, it is necessary to 206 account for the fluid nature of the data streams comprising the PVI. Accordingly, we developed a 207 Bayesian spatiotemporal random-effects model that jointly describes the log-observed and log-208 death counts to build local forecasts. Log-observed cases for a given day are predicted using 209 known covariates (e.g., population density, social distancing metrics), a spatiotemporal random-210 effect smoothing component, and the time-weighted average number of cases for these counts. 211 This smoothed time-weighted average is related to a Euler approximation of a differential 212 equation; it provides modeling flexibility while approximating potential mechanistic models of 213 disease spread. The smoothed case estimates are used in a similar spatiotemporal model that 214 predicts future log-death counts based on a geometric mean estimate of the estimated number of 215 observed cases for the previous seven days as well as the other data streams. The Dashboard 216 shows the resulting county-level predictions and corresponding confidence intervals (Fig. 1 Computing, which provides high-availability HTTPS load balanced with NGINX and a secure 228 environment for web applications. Automated data updates are pushed to the public servers 229 daily, and the daily update process is paralleled on a private server to permit independent data 230 integrity assessment. 231 for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. outcomes, we assessed the rank-correlation between the overall PVI and the key vulnerability-247 related outcome metrics of cumulative deaths ( Figure S1A), population adjusted cumulative 248 deaths ( Figure S1B), and the proportion of deaths from cases ( Figure S1C) This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. attribute to the dynamic model's ability to account for additional sources of variation due to the 286 use of lagged case counts, a smooth time-dependent term to account for national trend, and the 287 inclusion of daily dynamic predictors. Again, the most significant predictors are the population 288 size (p<1E-300), the proportion of black residents (p<1E-300), the two-week-lagged cumulative 289 number of cases as a predictor of current cases (p<1E-300), and mean PM2.5 (p<1E-300). We 290 also ran the analogous model for deaths/population size (Supplementary Table 7) and the same 291 predictors were found to be highly significant. In summary, the dynamic versions of the 292 generalized linear model reinforce and amplify the conclusions from the previous cumulative 293 models. However, the models are not designed to perform forecasting, which can be viewed as 294 essentially a machine learning exercise. For forecasting, careful cross-validation approaches can 295 be used to assess the accuracy of the results. 296 The most consistent significant predictors for COVID-19 related case rates and mortality 297 are the proportion of black residents and the mean PM2.5, reinforcing conclusions from previous 298 reports (Dong, Du, & Gardner 2020). A one-percentage-point increase in the proportion of black 299 residents is associated with a 2.9% increase in the COVID-19 death rate. The effect of a 1 g/m 3 300 increase in PM2.5 is associated with an approximately 14.5% increase in the COVID-19 death 301 rate, which is at the high end of a previously reported confidence interval from a report in late 302 April 2020 (Wu et al. 2020) when deaths had reached 38% of the total as of June 2020. We find 303 that these effects persist when including numerous additional predictors and correcting for 304 factors such as socioeconomic status, housing density, and comorbidities. Moreover, the effects 305 persist for a range of response values, including cumulative (i) cases, (ii) deaths, (iii) deaths as a 306 proportion of the population, and (iv) deaths as a proportion of reported cases (Supplemental  307  Tables 2-5 This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. are an interactive map layer with numerous display options and filters that allow sorting by 334 overall score and combinations of slice scores, clustering by profile similarity (i.e., vulnerability 335 "shape"), and searching for counties by name or state. Any user-selected county overlays the 336 summary Scorecard and populates the surrounding panels with county-specific information 337 (Figure 1). Scrollable panels on the left include plots of vulnerability drivers relative to their 338 nationwide distribution across all U.S. counties, with the location of the selected county 339 delineated. The panels across the bottom of the Dashboard report cumulative county numbers of 340 cases and deaths; timelines of cumulative cases, deaths, PVI scores, and PVI ranks; daily 341 changes in cases and deaths for the most recent 14-day period (a measure commonly used in 342 reopening guidelines); and predicted cases and deaths for a seven-day forecast horizon. 343 Taken  GA to illustrate the effects of dramatic differences in public action/interventions. Figure 3 shows 353 detailed results for the two counties, which have similar baseline vulnerabilities but implemented 354 divergent interventions at the outset of the pandemic. Specifically, pronounced differences in 355 for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 13, 2020. . https://doi.org/10.1101/2020.08.10.20169649 doi: medRxiv preprint intervention measures (social distancing and testing) are associated with varying dynamics of the 356 infection rates in these counties, as visualized through the considerable differences in magnitude 357 of the blue (intervention-related) and red (infection-related) slices over time. Note that all data 358 streams are scaled so that a larger slice indicates increased vulnerability (e.g., the larger blue 359 slices represent less adherence to social distancing and lower testing rates). As visualized in 360 Figure 3, the PVI rank for Orleans County improves over time (i.e., follows a downward 361 rank/percentile path), effectively blunting the curve caused by the accelerated increase in the 362 number of cases through early interventions. There is no similar positive change for Clayton 363 County, reflecting differences due to varying interventions in the two areas. In this way, the PVI 364 Dashboard enables customized empirical comparisons and evaluations across peer counties. 365

Discussion 366
Numerous expert groups have coalesced around a general roadmap to address the current 367 COVID-19 pandemic that comprises (i) reducing the spread through social distancing, (ii) 368 gradually easing restrictions while monitoring for resurgence and healthcare overcapacity, and 369 (iii) eventually moving to pharmaceutical interventions. However, the responsibility for 370 navigating the COVID-19 response falls largely on state and local officials, who require data at 371 the community level to make equitable decisions on allocating resources, caring for vulnerable 372 sub-populations, and enhancing/relaxing social distancing measures. The goal of the COVID-19 373 PVI Dashboard is to empower informed actions to combat the pandemic from the local to the 374 national levels on multiple time scales. The Dashboard accomplishes this goal by combining 375 underlying COVID-19-specific structural vulnerabilities with dynamic infection and intervention 376 data at the county level to produce an integrated concept of vulnerability that can inform 377 decision-making on actions at the local level. 378 Furthermore, the general public must embrace interventions for them to be effective, and 379 interactive visualization is a proven approach to communication among diverse audiences. The 380 PVI Dashboard provides interactive, visual profiles of vulnerability atop an underlying statistical 381 framework that enables the comparison of counties by clustering and the evaluation of the PVI's 382 sensitivity to component data. The Dashboard's county-level Scorecards illustrate both overall 383 vulnerability and the components driving it. A key utility of a public-facing, interactive 384 dashboard is that decision-makers can point to it for support, thus promoting transparency and 385 public buy-in for actions taken in the interest of public health. Example use cases include the 386 priority distribution of medical resources such as hospital beds, targeted community outreach 387 activities, and the establishment of contact-tracing mechanisms. Eventually, the PVI could be 388 used to support the priority distribution of vaccines to highly-vulnerable communities. 389 The modeling efforts presented here support decision-making in multiple ways. The 390 epidemiological modeling enables testing the impact of changes in dynamic interventions, such 391 as changes in social distancing, and the forecasting efforts support short-term resource allocation 392 decisions, such as hospital staffing and the distribution of supplies. These forecasts also help 393 communicate the trends that are part of the CDC's reopening criteria (Centers for Disease 394 Control and Prevention 2020), such as whether interventions and local government actions 395 translate into flattened curves. The PVI score itself constitutes an integrated indicator of 396 vulnerability that is strongly associated with mortality outcomes in the near-to-medium term. 397 for use under a CC0 license.
This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. supports key decision-making for managing the ongoing pandemic. We will continue to update 424 the data streams combined to calculate the PVI and will add additional variables as evidence of 425 new risk factors and potential drivers of vulnerability emerge and supported by publicly 426 available data. We will also continually develop software tools so that people can actively build 427 their own models and will update the modeling efforts as needed. Combating endemic diseases 428 requires long-range thinking, informed action, and political will, and we offer the COVID-19 429 PVI Dashboard as an interactive monitoring tool to support these sustained efforts. 430

Acknowledgments 431
We would like to thank the IT and web services staff at NIEHS for their help and support, 432 as This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 13, 2020.

PVI Legend
Average daily density of fine particulate matter in micrograms per cubic meter (PM2.5) Air Pollution is a relatively low risk for "County X" Air Pollution is a relatively high risk for "County Y" for use under a CC0 license.
This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 13, 2020. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.