CovidCounties - an interactive, real-time tracker of the COVID-19 pandemic at the level of US counties

Management of the COVID-19 pandemic has proven to be a significant challenge to policy makers. This is in large part due to uneven reporting and the absence of open-access visualization tools to present local trends and infer healthcare needs. Here we report the development of CovidCounties.org, an interactive web application that depicts daily disease trends at the level of US counties using time series plots and maps. This application is accompanied by a manually curated dataset that catalogs all major public policy actions made at the state-level, as well as technical validation of the primary data. Finally, the underlying code for the site is also provided as open source, enabling others to validate and learn from this work.


Abstract: 24
Management of the COVID-19 pandemic has proven to be a significant challenge to policy 25 makers. This is in large part due to uneven reporting and the absence of open-access 26 visualization tools to present local trends and infer healthcare needs. Here we report the 27 development of CovidCounties.org, an interactive web application that depicts daily disease 28 trends at the level of US counties using time series plots and maps. This application is 29 accompanied by a manually curated dataset that catalogs all major public policy actions made 30 at the state-level, as well as technical validation of the primary data. Finally, the underlying 31 code for the site is also provided as open source, enabling others to validate and learn from this 32 work. 33 34 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 2, 2020. . Three months later it was declared a pandemic by the WHO, and since then its death 38 toll has reached over 150,000 while infecting over 2 million people across 210 countries 39 worldwide 2 . Additionally, the pandemic has disrupted the daily lives of billions and has incurred 40 significant socioeconomic costs at the global level. 41 In the US, the very assessment of the disease's impact has been challenged by 42 limitations in accurate data capture and analysis. Variable testing, uneven reporting, barriers to 43 data sharing, and a lack of easy-to-use analytic tools have all contributed to a lack of clarity in 44 establishing and trending the state of the pandemic. As a consequence, policy makers at all 45 levels have been forced to make decisions of great socioeconomic consequence in the face of 46 significant uncertainty. 47 To improve the accessibility of basic COVID-19-related information in the US, especially 48 by the general public and policymakers without a data science background, we report the 49 creation of a new interactive visualization tool that depicts daily disease trends at the level of 50 individual US counties. This web application features the novel reuse of several publicly 51 available sources of data while also introducing a new, manually curated dataset accompanying 52 this manuscript. This site features several unique views, including local doubling times and 53 estimated ICU bed requirements by county. Additionally, we report the technical validation of 54 the primary data (counts per county per day) against other official-and commonly used sources 55 of data. 56 57 Methods: 58 59 Data sources: Data on state-wide and county-level counts were obtained from The New York 60 Times 3 via their github repository (https://github.com/nytimes/covid-19-data). County-wise 61 population data were obtained from the US Census 4 using the R package tidycensus 5 . Data on 62 ICU bed availability per county was obtained from Kaiser Health News 6 . 63 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 2, 2020.  Ferguson et al. 7 . Although simpler than other 98 models, it fit publicly available county-level ICU bed data in California well and was easier to 99 understand for the user than more complicated models proposed [8][9][10][11] . This model assumed a 100 4.4% rate of hospitalization among all new cases, a 30% rate of intensive care unit admission 101 among hospitalized patients, and a 9-day average length of stay (time until discharge or death). 102 103 Web Application Development and Deployment: See Figure 1 for an overall schematic of the 104 web application. The source code was written in R (4.1.0) 12 using the shiny 13 , shinyjs 14 , 105 tidyverse 15 and plotly 16 packages. Software version control was achieved using Docker. The 106 entire software code for the site is publicly available on github 107 (https://github.com/vivical/ButteLabCOVID) and dockerhub 108 (https://hub.docker.com/r/pupster90/covid_tracker). The web hosting was organized as a 109 unified data share between all instances running R shiny code and controlled by a load balancer 110 using an auto-scaling mechanism. The web environment is hosted by Amazon Web Services and 111 is located at covidcounties.org. 112 is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 2, 2020. . https://doi.org/10.1101/2020.04.28.20083279 doi: medRxiv preprint 122 First, we demonstrate the high concordance of cumulative cases and deaths calculated and 123 displayed in CovidCounties at the county level by directly comparing these to numbers reported 124 by the Departments of Public Health in California and Connecticut (Figure 2A, 2B). These two 125 states were chosen because they both publicly report the daily counts of cases requiring 126 hospitalization or intensive care at the county level. R 2 rates corresponding to the concordance 127 between predicted and actual counts ranged from 0.86 to 1. To our knowledge, California is 128 only state in the US to report county-wide ICU bed utilization rates. We found a high degree of 129 concordance (R 2 = 0.87) with minimal model bias (Figure 2A Table 1. 23 states reported 147 cases with unknown counties of residence, however, in all states except Rhode Island these 148 cases made up less than 4% of the total cases in that state ( Table 1). The inability to map these 149 cases to specific counties may explain some of the discrepancies between the New York Times 150 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 2, 2020. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 2, 2020. . https://doi.org/10.1101/2020.04.28.20083279 doi: medRxiv preprint The curation of COVID-19 case and death counts by The New York Times is an impressive effort 180 by over 60 reporters to collect, curate and analyze a constantly growing and evolving dataset 3 . 181 However, they acknowledge that the underlying data is extremely fragmented and comes from 182 thousands of different sources at both the state and county levels and thus is inherently limited 183 by accuracy, consistency, and timeliness. The New York Times notes that reported cases have 184 been corrected mere hours after the initial report and there have been numerous instances 185 where data has disappeared from databases without explanation. The New York Times has also 186 chosen to count patients where they were treated rather than their place of residence and 187 report on a number of geographic exceptions in their dataset 188 (https://github.com/nytimes/covid-19-data) including the treatment of cities like New York City 189 and Kansas City and the allocation of cases from cruise ships. Further, there are a subset of 190 cases where the patient's county of residence cannot or has not yet been identified which is 191 generally a small fraction of a state's total cases but can be a significant number in a small state 192 like Rhode Island (Table 1) with state ( Figure 2C) and county ( Figure 2B) reported hospitalizations revealed a 204 systematic bias towards increased hospitalizations in our model. We suspect that this bias is 205 due to a number of factors including time lags between the date of hospitalization and the 206 results of testing, as well as miscalibration of the assumed 4.4% rate of hospitalization taken 207 from the Ferguson model 7, 8,11 . 208 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 2, 2020.  We have developed an intuitive tool that facilitates temporal comparisons between all counties 233 in the US. However, we are inherently limited by the availability of data. While CovidCounties' 234 estimation of ICU needs at the county level allows for higher resolution allocation of resources 235 compared to the widely used state level model from IHME 236 (https://covid19.healthdata.org/united-states-of-america), zip code level data would further 237 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 2, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 2, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 2, 2020. Competing Interests: The authors declare no relevant competing interests 300 301 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 2, 2020. and from a manual curation of state governmental websites and news outlets as described in 307 Methods. Data was processed to reflect case and death counts at the level of states and 308 counties. Functions were written to perform x-and y-axis rescaling, normalization by 309 population, doubling time estimation, and ICU bed utilization. Results were depicted using 310 interactive line plots and maps. 311    CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 2, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 2, 2020. . https://doi.org/10.1101/2020.04.28.20083279 doi: medRxiv preprint CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 2, 2020. . https://doi.org/10.1101/2020.04.28.20083279 doi: medRxiv preprint CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 2, 2020. . https://doi.org/10.1101/2020.04.28.20083279 doi: medRxiv preprint