SUMMARY
Background Several research efforts have evaluated the impact of various factors including a) socio-demographics, (b) health indicators, (c) mobility trends, and (d) health care infrastructure attributes on COVID-19 transmission and mortality rate. However, earlier research focused only on a subset of variable groups (predominantly one or two) that can contribute to the COVID-19 transmission/mortality rate. The current study effort is designed to remedy this by analyzing COVID-19 transmission/mortality rates considering a comprehensive set of factors in a unified framework.
Method We study two per capita dependent variables: (1) daily COVID-19 transmission rates and (2) total COVID-19 mortality rates. The first variable is modeled using a linear mixed model while the later dimension is analyzed using a linear regression approach. The model results are augmented with a sensitivity analysis to predict the impact of mobility restrictions at a county level.
Findings Several county level factors including proportion of African-Americans, income inequality, health indicators associated with Asthma, Cancer, HIV and heart disease, percentage of stay at home individuals, testing infrastructure and Intensive Care Unit capacity impact transmission and/or mortality rates. From the policy analysis, we find that enforcing a stay at home order that can ensure a 50% stay at home rate can result in a potential reduction of about 30% in daily cases.
Interpretation The model framework developed can be employed by government agencies to evaluate the influence of reduced mobility on transmission rates at a county level while accommodating for various county specific factors. Based on our policy analysis, the study findings support a county level stay at home order for regions currently experiencing a surge in transmission. The model framework can also be employed to identify vulnerable counties that need to be prioritized based on health indicators for current support and/or preferential vaccination plans (when available).
Funding None.
Evidence before this study We conducted an exhaustive review of studies examining the factors affecting COVID-19 transmission and mortality rates at an aggregate spatial location such as national, regional, state, county, city and zip code levels. The review considered articles published in peer-reviewed journals (via PubMed and Web of Science) and working articles uploaded in preprint platforms (such as medRxiv). A majority of these studies focused on a small number of counties (up to 100 counties) and considered COVID-19 data only up to the month of April. While these studies are informative, cases in the US grew substantially in recent months. Further, earlier studies have considered factors selectively from the four variable groups - socio-demographics, health indicators, mobility trends, and health care infrastructure attributes. The exclusion of variables from these groups is likely to yield incorrect/biased estimates for the factors considered.
Added value of this study The proposed study enhances the coverage of COVID-19 data in our analysis. Spatially, we consider 1258 counties encompassing 87% of the total population and 96% of the total confirmed COVID-19 cases. Temporally, we consider data from March 25th to July 3rd, 2020. The model system developed comprehensively examines factors affecting COVID-19 from all four categories of variables described above. The county level daily transmission data has multiple observations for each county. To accommodate for these repeated measures, we employ a linear mixed modeling framework for model estimation. The model estimation results are augmented with policy scenarios imposing hypothetical mobility restrictions.
Implications of all the available evidence The proposed framework and the results can allow policy makers to (a) evaluate the influence of population behavior factors such as mobility trends on virus transmission (while accounting for other county level factors), (b) identify priority locations for health infrastructure support as the pandemic evolves, and (c) prioritize vulnerable counties across the country for vaccination (when available).
INTRODUCTION
Coronavirus disease 2019 (COVID-19) pandemic, as of July 19th, has spread to 188 countries with a reported 14.4 million cases and 603 thousand fatalities1. The pandemic has affected the mental and physical health of people across the world significantly taxing the social, health and economic systems2. Among the various countries affected, United States has reported the highest number of confirmed cases (3.8 million) and deaths (140 thousand) in the world3. In this context, it is important that we clearly understand the factors affecting COVID-19 transmission and mortality rate to prescribe policy actions grounded in empirical evidence to slow the spread of the transmission and/or prepare action plans for potential vaccination programs in the near future. Towards contributing to these objectives, the current study develops a comprehensive framework for examining COVID-19 transmission and fatality rates in the United States using COVID-19 data at a county level encompassing about 87% of the US population. The study effort is designed with the objective of including a universal set of factors affecting COVID-19 in the analysis of transmission and mortality rates. We employ an exhaustive set of county level characteristics including (a) socio-demographics, (b) health indicators, (c) mobility trends, and (d) health care infrastructure attributes. We recognize that analysis of COVID-19 data with a subset of factors, as has been the case with earlier work, is likely to yield incorrect/biased estimates for the factors considered. The framework proposed for understanding and quantifying the influence of these factors can allow policy makers to (a) evaluate the influence of population behavior factors such as mobility trends on virus transmission (while accounting for other county level factors), (b) identify priority locations for health infrastructure support as the pandemic evolves, and (c) prioritize vulnerable counties across the country for vaccination (when available).
In recent months, a number of research efforts have examined COVID-19 data in several countries to identify the factors influencing COVID-19 transmission and mortality. Given the focus of our current study, we restrict our review to studies that explore COVID-19 transmission and mortality rate at an aggregated spatial scale. To elaborate, these studies explored COVID-19 transmission and mortality rates at the national4–6, regional7, state8, county9–14, city15 and zip code levels16. A majority of these studies considered transmission rate as the response variable (transmission rate per capita). The main approach employed to identify the factors affecting the response variables is the linear regression approach. In their analysis, researchers employed a host of independent variables from four variable categories: socio-demographics, health indicators, mobility trends and health care infrastructure attributes. For socio-demographics, studies found income, race and age distribution have a positive association with the COVID-19 transmission 11,16–18. Regarding health indicators, earlier research found that smokers, obese and individuals with existing health conditions are more likely to be severely affected by COVID-1911. In terms of mobility trends, studies showed that staying at home and effective mobility restriction measures significantly lower the COVID-19 transmission rate4,7,9,10,14 while increased mobility resulted in increased COVID-19 transmission12,19. Finally, among health care infrastructure attributes, testing rate is linked with reduced risk of COVID-19 transmission4,5. While earlier research efforts have considered the factors from all variable categories, it is important to recognize that each individual study focused only on a subset of variable groups (predominantly one or two) and have not controlled explicitly for other variable groups that can contribute to the COVID-19 transmission/mortality rate.
The current study builds on earlier literature examining the factors affecting COVID-19 transmission and mortality rate and contributes along the following directions. First, we extensively enhance the spatial and temporal coverage of COVID-19 data in our analysis. Spatially, earlier research on COVID-19 aggregate data has focused on a small number of counties (up to 100 counties). In our study, we consider all counties with total number of cases greater than 100 on July 3rd. The 1258 counties selected encompass 87% of the total population and 96% of the total confirmed COVID-19 cases. Temporally, earlier research has only considered data up to the month of April. While these studies are informative, cases in the US grew substantially in the recent months. Hence, in our study we have considered data from March 25th to July 3rd, 2020. The longer period of data (101 days) also enables us to study/test for the evolution of variable effects over time. Second, earlier research studies have considered factors from one or two of the categories of variables identified above. Further, studies that tested health indicators employed one or two measures selectively. In our analysis, we conduct a comprehensive examination of factors affecting COVID-19 from all four categories of variables including (a) socio-demographics: distribution by age, gender, race, income, location (urban or rural), education status, income inequality and employment, (b) health indicators: percentage of population suffering from cancer, cardiovascular disease, hepatitis, Chronic Obstructive Pulmonary Disease (COPD); diabetes, obesity, Human Immunodeficiency Virus (HIV), heart disease, kidney disease, asthma; drinking and smoking habits, (c) mobility trends: daily average exposure, social distancing matrices, percentage of people staying at home, and (d) health care infrastructure attributes: hospitals per capita, ICU beds per capita, COVID-19 testing measures. Finally, the research study employs a robust modeling framework in terms of model structure and dependent variable representation. A linear mixed model system that addresses the limitations of the traditional linear regression framework for handling repeated measures is employed. For dependent variable, alternative functional forms of COVID-19 transmission – natural logarithm of daily cases per 100 thousand people and natural logarithm of 7-day moving average of cases per 100 thousand people - are considered in model estimation. The overall approach allows us to robustly quantify the impact of factors affecting COVID-19 transmission.
METHODS
Data Collection
Independent variables
Table 1 summarizes sample characteristics of the explanatory variables with the definition considered for final model estimation, the data source, and sample characteristics (minimum, maximum and mean values). For the sake of brevity, in the ensuing discussion we only present details of two groups of independent variables: health indicators and mobility trends. Using health indicator data, we ranked the 1,258 counties in a descending order of health metric and provided it in Figure 1. Further, we compute the average values for different health indicators across the healthiest and unhealthiest 10 counties to highlight the change in health conditions across the two groups. The values clearly emphasize the vulnerability of the unhealthiest counties relative to the healthiest counties. For instance, number of HIV patients in the healthy counties are 75·83 while in the unhealthiest counties, it is almost 430% higher (407).
To incorporate mobility trends, we considered two variables: daily average exposure20 and social distancing metric (from SafeGraph1) to serve as surrogate measures for the mobility patterns (see Supplementary Material for detail). Figure 2 provides a summary of both these measures at a state level from January 22nd to July 3rd. From the figure, we can clearly see the reduction in average daily exposure in March as many states and local jurisdictions imposed lockdowns. By late April, exposure activity started to increase again across all the states while still being lower than the levels for February. In terms of the staying at home measure, as expected, we find an exactly opposite trend.
Dependent variables
We analyze two county level dependent variables: (1) COVID-19 daily transmission rate per 100K population and (2) COVID-19 mortality rates per 100K population. For the transmission rate analysis, we tested two alternative functional forms: daily cases per 100 thousand people and 7-day moving average of cases per 100 thousand people. The moving average data is likely to be less volatile and serves as a stability test for the daily cases model. The reader would note that we used a natural logarithmic transformation for all the dependent variables. The COVID-19 dataset from Center for Systems Science and Engineering (CSSE) Coronavirus Resource Center at Johns Hopkins University21 provides information on the daily confirmed COVID-19 cases, number of people recovered (when available) and the number of deaths from COVID-19 starting from January 22nd to the current date across 3,142 counties in the United States. In our research, we confined our analysis to the cases between March 25th to July 3rd resulting in 101 days of data. Further, we focus on counties that have at least 100 cases by July 3rd and have available information on the mobility trends. With this requirement, a total of 1,258 counties are included in the analysis providing a coverage of 87% of the total population in the United States. For mortality rate, we considered the fatalities within the same time frame across all the 1,258 counties as the transmission rate variable. The summary statistics of the dependent variable are presented in bottom row panel of Table 1.
Data Analysis
For the analysis of daily COVID-19 transmission rate, we have repeated measures of the variable (101 repetitions for each county). The traditional linear regression model is not appropriate to study data with multiple repeated observations22. Hence, we employ a linear mixed modeling approach that builds on the linear regression model while incorporating the influence of repeated observations from the same county. A brief description of the linear mixed model is provided in the Supplementary Material. For modeling the COVID 19 mortality rate, we rely on simple linear regression approach as the dependent variable here is the total number of COVID-19 deaths per 100K population at a county level. Both models are estimated in SPSS.
Role of the Funding Source
There was no funding source for this study.
RESULTS
COVID-19 Transmission Rate Model Results
The estimation results for the linear mixed model are presented in Table 22.
Socio-demographics
In terms of female population, we find that higher proportion of females in the population has a positive impact on transmission rated. The result is in contrast to earlier studies that show women are less likely to be affected by COVID-19 transmission relative to men16. Among age and racial distribution proportions, we found that increased percentage of younger individuals (<18 years) and African-Americans is associated with more transmission(see earlier work for similar findings11,18). It has been suggested that African-Americans in general reside in densely populated low income neighborhoods with lower access to amenities and are employed in industries that requires more public exposure17. Educational attainment in a county also plays an important role in influencing the COVID-19 transmission. The counties with higher share of individuals with less than high school education are likely to report increased incidence of COVID-In terms of income, we find that higher median income counties have a higher incidence of COVID-19. The effect of income might appear counter-intuitive at first glance. However, it is possible that higher income individuals are more likely to get tested (even in the absence of symptoms) due to higher health insurance affordability. With respect to employment rate, counties with higher employment rate reflect more exposure and have a positive association with transmission. The percentage of people living in rural area offers a negative association with the daily COVID-19 incidence. This is intuitive as rural areas are sparsely populated and hence have more opportunity for social distancing thus lowering transmission rates.
Health indicators
With respect to health indicators, we tried several variables in the transmission rate model. Of these, two variables - number of people suffering from HIV and hepatitis C in a county offered significant impacts. We observe that counties with higher percentage of HIV and hepatitis C patients have an increased incidence of COVID-19 transmission. Individuals with these diseases have weaker immune systems and hence are more susceptible to COVID-19 transmission.
Mobility Trends
In terms of mobility trends, we tested two measures: daily average exposure and percentage of people staying at home. In considering these variables in the model, we recognize that exposure will have a lagged effect on transmission i.e. exposure to virus today is likely to manifest as a case in the next 5 to 14 days. In our analysis, we tested several lag combinations and selected the 10 day lag exposure as it offered the best fit. The exposure variable offers interesting results. Until April 25th, exposure variable does not have any impact on transmission. This trend strongly coincides with the lower exposure trends (see Figure 2). After April 25th, increased exposure is associated with higher transmission rates 10 days into the future (see Hamada and colleagues19 for similar findings). For the second measure, staying at home with 14 days lag, we find that daily transmission rates are affected as expected4,10. The reader would note that the two measures considered were not found to be correlated and thus were simultaneously considered in the model.
Health Care Infrastructure Attributes
From Table 2, we find that counties with more hospitals per capita are more likely to report higher COVID-19 transmission rate. This result perhaps accounts for the higher availability of COVID-19 testing. The last set of variables within this category corresponds to COVID-19 testing effects. Again, we select a 5 day lag as testing results are likely to be reported in 3-5 days. The coefficient of this variable is positive as expected and highly significant4. However, after May10th, the effect has a lower magnitude, which suggests that compared to the previous time period (before May 10th), higher testing rate will increase the daily COVID-19 transmission at a marginally lower rate.
Temporal factors
With data available for 101 days, we can evaluate the effect of the transmission rate in previous days on the current day. As expected, we find a positive association between the daily COVID-19 transmission rate and the number of cases 7 and 14 days prior. The result suggests higher transmission rate in previous time periods (7 and 14 days earlier) is likely to result in increased transmission. However, the effect is higher for the 7 day lagged variable, as evidenced by the higher magnitude associated with the corresponding time period in Table 2. Further, the 7 day lagged transmission rate after June 21st implies a higher impact perhaps explaining the sudden surge in COVID-19 cases in recent weeks. Finally, the weekend variable highlights that the COVID-19 transmission rate is lower during weekends possibly because of reduced testing rate on weekends23.
Correlation
As indicated earlier, we developed the mixed linear model for estimating the daily COVID-19 transmission rate per 100,000 people while incorporating the dependencies across each county for various repetition levels (such as week and month). We found that the model accommodating weekly correlations provided the best result in terms of statistical data fit and variable interpretation. The final set of variables in Table 2 corresponds to the correlation parameter across every 7 days within a county.
COVID-19 Mortality Rate
The coefficients in Table 3 represent the effect of different independent variables on COVID-19 mortality rate at a county level.
Socio-demographics
A higher percentage of older people in a county leads to an increased COVID-19 mortality rate (see 14,18 for similar findings). Further, consistent with previous research17, the current analysis also found that the percentage of African-Americans is positively associated with COVID-19 mortality rate. The variable specific to education attainment indicates that the likelihood of COVID-19 mortality increases with increasing share of people with less than high school education in a county. We find that counties with higher income inequality are more likely to experience higher number of COVID-19 deaths per capita relative to the counties with lower income disparities24. Finally, higher employment rate has a positive association with COVID-19 mortality rate.
Health Indicators
Several variables significantly influence the COVID-19 mortality rate in a county. For instance, in comparison to other counties, counties with higher number of HIV, cancer, asthma and cardiovascular patients are more likely to have higher number of COVID-19 deaths. This is expected25–28 as people with such conditions usually have weaker immune system which makes them vulnerable to the disease.
Health Care Infrastructure Attributes
The number of ICU beds per capita at a county is found to have a negative impact on COVID-19 mortality rate suggesting a reduced death rate with higher number of ICU bed per person in a county. The result is intuitive as more ICU bed per capita indicates the county is well equipped to handle higher patient demand and treatment is accessible to more COVID-19 patients.
Policy Implications
To illustrate the applicability of the proposed COVD-19 transmission model, we conduct a scenario analysis exercise by imposing mobility restrictions. While earlier researchers explored the influence of mobility measures, these models did not account for county level factors such as socio-demographics, health indicators and hospital infrastructure attributes. In our framework, the sensitivity analysis is conducted while controlling for these factors. The hypothetical restrictions on mobility are considered through the following changes to two variables:
county level average daily exposure reduced by 10%, 25% and 50%
county level percentage of stay at home population increased to 40%, 50% and 60%. The changes to the independent variables were used to predict the dependent variable.
Subsequently, the variable was converted to the daily cases per 100 thousand people. The results from this exercise are presented in Table 4. We present the average change in cases for all counties (1,258), and for the 25 counties with the highest overall transmission rates. From Table 4, two important observations can be made. First, changes to average daily exposure and stay at home population influence COVID-19 transmission significantly. In fact, by increasing stay at home population share to 50% the model predicts a reduction of the number of cases by about 30%.
Second, the benefit from mobility restrictions is slightly higher for the 25 counties with higher overall cases. The two observations provide evidence that issuing lockdown orders in counties with a recent surge is a potential mitigation measure to curb future transmission.
The COVID-19 total mortality rate model can be employed to identify vulnerable counties that need to be prioritized for vaccination programs (when available). While prioritizing the counties based on mortality rate might be a potential approach, it might not always be feasible. To elaborate, vaccination programs have to be planned well in advance (say 2 months) of the vaccine availability. As total mortality rates for 2 months into the future are unavailable, we need a model to predict total mortality into the future. The estimated mortality rate model provides a framework for such analysis. To be sure, it would be prudent to update the proposed model with the latest data to develop a more accurate prediction system.
DISCUSSION
The current study develops a comprehensive framework for examining COVID-19 transmission and fatality rates in the United States at a county level including an exhaustive set of independent variables: socio-demographics, health indicators, mobility trends and health care infrastructure attributes. In our analysis, we consider all counties with total number of cases greater than 100 on July 3rd and analyze daily cases data from March 25th to July 3rd, 2020. The COVID-19 transmission rate is modeled at a daily basis using a linear mixed method while the total mortality rate is analyzed adopting a linear regression approach.
Several county level factors including proportion of African-Americans, income inequality, health indicators associated with Asthma, Cancer, HIV and heart disease, percentage of stay at home individuals, testing infrastructure and Intensive Care Unit capacity impact transmission and/or mortality rates. The results clearly support our hypothesis of considering a universal set of factors in analyzing the COVID-19 data. Further we conducted policy scenario analysis to evaluate the influence of social distancing on the COVID-19 transmission rate. The results highlight the effectiveness of social distancing in mitigating the virus transmission. In fact, we found that by increasing stay at home population share to 50% the model predicts a reduction of the number of cases by about 30%. The finding provides evidence that issuing lockdown orders in counties with a recent surge is a potential mitigation measure to curb future transmission.
To be sure, the study is not without limitations. The study is focused on county level analysis and is intended to reflect associations as opposed to causation. For such causation based analysis, data from individuals would be more suitable. While exposure data were reasonably addressed, data was not available for mask wearing behavior across all counties. Finally, the data on transmission and mortality are updated for few counties to correct for errors or omissions. These were carefully considered in our data preparation. However, it is possible that further updates might be made after we finished our analysis.
Data Availability
All the data regarding the COVID cases are collected from Center for Systems Science and Engineering (CSSE) Coronavirus Resource Center at Johns Hopkins University. For independent varaibles, The socio-demographic variables are collected from the American Community Survey; The health indicator variables are collected from the Centers for Disease Control and Prevention (CDC) systems; information on social distancing is collected from Safegraph data; Information about the hospitals and ICU beds are gathered from the County level health ranking data; COVID-19 testing measures are sourced from the COVID-19 tracking project
https://coronavirus.jhu.edu/map.html
Contributors
NE conceptualized the study. TB and NE finalized the study design. TB, SD and NC conducted the literature review. TB, SD and NC collected the data. TB, SD, NC, and NE analyzed and interpreted the model results. TB, NC and SD prepared the figures. TB, SD, NC and NE drafted the main manuscript. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Interests
We declare no competing interests.
Acknowledgement
The authors would like to gratefully acknowledge SafeGraph COVID-19 Data Consortium, County Health Ranking and Road Maps, Centers for Disease Control System for providing access to the data at county level for United States.
Footnotes
Tel: 407-543-7521; Email: sudiptadeytirtha2018{at}knights.ucf.edu
Tel: 321-295-2134; Email: naveen.chandra{at}knights.ucf.edu
Tel: 407-823-4815; Fax: 407-823-3315; Email: naveen.eluru{at}ucf.edu
↵1 SafeGraph is a data company that aggregates anonymized location data from numerous applications in order to provide insights about physical places. To enhance privacy, SafeGraph excludes census block group information if fewer than five devices visited an establishment in a month from a given census block group
↵2 As discussed earlier, we also developed the same mixed linear model to estimate the 7-day moving average of COVID-19 cases per capita and find similar results as in the daily COVID-19 transmission model (results are available upon request from the authors). This further reinforces the stability of the transmission model.