## Abstract

We estimate the U.S. temperature response curve for COVID-19 and show transmission is highly sensitive to temperature variation despite summer outbreaks widely assumed to show otherwise. By largely replacing the daily death counts states initially reported with counts based on death certificate date, we build a week-ahead statistical forecasting model that explains most of the daily variation (R^{2} = 0.97) and isolates COVID-19’s temperature response profile (p < 0.001). These counts normalized at 31°C (U.S. mid-summer average) scale up to nearly 160% at 5°C. Positive cases are more temperature sensitive, scaling up to almost 400% between 31°C and 5°C. Dynamic feedback amplifies these effects, suggesting that cooling temperatures are likely to the substantially increase COVID-19 transmission.

**Article Summary Line** COVID-19’s temperature response profile is reliably estimated using re-assembled state-reported data and suggests the onset of cold weather will amplify its spread.

## Introduction

The question of whether COVID-19 exhibits a pronounced temperature response profile has garnered attention since the early days of the pandemic *(1,2,3)*. Kissler et al.’s examination of medium and long-term management of the pandemic assigns a prominent role to understanding how the temperature response profile (hereafter “TRP”) of COVID-19 might influence the pandemic’s progression in the United States *(3,4)*. This follows conventional wisdom regarding the strong seasonal pattern of influenza *(5,6)*, which helped mask COVID-19’s early U.S. ascent *(7)*. The recent rise in positive COVID-19 cases and related deaths across the United States has called into question whether COVID-19 transmission is adversely sensitive to summer weather. Nevertheless, some modeling groups such as the University of Washington’s Institute for Health Metrics (IHME) have assumed COVID-19 activity will increase as temperatures decline and are now using information from the U.S. influenza monitoring network *(8)*, which was previously shown predictive of the pandemic’s path last spring *(7)*, to help incorporate that effect.

Daily state reports often bear little resemblance to the number of COVID-19-related deaths actually occurring on that date. The resulting temporal data misalignment creates a substantial impediment to recovering any relationships with critical dependence on event timing. By reconstructing the death counts states report daily – largely by substituting in retroactive corrections based on death certificate dates – we are able to reliably estimate the TRP for COVID-19 deaths.

We assemble over 2,500 state-level daily observations from April 16–July 15, 2020, after COVID-19 became well-established across the United States. We follow the environmental economics literature on estimating pollution and temperature related impacts on a range of health and other outcomes (9). SARS-CoV-2 being a novel virus makes full implementation of the standard approach infeasible because multiple years of both spatially and temporally delineated data are not available. The specific issue we cannot currently resolve is whether the TRP over a given temperature range is the same in both directions, i.e. the cooler-to-warmer direction reflected in our data set and the warmer-to-cooler direction now occurring. It is also important to recognize that we estimate the joint, not separate, effect of any biological response by the virus and any behavioral response by the public.

### Prior Efforts

Early attempts to pin down COVID-19’s temperature response profile (TRP) proved it to be elusive. The now well-accepted approach for estimating influenza’s TRP is epitomized by Barreca and Shimshack *(6)*, which draws heavily on the modelling of climate impacts on human populations *(9,10,11)*. Under this approach a panel data set of political entities, such as countries or their political subregions (states, counties, etc.), is assembled and the outcome of interest observed across a long time horizon (e.g., a 20-year period). The ability to employ fixed-effect indicator variables to correct for time-invariant differences between political jurisdictions, coupled with the use of short-run weather variability, provides statistical identification of a variety of impact response functions.

Prior research has three important limitations. First, cross-sectional data cannot statistically identify the desired function without making the implausible assumption that all possible confounding variables are adequately controlled for. Routinely updated time-series models slowly incorporate environmental conditions into their forecasts without ever isolating it. Short panel datasets, where the stimulus of interest has limited range (temperature, humidity, UV light, air pollution in each location), often lack the statistical power to pin down such response functions. Consequently, estimated response functions are often fragile in the sense that statistical significance is lost when time trends or demographic variables are added to models *(12,13,14,15)*.

Second, early work (often using Chinese or cross-country data) focused on predicting speed at which the pandemic ramped up in different locations, using variants of derivative statistics like growth rates or R0*(16,17)*. These works point the 0-10°C temperature range as being most conducive to spreading COVID-19, with a possible humidity effect *(17,18,19)*. Current interest is now focused on situations where COVID-19 is spatially well-seeded and its effective R0 can move up or down with actions like state reopening plans.

Third, the quality and comparability of reported COVID-19 statistics is often suspect, particularly from the early phase of the pandemic. We move past this period to an observational window where reporting largely stabilized. A different problem now dominates temporal mismatches between when an event (e.g., a COVID-19-related death) was reported and the weather variables potentially influencing that event *(11)*. When reported event dates significantly differ from actual event dates, the resulting measurement error can overwhelm standard sources of biological variation such as individual differences in incubation periods.

### Correcting State-Level COVID-19 Statistics And Why It Matters

Our ability to isolate the TRP for COVID-19 related deaths stems largely from our reconstruction of state-level COVID-19 data. The most important of these is replacing the daily death counts initially reported by states with the revised daily counts based on actual death certificate date where possible. If death certificate data is not available, we use the retroactively corrected data series that several states have produced. When available, this type of data generally rectifies many initial reporting errors. We also correct other implausible data reports such as implicit negative daily death counts and zero counts on one day followed by a clear double-count the following day. The Appendix Data Preparation section provides details.

Figure 1 displays why repairing daily state-level COVID-19 death counts matters. Panel shows the originally reported (COVID Tracking Project [CTP]) CTPDailyDead_{it} daily death counts for (Florida) in blue with the “Revised” death counts by death certificate date overlaid in red. Revised curves follow the general shape predicted by epidemic models, whereas curves based on the originally reported counts have clear day-of-the-week patterns and large spikes, neither of which are predicted by biological models. Panel (B) shows the two implications. First, the confidence interval from fitting a simple quadratic trend model is dramatically larger for the CTPDailyDead_{it} (i.e., the Reported data) than the same model fit using the Revised. Second, the model based on the CTPDailyDead_{it} is slow to pick up the sharp rise in Florida deaths because the originally reported counts temporally lag the revised counts. Figure 1, Panels (C) and (D) display results for Georgia which show how use of CTPDailyDead_{it} can lead to both missing a downturn and an upturn, while supporting an incorrect story of slow progress. Appendix Figure A1 provides similar graphs for Arizona and Texas.

Figure 2(A) shows the week ahead forecast from the University of Washington’s Institute of Health Metrics and Evaluation (UW-IHME) plotted against our model’s DailyDead_{it} at the state level for the entire United States over the three-month period we examine. This model explains a reasonable amount of the variance, reflected in the R^{2} of .66 obtained by regressing the Revised on their forecasts. Substituting in the week-ahead forecasts from one of the other heavily used forecasting models results in a similar overall impression.

Figure 2(B) plots the predicted DailyDead_{it} using Eq. 1, which only uses information available a week earlier. This regression’s R^{2} of .97 is reflected in the very tight scatter plot. Since our model is relatively simple, the main source of the improvement in the percent of the variance explained is building the model off the temporally-aligned death counts provided by states post-hoc.

### Modelling Approach

Our objective is to isolate COVID-19’s TRPs with respect to state-level daily death counts (DailyDead_{it}) and new positive test counts (NewPositives_{it}). Any TRP specification should allow for the possibility of non-responsiveness (i.e., zero temperature dependence). TRPs should not incorporate fixed factors like state demographics, nor other factors associated with calendar date or a clear temporal profile. Rather, daily exogenous variation in temperature on any specific day should be the source of statistical information for identifying the TRP of interest. Intuitively, the TRP is being statistically identified by having days where the lagged count of the COVID-statistic of interest is approximately the same magnitude but a range of different temperatures is observed. The number of such comparable days is substantially increased by the introduction of controls for conditions that remain fixed across states and a flexible time trend. Statistical modeling issues revolve around functional forms and the specification of the relevant lag structure. The slow-moving, systematic changes in temperature over time imply that, over short time horizons, temperature will not be the driving force behind DailyDead_{it} and NewPositives_{it}. However, over longer time horizons, a virus’s TRP can be a major factor.

We index DailyDead_{it} and NewPositives_{it} by state (*i*) and day (*t*). For each we build a simple week-ahead forecasting model, controlling for state-level fixed effects and including a quadratic time trend. We then examine how lagged (t-k) maximum daily temperature (MaxTemp_{it-k}) scales the model’s predicted DailyDead_{it} or NewPositives_{it}. Attention is restricted to conditions where MaxTemp_{it} ≥ 5°C, since data below this level is sparse during our sample period and concentrated in a few sparsely populated states like Alaska and Montana. Empirical estimates of our TRPs are normalized to 100 at 31°C (∼88°F) – the U.S. population-weighted average for the last week of our sample – to aid interpretability.

The pandemic modelling community has largely concentrated on DailyDead_{it}, believing it to be a more reliable indicator of the COVID-19 infection pool than NewPositives_{it} due to the large differences in testing regimes across states and time. We proceed in a similar manner, but also produce the TRP for NewPositives_{it} (conditioning on available testing information) since it is positive cases that are potentially directly influenced by temperature.

### Model Components

Our base DailyDead_{it} statistical model (Eq. 1) is comprised of two multiplicative components. The first produces expected current-period deaths as a function of past observed deaths at a fixed temperature. The second allows expected DailyDead_{it} to (potentially) vary with past values of MaxTemp_{it-k}.

The terms inside the first component are the infection pool proxy, DailyDead_{it-7}, state-level fixed effect indicators, StateIndicator_{i} and a quadratic time trend in Days_{t} (t=1, …, 137; initialized to March 1 to aid interpretability). The StateIndicator_{i} captures the influence of a wide range of variables which remain constant over the period examined, such as demographic composition, geographic links between locales that influence infections, and public health infrastructure. The time trend variables pick up the decline in the case fatality rate and the average effect over time of initial lockdowns, social distancing, and state re-openings. The first component is exponentiated to incorporate the restriction that expected DailyDead_{it} should not be negative if COVID-19 transmission is active anywhere in the set of connected units examined. Commensurately, we use LogDailyDead_{it-k} and LogDays_{t} as regressors, so this component can be interpreted as a log-log regression model with state-level fixed effects.

The second component is a logistic function scaling predicted DailyDead_{it} up or down with MaxTemp_{it-k}. Deaths on specific days are the result of infections propagated over earlier days. We use LogMaxTemp_{it-7} and LogMaxTemp_{it-14} (one and two weeks in the past respectively) to roughly encompass the relevant period for temperature influencing current period deaths. Our panel data model specification uses an additive error term necessitates using nonlinear least squares to solve the model (*20,21*), but decouples conditional mean estimates from the estimated error component.

Formally, our base model for U.S. state *i* on day *t* is given by:
where estimated coefficients for the StateIndicator_{i}, and the Greek letter parameters minimize the sum of the squared error term of the estimated ε_{it}. The NewPositives_{it} model is similarly structured.

We investigate (Appendix Table A6) the sensitivity of our results to a range of alternative specifications (e.g., different infection pool indicators, different temperature scaling functions, adding absolute humidity, relative humidity, ultraviolet radiation, inclusion of shelter-in-place/reopening orders, lagged cumulative death counts, and use of the CTP death counts) and provide further discussion of modelling issues in the Appendix Modeling Approach section.

## Data

Our analysis uses three main types of data:

COVID-19 statistics for state-level death counts, positive cases, and tests, using The COVID Tracking Project (CTP; covidtracking.com) as our base information source.

Temperature data from the U.S. National Weather Service Integrated Surface Database.

State-level indicators and time variables.

We undertook extensive repair of the COVID-19 data reported daily by states, particularly those involving death counts. A dominant feature of these data are the substantial lags between when many of these events officially occurred and when they are reported. It is not uncommon to see states include deaths from several weeks prior in any given day’s count.

Over half of the U.S. states have made the number of COVID-related deaths publicly available by their death certificate dates in some manner (Appendix, Data Preparation). A simple OLS regression of death counts by death certificate date on originally reported death counts for many of these states yields an R^{2} of less than 0.5. Other states have made corrections to originally reported COVID-19 statistics. These updated counts tend to retroactively correct a myriad of reporting errors by states contained in the CTP daily data snapshot. These issues include failing to report any information on specific days, decisions on “probable” COVID-related deaths, and the resolution of duplicated death certificates. We use states’ self-corrected counts when available.

When neither of these two sources of information was available, we undertook a consistent set of data repair operations. These include: averaging across days with no reporting, prorating backwards in time large batches of deaths (and other statistics) known to have occurred over a much longer period (typically from nursing homes), and making the minimum set of changes necessary to correct logical violations such as negative counts of new deaths, positive cases, and tests. We make this dataset available for use by other researchers, along with a line-by-line account of the corrections made.

For each state, weather variables are taken from the airport with the highest volume of commercial traffic. We focus on maximum daily temperature. Mean temperature results are quite similar. State-level aggregation requires taking weather data from a single station. Measurement error induced by this compromise is likely to be less than one might think. Many states are small spatially or have a single, concentrated metropolitan area (e.g., Illinois). In spatially large states with large populations, most of the population often lives within reasonable proximity to the largest airport. Even in Texas, most people live along the corridor between Dallas (DFW is our representative airport) and Houston. As a result, over 60% of the American population lives within 300km of the representative airport for their state. Similarly, positive cases are also concentrated in major metropolitan areas near those airports. More generally, a single source of classical measurement error attenuates parameter estimates toward finding no effect.

### Daily Dead Model Results

Eq. 1 is estimated using non-linear least squares and has an R^{2} of 0.97. All parameter estimates (provided in Appendix Table A1) using robust standard errors clustered at the state-level are significant. Figure 2(B) displays the Revised DailyDead_{it} versus the model’s in-sample predicted values. States where the actual death certificate dates were available have considerably smaller prediction errors (Model 2 in Appendix Table A6).

The model’s quadratic time trend suggests DailyDead_{it} has fallen over time; at a declining rate from mid-April through the end of May, and then almost flat (slowly rising). LogDailyDead_{it-7} is the dominant predictor and its coefficient estimate of 0.8686 (t=35.32) has a standard elasticity interpretation.

Figure 3 shows the estimated TRP implied by the parameter estimates for the two MaxTemp_{it} lags in the logistic scaling function. The vertical axis represents expected DailyDead_{it} at each temperature value when both lags are set to the same temperature and the TRP is normalized to 100 at 31°C. (Appendix Figure A9 show the corresponding TRP using the original CTP death counts.) Appendix Figure A2 provides the contour plot which allows the two MaxTemp_{it-k} lags values to vary independently.

Differentiating Eq. 1 with respect to DailyDead_{it-7} produces a measure of how the expected DailyDead_{it} increases from a one death increase in DailyDead_{it-7}. Setting *t* and MaxTemp_{it-k} to chosen values yields [TRP*β*EXP(StateBase_{it})]/(DailyDead_{it-7})^{(1-β)}, where the TRP directly scales the elasticity parameter, β, on LogDailyDead_{it-7}, and StateBase_{it} is the temporally varying sum of the fixed StateIndicator_{i} and the quadratic time trend. As an example, for Georgia on July 15 (StateBase_{it} = 3.9184, MaxTemp_{it-7} = 29.4, MaxTemp_{it-14} = 27.8 and the corresponding non-normalized TRP = 0.0368), changing DailyDead_{it-7} from 27 to 28 is predicted to increase the expected DailyDead_{it} by 1.0457. Appendix Figure A3(B) displays a variant of this calculation for individual states as DailyDead_{it-7} increases from 0 to 1.

We simulate DailyDead_{it} predictions from both static and dynamic variants of our estimated base model (Eq 1) by taking DailyDead_{it-7} from our sample period’s last week and MaxTemp_{it} = 31°C as the initial values to propagate the simulation. We fix MaxTemp_{it-7} and MaxTemp_{it-14} at 31°C from day 1 to day 45 to mimic the rest of the summer period, then progressively decrease them by 0.2°C each day until 5°C is hit, which occurs on day 175, after which temperature is held constant.

The brown dashed line in Figure 4 provides a stylized static [left vertical axis] representation of information contained in our base model (Eq. 1) using Georgia as an example (since it starts out at just below our 31C° normalization point and in a cold year hits our 5C° endpoint). The two MaxTemp_{it-k} change in tandem, with all other variables fixed at their initial values. In this static response model expected DailyDead_{it} increases through only one channel: the direct impact of lowering temperature.

The blue dashed curve in Figure 4 shows the dynamic counterpart [right vertical axis] that allows MaxTemp_{it-k} to influence DailyDead_{it}, which is then used as the lagged model input for subsequent projections. Temperature affects expected DailyDead_{it} through two channels: the direct effect shown in the static model and the indirect, compounding effect of when the initial death count of each cycle of the simulation is set to a lag of the previous output. The indirect effect of maximum temperature is the dominant mechanism, accounting for more than 90% of the increase in the expected death count as temperature falls from 31°C to 5°C. The inset in Figure 4 plots the two curves under the same scale and conveys the magnitude of the differences between the pure static and dynamic responses.

Neither model is likely to represent reality, but together may span it. Staying on the static TRP curve requires continual reductions in effective contact rates that exactly offset the increase in transmission potential, such that the current death count is always held to be equal to last week’s death count. Observing the full dynamic effect would require the absence of offsetting actions by both the government and the public (such as increased social distancing and face mask adherence as COVID-19 activity quickly increased).

StateIndicator_{i} values for some geographically isolated states with small populations like Hawaii are small enough to suggest their COVID-19 death counts would be unsustainable in warm enough weather. This is not true of most states, though, and is inconsistent with Baker et al.’s finding from examining earlier emergent viruses that warming weather is not enough by itself to stop their spread *(22)*.

The Appendix section on Alternative Specifications for Base Death Count Model describes additional analyses that (a) look at alternative temperature scaling functions, (b) substitute DailyDead_{it-14} or NewPositives_{it-7} for DailyDead_{it-7}, and (c) add the dates of state actions such as shelter-in-place orders as control variables. This work shows that our finding of a strong TRP for DailyDead_{it} is quite robust. Some specifications suggest the TRP is flatter in the 10-20°C range but steeper in the 5-10°C range (Appendix Figure A8).

### New Positive Case Model Results

NewPositives_{it} are modeled similarly to Eq. 1, substituting LogNewPositives_{it-7} as the infection pool regressor. Testing information is required to interpret reported state-level positives cases. We use LogNewTests_{it}, to control for current testing intensity, LogNewTest_{it-7} (which allows for a past positivity rate interpretation), total tests administered per thousand lagged by one week (PerCapitaTests_{it-7}) to help control for prior testing intensity, and an indicator variable for systematically lower reporting on Monday. We chose LogMaxTemp_{it-7} for consistency with the death count model. For the other temperature lag the 2^{nd} lag fits best.

This model’s (Appendix Table A2) R^{2} is 0.95. All regressors are significant at p < 0.001, except for some of the test related variables: LogNewTest_{it-7} (p=0.005), PerCapitaTests_{it-7} (p=0.002) and Monday (p=0.015). Estimated parameters for the two lagged temperature variables are considerably larger than their DailyDead_{it} counterparts. The implied TRP for NewPositives_{it} is displayed in Figure 5. Appendix Figure A(4) shows contour plot allowing the two temperature lags to vary independently.

## Discussion

DailyDead_{it} predictably varies with changes in maximum daily temperature (Figure 3). This relationship is considerably more pronounced for new positive cases (Figure 5). Cooler temperatures with the progression of fall and winter will dramatically ramp up the number of new positive cases and the deaths that follow unless current infection pools are dramatically reduced. Dynamic feedback between rising infection pool indicators and cooling temperatures (Figure 4) suggests delay in responding to increased virus activity signs can result in rapid escalation. This is already being seen in the current spatial pattern of outbreaks and the ever-increasing medium-run death count forecasts (e.g., UW-IHME *(8)*). Investment in providing the pandemic modeling community with timely counts based on death certificate dates would allow them to deliver substantially more accurate and timely warnings of impending upturns.

Warming temperatures during the spring and summer actively aided efforts to reduce COVID-19’s spread and may have even contributed to a false sense of the efficacy of those efforts. Cooling temperatures will present a quite different challenge. Figure 6 shows the average date over the past 30-years when each U.S. county enters the particularly dangerous 10°C to 5°C. Effects of temperature amplification show up with a lag.

There has long been fear of facing COVID-19 during influenza’s October to March season *(23)*. Our results suggest, that in addition to this concern, COVID-19 transmission will become increasingly more efficient at transmission than it was during the summer. Further, if deaths and positive case TRPs behave like influenza *(6)*, both will continue to increase from 5°C until a few degrees below freezing.

## Data Availability

All data and code from the paper will be archived at github before publication.

## Funding

The University of California, San Diego provided partial financial support for the work reported on in this paper. No outside support was received.

## Competing interests

The authors declare no competing interests. Contributions to this paper by S.L. Carson and T.K. Dye were done outside of their usual employment.

## Author contributions

**Richard T. Carson:** Conceptualization, Methodology, Software, Validation, Formal Analysis, Investigation, Data Curation, Writing – Original Draft, Visualization, Supervision, Project Administration.

**Samuel L. Carson:** Investigation, Data Curation, Writing – Review and Editing. **Thayne Keegan Dye:** Software, Investigation, Data Curation, Writing – Review and Editing, Visualization.

**Samuel A. Mayfield:** Software, Investigation, Data Curation, Writing – Review and Editing, Visualization.

**Daniel C. Moyer:** Methodology, Investigation, Writing – Review and Editing. **Chu A. (Alex) Yu:** Methodology, Formal Analysis, Investigation, Writing – Review and Editing, Visualization.

**Corresponding author biographical sketch:** Dr. Richard Carson is a Distinguished Professor of Economics at UC San Diego. His research interests focus on assessing the role of climate change and other environmental variables on a range of impacts including health outcomes.

## Appendix

### Data Preparation

The data set used in this paper starts with the COVID-19 statistics reported by individual states as aggregated daily by The COVID Tracking Project (covidtracking.com). In preliminary work, we (and other modelling groups) found there was no plausible epidemiological model that would produce the wild up-and-down jumps in the state-level COVID-19 statistics reported daily. As a result, there is a major fork in the path of any modeling effort. Do you build a predictive model for the reported COVID-19 daily death count that policymakers see, where success is judged by minimizing the forecast error around those observable quantities? Or do you attempt to model the underlying process with an eye toward understanding key aspects of it? When the reported dependent variable systematically diverges from that being generated by the underlying process, these two approaches also fundamentally diverge. For the first path, success, to some degree, comes in uncovering the administrative procedures influencing the divergence. The second path led us to undertake a major effort to rectify and repair reported COVID-19 statistics, with a focus on recovering the temporal structure needed to provide TRP estimates.

The most important correction was replacing originally reported death counts with death counts by date of death as reflected on death certificates. Figures 1 and A1 illustrate the nature of the differences for between the original CTPDailyDead_{it} and DailyDead_{it} for four important states: Arizona, Florida, Georgia and Texas. The originally reported data results in substantially larger confidence intervals in predictive models and, because it lags the revised death counts, substantially reduces one’s ability to detect and respond to changes in COVID-19 activity. We were able to make this correction for 26 states where deaths by death certificate information could be located. Our cutoff date of July 15 allowed for two and half months for states to make these death counts available, which should subsume most of the COVID-19 deaths in these states over our sample period. The states where we have been unable to obtain deaths by death certificate dates tend to be smaller and to have had relatively fewer COVID deaths. Typically, they did not face large peaks, which appear to be associated with increased testing delays. However, we have been unable to obtain this data from three large states – California, Illinois, and New York – and of the 20 observations (out of 4,567) with residuals from Eq. 1 whose absolute value is 50 or more, over half are from these three states.

The methods for collecting data backed by death certificates varied based on how the individual states chose to present them. Some states publish their counts based on death certificates in a downloadable format on their official COVID-19 website. Others publish them inside of longer reports in such a way that they can be hard to find, or present tables or graphs that need to be extracted by hand into spreadsheets. Because there are no official rules for how to publish this information, even the charts themselves vary in structure and presentation. Commonality across states is largely dependent on the specific software vendor they are using for their public-facing COVID-19 dashboards. Data collection procedures sometimes required hunting through source code to find full datasets within, or by hovering cursors over each bar in a bar chart individually over the collection window to make them display the precise value represented. In each case we were able to verify via labeling or official statements that the data was compiled using verified death certificate dates. It’s worth noting here that, while the CDC collects deaths by death certificate date (https://www.cdc.gov/nchs/nvss/vsrr/COVID19/index.htm),they only do so at the weekly level making it unusable for the sort of modeling effort we have undertaken and, relative to the data we have obtained from individual states, suffers from even longer reporting lags.

For states where deaths by death certificate date were not available, we first sought to determine if an individual state, either on their COVID-19 reporting “dashboard” or in a downloadable file, had deaths by reported date. These datasets often contain substantial corrections to what a state originally reported on a specific day. These include reports that missed covidtracking.com’s daily reporting deadline, more accurate end-of-day tallies (e.g., late reporting counties/hospitals), the correction of testing dumps that resulted in the appearance of spikes on certain days, the removal of duplicate death certificates, and the resolution of probable cases. When these differed from the COVID-19 death counts originally reported (COVIDTracking.com has a set of “snapshots” of the originally reported information) we used the updated state data. In some smaller states we believe, but have been unable to fully verify, that these corrected datasets are “close” to deaths by death certificate date.

Two of these types of corrections turn out to be particularly important. First, when a state misses a reporting deadline, this is typically recorded as there having been no COVID-19 events (new deaths positives, or tests) on that date. This causes the next day to contain the events that happened during the previous day. Second, “probable” cases have often been treated differently across states and time. State-level resolution of this issue removes an extraneous source of variation. As with the data on death certificate dates, these corrections by states of their originally reported COVID-19 statistics were obtained through a variety of sources, ranging from reading counts off interactive bar charts to downloadable CSV files.

Next, we corrected two obvious problems with the data. First, occasional, exceptionally large spikes accompanied by auxiliary information (e.g., a news/twitter release by the state’s department of health) noting that this spike was due to an accumulation of deaths over an extended time period, typically from one or more congregate living facilities, e.g., nursing homes and prisons. For these, the initial correction involved proportionately increasing death counts over the relevant period, with days where this would result in negative death counts not included in the reallocation. Second, we corrected reports of zero deaths on days surrounded by death counts that were sufficiently large that a zero-death count was highly unlikely. Such days are generally also characterized by a failure to report some other COVID-19 statistics (like new positive cases) and by an abnormally high death count on the following day (or two if it is a weekend). Our approach for these was to assume no reporting followed by backlogged reporting so that the indicated correction is to average counts across the two and, in some instances, three days.

We also corrected data in situations where a state had “corrected” an earlier report, but where an entire rectified data series from the state was not available. A typical example is a state that initially reported all deaths except for those in the state’s largest county, with the corrected version containing the death count for the whole state contained in a press/twitter release. A similar situation was when the data failed a logical consistency check by having the difference between the reported cumulative death counts on two consecutive days generate a negative daily death count. This typically occurs in states with small populations and few COVID-19 deaths, who, without comment, reduce their cumulative death count by one or two. Here we “rollback” that correction to the closest date that no longer produces a negative daily death count.

Similar corrections have been made to daily positive case counts and new daily test counts. The major difference is that very few states have made available positive case counts by day of test administration (rather than the day the test result is reported). This means that few states publish datasets that can readily subsume the positive case count data from The COVID Tracking project.

This is likely due to these types of information having different reporting standards. Specifically, a state eventually knows the recorded date of COVID-related death, but this information is subject to reporting delays due to waiting for confirming test results or autopsies. Because the date of death is on the death certificate, obtaining official death counts for all states is eventually feasible. However, while the lab doing the test knows the date of test administration, this information is often not shared with the state. States could require more complete reporting, but it is unclear whether the past is capturable. Total test counts are even messier due to the common practice – particularly in the earlier part of our sample period – of reporting all newly returned positive test results daily but reporting negative results inconsistently or in batches. Some negative test results (often by the state’s lab) were reported along with the positives, while other negatives (often those by private labs) were reported once a week. We performed rollbacks of antibody tests that, for a while, were mixed with diagnostic tests, but these are messy because information on the period over which antibody tests were included has rarely been disclosed. We have endeavored to average and prorate testing data where the nature of this practice could be reasonably inferred.

The influence of our data correction efforts can quickly be gleaned from Table A5, which describes four simple autoregressive models with a constant term and 7^{th} death count lag. The first uses the original “Reported” dataset (covidtracking.com) as both the source of the dependent variable, CTPDailyDead_{it} and the lagged death count. The second uses our corrected version as the “Revised” for the dependent variable and uses the originally Reported for the lagged regressor. The third uses CTPDailyDead_{it} as the dependent variable and the Revised for the lagged regressor. The fourth uses Revised for both. The parameter estimates for lagged deaths are similar in versions using the same lagged variable and substantively larger in the two versions using lagged revised counts. The R^{2} starts at 0.69 for the Reported/Reported model, stays roughly the same (0.68) if Revised are predicted from CTPDailyDead_{it-7}, and increases somewhat further to 0.74 for the Reported/Revised combination. The Revised/Revised combination, though, has an R^{2} of 0.91, which clearly illustrates that the large gain in explanatory power in the base Eq. (1) model in Table A1 comes mainly from our data repair and rectification effort. Note that the models in Table A5 do set the massive NJ (June 25) reported death count outlier of 1877 (Revised death count is 16) to missing, since it is so large many modeling groups have either dropped it or prorate it over earlier time periods. The R^{2} of this Reported/Reported model using these observations falls to 0.42.

Temporal misalignment of NewPositives_{it} has received more attention than DailyDead_{it} because of the often-large gap in time between when a diagnostic test is administered and when the result is returned *(24)*. Indeed, this gap, coupled with different state testing regimes, has led most modeling groups to concentrate on predicting DailyDead_{it}. We have made substantial repairs to NewPositives_{it} and NewTests_{it} data by reference to corrected state reports, and prorated rollbacks of initially reported antibody tests. We have been much less successful in locating information on NewPositives_{it} by date of test administration. However, temporal misalignment of the NewPositives_{it-k} is somewhat less important than for deaths than it might first appear because there is a shorter window for positive-to-positive transmission than the positive to death transition and because test results for hospital patients, the persistent high positivity pool, are typically returned quickly. Further, even very noisy testing information can be helpful in serving as controls for variation in state-level testing behavior over time.

#### Construction of Temperature, Humidity and Ultraviolet Radiation Data

Weather data for our main analysis are drawn from the National Centers for Environmental Information (NCEI) Integrated Surface Database (ISD), which report hourly temperature and humidity data for most airports in the world. For each state, weather variables are taken from the airport with the highest volume of commercial traffic, where the volume information is found in the Federal Aviation Administration’s 2018 Commercial Service Enplanements report. Our key variable of interest is daily maximum temperature (MaxTemp). We also look at measures of humidity and ultraviolet radiation. Hourly relative humidity is calculated as a function of hourly observed temperature and dewpoint temperature. Under the assumption of ideal gas behavior, we calculate hourly absolute humidity as well (details can be found at www.hatchability.com/Vaisala.pdf). We then pick the highest readings within each 24-hour period as daily MaxTemp, MaxRelativeHumidity and MaxAbsoluteHumidity. Minimum daily temperature is obtained by picking the lowest reading and the mean by averaging the hourly readings. Our measure of ultraviolet radiation is UV index, which provides a forecast of the expected risk of overexposure to UV radiation from the sun. UV index data at our representative airports was obtained from OpenWeather Ltd., which publishes daily UV index forecast calculated by the National Weather Service.

To calculate the expected date by U.S. county for entering the 5-10°C range (Figure 6), we use reanalysis data provided by PRISM Climate group (https://prism.oregonstate.edu/) which provide reliable weather data at a high spatial resolution of 4km by 4km for the contiguous U.S. We extract daily maximum temperature from 1990 to 2019 and count the average number of days it takes since Oct 1 for the daily maximum temperature to fall below 10°C.

#### Modelling Approach

There are two competing modelling approaches to predicting future COVID-19 deaths and positive cases. The first builds on a standard SEIR model; the other, a production function approach. The first has a strong epidemiological conceptualization and is clearly better in the early phase of a pandemic when data is scarce. The second is largely agnostic as to the underlying structure of transmission but requires dramatically more data to offset this flexibility. We follow the second, with (Eq. 1) using a simple production function approach. Conceptually, there is an infection pool with the seeds planted and various inputs (ranging from a state’s health care system to temperature) that influence the output: the number of deaths observed today. We primarily use past death counts as the infection pool proxy, take into account state-level fixed effects (amalgamating effects due to constant factors such as demographic characteristics, fixed resources like health care and transportation networks, and average differences in factors including climate variables and mobility), and a simple quadratic time trend. We exponentiate the right-hand-side variables to effectively impose the restriction that under the conditions we observe, expected death counts must be non-negative.

The use of a multiplicative scaling function incorporates the logic that temperature by itself cannot generate changes in DailyDead_{it}. Because those dying at *t* became infected not on one day but rather over an extended period, some means of representing temperature in this setting is required. The two options, due to the strong correlation between closely adjacent MaxTemp_{it-k}, are a distributed lag structure that imposes structure on a set of MaxTemp_{it-k}, or parameterizing the scaling function as the product of individual scaling functions, each with a different MaxTemp_{it-k}. We find two lags – the 7^{th} and 14^{th} – are sufficient and reasonably consistent with what is known about the biology of the virus and its methods of attack.

Model results are reasonably robust to small shifts in temperature lag positions, with the following two caveats. First, the 7^{th} lag is a natural one to use; many people’s lives follow a typical weekly pattern of contacts and activities, and administrative reporting procedures often have a day-of-the week pattern. Second, lags too far back are insignificant. Notably we use the 18th lag of MaxTemp_{it} in the weekly model rather than its naturally shifted counterpart LogMaxTemp_{it-21}.

The most commonly used scaling function for our purposes is the logistic function 1/(1 + EXP(X)^{ϒ}), where X is the variable of interest, and ϒ is the single estimated parameter. This function converges to a constant as ϒ goes to zero. Use of MaxTemp_{it} rather than LogMaxTemp_{it-k} as the stimulus variable provides a function with a different curvature. Statistically, it provides a similar fit, largely because the corresponding shifts in the estimate of ϒ makes the two scaling functions reasonably similar after normalization.

We also consider another commonly used scaling function, X/(X + Ψ), where an estimate of Ψ > 0 results in smaller values of X being scaled up more than large values of X and an estimate of Ψ < 0 results in larger values of X being scaled up more than smaller values of X, and an estimate of Ψ = 0 results in a function exhibiting no temperature responsiveness. This function can be used with either LogMaxTemp_{it-k} or MaxTemp_{it-k}, and both will approximate a reasonable range of weakly monotonic scaling functions. It is well behaved as long as the estimate of -Ψ is bounded away from MIN(X), which appears not to be an issue in the situations we examine. Table A4 provides estimates for a set of competing specifications and their TRPs are displayed in Figure A8.

Our first major empirical decision was to use the three-month period of April 16-July 15. This time window allows for the seeding of the virus (to different degrees) across the i=1, …, 51 U.S. states (including DC) over a three-month period. We effectively start tracking the observable outcomes, deaths, positive cases, and tests on April 9^{th}, because we generally use a one-week lag of the COVID-19 statistics of interest. Our time variable, t=1, …, 137 denoted in days starts with 1 on March 1^{st}, the approximate date individual states first started reporting COVID-19 statistics.

Our second major decision involved how to implement the core epidemiological concept of a pool of infected individuals who can potentially infect other individuals. This pool is dynamic; existing positives transition to being no longer infectious, while newly infected individuals enter the pool. The totality of the currently infected individuals is unobservable without universal administration at each point in time of a 100% accurate diagnostic test. The question, thus, is whether to use a lagged variant of the test diagnosed positives, NewPositives_{it}, which logically are part of the infection pool but not necessarily representative of it due to differential testing, or a lagged variant of DailyDead_{it}.

The lagged DailyDead_{it} measure is one step removed but was previously assumed to always be observed *(25,26)*. This assumption is demonstrably false. In the early run of the pandemic many deaths failed to be classified as COVID-19-related, in part because some instances were thought to be influenza-related, while COVID-19’s role in inducing cardiac and kidney failure was not widely recognized. We avoid many of these problems by only using DailyDead_{it} data generated after the pandemic was well established.

Other problems with using lagged DailyDead_{it} (or NewPositives_{it}) as the infection pool indicator (such as a state having a proportionately larger elderly or disadvantaged population) are readily addressed using the standard statistical approach of including time-invariant state-level fixed effects.

One way the use of a lagged version DailyDead_{it} vs. NewPositives_{it} will vary is with respect to timing. Current positives are temporally closer to future positives or deaths than current deaths and can potentially pick up the virus spreading among healthy young adults. The shorter link between a lagged NewPositives_{it} is enhanced by the short period over which a current positive can influence future COVID outcomes, which makes TRP estimation less sensitive to the choice of temperature lags.

Subject to the same measurement error, using NewPositives_{it-7} should be preferred to DailyDead_{it-7} as the infection pool indicator. These results are provided in Table A6 (Model 16). The main lesson is that the model using LogNewPositives_{it-7} is a good predictor of DailyDead_{it} (R^{2} = 0.94), but not as good as LogDailyDead_{it-7}. This suggests that either the positives have more measurement error than our corrected version of death counts or that the DailyDead_{it-7} is a better reflection of the current risk-adjusted infection pool than NewPositives_{it-7}. The other result worth noting is that with the infection pool indicator temporally advanced, the coefficient on LogMaxTemp_{it-14}, as expected, is no longer significant.

The parameter estimates for the base model (Table A1) can be used to provide an estimate of the minimum infection pool each state faces by setting DailyDead_{it-7} = 0 (from a statistical perspective), obtaining expected DailyDead_{it}, and dividing it by the CDC’s point estimate of the infection fatality rate (0.0065) *(27)*. Setting the value of Days_{t} equal to the end of the sample period, across states, the sum of these minimum infection pools is 9853 cases. Figure A3(A) provides a visual display of this information for the continental U.S. states.

#### Interpretation of State-level Fixed Effects and Time-Related Effects

We calculate a variant of the state-level fixed effects from our main regression model (Table A1) by setting the time trend equal to the last day of our sample period, the lagged number of deaths equal to zero, and calculating the expected number of deaths for each state. Small, isolated states like Hawaii generate estimates close to zero while the most populous states tend to generate estimates above 2. These estimates suggest the minimum infection pool in each state varies by a factor of approximately 20 (Figure A3(A)). A regression (Table A3) of the minimum expected death count on a small set of state-level demographics taken from the U.S. Census Bureau (LogPopulation, %Black, %Hispanic, and %Age80+) explains 79% of the variation in these counts.

We do not want our TRP estimates to be confounded by other factors changing over time (ranging from state and locally mandated shelter-in-place orders, reopening plans, endogenous social distancing, propensity to wear face masks and changing fatality rates). Since we are agnostic as to the mechanism, the straightforward specification is a polynomial time trend, where we find a quadratic term justified. Higher order terms add little insight or predictive power. The quadratic trend for predicting DailyDead_{it} falls sharply from mid-April to the end of May, after which it very slowly starts to turn up.

A model which adds (a) the number of days since a state issued a mandatory shelter-in-place order and (b) the number of days since a state started to formally reopen its economy can be found in Table A6 (Model 8). The coefficients on these two variables are small and insignificant. The two LogMaxTemp_{it-k} parameters fell on average by less than 1%.

This might seem puzzling until it is recognized that the two sets of time-related variables are highly correlated. Dropping the quadratic trend (Model 9 in Table A6) provides a different picture because, while still statistically significant at conventional levels, it was substantially diminished. The earlier a state shelter in-place order was issued, the lower the predicted death count (p = 0.028), and the earlier a state started to reopen the higher the predicted death count (p < 0.001). We are reluctant to provide any substantive interpretation to this result. Papers that have tried to determine the role of state actions and the behavior they are intended to influence show the need to (a) extensively model both state and local orders, and (b) incorporate spatially disaggregated mobility data in order to identify the effects of these government mandates from endogenous social distancing by the public that often occurs before these actions *(28,29,30)*. The two temperature coefficients in this model fall on average by less than 20% and remain highly significant.

#### Alternative Specifications for Base Death Count Model

Table A4 compares our base model to alternative specifications. These alternative specifications were chosen to look at the sensitivity of the implied TRP because they all have reasonably similar fits relative to our base model. This allows us to observe how robust the TRP is to a substantial shift in various modelling decisions that we made. The first specification replaces the LogMaxTemp_{it-k} with their linear counterparts. On the surface, this seems like a decision about which of two different scales fits better, but because the models have estimated parameters in the scale function, it may be possible for the two different specifications to provide reasonably similar TRPs. The second uses a popular ratio scaling function LogMaxTemp_{it-k}/(LogMaxTemp_{it-k} + α), where α is an estimated parameter. The third replaces LogMaxTemp_{it-k} with MaxTemp_{it-k} in this ratio scaling function. A potential issue with Eq. 1, is the possibility that LogMaxTemp_{it-14} directly influences our main infection pool indicator LogDailyDead_{it-7}. Our next specification replaces our infection pool indicator with an alternative, LogDailyDead_{it-14}, so both temperature variables are now clearly exogenous from the temporal perspective of the infection pool indicator.

The last is a weekly variant (Eq. 2) of the base model, where the dependent variable is the sum of the daily death count of the next seven days with corresponding pushbacks of the lagged variables. This model averages out much of the daily variation and many types of administrative reporting practices. In forecasting the pandemic’s progression, it has become common to use data aggregated to a weekly level in an effort to average out many of the administratively-induced reporting issues that our extensive reconstruction and repair of the daily death count data sought to alleviate. It is possible to estimate a variant of Eq. (1) that uses death count data aggregated into seven-day periods, WeeklyDead_{it} = Σ_{t} DailyDead_{it}, where the summation is over t=1 to t=7. This makes WeeklyDead_{it-7} the sum of the 7th through 13th lags of DailyDead_{it}. Importantly, this aggregation does not reduce the number of observations because on each day, the weekly aggregation at t=1 adds a new observation DailyDead_{i1} and drops DailyDead_{i8}. Lags of the WeeklyDead_{it} variable can then be used in the standard way. One implication of this specification is that the temperature variables also need to shift backwards. For conceptual consistency, we use LogMaxTemp_{it-14} in place of LogMaxTemp_{it-7}. Empirically, the model fits best with the second temperature lag being LogMaxTemp_{t-18}. The model fit is reasonably similar using the 13th lag and 19th lags, but beyond the 19th lag, the temperature variable becomes insignificant, suggesting temperature information farther back in time than this is not useful. The model we report is thus:
The overall impression from Table A4 is the general stability of most of the common parameters for time and the infection pool. Some of the specifications offer insights into the latter variable. It is little influenced by whether the logistic or ratio scaling function was used, nor whether LogMaxTemp_{it} or MaxTemp_{it} was the stimulus variable. Using LogDailyDead_{it-14} in place of LogDailyDead_{it-7} results in an estimated TRP being similar to the base model (Figure A6), although it does rise more sharply between 10°C and 5°C suggesting, while there may be a endogeneity effect, it is not large and that our base TRP is conservative. In this model (Table A6, Model 7), the coefficient on the infection pool indicator shifts from the .86 of the base model to .80; in the weekly version model it jumps to .92, The R^{2} for the weekly model increases to 0.99 (from the base model’s 0.97) and falls to 0.94 in the model using LogDailyDead_{it-14}, which is less informative than LogDailyDead_{it-7} in term of the infection pool influencing current death counts .

#### Constructing Temperature Response Profiles (TRPs)

To compare the TRP’s implied by the models in Table A4, we plot the functions using two independent random uniform variables RTemp and RTemp2, defined over the range 5°C and 40°C. The logistic scaling function with two temperature variables, LogMaxTemp_{it-7} and LogMaxTemp_{it-14}, and corresponding Eq. 1 estimated parameters ϒ1 and ϒ2 (Table A4), results in the following scaling function in the base model:
There is a fundamental indeterminacy in such a scaling function, in that multiplying the production function part of Eq. 1 by a constant will result in an offsetting change in the scaling function which maintains the same expected value for the dependent variable. Note that each of the two multiplicative components of Eq. 3 converges to .5 irrespective of temperature values as ϒ1 and ϒ2 become increasingly negative. Often logistic functions are normalized to lie between 0 and 1 by changing the “1” in the numerator to “2”, but this is not needed with our normalization to 31°C, which solves the indeterminacy from the perspective of comparing curves. This is done by calculating the value of the estimated scale function at 31°C:
Dividing Eq. 3 by Eq. 4 produces a function which equals 1 at 31°C. Multiplying this quantity by 100 produces a function which has a natural percentage interpretation and equals 100 at 31°C. Note that for forecasting purposes, the original scaling function parameters need to be used.

The ratio scaling function with estimated parameters α_{1} and α_{2} using RTemp and RTemp2 is:
As α_{1} and α_{2} converge to zero, both multiplicative components of SI.1c converge to 1 irrespective of temperature values. The value of this function at 31°C can be calculated in a manner similar to that described for the logistic. Dividing Eq. 5 with the value of that function at 31°C and multiplying by 100 produces the desired TRP.

Figure A8 graphs the daily death TRPs from the model specifications in Table A4. The alternative specifications produce TRPs remarkably similar to that of our base model and tend to bracket it. The one systematic difference is that TRPs based on the ratio scaling functions tend to be flatter over 10°C to 20°C but rise more sharply in the 5°C to 10°C range.

#### Further DailyDead_{it} Specifications Involving Weather, Positives and Cumulative Dead

The other weather variables which have received considerable attention are absolute humidity, relative humidity, and ultraviolet (UV) radiation. Details of construction are provided below in Data Preparation Section. The limitation with all these variables is that they are correlated with MaxTemp_{it} (i.e., absolute humidity: 0.58, relative humidity: -0.21, UV: 0.79). Maximum absolute humidity has a reasonable size effect (p < 0.001) in the model where it replaces the corresponding MaxTemp_{it} variables. However, the LogAbsoluteHumidity_{it-k} parameter estimates are close to zero and are no longer significant when the two parallel logistic functions comprising the temperature scaling function are added (Model 10 in Table A6). Relative humidity has a marginally significant relationship with DailyDead_{it} when MaxTemp_{it} is not in the model and its effect is close to zero in a model with MaxTemp_{it} (Model 11 in Table A6).

The situation with UV_{it} for which (15) found support is more complex (Table A4 &Table A6, Models 12 and 13). The MaxTemp_{it} lags are marginally better predictors in a head-to-head comparison. These two variables are strongly correlated. Adding UV_{it} marginally decreases prediction errors, but the signs of high multicollinearity are obvious; the first MaxTemp_{it} lag is still highly significant while the second is insignificant and one of the UV_{it} lags is only marginally significant (Table A4). It is effectively impossible to disentangle the influence of LogMaxTemp_{it} and UV_{it}. Their relationship is displayed in Figure A5, which plots LogMaxTemp_{it} and LogUV_{it} for two states over our sample period: Georgia (Atlanta) and New York (New York City). We cast our results in terms of MaxTemp_{it} rather than UV_{it} because it is more widely reported and understood, without making any claim our work supports a joint versus singular causal mechanism.

The specification in Eq. (1) is cast in terms of DailyDead_{it}. If MaxTemp_{it-k} influences COVID transmission via the link between positive cases, then we should be able to replace DailyDead_{it-7} with NewPositives_{it-7}. Estimates for this model (Table A4) show LogNewPositives_{it-7} being significant at p < 0.001 and the R^{2} measure falling a bit. The coefficient on MaxTemp_{it-14} is insignificant, which would be expected if NewPositives_{it-7} incorporates that information, while the coefficient on LogMaxTemp_{it-7} is larger than in the base specification.

A different aspect of the COVID-19 death statistics that has not been incorporated into the model is lagged cumulative death count, TotalDead_{it-k.} If DailyDead_{it-k} can be seen as a proxy for the infection pool influencing DailyDead_{it}, then TotalDead_{it-k} (normalized by population) is proxy for the fraction of the population that no longer at risk in the sense of being either removed by death or recovered. Adding LogPCTotalDead_{it-7} effectively makes the StateBase_{it} dynamic in a potentially different way than the quadratic time trend by letting each state evolve according to its own pattern of deaths. Table A6 (Models 14 and 15) displays the results of this model with LogPCTotalDead_{it-7} enter as (a) a second order polynomial and (b) a fourth order polynomial.

Three results are worth noting. First, the increase in explanatory power is small with the most noticeable changes being as expected in the StateIndicator_{i} and a substantial reduction in importance of the overall quadratic time trend. Second, the LogMaxTemp_{it-k} parameter estimates are similar to that of Eq. (1) suggesting that our TRP is robust to a substantial dynamic reparameterization of the model. Third, in the quadratic specification, DailyDead_{it} is declining with as LogPCTotalDead_{it-7} at a declining rate. In the fourth order specification, all the LogPCTotalDead_{it-7} terms are insignificant (although jointly significant). Figure A6 displays, starting at .05, the two response functions for LogPCTotalDead_{it-7}. Like the quadratic time trend, they suggest a sharp drop in how DailyDead_{it} is influence by DailyDead_{it-7} as LogPCTotalDead_{it-7} increases from low levels with the fourth order polynomial being flatter at high levels of LogPCTotalDead_{it-7} than the quadratic. The influence of this factor is reasonably small by the time a state hits 10 deaths per 100,000, a condition that characterizes 80% of the states at the end of our sample period. Earlier, we noted one interpretation of the quadratic time trend was that medical care (and hence death rates) had improved sharply at first and then at a declining rate. This specification has a similar interpretation but suggests that some of that learning is state specific and related to its prior COVID-19 caseload. There is no indication that this rate of decline is accelerating even in the hard-hit Northeastern states where deaths per 100,000 can be as high as 175 (NJ), looking at the fourth order polynomial which should allow this feature to emerge if the data supports it. This suggests the magnitude of the fraction of the population previously infection is still too small to be a substantial factor in slowing transmission of the virus.

#### Univariate DailyDead_{it} and MaxTemp_{it-7} Relationship

If we have succeeded in isolating the TRP through use of the set of StateIndicator_{i} and a quadratic time trend, the simple regression of DailyDead_{it} on lagged MaxTemp_{it-7} without the state fixed effects and quadratic time trend should reveal a substantially different curve than our estimated TRP displayed in Figure 4. Figure A7 displays this relationship using a robust LOWESS smoother (bandwidth 0.2) on MaxTemp_{it-7}. This curve is dramatically more sensitive to temperature between 10°C and 30°C. Below 10°C it drops, which was expected since states with temperatures near 5°C tend to be more isolated and smaller population-wise. The curve bends up near 35°C and beyond where most of the observations come from a few states with (relatively) high June and July death counts.

Pinning down the TRP further will require: (a) obtaining more data over a longer time horizon with more temperature variation and, in particular, the important -5°C to 5°C range, where there has been little U.S. experience since the virus became widely dispersed, (b) obtaining death certificate data information from the few remaining large states (California, Illinois and New York) where it is not yet available, since states with death certificate date reporting have prediction errors that are substantially smaller (p < 0.001) than those that don’t (Table A6, Model 2), or (c) having high quality temporally aligned COVID-19 statistics at the county level, which would provide a better temperature match and dramatically increase sample size.

#### Death Count TRPs based on DailyDead vs. COVID Tracking Project Originally Reported Counts

What does a TRP based on The COVID Tracking Project’s (CTP) originally reported death counts look like compared to our base (Eq. 1) Model 1 which uses DailyDead_{it}? To examine this issue, we estimate Model 26 (Table A6), a direct analogue of Model 1, that substitutes CTPDailyDead_{it} and CTPDailyDead_{it-7} for their DailyDead_{it} counterparts. Consistent with Model 18 (Table A6) and other models employing CTPDailyDead_{it}, we exclude the two observations that contain the large New Jersey outlier as either the dependent variable or regressor and further excluded twenty observations where LogCTPDailyDead_{it-7} is undefined because CTPDailyDead_{it-7} is negative.

There are some clear differences between Model 1 and Model 26. First, in Model 26 there is a large drop in the R^{2} from 0.97 to 0.81 and the RMSE measure more than doubles. Second, the elasticity parameter on lagged LogCTPDailyDead_{it} is just over 60% as large as it is in the DailyDead_{it} version. This is the expected result from introducing substantial measurement error, but one that also has large implications for drawing inferences about the size of the infection pool or in undertaking any dynamic forecasting exercise. Third, the quadratic time trend, while still sizeable, is substantially diminished in both magnitude and statistical significance. Fourth, there are differences between the TRPs implied by their temperature response parameters in Eq. (1).

The TRPs for Model 1 and Model 26 are plotted in Figure A9. The TRPs for the CTPDailyDead_{it} variant effectively lies above its DailyDead_{it} counterpart. At 5°C, it predicts almost 80% more deaths than our base Model 1. This result may seem counterintuitive to the usual belief that measurement error induces the parameter estimate to be attenuated toward zero. That, intuition, generally correct, helps explain the fall in coefficient when CTPDailyDead_{it-7} rather than DailyDead_{it-7} is used as the infection pool indicator. However, it does not say anything about the implication of introducing measurement error into another covariate in the model. It is easy to show that as the scale of the measurement error in the lagged death count increases, the magnitude of the temperature responsiveness parameters adjusts to incorporate covariance with DailyDead_{it-k}. All else held constant, this would increase the magnitude of the estimated temperature effect under classical measurement error, since the parameters on infection pool indicator and temperature variables should have the same sign.

The situation here is more complicated because measurement error from using CTPDailyDead_{it} also substantially reduces the ability to pin down the quadratic time trend. Thus, neither the sign nor magnitude of the difference between the temperature response parameters across comparable specifications is therefore known a priori. Our parameter estimates (Table A6) suggest an upward bias is likely in temperature effect estimates obtained from models similar to ours that (a) use reported death counts and (b) have specifications where the infection pool and temperature response variables are expected to share the same sign. Note though that in a simple OLS regression model, the infection pool and temperature variables should have opposite signs, resulting in an estimate of the temperature parameter likely biased in the direction of finding no effect.

#### Structure of Table A6

The parameter estimates and standard summary statistics for models discussed is this paper are provided in Table A6. It contains three columns for each model. The first column contains the variables included in the model, the second the parameter estimates, and the third the standard errors. After the parameter estimates, the model’s R^{2}, root mean square error (RMSE), and the number of observations on which the model was fit are provided.

The order in which the models appear are:

The base model represented by Eq. 1 using LogDays

_{t}, LogDays_{t}^{2}, and LogDailyDead_{it-7}as the predictors along with a set of state-level indicator variables and using LogMaxTemp_{it-7}and LogMaxTemp_{it-14}in the temperature scaling function.A model which regresses the squared residuals from (1) on the log of state population and an indicator variable, GOOD_DATE for DailyDead

_{it}representing death counts by death certificate date.Base model using linear versions of the two temperature variables.

Base model using alternative ratio scaling function Temp/(Temp + α), where Temp is temperature variable and α is the estimated parameter.

Base model using the alternative ratio scaling function and linear versions of the MaxTemp

_{it-k}.Base model with LogDailyDead

_{it-14}substituted for LogDailyDead_{it-7}.A weekly version of the base model (see Eq. 2) substituting LogDailyDead

_{it-14}for LogDailyDead_{it-7}and LogDailyDead_{it-18}for LogDailyDead_{it-14}.A version of the base model that adds two state-level government policy variables, the log of the number of days since a shelter-in-place order was first issued (LogDaysShelterInPlace

_{it}) and the log of the number of days since a state began to formally reopen its economy (LogDaysReopen_{it}).A version of (8) that drops the quadratic time trend of the base model.

The base model adding parallel scaling functions using the LogMaxAbsoluteHumidity

_{it-7}and LogMaxAbsoluteHumidity_{it-14}.The base model adding parallel scaling functions LogMaxRelativeHumidity

_{it-7}and LogMaxRelativeHumidity_{it-7}.Base model adding additional parallel scaling functions for LogUV

_{it-7}and LogUV_{it-14}.Base model substituting LogUV

_{it-k}for LogMaxTemp_{it-k}.Base modeling adding a quadratic in terms of LogTotalDead

_{it-7}.Base modeling adding a 4th order polynomial LogTotalDead

_{it-7}.Base model substituting LogNewPositives

_{it-7}and testing variables for LogDailyDead_{it-7}.The model used in the paper to predict NewPositives

_{it}.The AR(7) regression of the originally reported death counts from The COVID Tracking Project (CPTDailyDead

_{it}) on the 7^{th}lag of itself (CPTDailyDead_{it-7}) for Table A5.The same as (18) but with DailyDead

_{it}regressed on the 7^{th}lag of the CTPDailyDead_{it}.The same as (18) but CTPDailyDead

_{it}regressed on the 7^{th}lag of DailyDead_{it}.The same as (18) but DailyDead

_{it}regressed on its 7^{th}lag.The StateBase

_{it}calculation for each state.The model reported in Table A3 which predicts the state-level fixed-effect estimates obtained in (1) as a function of a small set of demographic variables.

State-level estimates of the minimum infection pool.

State-level change in expected DailyDead

_{it}when DailyDead_{it-7}shifts from 0 to 1 on July 15, assuming 31°C.Base model (1) using the uncorrected death count data.

### Data and Code Availability

The data used in this study are archived at https://github.com/xxx in the form of a Stata “.dta” file. An Excel version of this file was created using StatTransfer. The Stata “do” file creating the data set contains a line-by-line set of the changes made to the original CovidTracking.com data set, and the providence of those changes. Three additional Stata “do” files are available in this archive. The first contains the code (Stata 16.1) for the regression models reported in this paper. The second provides an example of how to estimate the static and dynamic temperature response profiles for an individual state using Georgia as an example. The third contains Stata code for creating the basic versions (fine labeling was done using Stata’s graph editor) of the figures in this paper. We also provide a further Stata dataset and corresponding Excel files at the state level (using our representative airport) with maximum daily temperature readings from the last thirty years.