Yet another lockdown? A large-scale study on people's unwillingness to be confined during the first 5 months of the COVID-19 pandemic in Spain

Population confinements have been one of the most widely adopted non-pharmaceutical interventions (NPIs) implemented by governments across the globe to help contain the spread of the SARS-CoV-2 virus. While confinement measures have been proven to be effective to reduce the number of infections, they entail significant economic and social costs. Thus, different policy makers and social groups have exhibited varying levels of acceptance of this type of measures. In this context, understanding the factors that determine the willingness of individuals to be confined during a pandemic is of paramount importance, particularly, to policy and decision-makers. In this paper, we study the factors that influence the unwillingness to be confined during the COVID-19 pandemic by means of a large-scale, online population survey deployed in Spain. We apply both quantitative (logistic regression) and qualitative (automatic pattern discovery) methods and consider socio-demographic, economic and psychological factors, together with the 14-day cumulative incidence per 100,000 inhabitants. Our analysis of 109,515 answers to the survey covers data spanning over a 5-month time period to shed light on the impact of the passage of time. We find evidence of pandemic fatigue as the percentage of those who report an unwillingness to be in confinement increases over time; we identify significant gender differences, with women being generally less likely than men to be able to sustain long-term confinement of at least 6 months; we uncover that the psychological impact was the most important factor to determine the willingness to be in confinement at the beginning of the pandemic, to be replaced by the economic impact as the most important variable towards the end of our period of study. Our results highlight the need to design gender and age specific public policies, to implement psychological and economic support programs and to address the evident pandemic fatigue as the success of potential future confinements will depend on the population's willingness to comply with them.

Information. Given that the data was collected by means of a non-probabilistic sampling method, we weigh the answers such 114 that the distribution of answers per age, gender and geographic province matches the officially reported distributions in the 115 Spanish census data. A similar methodology was implemented in 33 . Table 2 in the Supplementary Information shows the 116 distributions of the age and gender variables before and after weighting the survey answers. The Table also includes the official 117 census data for comparison.
118 Table 1 depicts the dependent and independent variables used in our analysis. We are particularly interested in understanding 119 the differences in demographic factors, impact and perceptions between those who report an opposition to be confined vs 120 those who report a willingness to be confined for at least 6 months. Thus, we model our dependent variable (willingness to be 121 confined) as a binary variable where 0 represents an unwillingness to be confined and 1 represents maximum acceptance towards 122 confinement (willing to be confined for 6 months or more). When we select the answers of those reporting an unwillingness to 123 be confined and those reporting an acceptance to be confined for at least 6 months, we obtain a sample with 20,054 responses 124 corresponding to the time period between April, 3rd and September, 11th 2020. 125 Table 2 shows the resulting proportions of the variables analysed in the study. As seen on the Table, the data set is 126 unbalanced regarding the target variable: there are 7,887 (39.3%) answers of those reporting an unwillingness to be confined vs 127 12,167 (60.7%) answers of those who report high willingness towards confinement (≥ 6 months). In terms of gender, there 128 are 8,237 (41%) and 11,817 (59%) answers by women and men, respectively. We group the age of our participants into four  To consider the impact of time, we include a discrete variable called Phase that refers to the different phases of confinement 131 that were applied in Spain in the spring and summer of 2020, as shown in Figure 5 and described in Table 1. From Table 2, 132 we observe a decrease over time in those reporting a willingness to be confined for a long time period: at the beginning . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 10, 2021. ; https://doi.org/10.1101/2021.05.08.21256792 doi: medRxiv preprint variables of interest between those unwilling to be confined and those with high levels of acceptance towards confinement. We 139 found statistically significant differences in all the variables (p-value < 0.01) except for the variables marked with * on Table 2, 140 which correspond to Walks allowed, Phase I and Phase III and for the age groups 30-49 and 50-59. Nonetheless, note that 141 all the variables have at least two values with significant differences. Thus, we conclude that all variables are relevant to be 142 included in our models.

143
The main research questions that we would like to answer through the analysis of this data set are:

144
(1) RQ1: Are there differentiating attributes between those who report high vs no willingness to be confined?;

145
(2) RQ2: Is it possible to accurately predict an unwillingness to be confined from the independent variables?; and

146
(3) RQ3: How did the willingness to be confined evolve during the different phases of the confinement? 147 Logistic Regression Model 148 We obtain a logistic regression model with all the variables described in Table 1 and the interactions shown in Table 3. We refer 149 the reader to the Materials and Methods section for a description of our approach. We compute the McFadden's pseudo-R 150 squared (0.131), the Cox-Snell residual (0.161) and Nagelkerke's R squared (0.218) to asses the the goodness of fit of the 151 selected model.

152
As reflected in Table 3, the variables with the most interactions are Gender and PsyI. The existence of the variable Phase in 153 our model, interacting with PsyI, Gender and Household, empirically corroborates a temporal dependency of the target variable.

154
Interestingly, the 14-day Cumulative Incidence of COVID-19 cases (CumInc) has no significant interaction with any of the 155 variables. 156 We performed an Odds Ratio (OR) analysis to shed light on the role of each of the variables in the logistic regression model.

157
A variable with an OR > 1 is typically interpreted as a risk factor, i.e., it is a variable that significantly increases the probability 158 of the target variable to be 1. Conversely, a variable with an OR < 1 decreases the probability for the target variable to be 1. We 159 reverse the coding of the target variable so it represents a risk: we code as 1 the answers corresponding to an unwillingness to 160 be confined and as 0 the answers corresponding to an acceptance towards a confinement of at least 6 months.

161
In our case, the basal categories for the OR analysis are: Phase: Workplace closure; Gender: Men; Age: 18-29; Home: 162 Apartment; Household: Young; EcoI: None and PsyI: None, which correspond to the first row of each category in Table 2. 163 Note that to obtain the OR value in a logistic regression model with second order interactions among some of the variables 164 (Phase, Gender, Age, Home, Household, EcoI and PsyI as per Table 3), the OR of the main variable needs to be multiplied by 165 the OR of the interaction variables where the main variable is present.  Figure 1a) shows the OR for the EcoI variable taking as basal category None. Note how the OR is larger than one for all values 170 of the EcoI variable, meaning that this variable is a risk factor. Moreover, the OR for EcoI = Severe (3.53) is larger than for 171 EcoI = Mild (1.99) (in black color): those who report having had severe economic impact are more likely to report that they are 172 not willing to be confined when compared to those with mild economic impact.

173
In addition, the OR of the interaction Age:EcoI (cyan lines for Severe and green for Mild) is larger than 1 for those aged 174 30-59 years old. Thus, people in that age group are more likely to report that they are not willing to be confined than those 175 aged 18-29 years old (basal category). In the case of respondents aged 60+ years old, the OR is greater than one only when the 176 economic impact is mild.

177
In terms of the Gender:EcoI interaction, the OR are less than 1 for both values of EcoI (red line). Hence, women with any 178 kind of economic impact are less likely to report not willing to be in confinement any longer when compared to men (basal 179 category). or Mild) psychological impact have higher risk to report an unwillingness to be confined than those without any psychological 183 impact. Moreover, the larger the psychological impact, the higher the risk (black lines). In terms of the interaction Age:PsyI, all 184 the age groups show a similar level of risk (cyan lines), except for those aged 60+ years old with mild psychological impact 185 who seem to be at significantly higher risk of reporting that they would not be able to be in confinement any longer (OR = 186 2.23). We observe a difference in the risk depending on the age of the members of the Household: households composed of 187 elderly (60+) are significantly more at risk to report an unwillingness to be in confinement when compared to the rest (red and 188 cyan lines) Concerning the interaction between Phase:PsyI, it seems that time has an attenuating effect on the risk as there is a In Figure 1c) the OR for the Gender W variable are shown, taking Men as the basal category. As depicted in the Figure, women 194 tend to be at significantly higher risk (OR = 4.60) than men to report that they are not able/willing to be in confinement any 195 longer (black line). The value of this OR is significantly larger than the OR found for the EcoI and PsyI variables. This risk is 196 increased for women who live in households composed of elders Household Elder (yellow lines). In terms of the interaction 197 Phase:Gender W , all the OR are similar and lower than 1 (blue lines), meaning that as time goes by, the risk for women to report 198 that they are not able to be in confinement is lower than during the reference phase (Workplace closure).  Regarding the type of Household, multi-generational households are at lower risk of reporting that they are not willing to be 205 confined any longer than households with composed of elders (grey lines). However, in both cases the OR < 1, so they are 206 attenuating factors when compared to the basal category (Household = Young). 207 We do not observe a significant impact of Age as a main factor on the risk to report being unwilling to be confined any 208 longer. Nonetheless, as previously reported, age can be a risk factor in its interaction with other factors, such as Economic and 209 Psychological Impact.

210
Combining all the OR reported above, we conclude that men living in multi-generational homes, with neither psychological 211 nor economic impact due to the pandemic, are the least likely group to report an unwillingness to be confined. Conversely, 212 women aged 30+ years old with severe economic and psychological impact and living in households with young children are at 213 the highest risk to report an unwillingness to be confined.

5/21
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 10, 2021. ; https://doi.org/10.1101/2021.05.08.21256792 doi: medRxiv preprint Given that we have a quantitative model of the unwillingness to be confined, we test the model as a binary classifier to 216 automatically infer such unwillingness from our independent variables. We randomly split the data in a training (80% of the 217 data) and testing (20% of the data) sets. Note that we group the data on a weekly basis to have enough samples of survey 218 answers across the entire time period of study. In addition to the logistic regression modeling, we have developed an automatic pattern extraction algorithm described in the 227 Materials and Methods Section. Using this model, we identify the most influential variables and discover the most recurrent 228 patterns among those who report an unwillingness to be confined.  Note that for each Phase we build a different logistic model with the main variables but without interactions given that we do 247 not have enough data per Phase to identify meaningful interactions. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 10, 2021. ; https://doi.org/10.1101/2021.05.08.21256792 doi: medRxiv preprint Figure 3 a) shows the Odds Ratio of the 14-day cumulative incidence over time. The OR increases over time, which might 249 be indicative of the so called pandemic fatigue. In the early phases of the pandemic, the OR is less than one, meaning that the 250 larger the 14-day cumulative incidence, the lower the probability for respondents to report an unwillingness to be in confinement. 251 Conversely, from Phase II onward, respondents' seem to be indifferent (OR close to 1) to changes in the cumulative incidence. in increasing the probability of reporting not being able to be confined any longer decreases as the months go by. This finding is 255 consistent with the previously reported variable importance analysis (see Figure 2). 256 In the case of the Economic Impact (EcoI) variable (Figure 3 c)), the OR are also larger than 1 in all the phases. However, 257 the evolution over time is different than that of the PsyI variable. We observe a significant increase in the OR after the Phase 258 III, particularly for those reporting severe economic impact. This result is also consistent with the variable importance analysis 259 previously described. Moreover, it makes intuitive sense: as the economic situation of people worsens due to the pandemic, 260 their probability of reporting that they are not willing to be in confinement any more increases.

261
The changes over time in the OR for the Age variable are depicted in Figure 3 d). Those aged 60+ are at significantly larger 262 risk in the first phases of the pandemic to say that they "can't anymore" when compared to the rest of age groups. Once the New 263 normal phase started at the end of June of 2020, the risk of those aged 50-59 and those aged 60+ becomes similar. Interestingly, 264 the 30-49 age group is the only age group with an OR < 1 during the entire period.

265
Those living in single family homes are at lower risk to report that they cannot be in confinement any longer when compared 266 to those living in an apartment (Figure 3 e)). This finding makes intuitive sense as single family homes are typically more 267 spacious than apartments in Spain.

268
Finally, women are at larger risk than men to report that they cannot endure the confinement any longer throughout the   The intermediate phases of confinement (from Walks allowed to Phase II) can be mainly modeled using the Psychological 280 Impact variable. In Phase III (the last stage prior to the New normal), the unwillingness to be confined is affected by the two 281 socio-demographic variables (gender and age), by psychological impact and especially by economic impact, which is triggered 282 in this phase. In the New normal phase, the unwillingness to be confined is mainly dependent on economic impact. 283 Next, we summarize the most relevant patterns identified in each of the phases. In this phase, those over 50 years of age, regardless of their gender and economic impact, as long as they report no psychological 286 impact (26.4% of the sample) always opt (100% of the time) for a willingness to be confined for 6 months or longer.

287
While we did not obtain any significant interaction between gender and psychological impact in our logistic regression 288 modeling, we did identify significant gender differences in the 30 -49 years old age range: among those who do not report any 289 economic impact but do report severe psychological impact (close to 10% of the sample): 82.6% of men vs 64.7% of women 290 would be willing to be in confinement for 6 months or longer.

291
This finding illustrates the complementary nature of our modeling approaches. The pattern discovery method identifies 292 patterns that involve three or more variables, which would be very difficult to achieve via our logistic regression methodology.

293
In this initial stage of confinement, when the psychological and/or economic impact might not yet evident, the willingness 294 to be confined is at its highest levels. However, 2.6% of the sample corresponds to women between 50 and 59 years old who 295 report severe psychological impact. Of these, 60.0% report an unwillingness to be in confinement, even in this very early stage 296 of the pandemic.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 10, 2021. ; https://doi.org/10.1101/2021.05.08.21256792 doi: medRxiv preprint economic or psychological impacts: 94.1% on average of them report a willingness to be confined for 6 or more months, 304 irrespective of their gender. This group represents 11.6% of the sample.

305
At this stage, those who report high levels of acceptance towards confinement still clearly outweigh those unwilling to be 306 confined. However, once again, we identify clear gender differences in the patterns, with women being more likely to report 307 that they cannot be in confinement any longer when compared to men. In particular, we identify two distinct patterns: first, 308 43.6% of women between 50 and 59 years old who report severe psychological impact and without economic impact (3.7% 309 of the sample) report minimal acceptance towards confinement; second, 56.5% of women between 30 and 49 years old, who 310 in addition to reporting severe psychological impact also claim to have medium economic impact (2.7% of the sample) also 311 respond that they would not be able to continue in confinement. During this phase, the participants' opinions regarding their willingness to be confined starts to balance: the percentage of those 314 willing to be confined for 6 months or longer decreases whereas those who are not willing to be confined increase.

315
The first identified pattern corresponds to people between 30 and 49 years old who report having no economic impact 316 but severe psychological impact (13.0% of the sample). Among these (men and women alike), 54.8% on average report a 317 willingness to be confined for 6 months or longer.

318
At this stage, there are several groups (both men and women) that begin to opt mainly for an unwillingness to be confined.

319
The pattern with the largest support reveals that 75.0% of women between 30 and 49 years old with medium economic impact 320 and a severe psychological impact (3.7% of the sample) report an unwillingness to be confined. In the first phase of the re-opening in Spain, more people report being able to continue in confinement for 6 months or longer 323 than those reporting that they cannot stand it any longer, with the exception of two groups: women between 30 and 49 with 324 severe psychological impact and mild economic impact (3.49% of the sample) and women over 60, with severe psychological 325 impact but no economic impact (6.0% of sample). In these two groups, only 46.9% and 45.2% respectively report a willingness 326 to be confined for 6 months or longer.

327
While the willingness to be in confinement for 6 months or longer also decreases among men, still remains the most popular 328 option for them. This finding is aligned with our OR analysis shown in Figure 1 c).

329
Half of those aged 60+ (both men and women) who do not declare to have any economic impact but report severe 330 psychological impact (9.0% of the sample) report not being able to remain in confinement any longer. The burden of the 331 pandemic starts to become evident.

333
Both phases show similar patterns where the unwillingness to be confined is prevalent between individuals who report severe 334 psychological impact with some economic impact. These levels of unwillingness are higher when, in addition to the severe 335 psychological impact, participants also report some type of economic impact.

336
In Phase I, the opinion of some groups is polarized clearly towards an unwillingness to be confined: 67.2% of women 337 between 30 and 59 years old who report medium economic impact and severe psychological impact (over 7.4% of the sample) 338 and 86.9% of women aged 60+ who, having no economic impact declare severe psychological impact (5.0% of the sample) 339 respond that they cannot stand the confinement any longer. Again, the most significant patterns related to those who are 340 unwilling to continue in confinement are found amongst women. In the last phase of the re-opening and before the imminent start of the New normal phase, we observe an increase among 343 those who report being able to remain in confinement for 6 months or longer. As seen in our feature importance analysis, the 344 psychological impact becomes less relevant than economic impact and demographic factors. Hence, psychological impact does 345 now appear in most of the patterns.

346
In general terms, the majority opinion is once again towards a willingness to be confined for 6 months or longer. The group 347 that reports the highest levels of unwillingness to be in confinement (47.3% of the time) are people (men and women) between 348 30 and 49 years old who do not report psychological impact but report economic impact (7.6% of the sample). The willingness to be confined shows a balanced distribution between those who affirm that they are not able to stand it any 351 longer and those who would be willing to be confined for 6 months or longer: 65.2% of people between 30 and 50 years old 352 (without distinction of gender) who report neither economic nor psychological impact (28.8º% of the sample) would accept 353 long-term confinement, compared to 34.8% who would not stand confinement any longer.  Table 1 summarizes the independent and dependent variables used in our study. Our target or dependent variable is the 467 willingness to be confined, captured by Q14. Our independent variables are the socio-demographic, economic and psychological 468 impact measures, captured by questions Q1-Q6, Q15 and Q25, the confinement Phase as shown in Figure 5, and the 14-day 469 Cumulative Incidence of SARS-CoV-2 per 100.000 inhabitants averaged over the 17 Autonomous Regions in Spain.

470
To minimize biases in the data, we weigh the raw answers by gender, age and province, such that the resulting distribution 471 of answers matches the distributions reported by the official Spanish census. We apply the same methodology as described in 33 . Table 2 in the Supplementary Information depicts the age and gender distributions of the raw and weighted data, together with 473 the officially reported census distributions.

474
In addition, we simplify several of the answers as follows: 475 1. Q4. We only consider two types of homes: apartments and single family homes. These two answers represent 94% of the 476 raw answers. 477 variable (i.e., the willingness to be confined). Note that not all the variables in our study might be relevant when modeling 513 people's willingness to be confined. In this variable selection step, we used the Entropy-based Filters algorithm from the 514 FSelectorRcpp library in R, which is based on the information gain, gain ratio and symmetrical uncertainty metrics. 58 515 2.
Step 2: Automatic pattern generation, via a modified version of the RBS algorithm 59 , a classifier based on the ID3 516 family 60 . The RBS algorithm is an iterative method that, without building the full structure of a tree, automatically 517 identifies a set of rules (patterns) from a data set of discrete variables. The subsequent ordering and filtering of the 518 patterns depends on the significance of the rule, which is a metric based on the classic concepts of support and confidence 519 of the classification rules, but defined by intervals. We adapted the RBS algorithm to be used with re-weighted input 520 samples rather than absolute supports, as it is usually done.

521
As a result of this two-step process, we obtained the set of rules or patterns that best model our target binary variable 522 (willingness or acceptance to be confined). Each of the automatically identified patterns or classification rules has the structure 523 shown in Figure 6. The Results Section describes both the most influential variables and the patterns identified by our qualitative 524 analysis.

525
Data availability 526 The survey answers analysed in this paper will be made available together with the manuscript.

528
The code used to analyse the data will be placed in a publicly available github server.

14/21
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 10, 2021.

16/21
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 10, 2021. ; https://doi.org/10.1101/2021.05.08.21256792 doi: medRxiv preprint Table 2. Descriptive statistics of the subset of answers to the survey analyzed in this paper. All the variables pass a proportions test, with p-values < 0.001, except for the variables marked with *

17/21
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 10, 2021.  Table 3. Significant interactions identified by the logistic regression model.

18/21
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

19/21
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

20/21
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 10, 2021. ; https://doi.org/10.1101/2021.05.08.21256792 doi: medRxiv preprint Figure 6. Example of two of the patterns identified by our pattern extraction method.

21/21
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 10, 2021. ; https://doi.org/10.1101/2021.05.08.21256792 doi: medRxiv preprint