Air cleaners and respiratory infections in schools: A modeling study using epidemiological, environmental, and molecular data

Background: Using a multiple-measurement approach, we examined the real-world effectiveness of portable HEPA-air filtration devices (air cleaners) in a school setting. Methods: We collected environmental (CO2, particle concentrations), epidemiological (absences related to respiratory infections), audio (coughing), and molecular data (bioaerosol and saliva samples) over seven weeks during winter 2022/2023 in two Swiss secondary school classes. Using a cross-over study design, we compared particle concentrations, coughing, and the risk of infection with vs without air cleaners. Results: All 38 students (age 13–15 years) participated. With air cleaners, mean particle concentration decreased by 77% (95% credible interval 63%–86%). There were no differences in CO2 levels. Absences related to respiratory infections were 22 without vs 13 with air cleaners. Bayesian modeling suggested a reduced risk of infection, with a posterior probability of 91% and a relative risk of 0.73 (95% credible interval 0.44–1.18). Coughing also tended to be less frequent (posterior probability 93%). Molecular analysis detected mainly non-SARS-CoV-2 viruses in saliva (50/448 positive), but not in bioaerosols (2/105 positive) or HEPA-filters (4/160). The detection rate was similar with vs without air cleaners. Spatiotemporal analysis of positive saliva samples identified several likely transmissions. Conclusions: Air cleaners improved air quality, showed a potential benefit in reducing respiratory infections, and were associated with less coughing. Airborne detection of non-SARS-CoV-2 viruses was rare, suggesting that these viruses may be more difficult to detect in the air. Future studies should examine the importance of close contact and long-range transmission, and the cost-effectiveness of using air cleaners.


List of Figures
The clean air delivery rate (CADR) is typically used to quantify the cleaning efficiency of air cleaners. 1It is expressed in m 3 /s and describes the volumetric flow rate of particle-free, clean air that an air cleaner delivers into the room.The CADR is equal to the product of the filtration efficiency and the volumetric flow rate passing through the device.
We followed the approach by Küpper et al. to determine the CADR. 2 The approach is based on the release of fine aerosols (particle diameter D<1µm) in a closed, unventilated room by measuring the decay rate k of the particle concentration N (t) caused by the air cleaner, i. e.
Since the surfaces in the room (e. g. walls and furniture) are also sinks for aerosol particles, measurements were carried out with (k cleaner ) and without air cleaners(k background ).Based on that, we can compute the CADR as where V = 233m 3 is the volume of the classroom.
We ensured that the particle concentrations in the experiments were not too high, thereby preventing bias due to coagulation (i.e. the formation of a few large particles from many small particles).At the beginning of each experiment, fine aerosol particles were released for a few minutes by ultrasonic nebulization of an aqueous NaCl solution of 5 g/l.The generated dehydrated salt particles have a broad size distribution in the range of 0.02−1.0µm.Decay rates were measured using 11 fine dust sensors (SPS30, Sensirion AG), which were placed at head height of the students.
These sensors measured the number concentration of individual particles by counting the scattered light pulses they produce when passing through a laser beam.Because of their scattering properties, these sensors measure the number of particles larger than ∼0.3 µm.
Figure S1 shows the particle number concentration with and without without the air cleaner.
The exponential decrease is clearly visible as a linear curve in the logarithmic plot.Based on these experiments, we were able to determine the net CADR values of the air cleaner as 420 m 3 /h 4/38 with an uncertainty of ±10% due to systematic differences in the setup and measurement errors of the instruments.Our estimated CADR is lower than the CADR reported by the manufacturer of 600m 3 /h, most likely because the reported CADR was determined with comparably large particles (pollen in the super micrometer size range).Thus, we explain the lower filtration efficiency determined here with the fact that CADR is strongly dependent on the particle size, i. e. large particles are easier to filter than fine aerosols.

C Epidemiological line list data
Reports about absences were entered electronically into a REDcap database. 4,5 able S2 shows the line list data for respiratory cases based on epidemiological data.For each class and case, it shows the date when the student was absent, s/he reported symptoms, and returned to school.We also collected data on laboratory tests but none of the students reported a test result during the study.
A case of respiratory infection was defined as an absence where the student reported a sickness with at least one of the following symptoms: fever, coughing, tiredness, loss of test or smell, sore throat, headache, aches and pains, diarrhea, difficulty breathing or shortness of breath, stomach.
Students were asked to report the first date they experienced symptoms, so that a student absent on Monday could report Saturday or Sunday as the day when symptoms began to be felt.We will refer respiratory cases by their dates of symptom onset, which usually corresponded to the absence date, unless the student attended the school while experiencing symptoms.Informed consent for one student could not be obtained and the single absence of this student was considered unrelated to a respiratory infection.

D Molecular line list data
Table S3 shows the line list data for the laboratory test results from human saliva samples.For each class and test, it shows the date when the test was taken, the test result and (if positive) the virus that was detected.Recall that all students in class were sampled twice per week every Tuesday and Thursday.

E Modeling changes in particle concentrations
We only analyze particle concentration data of the time periods during which students were in the classroom.Particle concentrations Y on day t in class j are summarized daily with the mean across these time periods.The change in aerosol number and particle matter mass concentrations is then estimated using Bayesian log-linear regression models The effect of air cleaners is adjusted for class-and weekday-specific effects, the number of students in school, the air change rate, and the cumulative number of cases related to respiratory infections.
A log transform is applied to all continuous input variables.

F Modeling relative risk of infection F.1 Overall approach
The overall aim is to estimate the effects of air cleaners on the daily number of respiratory infections.
The latter is unobserved/latent and inferred from the daily number of respiratory cases (absences related to respiratory infections by date of symptom onset), considering the delay from infection to symptom onset (i.e. the incubation period).The effect of air cleaners is estimated with a Bayesian approach, which requires the specification of prior distributions for all model parameters.

F.3 Relating the number of respiratory cases to the number of infections
The number of respiratory cases C in class j at time t is modeled with a Negative Binomial distribution where µ jt is the expected number of new cases and ϕ is the parameter modeling over-dispersion.
The expected number of new cases is the weighted sum of the number of new infections I jt in the previous days where p IN (t − s) denotes the probability distribution of the incubation period.

F.4 Relating the number of new infections to the presence of interventions
The number of new infections is related to the presence of air cleaners using a log-link log where

F.5 Specifying the distribution of the incubation period
The virus of each respiratory infection could not be identified from the epidemiological data because the students never obtained a laboratory test result.As a consequence, different incubation periods need to be considered in p IN , reflecting a combination of the virus-specific incubation periods.The combination is determined based on the weekly proportion of positive saliva samples for each virus found in the molecular analysis.Formally, let p v IN be the distribution for the incubation period of respiratory virus v and let pp vw be the proportion of positive saliva samples in study week w, then each week the combined incubation period is computed as the weighted sum of the virus-specific incubation periods The prior distributions for the virus-specific incubation periods are based on estimates published in the literature. 6,7 he distributions are shown in Figure S2.Since we could not obtain prior estimates for the incubation period of metapneumovirus (MPV) from the literature, we instead formed a distribution by using the equally weighted average of the parameters from the other distributions.
We estimate the virus-specific distributions from our data as part of fitting the overall model.

F.6 Adjusting for under-reporting of cases on weekends
Despite recording cases by date of symptom onset, we recorded a higher proportion of cases on Mondays than on weekends, suggesting recall bias and under-reporting of cases on weekends.To consider weekday effects in the reporting of cases, we re-weight the expected number of cases each week as follows.Let k ∈(1: Saturday, 2: Sunday, 3: Monday, . . ., 7: Friday) denote the weekday with the week starting on Saturday.The re-weighted expected number of cases μ (class and day indexes omitted) are computed as where ν k is the weight for weekday k.These weights are modeled with a Dirichlet prior where c k is the total number of cases reported for weekday k and I is a binary indicator.

F.7 Modeling school-free days
Infections may have occurred during the week of vacation that falls into the study period.The expected number of infections and cases are computed during vacation, but vacation days are not modeled (i.e. not incorporated into the model likelihood).In addition, we assume lower transmission of respiratory infections on days without school (weekends and vacations).We incorporate our prior belief into the model intercept β 0 where α is the rate of new infections on school days and α + ω is the rate on days without school.
We model ω with an informative prior for a 10% decrease in new infections on school-free days ω ∼ Normal(log 1.1, 0.05) .

F.8 Seeding infections before study start
Cases in the first week of the study could indicate infections before the study commenced.We will therefore seed our model 2 • m days before the study start, where m is the average incubation period of the virus with the largest incubation period (i.e. adenovirus).The number of infections before the study start will be modeled with an exponential prior Note that we deviated from the statistical analysis plan by changing the parameter of the

F.9 Priors for modeling parameters
][10][11][12] The continuous adjustment variables are standardized to have zero mean and a standard deviation of 0.5.

27/38 G Modeling changes in cough frequency
We analyze the daily number of coughs Y with a Negative Binomial regression model where T is the daily duration that students were in the classroom and ϕ is the parameter modeling over-dispersion in count data.
In addition, we analyze the association between the daily number of coughs Y and the number of positive saliva test resutls for respiratory viruses with a Negative Binomial hierarchical regression model where θ [v] is the partially pooled estimate for respiratory virus v ∈ (IFB, HRV, AdV, CoV, MPV, PIV), τ estimates variation between viruses, and s v is the average standard deviation across the count variables of the respiratory viruses.

28/38 H Modeling positivity rate of human saliva samples
Molecular analysis determined which saliva samples were positive for a respiratory virus.Let p = 0, 1, . . ., P denote the virus, where 0 refers to negative samples and P is the number of different viruses detected over the study.The number of positive bioaerosol and saliva samples y p in samples of sizes n is analyzed with a Multinomial logistic regression model where softmax(µ) = exp(µ)/ exp(µ) and s x is the empirical standard deviation of each input variable.The negative test is set as the reference category.Overall variation in positive samples by virus will be modeled with β 0p and the effect of air cleaners will be modeled with β 1 .The effect of air cleaners is adjusted for class-specific effects and the decreasing number of susceptibles over time (for each virus computed as the number of students minus the total number of positive samples).
Note that we deviated from the statistical analysis plan by estimating β 0p and β 1 without a hierarchical prior.First, there was considerable variation in the overall positivity rate of each virus, so that we decided to use unpooled intercepts β 0p instead of partially pooled intercepts β 0[p] for each virus.Second, there was insufficient variation in the data to inform partially pooled estimates for the effects of air cleaners.Therefore, we decided to only estimate the average effect across viruses, i. e. the completely pooled estimate β 1 .

29/38 I Detailed results for changes in particle concentrations
There was a strong difference in particle concentrations between study conditions (Figure S3a).
When adjusting for air change rates and multiple other factors, the aerosol number concentration decreased by 76% (95%-CrI 63% to 86%) with air cleaners (Figure S3b and Table S4).The decrease in the concentration of larger particles (PM 10 ) was greater than the decrease in the concentration of smaller particles (PM 1 to PM 4 ).
CN (1 cm 3 ) PM 1 (µgm −3 ) PM In the main paper, we compared particle concentrations between study conditions.Figure S4 30/38  13 There are no considerable differences between in any of the variables between study conditions.Furthermore, numerical estimation results for the reduction in particle concentrations with air cleaners are shown in Table S4.The estimate for β 2 is positive, suggesting a higher risk of infection in class B, which is in line with the higher number of cases in this class.The estimate for β 3 is positive, suggesting a higher risk of infection as more students were in class.The estimate for β 4 is negative, suggesting a lower risk of infection with higher air change rates, in line with the intuition that the risk of infection is lower when the classroom is better ventilated.The estimate for β 5 and β 6 are both positive although only the former is distinguishable from zero, suggesting that higher transmission in the community was also associated with a higher risk of infection in the school.
The overdispersion parameter (ϕ) cannot be precisely estimated, but the credible interval suggests rather small overdispersion (ϕ → ∞).The school-free effect ω, the weekday weights ϑ, and the

Figure S1 .
Figure S1.Measured concentration curves with and without air cleaner.The red dot shows the average concentration measured by the 11 distributed sensors of particles with 0.3µm < D < 1µm.The standard deviation is shown in gray as an error bar.The increase in concentration at the beginning of these experiments is due to the release of the NaCl particles during approximately 15-20 min.The inhomogeneity of the concentration during the nebulization period is seen in larger error bars.Afterwards, the aerosol source is turned off, the concentration homogenizes quickly, and the concentration drop follows an exponential decay, which is used for the CADR calculation.
js is the number of infections in the previous seven days (first model offset; a proxy for the number of contagious students), N jt = s<t I js is the cumulative number of infections (second model offset; the inverse is a proxy for the number of susceptible students), β 0 is the rate of new infections without air cleaners (model intercept), and β 1 is the effect of air cleaners.The effect of air cleaners is adjusted for class-specific effects, the number of students in school, the air change rate, the proportion of positive tests for SARS-CoV-2 in the community, and the number of consultations for influenza-like illnesses in the community.Note that we deviated from the statistical analysis plan by using as model offset log F jt − log N jt instead of just log N jt .This change improved the fit of our model.The reason is that the ratio log(F jt /N jt ) incorporates both ways by which new infections can naturally decrease over time, i. e. a decrease in contagious students (F jt ) or a decrease in susceptible students (1/N jt ).

Exponential distribution from λ = 2 •
m to λ = 1.After inspecting the model fit, we realized that the model could not adequately fit the number of new cases at the start of study, because the expected number of new infections in the seeding period µ = 1/λ = 1/(2 • m) were too low.After increasing the expected number of new infections in the seeding period to µ = 1 the model could adequately capture new respiratory cases observed at the study start.

bFigure S3 .
Figure S3.Analysis of particle concentrations and comparison between study conditions.(a)Boxplot of the daily average values for aerosol number concentration (CN in 1/cm 3 ) and particle mass concentration (PM for particles of sizes <1 to <10 µm, respectively in µgm −3 ).(b) Estimated reduction in aerosol number and particle mass concentrations with air cleaners (posterior mean as dot and 50%-, 80%-and 95%-CrI as lines, respectively).

Figure S4 .
Figure S4.Boxplot for the daily average values of each environmental variable by study condition.
Figure S5 compares the model-estimated expected number of cases with the observed number cases only on a weekly basis and across classes.Overall the estimates are in relatively good agreement.The 95%-CrIs of the model-estimates always include the observed cases.

Figure S5 . 38 KFigure S6 .
Figure S5.Model-estimated number of new respiratory cases (posterior mean as red line and 50%-, 80%, and 95%-CrIs as shaded areas) and the observed number of new respiratory cases (black line) across classes by study week.
Table S1 in SI Appendixprovides an overview of the types of data that were collected in each classroom.

Table S1 .
Overview of collected data.Type of data collected, method/device, and frequency in the rooms of classes A and B.

Table S2 .
Line list of respiratory cases over the study period.

Table S4 .
Estimated reduction in aerosol number (CN) and particle mass (PM) concentrations with interventions (posterior mean and upper and lower estimate from the 95%-CrI).
Table S5 presents the posterior mean, credible intervals and model diagnostics for all model parameters.

Table S5 .
16timation results from infection risk model for the number of new respiratory cases.ESS is the effective sample size, i. e. the number of independent MCMC samples with estimation power equivalent to the total number of autocorrelated samples,14and R is the Gelman-Rubin convergence diagnostic.15LowESS or R or R > 1.10 indicate bad convergence of the model.16