Abstract
For the long term control of an infectious disease such as COVID-19, it is crucial to identify the most likely individuals to become infected and the role that differences in demographic characteristics play in the observed patterns of infection. As high-volume surveillance winds down, testing data from earlier periods are invaluable for studying risk factors for infection in detail. Observed changes in time during these periods may then inform how stable the pattern will be in the long term.
To this end we analyse the distribution of cases of COVID-19 across Scotland in 2021, where the location (census areas of order 500–1,000 residents) and reporting date of cases are known. We consider over 450,000 individually recorded cases, in two infection waves triggered by different lineages: B.1.1.529 (“Omicron”) and B.1.617.2 (“Delta”). We use random forests, informed by measures of geography, demography, testing and vaccination. We show that the distributions are only adequately explained when considering multiple explanatory variables, implying that case heterogeneity arose from a combination of individual behaviour, immunity, and testing frequency.
Despite differences in virus lineage, time of year, and interventions in place, we find the risk factors remained broadly consistent between the two waves. Many of the observed smaller differences could be reasonably explained by changes in control measures.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work has been funded by the ESRC grant "Real-time monitoring and predictive modelling of the impact of human behaviour and vaccine characteristics on COVID-19 vaccination in Scotland" (ES/W001489/1). This work has also been funded the BBSRC Institute Strategic Programme grant to the Roslin Institute (BB/J004235/1).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
- Significant revision to the Introduction section to present more clearly the context of COVID-19 in Scotland in the period our work focuses on, our aims with the work and how it builds on the existing literature. - Addition of several more references to the literature on risk factors for COVID-19 cases and severe outcomes. - Significant revision to the Results section to remove text more suitable for the Discussion or Materials and methods sections. - Removal of data relating to hospitalisation. In hindsight it was decided that this was confusing the main narrative of the manuscript (understanding the distribution of cases) without substantially adding to the analysis or the existing literature. - Significant revision to the Discussion section to clarify the main findings of the paper, particularly in the context of accumulated local effects from the model, as well as model weaknesses. - Additional work with respect to lateral flow testing frequency across demographics, and implications on the overall distribution of cases. - Addition of R. Wightman as a co-author, who performed the additional analysis on lateral flow testing frequency. - Significant expansion and restructuring of the Materials and Methods section to describe how the data were prepared, and methods used in different sections of the paper. New section describing the Moran's I statistic. - Revision of data availability statement to detail our access to the data, and how other researchers may gain access to it.