Abstract
Despite much research on the topic, little work has been done comparing the use of methods to control for confounding in the estimation of COVID-19 vaccine effectiveness in routinely collected medical record data. We conducted a trial emulation study to replicate the ChAdOx1 (Oxford/AstraZeneca) and BNT162b2 (BioNTech/Pfizer) COVID-19 phase 3 efficacy studies. We conducted a cohort study including individuals aged 75+ from UK CPRD AURUM (N = 916,128) in early 2021. Three different methods were assessed: Overlap weighting, inverse probability treatment weighting, and propensity score matching. All three methods successfully replicated the findings from both phase 3 trials, and overlap weighting performed best in terms of confounding, systematic error, and precision. Despite lack of trial data beyond 3 weeks, we found that even 1 dose of BNT162b2 was effective against SARS-CoV-2 infection for up to 12 weeks before a second dose was administered. These results support the UK Joint Committee on Vaccination and Immunisation modelling and related UK vaccination strategies implemented in early 2021.
Key messages
Real world evidence generated using weighting (overlapping weights and inverse probability of treatment weights) and propensity score matching: all methods successfully replicate the findings of Phase 3 trials for COVID-19 vaccine effectiveness.
Overlap weighting provides the least biased estimates in our study and should be considered amongst the most suitable methods for future COVID-19 vaccine effectiveness research.
Despite a lack of trial data, our findings suggest that first-dose BNT162b2 provides effective protection against SARS-COV-2 infection for up to 12 weeks, in line with UK’s Joint Committee on Vaccination and Immunisation modelling and subsequent vaccination strategies.
Introduction
Following the start of the COVID-19 vaccination programs, routinely collected data are being widely used to evaluate the effectiveness and safety of COVID-19 vaccines(1-4) and boosters(5, 6). However, careful consideration of how-to best account for confounding is required when comparing vaccinated and unvaccinated people: Aside from people’s individual characteristics, particularly age and co-morbidities associated with higher risk of severe COVID-19, population-level confounder such as the location, i.e. level of community transmission, and vaccination period are among the most important confounders. However, the latter were not always adequately considered in many of the previously conducted vaccine effectiveness studies. Differences in methods for adjustment for confounders as well as choice of study design, inclusion criteria and calendar time can have substantial impact on the findings and their interpretation as highlighted in recent observational studies(7, 8) assessing the comparative effectiveness of ChAdOx1 (Oxford/AstraZeneca) and BNT162b2 (BioNTech/Pfizer) using routinely-collected data from UK: a 28% (95% CI, 12-42) decreased risk for infection after first dose vaccination with BNT162b2(8) vs. no differences in infection incidence between the two COVID-19 vaccines (7). Another challenge when studying vaccine effectiveness is the handling of the immediate time after the first vaccine dose. Randomised trials (9, 10) showed no protective vaccine effect in the first 2 weeks, with the vaccine-induced immunity still building up. However, while some observational studies could replicate these findings (1), others already observed protective effects in the early days (11, 12), indicating the presence of residual confounding. However, some studies and even trials omitted this vulnerable time from their main analyses (10) while others did not (9), which makes it difficult to compare results across studies.
While several methods were used in previous studies, a rigorous assessment of their ability to resolve confounding has not yet been completed(13). Our study provides an empirical evaluation of the comparative performance of different weighting and matching methods to minimise confounding in the study of COVID-19 vaccine effectiveness: overlap weighting(14), inverse probability of treatment weights(15), and propensity score with exact geographic and index date matching. To do this, we measured confounding based on imbalances between vaccinated vs unvaccinated subjects in terms of all recorded variables in the patient’s records. Additionally, we used negative control outcomes to identify systematic error due to unobserved confounders. Lastly, we conducted target trial emulations to compare our findings to those from the BNT162b2(9) and ChAdOx1(10) phase 3 randomised controlled trials. As no trials assessed the effect of BNT162b2 between 3 and 12 weeks following the first dose, we estimated vaccine effectiveness based on primary care data and compared our results to the models the Joint Committee on Vaccination and Immunisation (JCVI) used to inform the vaccination campaign in the UK in early 2021(16).
Methods
Study type, setting, and data source
A cohort study was conducted using UK NHS primary care data from the Clinical Practice Research Datalink (CPRD) AURUM, mapped to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)(17).
Study population
All people aged ≥ 75 years, who were not previously infected with or vaccinated against SARS-CoV-2 and were registered in CPRD AURUM for ≥180 days before study start (04/01/2021) were eligible for inclusion. Subsequently, individuals were assigned to the vaccinated (VC) or unvaccinated (UV) cohort, based on whether they were vaccinated against COVID-19 between 04/01/2021 and 28/01/2021 (both dates included). Two different vaccinations were recorded at that time in England(18): BNT162b2 (BioNTech/Pfizer) and ChAdOx1 (Oxford/AstraZeneca). We therefore constructed three different vaccinated cohorts: any type of vaccination cohort (VC), ChAdOx1 vaccinated cohort (AZ), and BNT162b2 vaccinated cohort (PF). Index date for individuals in the vaccinated cohort was their vaccination date. All individuals who had a record of both vaccines at the index date were excluded. Assignment of the index date for unvaccinated people depended on the method used to account for confounding, i.e. matching or weighting. For matching, the index date of the matched counterpart was used; whereas for PS weighting, index dates for unvaccinated people were randomly assigned following the distribution of index dates in the vaccinated cohort, as depicted in Figure 1. In supplementary material Figure S1 distribution of index dates for vaccine type effectiveness are shown. After assignment of the respective index dates, individuals with a recording of COVID-19 infection before or on index date and individuals vaccinated before index date were excluded.
Exposure/s of interest
Each of the three identified vaccinated cohorts (VC, PF, AZ) were compared with an unvaccinated cohort (UV) to study vaccine effectiveness after weighting or matching.
Follow-up
Individuals were censored when: they received another vaccination dose (vaccinated cohorts) or received the first vaccination dose (unvaccinated); left the CPRD AURUM database; or they died.
Primary Outcomes
Two clinical outcomes were ascertained, in line with primary outcomes in the target phase 3 trials for the vaccines: (i) COVID-19 PCR, defined by a positive COVID-19 PCR test; and (ii) COVID-19 PCR/diagnosis, defined by either a COVID-19 diagnosis or a positive COVID-19 test. Definitions of the clinical outcomes can be found in supplementary material Tables S1-3.
Control Outcomes
PCR testing
We used PCR testing (e.g. performed COVID-19 PCR test regardless of test result) as a control outcome and proxy for diagnostic effort. We would expect this to be non-differential with respect to vaccination.
Negative control outcomes (NCO)
Additionally, a total of 43 NCO were pre-specified based on previous methodological research on vaccine safety(19) and refined after review by two clinical epidemiologists (DPA and AMJ). NCO are outcomes not causally associated with the exposure of interest, here COVID-19 vaccine/s exposure. Detail on the 43 utilised NCO is provided in the supplementary material Table S4.
Statistical analyses
Propensity Score
Large-scale Propensity Scores (PS) were used to minimise confounding. PS represent the probability of exposure based on a participant’s baseline characteristics. Covariates to be included in the PS equation were extracted, including condition occurrences for three different time windows (1 to 30 days, 31 to 180 days, and 181 days to any time prior index date), and drug exposures for 2 time periods (1 to 30 days, 31 to 180 days prior index date). Subsequently, all covariates with a frequency >0.5% were included in a lasso regression, which was used to identify relevant confounders to be included in the large-scale PS. In addition, the following variables were forced into the PS model as they were known to be associated with vaccination in the UK: location (region identifier; 9 different regions; or General Practice (GP) surgery identifier; 1357 different GP); age (5-year bands and as a continuous variable using a polynomial to account for non-linearity); prior observation years; number of GP visits, and number of previous COVID-19 PCR tests during each of the three periods (1 to 30-day, 31 to 180-day and 181 days to any time before index date).
PS Weighting and Matching
Regional vaccination, testing, and COVID-19 incidence rates(20) on index date were forced into the PS. Index date was also forced into the PS to ensure that the balance is maintained. PS were computed using a logistic regression model, with 3 different representations of location: without location (PSbase), location defined as region (PSreg) or de-identified GP surgery (PSGP). We used two different weighting methods: Inverse Probability of Treatment Weighting (IPTW), with trimming over 0.95 and above 0.05; and Overlap Weighting (OW)(21).
As for Matching, we used 1:1 PS nearest neighbours matching with exact matching on age band, gender, and location (region or GP) with caliper width 0.2 SD on the PS as calculated on their index date (PS1). Matched unvaccinated individuals were assigned the same index date as their vaccinated pair. PS were computed again on the new index date after matching (PS2) to ensure that the matching was still balanced after the index date assignment.
The following metrics were used to assess the performance of the different methods:
Covariate imbalance as a proxy of measured confounding was assessed by calculating standardized mean differences (SMD) between compared cohorts after weighting or matching.
Minimum detectable relative risk (MDRR) was computed for a 95% confidence interval, power of 0.8, a 10-day cumulative incidence of 0.0067, and 10 days of following time.
Association between vaccination status and Negative Control Outcomes (NCO)(22) was estimated using Cox proportional hazard regression to detect unmeasured confounding.
Association between vaccination status and COVID-19 PCR/diagnosis in the first 10 days following vaccination was estimated using Cox proportional hazard regression to detect unmeasured confounding through comparison with results from vaccination trials(9).
Empirical calibration
Hazard ratios calculated for (1) PCR testing, (2) COVID-19 PCR and (3) COVID-19 PCR/diagnosis were calibrated with the result of the NCO analysis as described in Schuemie et al.(23).
Target trial emulations
We aimed to reproduce the effectiveness observed at 3 weeks after vaccination in the BNT162b2 and in the ChAdOx1 phase 3 trials(9, 10) and the estimated effectiveness at 12 weeks for BNT162b2. For the AstraZeneca trial emulation, individuals who tested positive or were censored before the 24th day were eliminated from the analysis in line with the trial’s statistical analysis plan and protocol(10). We deemed a target trial as successfully replicated when the confidence intervals from the trial and observational data overlapped. It is worth noting that the 12-week trial estimate of vaccine effectiveness for BNT162b2 was based on an extrapolation by the UK JCVI committee, and did not report confidence intervals(16). Estimates from observational data were obtained using the method yielding lowest confounding (lower SMD values and least number of statistically significant NCO) and better statistical power (narrower MDRR).
Patient and Public Involvement
A patient representative was involved in the design of the project.
Results
Study population and characteristics
Following application of inclusion criteria, 583,813 vaccinated individuals (348,275 PF and 235,538 AZ); and 332,315 unvaccinated participants were identified (Figure 2). OW weighting methods retained the largest sample size, with a total of 905,418 participants included for analysis. IPTW led to the inclusion of the second largest number of participants, with N=903,147 for IPTW-PSbase, N=902,958 for IPTW-PSreg, and N=856,636 for IPTW-PSGP. Due to the exclusion of unmatched people, matching methods led to a substantially reduced sample size: N=459,000 for matchingreg; N=369,310 for matchingGP. Population flowcharts stratified by vaccine type are available in supplementary material Figures S2-3.
Baseline characteristics before weighting/matching are shown in Table 1. Relevant differences (SMD>0.1) between vaccinated and unvaccinated cohorts included age, location (region and GP practice) and number of GP visits in the previous 180 days. Additionally, differences were noted for comorbidities heart disease, hypertensive disorder, malignant neoplastic disease, and renal impairment. Supplementary material Tables S5-7 provide detail on a total of 23 imbalanced confounders with SMD > 0.1 in the comparison of vaccinated vs unvaccinated, 25 in AZ vs unvaccinated, and 27 in PF vs unvaccinated cohorts.
Comparison of performance of PS methods
Minimum detectable relative risks (MDRR) were computed for each of the comparison and methods as a proxy for statistical power and are reported in Table 2. As expected, the study was well powered due to large sample sizes, and MDRR was closer to one (better power) for weighting (e.g. MDRR 0.93 to 1.08 for IPTW in the ‘any vaccine’ analysis), and slightly further away for the one (worse power) for matching (e.g. MDRR 0.86 to 1.17 for matchinggp in the ‘any vaccine’ analysis). This relates to the exclusion of unmatched participants in the latter (see Figure 2). Standardized mean differences (SMD) and the number (%) of NCO associated with exposure are also reported in Table 2, as a proxy for measured/observed and unmeasured confounding, respectively. Both were highest in the unweighted analysis, and lowest in matching and OW compared to IPTW. Accounting for GP practice minimised confounding further: e.g. maximum SMD went from 0.79 in IPTW PSbase to 0.15 in IPTW PSgp. Overall, OW PSgp and matchinggp were the best methods in terms of observed and unobserved confounding minimisation, measured by SMD and NCO respectively.
Figure 3 depicts SMD values for the weighted cohorts. Despite PS weighting, the GP practice covariate was only balanced using OW with PSGP. IPTW did not yield sufficient balance for GP practice in any of the estimated PS, as noted by SMD>0.1. Similar findings were seen for region, which was not balanced in any of the weighting methods unless location was included in the model. Matching at regional level did not yield sufficient balance for GP. In fact, only matchinggp and OW PSgp were able to minimize GP unbalance. OW performed better than IPTW and matching in terms of overall covariate balance, with lower minimum SMD for all covariates (Table 2). Supplementary material Figures S4 and S5 show the SMD balancing for the Pfizer and AstraZeneca vs unvaccinated comparisons.
Figure 4 depicts the results of NCO analyses. In general, OW showed lower systematic error than IPTW and matching in most scenarios. Unweighted analyses show, as expected, clear evidence of one-sided systematic error, with many negative control outcomes being positively associated (RR>1) with vaccine exposure (Table 2). Supplementary material Figures S6 and S7 show the NCO results for the Pfizer and AstraZeneca vs unvaccinated comparisons.
In addition to the pre-specified NCO, Figure 5 shows the observed HR for the control outcome of PCR testing) and for the clinical outcomes (PCR+ and PCR+ or diagnosis) at day 10 after the first dose of vaccine. Although unweighted analyses led to a biased estimate (HR>1) for PCR testing, all weighting and adjustment methods resolved this and led to a calibrated HR including the expected null effect (HR=1). Unexpectedly, all the tested methods showed a protective effect against both clinical outcomes in the first 10 days after first-dose vaccination. The numerical values of Figure 5 can be observed in supplementary material Table S8. In supplementary material Figure S8 and Table S9 we can observe the same analysis than in Figure 5 without the 10 days censoring.
The Kaplan-Meier plots using Overlapping Weights and PSGP for the three studies comparisons can be observed in supplementary material Figure S9 (COVID-19 PCR + diagnosis) and Figure S10 (COVID-19 PCR).
Target trial emulations
Figure 6 shows the results of the target trial emulations based on OWgp analyses, as these yielded lowest bias and best statistical power. For ChAdOx1, we successfully replicated the results of vaccine effectiveness against COVID-19 PCR/diagnosis, with the calibrated estimate being closer to the trial result. As for BNT162b2, the observational estimates replicated both the 3- and the 12-week trial results only when calibrated results were used. Uncalibrated HRs overestimated effectiveness at week 3, and seemed to underestimate it at week 12 after first-dose vaccination.
Discussion
In this large cohort study using data from UK primary care records, we assessed the effectiveness of the first dose of the widely used COVID-19 vaccines BNT162b2 and ChAdOx1 using overlap weighting. Our estimates of vaccine effectiveness are in line with trial estimates for the ChAdOx1 vaccine (57%) and BNT162b2 at 3 weeks (77%), as well as the JCVI estimates for BNT162b2 at 12 weeks (75%).
We chose overlap weighting for the target trial emulation, following an extensive investigation of the performance of different methods to account for confounding when studying COVID-19 vaccine effectiveness using observational data. Notably, merely including individual-level patient characteristics in the construction of PS did not guarantee the balance of population-level geographic variables, such as geographical region and GP, regardless of subsequent PS approaches of matching or weighting. Also, even if these geographic variables were incorporated into the propensity score, balance between groups was not sufficiently achieved using any of the studied methods, except for overlap weighting and matching. Additionally, negative control outcomes demonstrated a better resolution of uncontrolled confounding and systematic error with overlap weighting compared to other matching and weighting methods. Although no remarkable differences were seen in the final estimates of vaccine effectiveness based on this single database, the proposed theoretical advantages of overlap weighting(14, 21), including exact balance of every measured patient characteristics and good retainment of sample size, were supported by this empirical study. For the first time, we showed that overlap weighting overperforms other weighting and matching methods in terms of unmeasured confounding.
Importantly for COVID-19 vaccine effectiveness, overlap weighting was also the best performing method to minimise confounding at the population-level, including geographic location, which has important implications due to known differences in the spread of SARS-CoV-2 during the study period. Previous early vaccine effectiveness studies using observational data overlooked the adjustment of geographic information(24, 25). Our study, consistent with previous methodological literature, showed that geographic distributions between the vaccinated and unvaccinated individuals were markedly different(26), emphasizing the need for adjustment in vaccine effectiveness studies, given known differences in community transmission levels during the study period. In addition, the insufficient covariates’ balance after applying conventional methods highlighted the importance of the proposed diagnostics in observational studies of vaccine effectiveness. More importantly, our findings didn’t support the previous suspicion that less active health-seeking behaviour in unvaccinated people might confound the vaccine effectiveness observed in real-world data(12), as no association was found between vaccination and PCR testing in our study after the empirical calibration. The observed protective vaccine effects shortly after vaccination were in line with previous findings from observational data(11, 12), which might be attributable to differences in personal behaviour during the period immediately before and after vaccination, e.g. mask wearing, avoiding crowed indoor places/shielding, rescheduling vaccination if feeling unwell, which unfortunately was unlikely to be captured by electronic health records.
Conclusions
Our study found vaccination with a first vaccine dose against COVID-19 associated with a 69% reduced risk of COVID-19 for both the ChAdOx1 and BNT162b2 vaccines. These data therefore confirm the estimation by JCVI that one dose of BNT162b2 provide protection beyond the 3-week period studied in pivotal trials. Secondly, we demonstrate that the studied propensity score weighting and matching methods can replicate pivotal trials and therefore provide reliable estimates of vaccine effectiveness. Further, PS-based overlap weights performed better than IPTW or matching in controlling for measured and unmeasured covariates while retaining sample size, and could be proposed as the preferred method for future vaccine effectiveness studies. Finally, our findings illustrate the need to incorporate patient location (e.g. GP practice or region of residence) and related variables (e.g. testing and transmission rates) to minimize community-as well as patient-level confounding in the study of COVID-19 vaccine effectiveness.
Data Availability
Data were obtained from CPRD under the CPRD multi-study license held by the University of Oxford after Research Data Governance (RDG) approval. Direct data sharing is not allowed.
Contributors and sources
MC led data analyses and contributed to analytical choices and study design. DPA and AMJ led the conceptualization of the study with contributions from EB and TRM. EB and TRM led the statistical analysis plan jointly with MC, DPA and AMJ. AD mapped and curated the data. MC, JX, DPA and AMJ wrote the first draft of the manuscript. All authors read, made contributions, and approved the last version of the manuscript. DPA and AMJ obtained the funding for this research.
Ethics approval
The study protocol was approved through the CPRD’s Research Data Governance Process (Protocol No 21_000557).
Funding
The project was supported by the National Institute for Health and Care Research (NIHR) Grant number COV-LT2-0006.
Conflict of Interest
DPA receives funding from the UK National Institute for Health and Care Research (NIHR) in the form of a senior research fellowship and from the Oxford NIHR Biomedical Research Centre. His research group has received funding from the European Medicines Agency and Innovative Medicines Initiative. His research group has received research grant/s from Amgen, Chiesi-Taylor, GSK, Novartis, and UCB Biopharma. His department has also received advisory or consultancy fees from Amgen, Astellas, Astra Zeneca, Johnson and Johnson, and UCB Biopharma; and speaker fees from Amgen and UCB Biopharma. Janssen and Synapse Management Partners have supported training programmes organised by DPA’s department and open for external participants organized by his department outside the submitted work.
Data sharing⍰
Data were obtained from CPRD under the CPRD multi-study license held by the University of Oxford after Research Data Governance (RDG) approval. Direct data sharing is not allowed.
Supplementary material
1. Clinical definition and concept list
2. Vaccine specific analyses
2.1. Index dates
2.3. Standardized mean differences
3. Hazard ratio values
Acknowledgements
This study is based in part on data from the Clinical Practice Research Datalink obtained under license from the UK Medicines and Healthcare products Regulatory Agency. The data are provided by patients and collected by the NHS as part of their care and support. The interpretation and conclusions contained in this study are those of the authors alone.⍰