Abstract
Multiple clinical studies are ongoing to assess whether existing vaccines may afford protection against SARS-CoV-2 infection through trained immunity. In this exploratory study, we analyze immunization records from 137,037 individuals who received SARS-CoV-2 PCR tests. We find that polio, Hemophilus influenzae type-B (HIB), measles-mumps-rubella (MMR), varicella, pneumococcal conjugate (PCV13), geriatric flu, and hepatitis A / hepatitis B (HepA-HepB) vaccines administered in the past 1, 2, and 5 years are associated with decreased SARS-CoV-2 infection rates, even after adjusting for geographic SARS-CoV-2 incidence and testing rates, demographics, comorbidities, and number of other vaccinations. Furthermore, age, race/ethnicity, and blood group stratified analyses reveal significantly lower SARS-CoV-2 rate among black individuals who have taken the PCV13 vaccine, with relative risk of 0.45 at the 5 year time horizon (n: 653, 95% CI: (0.32, 0.64), p-value: 6.9e-05). These findings suggest that additional pre-clinical and clinical studies are warranted to assess the protective effects of existing non-COVID-19 vaccines and explore underlying immunologic mechanisms. We note that the findings in this study are preliminary and are subject to change as more data becomes available and as further analysis is conducted.
Introduction
Since the genome for SARS-CoV-2 was released on January 11, 2020, scientists around the world have been racing to develop a vaccine1. However, vaccine development is a long and expensive process, which takes on average over 10 years under ordinary circumstances2. Even for the previous epidemics of the past decade, including SARS, Zika, and Ebola, vaccines were not available before the virus spread was largely contained3.
Conventionally vaccinations are intended to train the adaptive immune system by generating an antigen-specific immune response. However, studies are also demonstrating that certain vaccines lead to protection against other infections through trained immunity4. For instance, vaccination against smallpox showed protection against measles and whooping cough5. Live vaccinia virus was successfully used against smallpox. Due to the urgent need to reduce the spread of COVID-19, scientists are turning to alternate methods to reduce the spread, such as repurposing existing vaccines. There are some hypotheses that the Bacillus Calmette–Guérin (BCG) and live poliovirus vaccines may provide some protective effect against SARS-CoV-2 infection6,7. There are several ongoing/recruiting clinical trials testing the protective effects of existing vaccines against SARS-CoV-2 infection, including: Polio8, Measles-Mumps-Rubella vaccine9, Influenza vaccine10, and BCG vaccine11,12,13,14.
In this work, we conduct a systematic analysis to determine whether or not a set of existing non-COVID-19 vaccines in the United States are associated with decreased rates of SARS-CoV-2 infection. In Figure 1, we provide an overview of the study design and statistical analyses. We consider data from 137,037 individuals from the Mayo Clinic electronic health record (EHR) database who received PCR tests for SARS-CoV-2 between February 15, 2020 and July 14, 2020 and have at least one ICD diagnostic code recorded in the past five years (see Methods). In Table 1, we show the clinical characteristics of the study population. In particular, 92,673 (67%) individuals have at least 1 vaccine in the past 5 years relative to the PCR testing date. In Figure 2, we present the SARS-CoV-2 infection rates for subsets of the study population with particular clinical covariates. We note that the rates of SARS-CoV-2 infection are higher in Black, Asian, and Hispanic racial and ethnic subgroups compared to the overall study population. This is likely due to higher rates of COVID-19 spread and/or decreased access to PCR testing. In addition, the rates of SARS-CoV-2 infection are lower in individuals with pre-existing conditions (e.g. hypertension, diabetes, obesity) possibly due to greater caution in avoiding exposure and/or higher PCR testing rates. Given this study population, we assess the rates of SARS-CoV-2 infection among individuals who did and did not receive one of 18 vaccines in the past 1, 2, and 5 years relative to the date of PCR testing. In Table 2, we present the full names, common formulations, and counts for the 18 vaccines that we consider.
First, we assess the overall association of vaccination status with the risk of SARS-CoV-2 infection (see Methods). We use propensity score matching to construct unvaccinated control groups for each of the vaccinated populations at the 1 year, 2 year, and 5-year time horizons. The unvaccinated control groups are balanced in covariates including demographics, county-level incidence and testing rates for SARS-CoV-2, comorbidities, and number of other vaccines taken in the past 5 years. Then, we compare the SARS-CoV-2 rates between each of the vaccinated cohorts and corresponding matched, unvaccinated control groups which have similar clinical characteristics. Second, we repeat the analysis on a set of age, race, and blood type stratified subgroups of the study population. In particular, for each subgroup, we run propensity score matching and compute the difference in SARS-CoV-2 infection rate between the vaccinated and unvaccinated (matched) cohorts. Finally, we run a series of sensitivity analyses to evaluate whether or not these results may be biased from unobserved confounders or other factors.
Results
Polio, HIB, MMR, Varicella, PCV13, Geriatric Flu, and HepA-HepB vaccines consistently show associations with lower rates of SARS-CoV-2 infection across 1, 2, and 5-year time horizons
The results of the propensity score matching for the 1 year, 2 year, and 5-year time horizons are presented in Table 3, Table 4, and Table 5, respectively. We observe that across all time horizons, Polio, Hemophilus Influenzae type B (HIB), Pneumococcal conjugate (PCV13), Geriatric Flu, Hepatitis A / Hepatitis B (HepA-HepB), and Measles-Mumps-Rubella (MMR) vaccinated cohorts show consistent lower rates of SARS-CoV-2 infection. In Tables S1-S7, we show the clinical characteristics for the vaccinated, unvaccinated, and matched cohorts for each of these vaccines at the 1-year time horizon. In Figure 3, we present the vaccination coverage rates for each of these vaccines in the study population for all time horizons.
Overall, we observe that the Polio and HIB vaccinated cohorts generally have the lowest relative risks for SARS-CoV-2 infection across all time horizons. The relative risk of SARS-CoV-2 infection is 0.57 (n: 2,402, 95% CI: (0.42, 0.77), p-value: 0.003) for individuals who have taken the Polio vaccine in the past 1 year, and 0.53 (n: 2,061, (95% CI: (0.37, 0.77), p-value: 3.2e-03) for individuals who have taken the HIB vaccine in the past year. We note that these vaccines are almost exclusively administered to individuals under 18 years of age, as shown in Figure 4. Other vaccines that are commonly administered to younger individuals with strong negative correlations with SARS-CoV-2 infection include MMR and Varicella vaccines.
The other vaccines which are consistently associated with lower SARS-CoV-2 rates include PCV13, Geriatric Flu, and HepA-HepB vaccines. At the 1 year time horizon, the relative risks of SARS-CoV-2 infection are 0.72 for PCV13 (n: 4,693, 95% CI: (0.56, 0.92), p-value: 0.03), 0.74 for Geriatric Flu (n: 12,085, 95% CI: (0.61, 0.89), p-value: 5.6e-03), and 0.80 for HepA-HepB (n: 5,858, 95% CI: (0.67, 0.97), p-value: 0.05). Although the relative risks are less significant compared to Polio and HIB, these associations may be particularly interesting to explore further because these vaccines are commonly administered across a broader age range of the population (see Figure 4).
Pairwise correlation analysis reveals strong associations between administration of HIB, Polio, Rotavirus, Varicella, and MMR vaccines
In order to identify vaccines which may be confounding factors for other vaccines that are linked to reduced rates of SARS-CoV-2 infection, we conduct a pairwise correlation analysis. For example, it is possible that the lower rates of SARS-CoV-2 infection that we observe for one vaccine are in fact caused by another vaccine which is highly correlated with the former. To measure the correlations we use Cohen’s kappa, which is a measure of correlation for categorical variables that ranges from −1 to +1. In particular, Cohen’s kappa = +1 indicates that the pair of vaccines are always administered together, Cohen’s kappa = 0 indicates that the pair of vaccines are independent of each other, and Cohen’s kappa = −1 indicates that the pair of vaccines are never administered together.
In Figure 5, we present a heatmap of the pairwise correlations for each of the 18 vaccines administered in the 5 years prior to the PCR test date. Sorted by Cohen’s kappa value, the top vaccine pairs with kappa ≥ 0.60 are: HIB and Rotavirus (0.83), HIB and Polio (0.80), MMR and Varicella (0.74), Polio and Varicella (0.72), Polio and Rotavirus (0.71), MMR and Polio (0.68). From this, we see that there is a cluster of vaccines which are commonly administered together, which includes: HIB, Polio, Rotavirus, Varicella, and MMR vaccines. The majority of individuals who receive this cluster of vaccines are children <18 years old (see Figure 4). We note that in this cluster, the vaccines HIB, Polio, Varicella, and MMR are all consistently associated with lower SARS-CoV-2 rates. This suggests that some of the lower rates of SARS-CoV-2 observed in these vaccinated cohorts may be confounded by the other vaccines in this group.
Stratification by race reveals that Polio, HIB, and PCV13 vaccines are associated with lower SARS-CoV-2 rates in particular racial subgroups across 1, 2, and 5-year time periods
In Tables 7-9, we present the results of propensity score matching at the 1, 2, and 5-year time horizon, respectively, on study cohorts stratified by race. We observe that PCV13 vaccination is linked with significantly decreased SARS-CoV-2 rates in the Black subpopulation. In particular, the relative risk of SARS-CoV-2 infection for black individuals who have been administered PCV13 is 0.24 at the 1 year time horizon (n: 197, 95% CI: (0.09, 0.71), p-value: 0.03); 0.33 at the 2 year time horizon (n: 239, 95% CI: (0.16, 0.74), p-value: 0.03); and 0.45 at the 5 year time horizon (n: 653, 95% CI (0.32, 0.64), p-value: 6.9e-5). Furthermore, at the 5-year time horizon, the relative risk for the PCV13 vaccinated cohort of black individuals is significantly lower than the relative risk for the PCV13 vaccinated cohort overall (p-value: 0.03).
In addition, we observe that Polio, HIB, and PCV13 vaccines are linked with decreased SARS-CoV-2 rates in the White subpopulation. However, since 119,979 (88%) of individuals in the study population are white, the relative risks for these vaccinated cohorts are close to the relative risks for the overall population (see Tables 3-5). Matching within subgroups was done by age group (0-18, 19-49, 50-64, 65+) and blood group (A, B, AB, O) as well, but no significant within-subgroup associations between any vaccine and SARS-CoV-2 rates were found. This suggests that associations between vaccines and SARS-CoV-2 infection rates may not be strongly specific to particular age ranges/blood groups.
Sensitivity Analysis
Tipping point analysis shows that associations between reduced SARS-CoV-2 rates and Polio vaccine (1, 2 year time horizons), PCV13 (5 year time horizon) are most robust to unobserved confounders
In this retrospective study, we evaluate the correlations between vaccination and SARS-CoV-2 infection, taking into account a number of possible confounding variables, such as demographic variables and geographic COVID-19 incidence rate (see Methods). However, it is possible that the results from this study have been influenced by unobserved confounders. For example, we do not explicitly control for travel history, which was a significant risk factor for SARS-CoV-2 infection early on in the pandemic.
In Figure 6, we present the results from the tipping point analysis on the statistically significant associations between vaccination and reduced rates of SARS-CoV-2 infection in the overall study population. For each time horizon, we show the relative prevalence and effect size that would be required for an unobserved confounder to overturn the conclusion for a given (vaccine, time horizon) pair. For reference, we show the effect size of the covariate (county-level COVID-19 incidence rate ≥ median value) as a potential confounder, which has a large relative risk of 2.78.
At the 1 year and 2 year time horizons, the associations of the Polio vaccine to lower rates of SARS-CoV-2 infection are most robust to the impact of a potential unobserved confounder. In particular, an unobserved confounder with a large effect size of 2.78 would need to have an absolute difference in prevalence between vaccinated and unvaccinated cohorts of 17.8% (30.9%) in order to overturn the results for the 1 year (2 year) time horizon. On the other hand, at the 5 year time horizon, the association of PCV13 and lower rates of SARS-CoV-2 infection is most robust to the influence by unobserved confounders. An unobserved confounder with a large effect size of 2.78 would need to have an absolute difference in prevalence between vaccinated and unvaccinated cohorts of 19.1% in order to render the findings insignificant.
Discussion
Ongoing clinical studies offer preliminary evidence that existing vaccines may reduce risk of SARS-CoV-2 infection. For example, interim results from the ACTIVATE trial12 indicate that the BCG vaccine reduces SARS-CoV-2 infection rates up to 53%. While specific vaccines such as BCG are being tested for cross-protective effects against SARS-CoV-2 infection based upon their prior potential for protection against other diseases14, to our knowledge, a systematic hypothesis-free analysis to identify potential vaccines that can have beneficial effects against SARS-CoV-2 infection is lacking. Our retrospective study has analyzed 18 different vaccines and identified key vaccines that are correlated with lower rates of SARS-CoV-2 infection after controlling for confounding factors (see Results). In particular, we find that individuals who have been recently vaccinated with one of Polio, HIB, MMR, Varicella, PCV13, Geriatric Flu, or HepA-HepB vaccines have lower rates of SARS-CoV-2 infection. These vaccines are promising candidates for follow-up pre-clinical animal studies and clinical trials. We note that this list of vaccines is preliminary and may change as more data becomes available and as further analysis is conducted.
For the rest of the 18 vaccines that we considered, the correlations with SARS-CoV-2 infection were either insignificant or varied across the time horizons of interest. In some cases, these vaccines may serve as negative controls in clinical trials testing the safety and efficacy of novel COVID-19 vaccines. For example, a clinical trial evaluating the COVID-19 vaccine candidate ChAdOx1 uses Meningococcal vaccine as a comparator arm15. Preliminary results from this trial indicate that as expected, Meningococcal vaccine does not induce antibody responses against SARS-CoV-2 spike protein. It may be interesting to evaluate the antibody responses for some of the vaccines that we have found to be significantly correlated with lower rates of SARS-CoV-2 infection, to explore if there is any underlying immunologic mechanism for the associations that we observe.
Because the BCG vaccine is rarely administered in the US, this vaccine did not meet the sample size threshold for inclusion in our analysis. From the limited data available, there were 51 individuals in the study population who had taken BCG vaccine in the past 5 years, and among these 0 individuals tested positive for SARS-CoV-2 infection (95% CI: (0.0%, 7.0%)). Among the 198 individuals who had taken BCG vaccine at least once in their lifetime, there were 6 (3.0%) individuals who tested positive for SARS-CoV-2 infection (95% CI: (1.4%, 6.5%)). As a result, more data from additional medical centers would be required for us to assess the associations between BCG vaccine and SARS-CoV-2 infection.
Due to the observational nature of this study, there are potential biases which may have impacted the findings, including confounding, selection bias, and measurement bias. The motivation for using propensity score matching was to account for confounding. Although we take into account some potential confounders through propensity score matching, there may still be residual confounding from unobserved factors (e.g. socioeconomic status, indications, contraindications, etc.) which may be different for each vaccine. For example, travel history is a risk factor for exposure to SARS-CoV-2 infection that we do not explicitly account for in this study. Our motivation for the tipping point sensitivity analysis is to estimate the effect size and prevalence of an unobserved confounder which would be required to overturn the statistically significant findings (see Figure 6). Even among the variables that we consider, there is potential for bias if the cohorts are poorly matched on those covariates. In Tables S1-S7, we present the propensity score matching results for a number of vaccines at the 1 year time horizon, in order to show the matching quality for each of these statistical comparisons. Furthermore, we present plots showing the distribution of the age covariate in particular in Figure S1. We note that for some vaccines, differences in age between the vaccinated and unvaccinated (matched) cohorts may have influenced the results.
In addition, it is possible that restricting the study population to SARS-CoV-2 PCR tested individuals may have introduced selection bias. For example, vaccinated individuals may engage in more health-seeking behaviors to reduce their potential COVID-19 risk, and also have a higher likelihood of seeking out a PCR test. This type of bias is known as the “healthy user effect”, which is suspected to have influenced the findings of recent COVID-19 observational studies16,17. We performed sensitivity analyses using breast cancer and colon cancer screening as negative controls which suggest that the propensity score matching analysis is in part effective in filtering out healthy user effect for the associations between vaccination status and SARS-CoV-2 risk. Finally, measurement bias is a concern as vaccination records may be incomplete for some individuals in our cohort since they may have received the vaccines outside of the Mayo Clinic system. We plan to perform additional sensitivity analyses to further explore these potential sources of bias.
As an initial exploratory analysis linking historical vaccination records to SARS-CoV-2 PCR testing results, more research is warranted in order to confirm the findings. We plan to update this analysis in coming months as more PCR testing data becomes available. Also, we note that this study is based on data from one academic medical center in the United States, which restricts the analysis to vaccines administered in this geographic region. Notably, we do not have sufficient immunization record data on the BCG vaccine, which has shown promise in early clinical trials. As a result, the findings from this study would be well complemented by similar studies from hospitals across the world.
Methods
Study design
This is an observational study in a cohort of individuals who underwent polymerase chain reaction (PCR) testing for suspected SARS-CoV-2 infection at the Mayo Clinic and hospitals affiliated to the Mayo health system. The full dataset includes 152,548 individuals who received PCR tests between February 15, 2020 and July 14, 2020. We restricted the study population to 137,037 individuals from this dataset who have at least one ICD code recorded in the past 5 years. This exclusion criteria is applied in order to restrict the analysis to individuals with medical history data. Within this PCR tested cohort, we define COVIDpos to be persons with at least one positive PCR test result for SARS-CoV-2 infection, which includes 5,679 individuals. Similarly, we define COVIDneg to be persons with all negative PCR test results, which includes 131,358 individuals.
For the study population of 137,037 individuals, we obtain a number of clinical covariates from the Mayo Clinic electronic health record (EHR) database, including: demographics (age, gender, race, ethnicity, county of residence), ICD diagnostic billing codes from the past 5 years, and immunization records from the past 5 years (68 unique vaccines; we focus on the 18 taken by at least 1,000 individuals over the past 5 years). We use the Elixhauser Comorbidity Index to map the ICD codes from each individual from the past 5 years to a set of 30 medically relevant comorbidities18. In addition to the Mayo Clinic EHR database, we use the Corona Data Scraper online database to obtain incidence rates of COVID-19 at the county-level in the United States18,19. By linking the county of residence data from the EHR with the incidence rates of COVID-19 from Corona Data Scraper, we are able to obtain county-level incidence rates of COVID-19 for 136,313 individuals in the study population. We also obtain county-level testing data for 100,433 individuals in the study population from (i) Minnesota state government records and (ii) public county-level testing data scraped from other state/county websites. In Table 1, we present the average values for each of the clinical covariates in the study population.
Given these clinical covariates, we conduct a series of statistical analyses to assess whether or not each of the 19 vaccines has an association with lower rates of SARS-CoV-2 infection at the 1 year, 2 years, and 5 year time horizons. For each vaccine and time horizon, the vaccinated cohort is defined as the set of individuals in the study population who received the vaccine within the past time horizon. For example, the “2-year polio vaccinated cohort” is the set of individuals who received the polio vaccine within the past two years. Similarly, for each vaccine and time horizon, the unvaccinated cohort is defined as the set of individuals in the study population who did not receive the vaccine within the past time horizon. For example, the “5-year influenza unvaccinated cohort” is the set of individuals who did not receive the influenza vaccine within the past five years.
In the following sections, we describe the statistical methods that we use to compare the rates of COVID-19 between the vaccinated and unvaccinated cohorts for each of the (vaccine, time horizon) pairs. First, we describe the propensity score matching analysis to construct unvaccinated control groups that have similar clinical characteristics to the vaccinated cohorts. Second, we describe the statistical tests that we use to determine which of the (vaccine, time horizon) pairs have the most significant association with lower rates of SARS-CoV-2 infection for the 1 year, 2 year, and 5 year time horizons, both overall and for particular demographic subgroups. Third, we describe the covariate-level stratification analysis to identify vaccines which have the largest association with lower rates of SARS-CoV-2 infection for particular demographic subgroups. Finally, we describe the sensitivity analyses that we use to evaluate the robustness of the statistical methods to potential biases from unobserved confounders or other factors that could impact the overall results from this observational study.
Propensity score matching to construct unvaccinated control groups
Before running the propensity score matching step, first we filtered to vaccinated cohorts with at least 1,000 persons. For the overall statistical analysis, there were 13, 15, and 18 vaccines which met this threshold for the 1 year, 2 year, and 5 year time horizons, respectively.
For each vaccinated cohort with sufficient numbers of individuals, we applied 1:1 propensity score matching to construct a corresponding unvaccinated control group with similar clinical characteristics20. We refer to this as the “unvaccinated (matched)” cohort, which is a subset of the unvaccinated cohort. We considered the following clinical covariates in the propensity score matching step:
Demographics (Age, Gender, Race, Ethnicity)
County-level COVID-19 incidence rate: (Number of positive SARS-CoV-2 PCR tests in county) / (Total population of county) within +/-1 week of PCR testing date.
County-level COVID-19 test positive rate: (Number of positive SARS-CoV-2 PCR tests in county) / (Number of PCR tests in county) within +/-1 week of PCR testing date.
Elixhauser comorbidities: Medical history derived from ICD diagnostic billing codes in the past 5 years relative to the PCR testing date. Includes indicators for the following conditions: (1) congestive heart failure, (2) cardiac arrhythmias, (3) valvular disease, (4) pulmonary circulation disorders, (5) peripheral vascular disorders, (6) hypertension, (7) paralysis, (8) neurodegenerative disorders, (9) chronic pulmonary disease, (10) diabetes, (11) diabetes with complications, (12) hypothyroidism, (13) renal failure, (14) liver disease, (15) peptic ulcer disease (excluding bleeding), (16) AIDS/HIV, (17) lymphoma, (18) metastatic cancer, (19) solid tumor without metastasis, (20) rheumatoid arthritis/collagen vascular diseases, (21) coagulopathy, (22) obesity, (23) weight loss, (24) fluid and electrolyte disorders, (25) blood loss anemia, (26) deficiency anemia, (27) alcohol abuse, (28) drug abuse, (29) psychoses, (30) depression.
Pregnancy: Whether or not the individual had a pregnancy-related ICD code recorded in the past 90 days relative to the PCR testing date.
Number of other vaccines: Count of the total number of unique vaccines (excluding the vaccine which is the treatment variable) taken by the individual in the past 5 years relative to the PCR testing date.
For each of the vaccinated cohorts, we fit a logistic regression model to predict whether or not the individual was vaccinated, using these covariates as predictors. We trained the logistic regression model using the scikit-learn package in Python21. Then, we used the model-predicted probability of an individual receiving the vaccine as the propensity score for the individual. Matching was done without replacement using greedy nearest-neighbor matching within calipers. Some subjects were dropped from the positive cohort in this procedure. The matching was performed with caliper width 0.2 * (pooled standard deviation of scores), as suggested in the literature22.
Statistical assessment of associations between vaccines and SARS-CoV-2 infection rates for the overall study population
After the propensity score matching step, we compare the COVIDpos rates for the vaccinated and unvaccinated (matched) cohorts. First, we compute the relative risk, which is equal to the COVIDpos rate for the vaccinated (matched) cohort divided by the COVIDpos rate for the unvaccinated (matched) cohort. We use a Fisher exact test to compute the p-value for this association. We then apply the Benjamini-Hochberg (BH) adjustment23 on the p-values over all vaccines for each time horizon to control the False Discovery Rate (at 0.05). We also compute and report 95% confidence intervals for the relative risks.
Statistical assessment of associations between vaccines and SARS-CoV-2 infection rates for age, race/ethnicity, and blood type stratified subgroups
We repeat the statistical analysis on subsets of the study population stratified by age, race/ethnicity, and blood type. For age, we consider the subgroups: 0 to 18 years, 19 to 49 years old, 50 to 64 years old, and ≥ 65 years old. For race/ethnicity, we consider the subgroups: White, Black, Asian, and Hispanic. For blood type, we consider the subgroups: O, A, B, and AB. We note that age and race/ethnicity were recorded in the dataset for all subjects, but blood type information was only available for 41,828 subjects.
For each vaccine, at the 1, 2, and 5 year time horizons, we use propensity score matching to construct unvaccinated control groups for each age bracket, race/ethnicity, and blood type subgroup. Matching was done on the same covariates as in the overall analysis (apart from the Race/Ethnicity covariates for the race/ethnicity subgroups). We then compared the COVIDpos rates between the vaccinated and unvaccinated (matched) cohorts, and reported the relative risk, 95% confidence interval, and BH-corrected p-values.
Sensitivity Analyses
We performed two sets of sensitivity analyses, as described below.
Cancer screens as negative controls for propensity score matching procedure
To assess the effectiveness of the propensity score matching procedure, we ran the statistical analysis using cancer screens as the exposure variable instead of vaccinations (i.e. negative control exposure). This set of experiments serves as a negative control because it is highly unlikely that cancer screenings are causally linked to risk of SARS-CoV-2 infection. In particular, we considered the following two cancer screens as negative controls:
Colon cancer screen: Whether or not the individual received a screening for colon cancer (within a specified time horizon relative to PCR testing date).
Mammogram: Whether or not the individual received a mammogram screening for breast cancer (within a specified time horizon relative to PCR testing date),
In Table 6, we present the results from the negative control experiments. In the unmatched cohorts, we observe that persons who have had a mammogram in the past 1, 2, or 5 years have significantly lower rates of SARS-CoV-2 infection compared to persons who have not had mammograms during the same time period. For example, the SARS-CoV-2 infection rate is 2.5% among persons with mammograms in the past 5 years and 4.5% among persons without mammograms in the past 5 years (p-value: 1.9e-47). This significant difference in SARS-CoV-2 infection rate can be explained by confounding variables, because the unmatched cohorts have different underlying clinical characteristics. However, after propensity score matching, the SARS-CoV-2 infection rate is 2.8% among persons with mammograms in the past 5 years and 2.8% among persons without mammograms in the past 5 years (p-value: 1).
We observe similar results for the colon cancer screening covariate. For example, the SARS-CoV-2 infection rate is 2.5% among persons with colon cancer screens in the past 5 years and 4.4% among persons without colon cancer screens in the past 5 years (p-value: 9.3e-44). After propensity score matching, the SARS-CoV-2 infection rate is 2.5% with and 2.4% without colon cancer screens in the past 5 years (p-value: 1). In total, 6 comparisons (2 controls, 3 time horizons each) were done. After applying Fisher’s method to combine p-values, we get a combined p-value of 0.22 (X2 = 15, df=12) against the combined hypothesis that none of the controls have an association with SARS-CoV-2 after propensity score matching.
We expect that the individuals who have recently taken cancer screens may have lower rates of SARS-CoV-2 infection due to the “healthy user effect”17. In particular, persons who have recently had mammograms or colonoscopies may engage in general health-seeking behaviors which decrease their risk of SARS-CoV-2 infection or generally decrease their risk of a positive PCR test result. The results from the negative control experiment demonstrates that the propensity score matching is able to correct for confounding variables which may contribute to spurious findings such as those caused by the healthy user effect.
Tipping point analysis
In order to evaluate how robust the associations between vaccinations and SARS-CoV-2 infection found in this study are to the effects of potential confounders, we conduct a “tipping point” analysis24. The purpose of this analysis is to find the point at which an unobserved confounder would “tip” the conclusion on each vaccine, making the results no longer statistically significant.
Here, there are two dimensions to consider: (1) the effect size (i.e. relative risk of SARS-CoV-2 infection) of the confounder, and (2) the relative prevalence of the confounder in the vaccinated vs. unvaccinated (matched) cohorts. For each vaccine, we compute the relative prevalence and effect size that would be required for an unobserved confounder to overturn the conclusion for a given (vaccine, time horizon) pair. We present the results from the tipping point analysis in Figure 6.
Institutional Review Board (IRB)
This research was conducted under IRB 20-003278, “Study of COVID-19 patient characteristics with augmented curation of Electronic Health Records (EHR) to inform strategic and operational decisions”. All analysis of EHRs was performed in a privacy-preserving environment. nference and the Mayo Clinic subscribe to the basic ethical principles underlying the conduct of research involving human subjects as set forth in the Belmont Report and strictly ensure compliance with the Common Rule in the Code of Federal Regulations (45 CFR 46) on the Protection of Human Subjects.
Data Availability
The data analyzed was accessed under the IRB information provided herewith (IRB 20-003278), Study of COVID-19 patient characteristics with augmented curation of Electronic Health Records (EHR) to inform strategic and operational decisions (at the Mayo Clinic).
Supplementary Material
Acknowledgements
The authors thank Murali Aravamudan, Patrick Lenehan, Walter Kremers, and Hilal Maradit-Kremers for their review and helpful feedback on this manuscript.
Footnotes
↵+ Joint first authors