WHAT IS THE UNCERTAINTY IN EFFICACY OF COVID-19 VACCINES? A BAYESIAN ANALYSIS

This short paper reports a Bayesian analysis of the publicly available COVID-19 trial results. The analysis casts some doubts on whether the half+full dose regime of the AstraZeneca COVID-19 vaccine is truly more eﬀective than the 2x full dose regime. The 95% posterior interval for the eﬃcacy of the half+full dose regime is 66.6-96.3%, while for the 2x full dose regime it is 39.0-74.8%. Hence, it is possible that in both dosage regimes the vaccine has similar eﬃcacy, around 70%. The estimated eﬃcacy for the Pﬁzer vaccine is 89.9-97.4% and for Moderna 86.3-97.5%. These results should be interpreted with care though, since this analysis does not account for diﬀerences in for instance trial population, COVID-19 testing, and storage requirements for the various vaccines.


Introduction
Over the past weeks, data for three COVID-19 vaccines have been released. The Pfizer and Moderna vaccines exhibit around 95% efficacy [5,4]. The AstraZeneca vaccine had a 62% efficacy for two regular doses and a 90% when the first shot was administered in half a dose.
Though promising, these data also raise many questions. How big are the uncertainty margins for these efficacy numbers, given that these initial results are based on a limited number of COVID-19 infections? And is it really likely that a first half dose for the As-traZeneca vaccine-which was apparently administrated in error [2]-is more effective than the full dose?
A Bayesian analysis is ideally suited to address such questions. In a Bayesian analysis, the probability distribution of unknown parameters is inferred from limited known data [3]. In this case, the probability distribution of the vaccine efficacy for each trial is inferred from the publicly available data of COVID-19 infections in vaccinated and control groups.

Method
Since the infection rates as a fraction of the total trial participants is low (in the order of one or a few percent), the number of infections of the control group n v can be assumed to follow a Poisson distribution with rate λ > 0. A vaccine with efficacy 0 ≥ α ≥ 1 reduces this rate to (1 − α)λ for the number of infections in the vaccinated group n v . This assumes that trial participants have equal probability of being in the vaccine or the control group. If the vaccine has no effect (α = 0) the infection rate in the vaccinated group is the same as in the control group, and if the vaccine works perfectly (α = 1), the rate is reduced to zero. Summarizing: Assume flat priors on α and log(λ). This means α ∼ Uniform(0, 1) and p(λ) ∝ 1/λ (an improper prior). This leads to the following posterior distribution: Because we are primarily interested in the distribution of the vaccine efficacy α, the 2 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted December 3, 2020. ; https://doi.org/10.1101/2020.11.30.20240671 doi: medRxiv preprint parameter λ can be integrated out: The third step in this derivation uses a parameter transformation in the integral λ → λ 2−α . Since the resulting integral is constant in α, it can be taken out of the posterior, which is commonly defined up to a normalization constant (hence the proportionality instead of equality signs). The resulting distribution is used for the posterior inference in this paper, after normalizing such that the total probability is equal to one.
The log posterior density is: In this equation c is the log of the normalization constant. The log posterior can be useful for analytical and numerical calculations. For instance, the mode of the posterior follows from ∂ /∂α = 0, leading to α mode = 1 − n v /n c . This is equal to the "naïve" implied efficacy as often reported (see Table 1). Note though that the posterior median and mean are in general different from the mode-and typically more insightful. Only for large samples do these different centrality measures converge, in which case they become equivalent to maximum likelihood estimation (MLE).

Data
The vaccine developers have released only limited data so far, through press releases. Pfizer and Moderna have released the data for the number of COVID-19 cases in treatment and control groups [5,4]. AstraZeneca states that there were 131 COVID-19 cases observed so far, while the efficacy was 90% in the half+full dose group, 62% in the 2x full dose group, and 70% when combining both groups [1]. This is just enough data to infer the number of COVID-19 infections in the vaccine in control groups. Table 1 shows the summary data for the different trials, including the average implied efficacy followingᾱ = 1 − n v /n c .
3 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Results
The posterior distribution of α resulting from Equation (1) can be inferred for each trial using the data in Table 1. Figure 1 shows the resulting posterior probability density functions. Table 2 summarizes these distributions in terms of the posterior median as well as the 95% posterior probability intervals of vaccine efficacy α.
First of all note that the posterior median (i.e., "most likely") efficacy in Table 2 is 4 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted December 3, 2020. ; https://doi.org/10.1101/2020.11.30.20240671 doi: medRxiv preprint somewhat lower than the average implied efficacy in Table 1. The reason for this can be most easily understood in the extreme case that there are no observed patients in the vaccine group, n v = 0. In such a case the implied efficacyᾱ would be 100%. However, there is still a chance that the vaccine does not work perfectly, so there will be probability mass for α < 100%, and hence the posterior median would be below 100%. A similar asymmetry is apparent in the posterior distributions in Figure 1, leading to lower posterior medians for α than the reported average vaccine efficacy. Furthermore, the AstraZeneca trials exhibit wide error margins for the vaccine efficacy. Most notably, the margins are sufficiently wide that it appears conceivable that the efficacy for the half+full dose regime actually does not differ from the 2x full dose regime. It is possible that the vaccine has an efficacy around 70% for both dosage regimes. If the efficacy of the different dosing regimes is in fact similar, the data from these two arms of the trial could be combined. This would yield an efficacy between 54.4% and 79.7% for the AstraZeneca vaccine (95% posterior interval).
The Pfizer and Moderna data exhibit significantly smaller error margins. The current data imply a vaccine efficacy between 89.9% and 97.4% for the Pfizer vaccine and 86.3% to 97.5% efficacy for the Moderna vaccine. These 95% intervals are strictly higher than for the AstraZeneca combined vaccine, though it still could be that the half+full dose regime of that vaccine is as effective as the Pfizer and Moderna vaccines. Clearly, more data on the AstraZeneca vaccine in the different dosage regimes would be needed to assess this.

Discussion
This study uses a simple Bayesian analysis on publicly available trial data. It comes with several limitations. Most notably, not taken into account is that vaccines and trials can differ in many ways, such as: -Trial participant demographics (country, age, medical history, etc.) -COVID-19 testing procedure (for instance, in the AZ-Oxford trials participants are checked 5 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted December 3, 2020. ; pro-actively for asymptomatic infections, while the Pfizer and Moderna trials rely on selfreporting with follow-up tests [2]) -The storage and distribution of the vaccines (for instance, the Pfizer vaccine needs to be stored in −70 • C, while the AstraZeneca one comes with relatively mild storage conditions, facilitating distribution, especially for less economically-developed areas) -Other end points (such as hospitalizations) and side effects The analysis could also be further extended. For instance, hierarchical priors could be used across different trials in order to get a sense of the distribution of vaccine efficacy. Also, a more formal Bayesian analysis could be performed to assess the posterior probability of the difference in efficacy of the two dosage regimes of the AstraZeneca vaccine. Such analysis could use earlier trial results with different dosage regimes (from COVID-19 or other vaccines) and/or expert assessments to estimate the prior probability that a half+full dose regime would be more effective than a 2x full dose regime. Finally, inference can be improved as more data comes available on these and other vaccines. Appendix A contains the R code used for the present analysis, which future researchers can use for replication of the results in this paper and for future extensions.

Conclusion
A Bayesian analysis of the publicly available trial data casts some doubt on whether the half+full dose regime of the AstraZeneca COVID-19 vaccine is truly more effective than the 2x full dose regime. It is possible that the vaccine has an efficacy around 70% for both dosage regimes. The data for the combined trials suggests that this vaccine might be less effective than the Pfizer and Moderna vaccines, which likely have an efficacy of at least 85% and probably above 90%, up to 97.5%.