## Abstract

We compare and contrast the expected duration and number of infections and deaths averted among several designs for clinical trials of COVID-19 vaccine candidates, including traditional randomized clinical trials and adaptive and human challenge trials. Using epidemiological models calibrated to the current pandemic, we simulate the time course of each clinical trial design for 504 unique combinations of parameters, allowing us to determine which trial design is most effective for a given scenario. A human challenge trial provides maximal net benefits—averting an additional 1.1M infections and 8,000 deaths in the U.S. compared to the next best clinical trial design—if its set-up time is short or the pandemic spreads slowly. In most of the other cases, an adaptive trial provides greater net benefits.

## 1 Introduction

The COVID-19 pandemic has caused the deaths of hundreds of thousands, upended the lives of billions, and caused trillions of dollars in economic loss, and life is unlikely to return to normal until a vaccine is found [1]. Despite the many candidates undergoing testing, an approved vaccine is not expected until 2021, even with substantially compressed development timelines [2], smooth proceeding of clinical trials, and not accounting for possible failures [3]. It is possible—though considered highly unlikely at the present time—that, like many non-influenza epidemics, the crisis may be over before a successful vaccine is developed [4].

Unlike typical therapeutics that are administered to sick patients, vaccines are intended for the healthy. Therefore, confirming the safety and effectiveness of a vaccine is of critical importance [5]. The two primary methods for demonstrating vaccine safety and efficacy are through either a vaccine efficacy randomized clinical trial (RCT) or a vaccine immunogenicity RCT. In the former, large numbers of recruited healthy volunteers are randomly selected to receive either the vaccine or a placebo/active control and then monitored for a period of time. At the end of the surveillance period, the difference in the proportion of infections between the treatment and control arms is computed to demonstrate the ability of the vaccine to prevent infection or disease. A phase 3 vaccine efficacy RCT typically takes five to ten years to complete [6].

In a vaccine immunogenicity RCT, the primary endpoint is an immunity measurement or surrogate marker which is known to correlate with protection against infection or a disease. This type of trial involves a smaller number of volunteers and requires a shorter follow-up period, and as a result, is quicker and less costly [7]. Given that SARS-CoV-2 is a novel pathogen for which we do not yet know how to determine whether a subject is protected, vaccine efficacy must be confirmed using the longer and more costly vaccine efficacy RCT.^{1} The U.S. Food and Drug Administration (FDA) has also issued a guidance stating that “the goal of development programs should be to pursue traditional approval via direct evidence of vaccine efficacy” [8].

A human challenge trial (HCT), in which volunteers are randomized into either the vaccine or placebo arm and then infected deliberately with live virus in a controlled setting, has been proposed as an alternative to accelerate the vaccine development process. Upon challenge, HCTs can quickly demonstrate safety and efficacy of candidate vaccines in preventing infection or disease. Depending on sample size, HCTs could also help to establish functional immune correlates of protection to inform the design of future vaccines. Since an HCT allows comparison of immune responses in vaccinated and unvaccinated individuals, precise measurements of post-vaccination viral loads, characterization of immune responses (innate, adaptive, cell-mediated) and antibody titers, and close monitoring and care of patients, it can help establish the correlates of protection and prove vaccine efficacy concurrently. Moreover, a properly designed HCT can determine transmission risk of the infected in a controlled setting with minimal exposure to investigators and the public. While concerns have been raised regarding the ethics and morality of HCTs, it is generally accepted that HCTs are ethically permissible when the benefits to society outweigh acknowledged risks [9], and they have been deemed acceptable for developing vaccines for multiple infectious diseases such as influenza [10], malaria [11], typhoid [12], cholera [13], and dengue fever [14]. To the best of our knowledge, there have been no published studies of a quantitative analysis of the potential societal value of a COVID-19 HCT.

In this paper, we compare the costs and benefits—as measured by the number of deaths and infections avoided—of confirming the safety and efficacy of a COVID-19 vaccine using four clinical trial designs: a traditional vaccine efficacy RCT, a vaccine efficacy RCT with an optimized surveillance period that maximizes the benefits of the trial (ORCT), an adaptive vaccine efficacy RCT (ARCT), and an HCT. Although our framework applies more broadly to any vaccine candidate for any infectious disease, we calibrate our simulations using a set of estimated epidemiological models for the SARS-CoV-2 virus (one for each of the 50 states and Washington, D.C.) to determine attack rates^{2} and cumulative numbers of infections and deaths in the U.S under various scenarios.

A summary of our simulation framework is displayed in Fig. 1. We first estimate baseline models from data and make assumptions for the evolution of the epidemic in order to predict the attack rates over the course of the clinical trials. We then combine the attack rates with the assumptions for the vaccine trial design to simulate the outcomes for the clinical trials. Conditioned on the vaccine being approved, we make assumptions on the vaccination schedule and simulate the path of the epidemic in order to compute the net infections and deaths prevented.

Assuming that a clinical trial testing a vaccine with a true efficacy of 50%^{3} and using superiority tests starts on August 1, 2020, we estimate the date of licensure of the hypothetical vaccine candidate to be some time in November 2021 with a traditional RCT (476 days), between June and August 2021 with an ORCT (326 to 380 days), between April and June 2021 with an ARCT (246 to 306 days), and between March and June 2021 with an HCT (221 to 311 days).^{4} The ARCT provides the greatest expected net benefit among the three RCT designs in almost all scenarios. The utility of an HCT versus the RCTs, however, depends critically on the HCT set-up time and the course of the epidemic. The benefits of HCTs are greatest when trials are initiated as early in an epidemic as possible, and/or if the rate of infection is relatively low. Assuming a 30-day set-up time, a vaccine efficacy of 50%, a behavioral epidemiological model, and a population vaccination schedule of 10M doses per day, an HCT can reduce the time to licensure by one month, thus preventing approximately 1.1M incremental infections and 8,000 incremental deaths compared to the best performing alternative clinical trial design, the ARCT. We observe similar results when superiority-by-margin tests are used instead.

We review the designs and assumptions for the four vaccine trials considered in Section 2 and explain our cost/benefit calculations in Section 4.1. We present the epidemiological model used in our forecasts in Section 3 and report our simulation results in Section 4. We discuss our findings and some broader issues of COVID-19 clinical trials in Section 5 and conclude in Section 6.

## 2 Vaccine Trial Design

We begin by describing the assumptions and calibrations used in each of the four vaccine trial designs we considered in our simulations.

### 2.1 Traditional Vaccine Efficacy RCT

First, we consider a traditional double-blind vaccine efficacy trial design. We assume that a closed cohort of 30,000 infection-free but at-risk healthy U.S. adults aged between 18 and 50 years will be enrolled for the study, similar to the phase 3 studies planned or underway for the COVID-19 vaccines developed by Moderna [15], AstraZeneca [16], Pfizer/BioNTech [17], and others. The participants will be randomized equally between the treatment and control arms, receiving either the vaccine candidate or an active control vaccine^{5} (e.g., vaccine against meningococcal bacteria), respectively. Unlike clinical trials for cancer therapeutics where patient accrual can be a challenge due to the small pool of afflicted patients and strict inclusion/exclusion criteria, subject enrollment for vaccine efficacy studies are often accelerated because there is a large pool of healthy adult volunteers to recruit from. Therefore, we assume an accrual rate of 250 patients per day in our simulations.

Similar to the design of study protocols adopted for phase 3 clinical trials of current leading SARS-CoV-2 vaccine candidates, we assume a hypothetical COVID-19 vaccine candidate that will be administered to subjects in two doses, 28 days apart, i.e., the prime-boost regimen [18, 19]. Furthermore, we assume that it takes approximately 28 days after the booster dose for antibodies to develop (i.e., seroconversion) before surveillance can begin.

We consider efficacy in the prevention of infection by SARS-CoV-2 as the primary end-point in our study.^{6} To draw meaningful conclusions from the trial results, volunteers must be monitored long enough for a sufficient number of infections to occur. Here, we assume a fixed post-vaccination surveillance period of 180 days for all subjects in the cohort, after which a single safety and primary efficacy analysis will be performed to determine licensure (see Appendix A.1).

Finally, we assume an interval of 120 days after surveillance for the preparation of a biologics license application (BLA) submission to the FDA, including an analysis and publication of safety, immunogenicity, and efficacy results; collection of chemistry, manufacturing, and controls (CMC) data; the writing of a clinical study report; and subsequent review by the FDA. Under these assumptions, we estimate the time to licensure of our hypothetical candidate under a traditional RCT to be approximately 476 days. This is the baseline value against which we will compare the other three trial designs.

### 2.2 Optimized Vaccine Efficacy RCT

Depending on the transmission rate of COVID-19 during the trial and the assumed efficacy of the hypothetical candidate, a shorter surveillance period might be sufficient to observe significant results.^{7} Therefore, we consider an optimized version of the traditional vaccine efficacy RCT design (ORCT) in which the surveillance period is optimized between 30 to 180 days based on different epidemiological scenarios and vaccine efficacies to maximize the expected number of incremental infections and deaths prevented.^{8} Apart from the surveillance period, we assume that the ORCT is identical to the RCT in all other aspects.

### 2.3 Adaptive Vaccine Efficacy RCT

An adaptive version of the traditional vaccine efficacy RCT design (ARCT) is based on group sequential methods [20]. Instead of a fixed study duration with a single final analysis at the end, we allow for early stopping for efficacy via periodic interim analyses of accumulating trial data (see Appendix A.2). While this reduces the expected duration of the trial, we note that adaptive trials typically require more complex study protocols which can be operationally challenging to implement for test sites unfamiliar with this framework. In our simulations, we assume a maximum of six interim analyses spaced 30 days apart, with the first analysis performed when the first 10,000 subjects have been monitored for at least 30 days.^{9}

### 2.4 HCT

Unlike traditional vaccine efficacy field trials which require large sample sizes to observe significant results, we assume that the HCT requires only 250 volunteers, randomized 4:1 between the treatment and control arms. Furthermore, to minimize the risk to participants, we assume that this study will recruit only young and healthy adults aged between 18 and 25 years without any underlying chronic conditions because this group of individuals has the lowest risk of mortality and complications after recovering from the infection [21, 22, 23].

It is clear that extensive preparations are required to set up an HCT: selecting, developing, testing an appropriate challenge virus strain;^{10} manufacturing a batch of the selected challenge strain under good manufacturing practices (GMP); and identifying the dose level required to achieve satisfactory attack risk of non-severe clinical illness [23]. From discussions with challenge trial experts, there seems to be a lack of consensus on the appropriate set-up time for HCTs. We reflect this uncertainty in our simulations by incorporating a lag time for HCTs (“set-up time”) that ranges between 30 to 120 days.

In the challenge study, volunteers are deliberately exposed to the SARS-CoV-2 virus, reducing post-vaccination monitoring times because investigators do not need to wait for infections to occur naturally as with non-challenge RCTs. Therefore, we assume a surveillance period of only 14 days (the incubation period for COVID-19 [24, 25, 26]) for the challenge study. Moreover, the attack rate in the control arm will be independent of the population epidemiological model since the study will be conducted in isolated facilities. In our simulations, we assume that 90% of the subjects in the control arm will be infected after the challenge.^{11}

We note that the FDA is unlikely to approve an experimental vaccine tested in only 200 subjects (versus thousands in non-challenge RCTs), hence we assume that a large-scale safety study will be performed immediately after the conclusion of the challenge study—conditional on positive efficacy results—to evaluate the safety of the hypothetical vaccine candidate in a broader population. Assuming a single-arm study with 5,000 subjects followed for 30 days, we expect the process to be completed in 106 days. To accelerate licensure, we assume that the collection of safety data will be performed in parallel with BLA submission and FDA review. Since the latter is assumed to take 120 days, the additional safety study does not actually add to the time to licensure of the vaccine candidate. It does, however, add to the financial costs of the HCT (see Appendix A.4).

Apart from the sample size, randomization ratio, set-up time, surveillance period, and safety data requirement, we assume that the HCT is identical to the RCT in all other respects. See Appendix A.3 for a summary of our assumptions.

We anticipate similar post-marketing commitments for both the HCT and the RCTs, in terms of the collection of additional safety and effectiveness data, and supplementary studies to support the effectiveness of the vaccine in populations not included in the initial efficacy study, e.g., infants. However, we do not model them here because they do not affect our cost/benefit computations.

## 3 Epidemiological Model

To estimate the attack rate encountered by subjects in a given clinical trial—a key component for our cost/benefit calculations—we require information about the spread of the COVID-19 epidemic in the U.S. We use the Susceptible-Infected-Resolving-Dead-ReCovered with social distancing (SIRDC-SD) model proposed by Fernandez-Villaverde and Jones [27], chosen because it is able to fit both the cumulative and daily number of deaths in all the states well despite being a simple model, to establish a baseline for the epidemic. The details of the model are described in Appendix A.5.

We estimate the model for each of the 50 states in the U.S. and Washington, D.C., using the time series of deaths in the U.S. obtained from the John Hopkins Center for Systems Science and Engineering (CSSE) COVID-19 repository [28]. Our data was downloaded on June 16, 2020. We do not scale the number of deaths but continue to perform a centered moving average smoothing on the daily number of deaths, as described in Fernandez-Villaverde and Jones [27]. Our estimation method is detailed in Appendix A.6 and the estimated parameters are reported in Table A.3.

The estimated models show how the epidemic has played out thus far but we will need to predict how it will evolve in the future after the lockdowns are relaxed and/or vaccines are developed. To do so, we extend the SIRDC-SD model to take into account semi-effective vaccination. The new model, which we shall name Susceptible-Infected-Resolving-Dead-ReCovered-Vaccinated with social distancing (SIRDCV), is explained in Appendix A.8.

### 3.1 Evolution of Epidemic with Reopening

We consider three different scenarios for the evolution of the epidemic over time. In the first, we assume that the current situation will continue indefinitely until the end of the epidemic (“status quo”). That is, stay-home orders and bans on social gatherings will be extended until there are no new infections. We simply forecast ahead of time using the estimated parameters in this scenario.

In the second, we consider that there will be a partial reopening with strict monitoring across all states starting from June 15, 2020 (“ramp”). To model this, we assume a ramp function for *β*(*t*) that will increase to 0.22 over 90 days and maintain at that level until the end of the epidemic. The parameters are chosen to imply a final *R*_{0} of 1.1, which reflects close monitoring and contact tracing, and if needed, temporary quarantines to arrest clusters of infections that may pop up. The contact rate parameter, *β*, in this scenario is described by Eq. A.39.

In the third, we consider the behavioral-based response proposed by John Cochrane (“behavioral”), whereby people voluntarily reduce social contact when they perceive danger (e.g., when they observe that there is an uptick in the daily number of deaths) and increase social contact when they observe that there is a decrease in risk (e.g., when they observe a reduction in the daily number of deaths) [29]. The functional form of *β* is given by Eq. A.42.

We give an example of how the basic reproduction number, or *R*_{0}, may look for each of the scenarios in Fig. A.3.

### 3.2 Population Vaccination Schedule

We assume that vaccines will be immediately available for distribution and inoculation upon licensure. This reflects how the leading vaccine companies have been scaling up their manufacturing capabilities and started producing millions of doses at industrial scale in parallel to the clinical trials [30, 31] and well before the demonstration of vaccine efficacy and safety. We model three ways that the susceptible population will be vaccinated upon vaccine licensure: 1M, 10M, and infinite doses administered per day. In the last case, the entire U.S. population is assumed to be vaccinated the day after licensure. While unrealistic, this gives an upper bound on the potential benefit of vaccine development. We assume that the vaccines are distributed proportionally to states according to their relative population at the start of the epidemic.

### 3.3 Forecasting Infections and Deaths

We forecast the cumulative number of infections and deaths in each state between February 29, 2020, and December 31, 2022, using the SIRDCV described by Eq. A.32 to Eq. A.38 before summing over all states in order to produce estimates for the entire U.S. The attack rate at time *t* is the ratio of the number of new infections at time *t* to the number of susceptible persons at time *t* − 1.

## 4 Results

Given the parameters for each trial design and an epidemiological model, we simulate the outcome of hypothetical clinical trials for all four designs and measure their incremental differences. Our cost/benefit methodology is described in Section 4.1, we report the numerical results in Section 4.2, and discuss them in Section 5.

### 4.1 Cost/Benefit Analysis

We apply cost benefit analysis to quantify and compare the net value of each trial design. We focus on public health outcomes—that is, the risks of mortality and morbidity—and provide a qualitative discussion of the societal and financial impact in Section 5.

As shown by Montazerhodjat et al. [32], Isakov et al. [33], and Chaudhuri et al. [34], the value associated with a pathway is computed as the difference between the post-trial benefit and the in-trial cost (Eq. 1). The former estimates the net benefit of the trial to society at large while the latter measures the cost of conducting the study to volunteers in the trial.
We quantify the cost of a trial design in terms of the number of COVID-19 infections and deaths observed in the clinical study. For post-trial benefit, we first consider a baseline scenario in which a vaccine is never developed and the epidemic is allowed to run its course. Next, we simulate the case where a vaccine is approved at some point in time depending on the duration of the trial design. The post-trial benefit is then the difference in the cumulative number of infections and deaths in the population between the two scenarios, i.e., the incremental number of infections and deaths prevented with a vaccine licensure. In cases where the vaccine candidate is rejected,^{12} net value will be negative since post-trial benefit is zero but cost has been incurred for conducting the clinical trial. Lastly, we assume that the hypothetical vaccine candidate is generally well tolerated and any vaccine-related adverse reactions are mild and negligible with respect to in-trial costs and post-trial benefits [35].

### 4.2 Simulation Results

We compute the expected net value of different trial designs using Monte Carlo simulations and asymptotic distributions of the efficacy test statistics (see Appendix A.1). Fig. 1 illustrates the inputs, computations, and outputs of our simulation framework. We assume that all trials start on August 1, 2020, and simulate the epidemiological models until December 31, 2022. We perform sensitivity analysis over a wide range of trial design, epidemiological model, and population vaccination schedule assumptions (see Table 1), covering 504 different scenarios. We summarize our results in Table 2 and Appendix A.11. In addition to our results, we release an open-source version of our simulation software, and encourage readers to rerun our simulations with their own preferred set of assumptions and inputs.

Assuming superiority testing and a vaccine efficacy of 50%, we estimate the date of licensure of the hypothetical vaccine candidate to be some time in November 2021 under an RCT (476 days), between June and August 2021 under an ORCT (326 to 380 days), between April and June 2021 under an ARCT (246 to 306 days), and between March and June 2021 under an HCT (221 to 311 days). Apart from an RCT which has a fixed trial duration, the dates of licensure from the ORCT and ARCT depend largely on the status of the epidemic during the clinical trial. If the transmission rate of the disease is low (e.g., due to social distancing or other non-pharmaceutical interventions), an extended surveillance period is required to accrue enough natural infections in order to observe a statistically significant difference in infection risk between the treatment arm and the control arm. Conversely, when the transmission rate is high, a short surveillance period is sufficient to observe significant results. We note that an HCT, on the other hand, does not depend on the epidemic situation but is instead limited by the time required to set up the challenge model. In general, we find that the time to licensure under ORCT and ARCT decreases with increasing vaccine efficacy: the greater the efficacy, the easier it is to observe a significant treatment effect.

We find that the ARCT provides the greatest expected net benefit among the three RCT designs in almost all scenarios. The utility of an HCT versus the RCTs, however, depends critically on the set-up time and the dynamics of the epidemic. For example, assuming superiority testing, a vaccine efficacy of 50%, the behavioral epidemiological model, and a population vaccination schedule of 10M doses per day, we estimate that the ARCT can help accelerate licensure by almost 8 months versus the RCT, thus preventing approximately 2.9M incremental infections and 23,000 incremental deaths from COVID-19 in the U.S. versus the latter.

Under the same set of assumptions, an HCT that requires 30 days to set up can *further* reduce the time to licensure by a month, thus preventing approximately 1.1M more infections and 8,000 more deaths versus the ARCT. However, the advantage of the HCT vanishes when its set-up time is long: an HCT that requires 90 days to set up takes about one month longer to reach licensure as compared to the ARCT, leading to around 1.0M more infections and 8,000 more deaths versus the latter (see Fig. 2a). Under such circumstances, the use of an HCT is worthwhile only when the prevalent transmission rate is low. If we consider the status quo scenario instead of the behavioral epidemiological model, the time to licensure is about one month shorter under the HCT than under the ARCT even with a 90 day setup period (see Fig. 2b). In this case, the HCT prevents approximately 60,000 incremental infections and 500 incremental deaths versus the ARCT. We observe similar trends under superiority-by-margin testing at a threshold of 50%.

## 5 Discussion

There has been a plethora of papers highlighting various ethical considerations for conducting HCTs [36, 37], some specifically for COVID-19 [9, 38, 39, 40, 41, 42]. Some of the main ethical concerns are: (1) what is the explicit scientific rationale for, and societal value of, an HCT; (2) whether the risks of harm to the subjects and the public at large are understood by the scientists and have been minimized; (3) whether informed consents have been obtained from subjects after they are given full disclosures of the risks involved; and (4) whether the subjects have been selected fairly and given appropriate compensation for both the risk and actual harm brought on by HCTs. Most bioethicists generally accept that these concerns can be addressed within the existing ethical framework for human medical research.

Our paper addresses the first and second of these ethical concerns. We provide scientific justifications for COVID-19 HCTs by considering how conducting them can allow companies to learn about the protection curves and accelerate the development of vaccines against SARS-CoV-2.

However, our analysis does not address the latter two ethical considerations as they concern the execution of HCTs, which is beyond the scope of this paper. Nonetheless, companies and scientists seeking to perform HCTs, and especially regulators, will have to address those concerns to preserve public trust and avoid a public backlash that could jeopardize other important medical research critical to addressing the current epidemic.

Some scientists argue that “a single death or severe illness in an otherwise healthy volunteer would be unconscionable” [42]. However, it can be argued that allowing tens of thousands of individuals to die by denying the consent of an informed individual to take a calculated risk is equally unconscionable. In this paper, we adopt the Benthamite approach [43], where every individual’s utility is weighted equally in the aggregate utility function, as is the common convention in public economics analyses. Within this ethical perspective, our calculations show that an HCT can potentially provide substantial public health benefits in terms of accelerating vaccine development and reducing the burden of coronavirus-related mortality and morbidity in the U.S.—in some cases, by more than 1.1M infections and 8,000 deaths compared to the best performing RCT—when conducted early in the pandemic’s life cycle and in cases where the spread of COVID-19 in the population is muted due to non-pharmaceutical interventions.

We also expect the financial costs of an HCT—which includes the cost of liability protection—to be lower than those of a traditional vaccine efficacy RCT, adding further support for a challenge design (see Appendix A.4 for further discussion). While we have focused on public health outcomes here, it is clear that accelerated vaccine development provides tremendous societal and economic benefits as well—e.g., savings in insured medical costs, direct medical expenditures, and hospitalization costs, and accelerated economic recovery from an earlier reopening.

We emphasize that the expected costs and benefits of a clinical trial depend critically on many assumptions about existing conditions. For example, recruiting subjects in sufficient numbers and diversity can sometimes present a challenge for clinical trials involving experimental vaccines (although, in the case of HCTs for COVID-19, the organization 1Day Sooner reports over 32,000 registered volunteers as of July 27, 2020). Also, we do not include set-up time for non-challenge RCTs because phase 3 vaccine efficacy trials are already imminent as of now. Moreover, we assume a relatively short set-up time for HCTs because a challenge study can be set up relatively quickly using a wild-type strain [23], and the National Institute of Allergy and Infectious Diseases (NIAID) appears to have already made some headway in manufacturing challenge doses [44]. If, instead, we assume comparable set-up times (e.g., two months) and start dates for both an HCT and non-challenge RCTs, we expect that an HCT can accelerate licensure by two months when compared to an adaptive RCT.^{13} Some have argued that at least one to two years is required to develop a robust model from scratch [42]. In this case, our results indicate that an ARCT will almost always be faster than an HCT. However, even if an HCT with a long set-up time does not lead to faster vaccine licensures over an ARCT given current conditions, the creation of a standing HCT agent and setting up an HCT now can provide a hedge against potential failures in the current crop of vaccine candidates. By having an approved, ready-to-go challenge virus and ready-to-go HCT sites that vaccine developers can access immediately, the approval process for as-yet-untested SARS-CoV-2 vaccine candidates can be accelerated when required. For a pandemic like COVID-19, such a hedge will almost always show substantial net benefits relative to its costs.

HCTs have several other benefits that will be more obvious as the pandemic progresses. They require many fewer eligible volunteers, whose numbers will dwindle as the pandemic progresses. They do not depend on attack rates at clinical trial sites which are notoriously difficult to estimate and highly dependent on non-pharmaceutical interventions such as lockdowns and other social-distancing policies. They also avoid logistical problems such as identifying subjects, obtaining subjects’ consent, obtaining institutional review board’s approval or tracking subjects, particularly when attempting large-scale clinical trials in places where contract research organizations (CROs) have little experience.

It is conceivable that multiple vaccines—instead of the single vaccine in our simulation study—are tested concurrently in a single trial design [45]. For example, five vaccines, such as those selected by Operation Warp Speed [46], could be tested concurrently in a six-arm trial (five vaccine arms and a control arm), requiring 40% fewer test subjects, thereby reducing in-trial expected morbidity and mortality costs by the same amount. The benefits can be increased if an adaptive platform clinical trial—designed to eliminate ineffective vaccines at the first signs of futility—is adopted. A clinical trial testing multiple vaccines can also reduce competition for volunteers, a problem that continues to plague vaccine developers [47].

We choose to quantify the cost and benefits of the clinical trials by measuring the number of infections and deaths avoided, and refrain from performing a traditional health technology assessment, such as comparing the economic value of an HCT versus an RCT using quality-adjusted life years measures or willingness to pay estimates such as the value of a statistical life. Performing such computations is straightforward given the output of our simulations, but we have refrained from doing so in deference to non-economist stakeholders who find it offensive to use any pecuniary measures when discussing the loss of human life.

Finally, our analysis focuses mainly on the U.S. for practical reasons involving access to data with which to calibrate our simulations and the broader goal of informing U.S. public health officials and policymakers as the country enters the final stages of vaccine development. However, a vaccine licensure may apply internationally. Given that the U.S. currently comprises 25% of all confirmed COVID-19 cases (as of July 7, 2020) [28], if the assumptions made in our study also hold internationally, the net benefits for all the clinical trials will scale by a factor of 4, in which case HCTs can save an additional 4.4M infections and 32,000 deaths compared to the best performing RCT in certain situations.

We highlight that these figures depend heavily on the development of the epidemic in the U.S. moving forward. We have considered three simple scenarios, status quo, ramp, and behavioral, corresponding to low transmission, moderate transmission, and behavioral-based response, respectively. There are clearly many other sources of uncertainty that are not reflected here. For example, non-adherence to social distancing advisories and/or resistance to precaution recommendations such as wearing a mask in public will lead to an uncontrolled outbreak, which will help to accelerate non-challenge RCTs, making them attractive even when compared to an HCT with a short set-up time. We have found it difficult and impractical to incorporate these uncertainties in our assumptions due to the speed at which things are evolving and the unpredictability of public reaction. In addition, studies that have attempted to incorporate such uncertainties in their epidemic model report huge error bounds in their projections [48]. The wide confidence intervals prevent us from drawing any useful conclusions, which severely limit the usefulness of such models. Therefore, we recommend readers not to take our results as final or definitive, but to re-run our simulations with their own preferred set of assumptions, calibrated using the most current epidemiological data.

## 6 Conclusion

Our paper presents a systematic framework for quantitatively accessing the in-trial and societal cost/benefit trade-offs of various clinical trial designs in terms of infections and deaths averted. We hope that this framework will allow stakeholders to make more informed practical and ethical decisions regarding accelerating COVID-19 vaccine development in the ongoing pandemic.

## Data Availability

All data used in the study are either publicly available or accessible via standard commercial licenses. All software developed by the authors will be made publicly available with an open-source license.

## Conflict of Interest Disclosure

D.B. and S. B. are employed by Berry Consultants LLC which provides statistical support for clinical trials.

P.H., K.S., and C.W. report no conflicts.

L.I. is an employee of the biotech company Seqirus and receives salary and company stock as part of compensation

A.L. reports personal investments in private biotech companies, biotech venture capital funds, and mutual funds. A.L. is a co-founder and partner of QLS Advisors, a healthcare analytics and consulting company; an advisor to BrightEdge Ventures; a director of BridgeBio Pharma, Roivant Sciences, and Annual Reviews; chairman emeritus and senior advisor to AlphaSimplex Group; and a member of the Board of Overseers at Beth Israel Deaconess Medical Center and the NIH’s National Center for Advancing Translational Sciences Advisory Council and Cures Acceleration Network Review Board. During the most recent six-year period, A.L. has received speaking/consulting fees, honoraria, or other forms of compensation from: AIG, AlphaSimplex Group, BIS, BridgeBio Pharma, Citigroup, Chicago Mercantile Exchange, Financial Times, FONDS Professionell, Harvard University, IMF, National Bank of Belgium, Q Group, Roivant Sciences, Scotia Bank, State Street Bank, University of Chicago, and Yale University.

## A Appendix

In this appendix, we include detailed results about clinical trial design (Sections A.1–A.4), epidemiological models (Sections A.5–A.9), and additional simulation results (Section A.11).

### A.1 Efficacy Analysis

The protective effect of a vaccine—that is, vaccine efficacy—is defined as [7]:
where *ε* refers to the vaccine efficacy, *p*_{1} and *p*_{0} are the attack rates observed in the treatment arm and the control arm, respectively, *n*_{1} and *n*_{0} refer to the sample sizes of the treatment arm and the control arm, respectively, and *c*_{1} and *c*_{0} refer to the number of infections observed in the treatment arm and the control arm, respectively. The attack rate is defined as the fraction of a cohort at risk that becomes infected during the surveillance period. There are conflicting views on the possibility of human reinfections [49, 50]; for simplicity, we rule out recurrent infections in our simulations.

#### Superiority Testing

First, we consider superiority testing to determine the licensure of a vaccine candidate at the end of a clinical study, e.g., RCT, ORCT, or HCT. The aim is to demonstrate that the efficacy of the candidate in the prevention of infections is greater than zero. Such a criteria might be appropriate for emergency use authorization during a pandemic where no alternative treatments are available. For this, we consider the following null and alternative hypotheses:
The test statistic under the null hypothesis is given by:
where *z* is the test statistic. For large samples, *z* is approximately the standard Normal distribution.

The power of a vaccine efficacy study under superiority testing is given by [51, 52]:
where *α* is the level of significance, *β* refers to the type II error under the alternative hypothesis, *z*_{a} is the 100(1− *a*) percentage points of the standard Normal distribution, *P*_{1} and *P*_{0} refer to the underlying (true) attack rate in the treatment arm and the control arm, respectively, and *E* refers to the true vaccine efficacy.

#### Superiority-by-Margin Testing

Next, we consider the case where superiority by margin (also known as super-superiority)— that is, a vaccine efficacy that is greater than some minimum threshold—must be demonstrated for full licensure:
where *ϑ* = *p*_{1}*/p*_{0}, and *θ* is a specified minimum threshold larger than 0 and smaller than 1. The test statistic under the null hypothesis is given by [51]:
where *χ*^{2} is the test statistic, and and are the large sample approximations of the constrained maximum likelihood estimate of *P*_{1} and *P*_{0}, respectively, under the null hypothesis (see below for closed-form solutions). For large samples, *χ*^{2} is approximately the chi-square distribution on one degree of freedom.

The power of a vaccine efficacy study under superiority-by-margin testing is given by:

#### Asymptotics for Superiority-by-Margin Testing

The constraint is:
where and are the constrained maximum likelihood estimates of *P*_{1} and *P*_{0}, respectively, under the null hypothesis.

The closed-form solution is given by: The asymptotic approximation is:

### A.2 Adaptive Vaccine Efficacy RCT

We propose an adaptive vaccine efficacy RCT design (ARCT) based on group sequential methods. First, we consider an alternative definition of vaccine efficacy based on relative force of infection, as opposed to relative risk of infection in Eq. A.1:
where *λ*_{1} and *λ*_{0} refer to the force of infection in the treatment arm and the control arm, respectively, and *t*_{s} refers to the duration of the surveillance period. The force of infection of an infectious disease is defined as the expected number of new cases of the disease per unit person-time at risk. When the risk of infection is small, e.g., smaller than 0.10, the risk of infection is approximately equal to the cumulative force of infection [7].

Next, we note that the force of infection and the hazard function in survival analysis actually take the same functional form [7]. This suggests that infections can also be treated as time-to-event data, in addition to binary variables as in Eq. A.1. By performing Cox regression on the time-to-infections data of a clinical trial, we can estimate the efficacy of the vaccine candidate from the hazard ratio of the treatment arm versus the control arm:
where *z* refers to the treatment variable, i.e., whether the patient is vaccinated or not, *λ*_{baseline} is the baseline hazard function, and *β* is the log hazard ratio. We note that the proportional hazards assumption is not unreasonable if we assume that the proportion of cases prevented by the vaccine is independent of the possibly non-homogeneous force of infection [7].

We consider the following null and alternative hypotheses based on the coefficient of the treatment variable in the Cox model:
where *β*_{0} is 1 for superiority testing and smaller than 1 for superiority-by-margin testing.

The test statistic under the null hypothesis is given by:
where is the maximum partial likelihood estimate of *β* and se is its standard error, and *z* is asymptotically Normal. This is also known the Wald test. It turns out this statistic satisfies the criteria for group sequential testing [20], allowing us to perform periodic interim analyses of accumulating trial data, rather than just a single final analysis at the end of a traditional vaccine efficacy RCT (see Fig. A.1). Under the group sequential testing framework, we estimate a new Cox model at each interim calendar time point based on the infections data that has accrued up to that point, over the course of the study surveillance period. At the interim analyses, we decide whether to stop the study early by rejecting the null hypothesis, i.e., approving the vaccine candidate, or to continue on to the next analysis by monitoring the subjects for a longer period of time [20].

We adopt Pocock’s test for sequential testing [53]. It involves repeated testing at successive interim analyses at some constant nominal significance level over the course of the study (see Algorithm 1). The critical value is chosen to satisfy the maximum type I error requirement, e.g., 5%.

In our simulations, we consider a maximum of six interim analyses spaced 30 days apart, with the first analysis performed when the first 10,000 subjects enrolled have been monitored for at least 30 days. To keep the type I error at 5%, we consider a nominal significance level of 2.453 at each interim analyses [53].

For each of the epidemiological-model and population-vaccination schedule assumptions, we compute the expected net value of ARCT over 100,000 Monte Carlo simulation paths. For each path, we track the infections data of 30,000 patients for up to 180 days of surveillance. In addition, we estimate up to six Cox proportional hazards models, one at each interim analysis. The simulation process is computationally intensive despite parallelization, requiring approximately 8 hours to complete on the MIT Sloan “Engaging” high-performance computing cluster using over 400 processors.

While we have considered a simple adaptive design in this paper, we note that our framework can be easily extended to other sequential boundaries such as the O’Brien & Fleming’s Test, to two-sided tests that allow for early stopping under the null hypothesis, i.e., early stopping for both futility and efficacy, and to flexible monitoring using the error spending approach, instead of using a constant nominal significance level for all interim analyses [20].

Pocock’s test. *k* refers to the *k*^{th} interim analysis, *K* refers to the maximum number of interim analyses planned, *z*_{k} refers to the test statistic at the *k*^{th} interim analysis, and *c*(*K, α*) refers to the nominal significance level which is a function of *K* and *α*, the maximum type I error allowed.

### A.3 Trial Design Assumptions

### A.4 Financial Cost of Vaccine Efficacy Studies

There are many sources of costs involved in a clinical trial, e.g., patient recruitment and retention, medical and administrative staff, clinical procedures and central laboratory, site management, and data collection and analysis. For a back-of-the-envelope calculation, we assume that the cost per subject in a phase 3 vaccine efficacy trial is around US$5,000. This suggests a cost of US$150M for a study with 30,000 subjects, close to that estimated for rotavirus vaccines [54] in one of the very few studies that estimate the cost of vaccine development [55]. The figure is very high as compared to the median expense of a phase 3 trial for novel therapeutic agents, estimated to be US$19M [56]. However, this is not surprising because vaccine efficacy studies are notorious for being costly due to the large sample sizes and lengthy follow-up durations. If we assume that challenge studies have a cost per subject that is ten times higher, i.e., US$50,000 per volunteer, the estimated cost of an HCT is approximately US$37.5M, where we have assumed a cost of US$5,000 per subject for the follow-up single-arm safety study comprising of 5,000 subjects. This makes up just 25% of the cost of an RCT with 30,000 subjects.

### A.5 SIRDC with Social Distancing (SIRDC-SD) Model

We assume that there is a constant population of *N* people. The number of people who are susceptible to infection, infected, resolving their infected status, dead, and recovered are denoted as *S*_{t}, *I*_{t}, *R*_{t}, *D*_{t}, and *C*_{t} respectively.
The dynamics of the epidemic are governed by the following differential equations:
Unlike most epidemiological models, the SIRDC-SD model assumes a contact rate parameter, *β*(*t*), that decreases exponentially over time at a rate of *λ* from an initial value of *β*_{0} to *β*^{*} instead of a static one.
This dynamic *β*(*t*) incorporates the belief that social distancing over time will lead to a lower contact rate. This is particularly true in the U.S., where many cities have issued stay-at-home orders. Many people are also voluntarily wearing masks and are avoiding crowded places, which serve to reduce the contact rate.

The model also assumes that infections resolve at a Poisson rate *γ*, which implies that a person is infectious for a period of 1*/γ* on average. Thereafter, he will stop being infectious and transition into the ‘resolving’ state. Resolving cases will clear up at a Poisson rate of *θ*. There is an implicit assumption that people who recovered from the virus gain immunity to the virus and cannot be reinfected.

### A.6 Parameter Estimation/Calibration for SIRDC-SD Model

Let *D*_{t} and *d*_{t} be the cumulative and daily number of deaths from data at time *t*, respectively. Let variables with hats denote the model’s estimated values. We use the following optimization program to estimate the parameters of the model.
subject to:
Our loss function is given by Eq. A.27, which says that we minimize the sum of 1) the natural logarithm of the sum of squared errors for the cumulative deaths, and 2) the natural logarithm of the sum of squared errors for the daily deaths. The minimization program is subjected to the four constraints. Eq. A.28 says that the initial number of infected must be less than the entire population. Eq. A.29 imposes that the number of initial resolving cases must be less than the number of initial infected cases. Eq. A.30 states that the conservation of population must hold at time = 0 and Eq. A.31 constrains the initial contact rate to be greater than the final contact rate.

We set *γ, δ*, and *θ* to 0.2, 0.008, and 0.1, respectively, as suggested by [27].

The optimization program is solved using the constrained Trust-Region algorithm as implemented in the SciPy Optimize package for each of the 50 U.S. states and Washington, D.C. Our estimated parameters for each state are reported in Table A.3.

### A.7 Infections and Deaths Across Scenarios

Fig. A.2 illustrates how the cumulative number of infections and deaths change over time given the different evolution paths of the epidemic and vaccination schedules. We assume that the epidemic evolves based on our scenarios after June 15, 2020, and that the vaccine is approved on March 13, 2021. The vaccine efficacy assumed is 50%.

### A.8 SIRDCV Model

We let and *ϵ* be the number of persons vaccinated at every time step and the *effectiveness* of the vaccine, respectively. Effectiveness is defined as the performance of the vaccine under real-world conditions in a general population whereas efficacy is defined as the ability to protect against a virus under ideal conditions in a homogeneous population. The former is usually is less than the latter due to several reasons, e.g., improper storage of vaccines leading to loss of potency and non-compliance with the vaccine dosing schedule. For simplicity, we assume that the effectiveness of the vaccine in the epidemiological model is identical to the efficacy of the vaccine in the clinical trials. and represent the stock of people who are inoculated, and respond (*r*) and do not respond (*nr*) to the vaccine, respectively.
Eq. A.21 has been modified to remove vaccinated persons at every time step in Eq. A.32. We also modify Eq. A.22 to allow people who are vaccinated but do not respond to the inoculation to be infected in Eq. A.33. Eq. A.34 and Eq. A.35 keep track of the stock of people who are vaccinated. With this specification, the virus is allowed to spread even when the entire population is vaccinated because not everyone will respond to the mass inoculation.

### A.9 Evolution of the Epidemic

As mentioned in the main text, we model three different scenarios regarding the evolution of the epidemic after lockdown is relaxed. We explain them here. Below, *β*_{ss} is defined to be *max*(0.22, *β*(*T*_{υ})), where *β*(*T*_{υ}) is the value of *β* when the lockdown is released.

#### Status Quo

For the ‘status quo’ scenario, we will use the estimated dynamic *β*(*t*) to perform our forecast.

#### Ramp Response

For the ‘ramp’ scenario, we model *β*(*t*) with Eq. A.39. We have explained our rationale for this function in the main text (see Section 3.1).

#### Behavioral Response

The ‘behavioral’ scenario is modeled by making the percentage change in contact rate parameter negatively proportionate to the change in the observed death rate over an interval of *t*_{o}. That is,
Integrating Eq. A.40 will yield Eq. A.41.
The exponent of *c* is the long term steady-state value of *β. k* can be interpreted as the percentage increase/decrease in *β* if there is a decrease/increase in the death rate. In our simulations, *t*_{0}, *c*, and *k* are set to 7, ln *β*_{ss}, and 50,000, respectively. The default scenario of *c* = ln 0.2 will correspond to a *R*_{0} of 1 when approximately 16,000 deaths per week are observed in the U.S. This behavior will start immediately on June 15, 2020, to be consistent with the second scenario.

The new contact rate parameter in this case is defined by Eq. A.42.

#### Illustration of the Evolution of Epidemic

We give an example of how *R*_{0} = *β/γ* may look for each of the scenario in Fig. A.3. The actual evolution of *R*_{0} for a state may differ pending on estimated parameters.

### A.10 Trade-off Between Time and Power

As mentioned in the main text, there is a trade-off between time and power. A shorter surveillance period will, *ceteris paribus*, reduce the power of the RCT. However, it will also reduce the time to licensure of the vaccine (if approved), which would prevent more infections and save more lives. Conversely, a longer surveillance period would increase the power of the RCT but also prolong the time it takes for the vaccine to be approved. We illustrate the interaction between power and infections avoided over time in Fig. A.4.

### A.11 Additional Simulation Results

## Footnotes

* We thank Arthur Caplan for helpful comments and discussion, Amanda Hu for research assistance, and Jayna Cummings for editorial support. The views and opinions expressed in this article are those of the authors only and do not necessarily represent the views and opinions of any other organizations, any of their affiliates or employees, or any of the individuals acknowledged above. Funding support from the MIT Laboratory for Financial Engineering is gratefully acknowledged, but no direct funding was received for this study and no funding bodies had any role in study design, data collection and analysis, decision to publish, or preparation of this manuscript. The authors were personally salaried by their institutions during the period of writing (though no specific salary was set aside or given for the writing of this manuscript). More detailed conflict of interest disclosures are provided after the Conclusion section of the main text.

↵

^{1}While there exists the possibility of an expedited (conditional) licensure based on immunogenicity results with post-approval commitments, we find it unlikely to occur given the latest information.↵

^{2}The attack rate is the proportion of the susceptible population infected with a disease.↵

^{3}The true efficacy is distinct from the realized efficacy of the outcome of a given trial, which is unknown in advance (see footnote 12 for details).↵

^{4}For specificity, we report estimated times to licensure using calendar dates and provide the corresponding number of days in parentheses. However, our simulations do depend on calendar dates in one respect: the epidemiological model used to estimate the attack rates depends on current data. Therefore, the estimates reported in this paper are all based on extrapolated conditions as of August 1, 2020, and may need to be revised for other start dates.↵

^{5}The use of an active vaccine (e.g., vaccine against meningococcal bacteria) as control provides some benefit to the participants, making it more ethical. It also serves to ensure that the participants are unable to tell whether they received the COVID-19 vaccine based on side effects such as soreness at the injection site, reducing the possibility of behavioral changes that can bias the results of the study.↵

^{6}We note that secondary endpoints include the prevention of COVID-19.↵

^{7}In general, the higher the transmission rate, the shorter the surveillance period required to observe a statistically significant difference in infection risk between the treatment arm and the control arm (or the lack of thereof) at the same level of significance and power, assuming a constant sample size and vaccine efficacy.↵

^{8}There is a trade-off between time and power: A shorter surveillance period will,*ceteris paribus*, reduce the power of the RCT. However, it will also reduce the time to licensure of the vaccine (if approved), which can potentially prevent more infections and save more lives. Conversely, a longer surveillance period will increase the power of the RCT and prolong the time it takes for the vaccine to be approved. See Fig. A.4 for an illustration.↵

^{9}While we have assumed interim analyses at periodic calendar time points here, we note that most vaccine efficacy trials are event based, e.g., performing interim analyses when pre-specified numbers of events occur.↵

^{10}There are multiple lineages of SARS-CoV-2 to choose from. In addition, a decision must be made between using a fully virulent or an attenuated strain of the SARS-CoV-2 virus.↵

^{11}We do not assume a 100% attack rate since the challenge strain used is likely weakened to reduce risk to volunteers, and some individuals might have innately stronger immune systems that can counteract the virus.↵

^{12}In our simulations, we consider a vaccine candidate with some efficacy*∈*and assume that infections in the clinical study follow a stochastic process (e.g., binomial distribution). Due to this randomness, false rejections of the efficacious vaccine might occur. This is also known as type II error. The false negative rate depends on the trial design (e.g., sample size, surveillance period, maximum type I error, superiority testing) and the epidemiological model (e.g., attack rate in the clinical study).↵

^{13}Assuming superiority testing, a vaccine efficacy of 50%, and the behavioral epidemiological model.

## References

- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵