## Abstract

Human Challenge Trials (HCTs) are a potential method to accelerate development of vaccines and therapeutics. However, HCTs for COVID-19 pose ethical and practical challenges, in part due to the unclear and developing risks. In this paper, we introduce an interactive model for exploring some risks of a SARS-COV-2 dosing study, a prerequisite for any COVID-19 challenge trials. The risk estimates we use are based on a Bayesian evidence synthesis model which can incorporate new data on infection fatality rates (IFRs) to patients, and infer rates of hospitalization. We have also created a web tool to explore risk under different study design parameters and participant scenarios. Finally, we use our model to estimate individual risk, as well as the overall mortality and hospitalization risk in a dosing study.

Based on the Bayesian model we expect IFR for someone between 20 and 30 years of age to be 17.5 in 100,000, with 95% uncertainty interval from 12.8 to 23.6. Using this estimate, we find that a simple 50-person dosing trial using younger individuals has a 99.1% (95% CI: 98.8% to 99.4%) probability of no fatalities, and a 92.8% (95% CI: 90.3% to 94.6%) probability of no cases requiring hospitalization. However, this IFR will be reduced in an HCT via screening for comorbidities, as well as providing medical care and aggressive treatment for any cases which occur, so that with stronger assumptions, we project the risk to be as low as 3.1 per 100,000, with a 99.85% (95% CI: 99.7% to 99.9%) chance of no fatalities, and a 98.7% (95% CI: 97.4% to 99.3%) probability of no cases requiring hospitalization.

## 1 INTRODUCTION

As of November 15, 2020, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to over 54 million confirmed infections worldwide and over 1.3 million deaths. Currently, there are no clinically-approved vaccines against COVID-19. McKinsey estimates that global vaccine production will reach one billion doses by the end of 2020, and eight billion by 2021 assuming the most advanced vaccine candidates pass clinical trials on time[2]–a thin margin of error.

Timely development of new pharmacological therapeutics and vaccines will be necessary to manage disease burden and impact of COVID-19. Different paths to vaccines can exist, and more traditional clinical trials are slow and require large subject populations to discern therapeutic effects[18]. In order to accelerate testing, several authors and institutions have proposed intentionally exposing human subjects to SARS-CoV-2 to test novel interventions; to this date, nearly forty thousands of people have expressed interest in volunteering for this task[1]. Such human challenge trials (HCTs) have been useful in the past[19] to develop vaccines and treatments for other infectious diseases such as Malaria, cholera, respiratory syncytial virus (RSV) [35, 13], influenza [41], and Dengue fever [20]. However, two major problems currently stand in the way of conducting HCTs for SARS-CoV-2 infection. First, given lack of rescue therapies and our limited understanding of COVID-19’s risks, it is difficult to weigh the likely impacts of these studies on volunteers against the benefit to society, or to obtain informed consent from volunteers[33]. Second, we do not know what viral dose of SARS-CoV-2 should be given to volunteers.

To help address both concerns, we developed a model to help assess the risks participants will face in a hypothetical dosing study for COVID-19. This model uses data from a non-systematic review of data on COVID-19 risks (mortality and infection rates) and describes risks for individuals as well as the overall study risk. As both clarification on viral dose and viral risk are essential before starting HCTs, this work can help inform policymakers and potential volunteers about some risks concerning the process of using HCTs to accelerate vaccine and therapeutic development.

## 2 METHODOLOGY

We developed a three-component tool to understand and explain the relevant risks. The first component quantifies risk of COVID-19 mortality by using a Bayesian evidence synthesis model; the second uses that estimate, along with other data on gender-specific mortality and hospitalization risks to simulate the risk of a study with given characteristics; the third is a front end tool for allowing interactive exploration of risks from a study or to an individual.

## 2.1 Bayesian evidence synthesis model

We use a Bayesian meta-analysis approach to obtain an estimate of mortality risk. This form of modelling combines different sources of evidence with varying statistical power to obtain posterior distribution for mortality risk in different populations. In our case, we use age-specific, location-specific death and prevalence data to generate an estimate of the infection fatality rate (IFR), and estimates of relative risk in healthy individuals (vs the general population) to understand risk reduction in individuals who would participate in an HCT.

We use Bayesian methods because they allow us to best account for heterogeneity in IFRs across different populations (age groups, different countries and regions). Characterising this heterogeneity is important when assessing possible reductions in IFRs. For example, it can be argued that an HCT can use screening and provide medical care to achieve a rate of IFRs which is at least as low^{1} as the region or country with the lowest IFRs in our data.

Although existing statistical packages for meta-analysis (both Bayesian and frequentist) could easily be used to model event rates such as IFR[44, 8], these models may not work when no deaths are observed, as is often the case for COVID-19 in younger populations. To address this, we use death data and estimates of prevalence as inputs instead of IFRs. We then construct a new, reproducible model for IFRs.

Methodological details of the model are described in the Appendix. ^{2} We assume that the fixed effect of age and random effects of location on IFR are on logit scale, as is typical for meta-analysis models of binary data.

However, while our IFR estimates capture average risks within different age groups and even heterogeneity across regions or countries, they still refer to the general population (of a given age, in a given location). A prospective HCT participant would be screened for health issues and comorbidities, further reducing the risk in comparison to the members of general population. To account for this, we perform an additional analysis using OpenSAFELY[43], a large observational dataset on COVID-19 mortality factors that includes comorbidity and age information, as well as data on gender, comparing the risk in general population to lower-risk subpopulation^{3}. Similarly to adjustment for heterogeneity across general populations, the adjustment for screening can be turned on or off in our tool.

Input data for the Bayesian IFR model is based on a non-systematic review of the literature and earlier meta-analyses, particularly by Levin *et al* [21], but we have also opportunistically included other studies, and data from other official sources as detailed in the appendix. However, the list of studies is not fixed, since new and better characterized datasets are becoming available over time. For this reason, we have made sure that incorporating additional prevalence and death data from newer studies and/or updated data sets is straightforward. We will continually update the model to assure that any estimates provided to participants or used for decisions include all relevant data, rather than only using data that was available when the analysis was first performed.

We use the age- and gender-specific data from Salje et al. [36] for the rate of death of hospitalized patients to impute the relative risk of hospitalization based on our meta-analysis for mortality risk.

Our model is available publicly, together with input data and source code for the tool and under an open license, at https://github.com/1DaySooner/RiskModel.

### 2.2 Transforming individual risk into study risk

Once a suitable challenge virus is manufactured, itself a complex process [10], the risk of the individual from a challenge trial depends on the dose of virus given. The uncertainties about dose-response lead to a number of additional uncertainties about overall study risk. For other viruses, such as H1N1 and H3N2 influenza, a dose-response relationship has been found[26, 15]. The specific dose-response relationship, and its functional form must be determined experimentally, which is an outcome rather than an input of a dosing study like the one we are considering^{4}. This uncertainty is a key issue, so, as suggested by Morgan and Henrion[28], we advise that this structural uncertainty should not be treated as a probabilistic variable, and instead sensitivity analysis should be used to enable the consideration of a range of plausible outcomes.

Given the specifics of a study design, the relationship between individual risk and the risks in the overall study is straightforward, assuming independence of risk between individuals in the study5. The risk to an individual of severe disease in the study given dose *d* is indicated *S*_{d}, and the risk of mortality is correspondingly *M*_{d}, so the probability that someone in a group of size *N* experiences the corresponding outcome is 1 − (1 − *S*_{d}) *N* and 1 − (1 − *M*_{d}) *N*. By simulating the probability of impact for each dosed group and, in the case of more complex studies, conditioning the trial of later groups on the results of earlier ones, we can find the overall risk in more complex studies.

In the current version we have restricted the tool to consider to a simple *N*-person study. This means the tool provides an upper bound for risk of mortality and hospitalisation, higher than in a more complex dose-escalation study^{6} However, the underlying model structure allows simulation of dose-dependent and/or conditional study designs, as necessary, to accommodate dose response information and allow for the simulation of more complex trials.

### 2.3 Interface

The web tool we built allows exploration of two related types of risk. The first allows individuals to explore their personal risk, for example, by gender and age, if they volunteer to participate in a study, while the second displays overall risk of a single round dosing study with a given number of participants. Assumptions on study design and the underlying risk can be adjusted to allow interactive exploration by policymakers and participants.

In considering personal risk, the tool uses pre-calculated outputs from the risk model to calculate and display predictions for hospitalization rate and death rate for individuals as a function of age and sex.

Deaths are further translated into micromorts–the expected number of deaths per million events, a standard method of showing mortality risk[16] often used for patient consent [3]. Quantities which cannot be transformed into micromorts, are presented as probabilities^{7}.

The tool allows both the public and policymakers alike to explore how overall risks change depending on differences in study design. This also helps maintain transparency into clinical trial design concerns, thereby better informing potential challenge trial volunteers. We also note that for trial designers and ethicists, the relationship between risk of different impacts and compensation is critical[14, 33, 5].

The importance of transparency and public engagement has been widely noted in the literature on challenge trials[17]. For this reason, the tool is available for public use, and is already being used to inform people who have volunteered to be contacted to potentially participate in a challenge trial[1].

## 3 RESULTS

The analysis dataset for age-specific IFRs contains 114 data points from 23 studies, with each containing between 3 and 11 different age groups; all data are presented in Table 2 in Appendix and are included in the code repository accompanying this paper. A glance at available data also confirms the necessity of using a more complicated modelling approach: out of 114 age-specific estimates, only 24 contain individuals aged between 20 and 30. However, out of these 24, only 2 have median age that falls between 20 and 30.

The basic evidence synthesis model, which includes all data, but does not adjust for health status, finds that average IFR in 20-29 age group for the studies included in this analysis is 17.5 per 100,000 cases (95% uncertainty interval^{8} 12.7 to 23.3). Extending the HCT population to also include 30-39 year olds would lead to mean IFR of 29.9 per 100,000 (95% interval 21.9 to 39.9)^{9}.

Based on OpenSAFELY data, we estimate that in healthy population (defined as lack of any co-morbidities listed above), the average mortality risk in 20-29 year olds^{10} is 1.9 times lower than in the general population, with 95% interval from 1.3 to 2.8.

We also note that there is large heterogeneity across the studies, due to treatment availability and other factors^{11}. Because the HCT volunteer population will receive the best available care, including the most up-to-date treatment options, we can consider our population akin to a best-case scenario. In Belgium, the estimated risk for 20-29 year olds is 5.74 × 10^{−5}, approximately the estimated risk across all populations. Expanding the study population to 20-39 year olds would result in mean IFR of 9.88 per 100,000 cases.

Additionally, in the meta-analysis we estimate a population-level IFR by age, which does not account for the health status of screened volunteers. Looking at the OpenSAFELY dataset for the UK, we find a mean risk reduction factor in healthy populations aged 20-29 equal to 1.9, although our estimate is uncertain, with 95% interval from 1.3 to 2.8. This adjustment can be used to modify the risk estimate for the general population across studies.

It is unclear how much double-counting occurs when adjusting for both available medical care and for screening out comorbidities. However, using this estimate, we use our estimate that a simple 50-person challenge trial has a 99.85% probability of having no induced fatalities, and a 98.6% probability of having no cases serious enough to require hospitalization.

Due to lack of reliable data, the current model does not include estimates of longer term impacts, and instead includes a more qualitative discussion of these risks[38], but the model and interface will be updated with such estimates as they become available.

The model interface can be used to explore how these uncertain factors can interact, as shown below in Figure 2.

The overall study risk for a simple 50 person study is 0.15% probability of any deaths, and 1.4% probability of at least 1 hospitalization. This represents an upper bound of risk for 3 groups 10 of volunteers given different doses of COVID-19, and an additional 20 person cohort to ensure sufficient sample size.

While the estimate is specific to a dosing study, it can also be useful for understanding the risk of later vaccine trials, though in that case other factors, including the possibility of vaccine-enhanced infection, would need to be assessed.

## 4 DISCUSSION

We have demonstrated that we can model certain risks associated with human challenge trials. However, our model is incomplete, and considerable concerns remain about the risks of human challenge trials. While a full accounting of challenge trial ethics is beyond the scope of this paper, we consider several factors below that inform our work on human challenge trial design, especially unmodeled risks. For a more complete perspective on the ethics of COVID-19 HCTs, we direct the reader to the World Health Organizations’ (WHO) key considerations [31] and the recent discussion in Lancet Infectious Disease [17, 24] on how these issues are being addressed.

In our modal case, it is clear that the risk of our dosage-response study is not higher (and indeed likely far lower) than the risk from comparable clinical infections in the general population, and are far lower than other risks that are typically widely viewed as acceptable. Further, human challenge trials have had historical precedent, showing promise for both less lethal human coronaviruses, and for yellow fever[37]. They also provided early indications regarding the possible efficacy of a leading malaria vaccine candidate[30]. The discussions about earlier trials shows that the ethical challenges of HCTs should not be seen as unique, but rather as lying along a natural continuum of clinical studies [12]. That said, the initial human challenge trials should be held to high ethical standards, both for individual risks and to preserve public trust in scientific and medical progress. That includes emphasis on fully informed consent of the participants, and as noted by the WHO, higher than typical ethical standards[31].

### 4.1 Opportunity Cost

When evaluating clinical trial design, it is not sufficient to evaluate whether the proposed model is good *ex nihilo*. One must consider whether the alternatives are better. The key benefit of a dosing study is to allow further research with challenge trials, and alternative clinical trial models have major practical difficulties and far higher costs in a variety of ways. While this is not relevant for the initial set of vaccines that are close to approval, the challenge of large scale trials is magnified for later vaccines.

For example, “standard” phase 3 efficacy trials for an ongoing novel pandemic rely on high numbers of trial participants, and require this large sample for each vaccine or treatment that is trialled. Such trials are relatively expensive, and expose more trial participants to negative side effects of a given treatment, so that in many scenarios HCTs have been shown to be superior [4]. Standard trials are also difficult to pursue after an initial vaccine is available, and in the likely and hoped-for case that there will soon be at least one vaccine, finding willing participants for later and perhaps more effective vaccines, or ones with fewer side effects, is harder, and the burden of proof for benefit is far higher, or not achievable given plausible sample sizes, without an HCT.

Assuming a log-linear relationship between age and IFR, we estimated that IFR increases on average 2.93-fold per each additional decade of age. We also estimate IFR of 1.75 per 10,000 among individuals aged 20-30, with considerable heterogeneity in IFR across locations, ranging from 6.29 per 10,000 to 5.7 per 100,000. These results align with two recent meta-analyses of IFRs by O’Driscoll *et al* and by Brazeau *et al* [32] [6]. *Both studies also show log-linear relationships between age and IFR, as well as considerable variation in IFR by context. O’Driscoll et al* report median IFRs of 0.6 and 1.3 per 10,000 among those aged 20-24 and 25-29, respectively, while Brazeau *et al* report 3 and 4 per 10,000 respectively.

### 4.2 Limitations

Finally, it is important to note that our model does have limitations. Hospitalization rates and death for 20-30 year olds are rare; our prior knowledge on fatalities for this group are more uncertain due to limited data. Information on long-term damage caused by COVID-19 is similarly incomplete, and though this is discussed further below, our model does not currently account for that risk. Also note that although our model uses hospitalization as a proxy for the upper bound of serious nonfatal COVID-19 cases, more data is required to see if this is an accurate assumption.

Finally, our model may not accurately capture changes in COVID-19 risks over time. It also does not estimate any indirect risks of the study. We stress that our model is not a comprehensive analysis of all available risks, but rather a tool quantifying certain known risks that can be used by trial participants and policymakers.

### 4.3 Non-modeled Risks

We also note that there are several impacts we do not model in the study, most notably the concern about so-called Long-COVID, which is a catch-all term referring to a combination of persistent symptoms, i.e. slow recovery, and new post-recovery symptoms[7]. It is understood that for some cases, especially severe ones, recovery from COVID-19 can take months. In others, there are longer-term symptoms differing from those experienced during the infection, perhaps similar to Post-SARS syndrome[34, 27]. At the same time, COVID-19 recovery has been found to be faster in younger, healthier patients[40], which may mean the risk is lower in this group. It seems clear that the risk is a subject of continuing scientific investigation, and as it becomes better understood and quantified, it will be incorporated into both the risk model, and the web tool. Until then, the model contains an embedded overview of what is currently understood and known or unknown about longer term risks[38].

We also note that vaccine-induced disease enhancement is a critical concern for vaccine challenge trials, but is not relevant to dosing studies. Still, this risk must be considered in analyses of risks for later trials, and the model would need to be adapted or supplemented to consider this.

## 5 CONCLUSION

Human challenge trials are not risk free, but the balance of risk and benefits seems to clearly favor allowing them, as a large group of experts has argued[39]. This conclusion is disputed by some, but all decisions are made despite uncertainties and debate - whether empirical or moral[22, 23]. The question is whether the both empirical and moral balance of factors lead to the indicated conclusion. The alternative is a failure to act due to misguided risk-aversion, or worse, using uncertainty and disputed moral claims as a positive stance to shut down further work, as has occurred in the debate about HCTs[25].

It seems likely that Challenge trials are a viable way to rapidly test vaccine efficacy, which is particularly critical now for testing second-generation vaccines, which may prove superior to first generation vaccines, or at least help fill the demand unmet by first-generation candidates. A dosing study is an urgent first step, and the risk estimates and tools developed for this paper can assist in planning such studies and informing volunteers.

Our model provides insight into the overall risk of a trial of a given size, and can better inform HCT participants about the dangers they face. Given that an HCT may help select the multiple vaccines necessary for global immunization while also assisting with therapeutic testing, the risk of an initial study into SARS-CoV-2 pathogenesis seems justified. If the dosing study is successful, future HCTs of COVID-19 may provide a rapid and systematic way of screening vaccine candidates for efficacy and safety, which is a significant benefit.

The results presented here are a useful static estimate of risk. The model is already being used to inform potential volunteers [1], and can be adapted and expanded in the future. Given the evolving understanding of the disease, the model should be continually updated with additional data on mortality and hospitalisations risks or other long-term risks. This will contribute to the discussion of whether or not to pursue challenge trials, which can help with response to COVID-19 in a variety of ways[29]. Challenge trials may be an important tool for fighting COVID-19, and our model is a step towards that goal.

## Data Availability

All data and code is available on github, as linked in the paper.

## Acknowledgements

Thanks to Chris Choe, Troy Yamaguchi, and Linchuan Zhang for help preparing and editing the manuscript.

## Appendix: Bayesian model of COVID-19 mortality risk in HCT volunteers

### A-1 INTRODUCTION

This short document is a technical appendix to the paper discussing COVID-19 risks in human challenge trial. Here, we show how a Bayesian model can synthesise information on many infection fatality rates (IFRs) into a single estimate. This estimate is specific to certain age groups and can be further adjusted by e.g. co-morbidity status. The analysis presented here is a form of Bayesian meta-analysis, and our primary objective is to weigh sources of evidence in a way that captures both variability (here, heterogeneity in real IFRs across different settings) and uncertainty (here, the fact that we do not know the IFRs in each setting precisely).

The ultimate objective of this model is to characterise risk in a way that is useful for design of HCTs. Therefore, as a minimum, we want to incorporate variability across different populations into our prediction. Even better would be to understand how different factors can drive heterogeneity: *a priori* we hypothesise that the three main drivers of differences in IFRs are time-specific, population-specific and otherwise country-specific^{1}.

To characterise differences in observed IFRs we first develop a Bayesian model and apply it to publicly available summary data on IFRs from multiple countries and contexts, with particular focus on the impact of age. This is covered by Section A-2. We then use a simple model to hypothesise reduction in risk that may be achieved by screening individuals for comorbidities; this is Section A-2.2. We summarise all results in Section A-2.3.

### A-2 AGE-SPECIFIC RISK OF COVID-19 MORTALITY

#### A-2.1 Bayesian evidence synthesis model

What follows is an adaptation of typical methods of Bayesian evidence synthesis to analysis of IFRs. IFR is the ratio of deaths to infections in a given population. Early estimates of Covid-19 mortality risk, e.g. by Verity et al. [42], placed it at over 0.6%; however, it was also evident from data that IFR could be orders of magnitude higher in particular high risk groups, especially in the elderly, than in the general population^{2}.

By definition, our data on IFRs is a combination of data on deaths with data on infections. Typically, these are disjoint samples, in that numbers of infections are estimated (typically very imprecisely) on select subpopulation, while deaths are recorded in the general population (at a level of country, administrative region etc.). There are clear reasons to believe that IFRs will differ across studies (e.g. due to age, comorbidity status, time, genetic factors, quality of healthcare etc.). To address this, we will use a Bayesian hierarchical modelling framework to assume that the setting-specific estimates of *IF R*_{k} can differ from each other but are linked through some common parameters. (By *k* ‘s we denote different populations; note that sometimes we may have multiple *IF R* ‘s from different age groups in the same location.)

The most straight-forward and “canonical” way to implement such a Bayesian model is by modelling log odds of the event^{3}. Deeks [11] present a general treatment of such approach in medical statistics. Note, that for very rare events the odds of mortality are very similar to probability of mortality, but we model events on odds scale as a good “generic” approach to modelling binary data (in this case death following infections)^{4}.

Basic models for this type of analysis of binary data can be implemented using existing statistical analysis packages (see, for example, *metafor* package in R or *baggr* by Więcek and Meager [44]), by treating IFR as a logit-normal parameter to meta-analyse. However, note that when no deaths are observed, analysis of IFR (equal to observed deaths divided by modelled infections) is problematic. Therefore we propose a “custom” model that built in Stan which treats deaths and *prevalences* (rather than the IFRs) as data.

Let *d*_{k} denote observed deaths for data point *k* and assume that logit of corresponding prevalence estimate is *p*_{k} is a parameter (typically obtained from a statistical modelling papers, government reports etc.). Total population in *k*-th setting is *n*_{k}. Total number of estimates is *K*. Then the model likelihood is as follows:
Where ) and are parameters obtained from the literature (or converted from these parameters – see next section). The *k* data points collected can span many locations (studies); we denote them by loc_{k} and the total number of locations by *K*_{l oc} (with *K*_{l oc} < *K*).

In this model we can also account for various covariates impacting the IFRs (let’s denote their total number by *N*_{p}), such as age groups (which we identify with median age of the population being studied, MedianAge_{k}). We code them in a design matrix *X*. To center our *X* at the value of interest in our model (risk in 20-30 year olds), we use a transformation MedianAge/10 - 2.5 to construct our matrix *X*. We denote all of the covariates using a design matrix *X* and denote by *N*_{p} the number of columns in *X*. We assume the impact on IFR is on logit scale, same as in the “canonical” logistic models of binary data that we mentioned above:
This means *θ* spans location-specific (random) effects on IFR while *β* is *N*_{p} dimensional vector of (fixed) covariate effects.

We implement our model in Stan and assume weakly informative priors on all parameters, with prior for *τ* centered at 1 death per 10,000 cases.

#### A-2.2 Data

We used estimates originally collected by Levin, Cochran, and Walsh (2020) to construct the first version of analysis dataset, which we then supplemented with more values extracted from other studies. The input data into our model consists of deaths (treated as known) and prevalences (treated as logit-distributed parameter with known mean and SD) in all reported age groups in all studies^{5}.

All of input data are given in Table 2. The analysis dataset contains 114 data points from 23 studies, each containing between 3 and 11 different age groups. We made only minimal modifications to source data, by 1) imputing the values in the Italian fatality data based on a seroprevalence survey, 2) imputing population size in Maranhao (as ratio of the reported number of infections and the mean infection rate) which were not reported and 3) assuming that uncertainty in prevalence 0-29 age group in Iceland is same as in the 30-39 age group since data were missing.

As mentioned, our model treats number of COVID-attributable deaths as measured without error (due to lack of data) but accounts for uncertainty in infection rates, which are always model-based estimates extracted from various available data sources. In preparing data, we assumed that logits of prevalence estimates from available studies are normally distributed, which seems to reproduce majority of data very well, see Figure 7 in the Supplement. There are some discrepancies with studies that allowed for prevalence estimates to be 0, something that our logit model does not allow.

For each study we construct a median age scalar defined by the average of the endpoints of each age range, rather than attmepting to calculate a population-weighted mean.

Our approach of regressing on the median age and use of all available data (rather than the subset of data available in younger adults only) is necessitated by data limitations: out of 114 data points comprising age-specific estimates of prevalence (or IFR) and counts of deaths, 24 contain individuals aged 20-30 who are of primary interest to us. However, the populations are mixed with regards to age, with typical age groupings such as 19-49, 20-49, 20-39, 0-49 used instead. In fact, we find only one estimate out of 24 that is entirely specific to the 20-29 age group (Brazilian state of Maranhao), while one more has median age falling between 20 and 30 but is not specific to that age group.

#### A-2.3 Results

There were no issues with convergence of the Bayesian model. We set number of iterations to 5,000 and used 4 chains, with max_treedepth option set to 15. There were no divergent transitions and effective sample size was greater than 3190 for all of 368 modeled parameters (this number includes fitted prevalences, IFRs, *θ*’s and their transformations into/from logit scales). For the three main parameters in the model we obtained the following:

The mean coefficient of beta 1.07 corresponds to 2.93-fold increase in mortality risk following an infection per each extra decade of age (95% uncertainty interval is 2.87-2.99).

From these parameters we can predict average risks for subjects of any given age *x*, by using the posterior distribution of *τ* + (10*x* + 2.5)*β* (where 2.5 and 10 refer to the transformation that we applied to MedianAge inputs).

#### A-2.4 Average infection fatality risk in young subjects

Since we centered our MedianAge at 25 years in constructing our matrix *X*, we can now obtain model-estimated risk for a typical HCT population (aged 20 to 30, with median 25) by ignoring the *β* coefficient and examining *τ* and *σ* only. We find that the average IFR for this group (equal to ) is 1.75 × 10^{−4} (with 95% interval from 1.28 × 10^{−4} to 2.36 × 10^{−4}). That means, on average, slightly under 2 deaths per 10,000 infections in the studied datasets.

##### A-2.4.1 Heterogeneity in IFRs

However, there is a considerable variability in IFRs across different locations/dataset that we should consider. To take into account parameter *σ*, we can generate draws from the N(*τ, σ*^{2}) distribution, corresponding to a hypothetical IFR in a new source of data. 95% interval for such model runs from 4.39 × 10^{−5} to 6.94 × 10^{−4}. Since the model works a logistic scale, another way of interpreting the across-dataset variability is reporting the fold-impact of *σ* on the mean IFR; here, we obtain on average a 3.96-fold increase (decrease) in IFR per 2*σ* increase (decrease).

The lower end of the 95% interval, 4.39 × 10^{−5}, is not extreme given input data, where the “crude” mean IFR (based on mean prevalence only) is below 7 per 10,000 for all data except for South Florida, and as low as 0 for some countries that did not record deaths (Belgium, New Zealand, Korea, Iceland) in various age groups including 20-29 year olds or 1.4 per 10,000 in Utah, in the population aged 19-44. (Please refer to Table 2 for complete list of inputs.)

We can assess this heterogeneity by inspecting the distribution of random effects in the model transformed into IFRs, i.e. the inverse logit transformation *θ* parameters. The largest (posterior mean) IFR value of *θ* is 6.3 × 10^{−4} in Castiglione d’Adda. The smallest posterior mean for 20-29 year olds is 5.73 × 10^{−5} in Utah.

##### A-2.4.2 Predictive checks for the model

We constructed posterior predictive distributions for number of deaths in each of the inputs by using the generated quantities functionality of Stan. Figure 3 compares the posterior means and 95% intervals with observed deaths. Out of 114 observations that were used to fit the model, 109 were within 95% intervals of the posterior predictive distributions. We observed the largest discrepancies occurred in Spanish data. Overall, we conclude that the simple binomial model we used here is flexible enough to capture both age-specific risk increases and heterogeneity in IFRs across settings/countries.

#### A-2.5 Sensitivity analyses

Planned sensitivity analyses include

Exclusion of small studies and test & trace data

- Median age: exclusion of 80+ population (non-linear behaviour on logit IFR?)

Could also check exclusion of children and teenagers, due to imprecise prevalence numbers

Median age: different method of age imputation

Model including time

### A-3 RISK REDUCTION IN HEALTHY INDIVIDUALS

We now turn our attention to the question of how much a human challenge trial designer could reduce the mortality risk by using simple screening methods, as discussed in the main paper.

Data for this section has been provided by OpenSAFELY (https://opensafely.org/) and was used by Williamson et al. [43] to characterise COVID-19 mortality risk factors for 10,926 COVID-19 deaths in England. We group the total of 21,444,863 individuals into a total population and a lower-risk sub-population, defined as non-smoker, non-obese and without the comoribidities reported in the OpenSAFELY study^{6}, most notably respiratory and cardiovascular diseases and type I diabetes. For brevity we refer to the population without one of the pre-defined comorbidities as “healthy”. In contrast to the cited publication, we include records of individuals under 18 in our assessment. Counts grouped by age are presented in Table 1.

As shown in Table 1, for the age group of 20-29 the crude risk ratio (of general population vs the healthy subset only) is 1.53, but, due to low number of events in both healthy and general population, with a very wide 95% interval from 0.6 to 5.65^{7}. As data on relative risks in other age groups is clearly related to the relative risk in 20-29 age group, we use another meta-analysis model to improve our estimate. Additionally, relative risks are higher in women than in men – something that we can account for in our model too.

In our modelling we make a strong assumption that infection rates in population with comorbidities are the same as in the general population. In other words, we assume that denominator for IFR is same in both populations. We then specify a generic partial pooling model of fatality rates (FR, defined as number of deaths in the entire population, without regards to infection status) such that
where, for *k* -th observation, age_{k} is the age group, and comorb_{k} and male_{k} are indicator variables. This means that each age group is assigned different “baseline” fatality rates.

Note that the assumption of FRs varying across age groups that we just mentioned differs from the model of age-specific IFRs in Section 2. This is because we hypothesised that infection rates (which are used as denominators in the IFR model of Section 2) will vary across age groups. This is borne out by Figure 6.

Summary of the main model parameters is as follows:

We find that the mean risk ratio between population with comorbidities and healthy sub-population in 20-29 age group (exponent of *α*_{3} above) is 3.96, with wide 95% uncertainty interval of 1.85 to 6.96^{8}.

Next, using the posterior samples we calculate the event rate in total population (i.e. *θ*∗ = (*θ*_{1}*n*_{1} + *θ*_{2}*n*_{2})/(*n*_{1} + *n*_{2}), where subscripts 1 and 2 are healthy and comorbid sub-populations) and then divide it by ratio in healthy population to obtain an estimate of risk reduction possible by selecting healthy volunteers only. The mean posterior value is 1.88, with 95% uncertainty interval from 1.25 to 2.76. Due to use of Bayesian hierarchical model over many age groups, the uncertainty interval is much narrower than on the risk reduction factor calculated on 20-29 age group only.

To validate the model, we conducted a simple posterior predictive check for numbers of deaths in different age groups and genders. The graphical check is presented in Figure 6. Overall we find that the simple model has no problem with reproducing observed data.

### A-4 CONCLUSION AND SUMMARY OF RESULTS

In conclusion, the implications of the model for the risk in healthy young subjects are as follows:

We find that average IFR in 20-29 age group for the studies included in this analysis is 1.75 × 10

^{−4}with 95% interval from 1.28 × 10^{−4}to 2.36 × 10^{−4}.- It is feasible that the mean IFR can be decreased as much as 3.96-fold (2

*σ*impact on the IFR according to hyper-SD parameter in the meta-analysis model).- It is easy to argue that a HCT designer would be able to achieve IFR at least as low as within any of the large-scale studies included in our sample of populations. The smallest posterior mean for 20-29 year olds is 5.73 × 10

^{−5}, fitted to data from Utah.- Extending the HCT population to also include 30-39 year olds would lead to mean IFR of 2.99 × 10

^{−4}with 95% interval from 2.19 × 10^{−4}to 4.02 × 10^{−4}. Lowest mean IFR would then be 1.02 × 10^{−4}(also in Utah).

In healthy population (defined as lack of co-morbidities listed above), the average mortality risk in 20-29 year olds is 1.88 times lower than in the general population, with 95% uncertainty interval from 1.25 to 2.76.

- Our 1.88 estimate is a bit higher than the mean “crude” risk ratio of 1.53 because we use a Bayesian hierarchical model that synthesises evidence across all age groups.

- Expanding to 20-39 year olds, the risk in healthy sub-population would be 2.04 times lower than in the general population (95% interval from 1.25 to 2.76).

Combining the smallest posterior IFR for 20-29 year olds with our estimated fold-reduction due to excluding individuals with co-morbidities from the population would lead to a mean infection fatality risk of 3.31 × 10

^{−5}with 95% Bayesian interval from 1.53 × 10^{−5}to 6.13 × 10^{−5}.- In 20-39 year old subjects the risk would be 5.11 × 10

^{−5}(2.54 × 10^{−5}to 9.04 × 10^{−5}).

## Footnotes

**Abbreviations:**HCT, Human Challenge Trial; IFR, infection fatality rate↵

^{1}This is an assumption, and we allow users of the tool to choose to use the overall estimate across studies, or the expected lower risk.↵

^{2}Briefly, let*k*index populations and loc*k*their locations (countries, regions). Let*d*_{k}be observed deaths for data point the reported mean prevalence and its standard error, on logit scale;*n*_{k}the total population;*X*is a vector of median ages, expressed in decades and centered at 25 years. The hierarchical model is as follows: The parameters of the model are*p*_{k}, the true prevalence;*θ*, location-specific (random) effects on IFR;*β*, (fixed) effect of age;*τ*and*σ*, the hyper-mean and hyper-scale parameters for IFR. We implement our model in Stan[9], with weakly informative priors on all parameters. The prior for*τ*is centered at 1 death per 10,000 cases.↵

^{3}The lower-risk group is defined as non-obese, non-smoking, and without the following risk factors (same as used by [43]): asthma, other chronic respiratory disease, chronic heart disease, diabetes mellitus, chronic liver disease, chronic neurological diseases, common autoimmune diseases (Rheumatoid Arthritis (RA), Systemic Lupus Erythematosus (SLE) or psoriasis), solid organ transplant, asplenia, other immuno-suppressive conditions, cancer, evidence of reduced kidney function, and raised blood pressure or a diagnosis of hypertension.↵

^{4}There is an assumption implicit in the model that the trial uses dose escalation or other data to ensure that the dose given does not greatly exceed the typical natural dose, or that a larger-than natural dose does not increase disease severity.^{5}The assumed independence is conditional on the age and health status of participants, and for dose-response studies, also infection severity by dose.↵

^{6}This is equivalent to assuming that the risk of infection at each dose is above the threshold for infection to replace a dose-response curve.↵

^{7}See discussion below about incorporating other risks in the model as more data becomes available.↵

^{8}All intervals reported here are Bayesian posterior intervals. For brevity we just refer to them as “x% interval”.↵

^{9}See estimates in the appendix for how relative risk was estimated between age groups.↵

^{10}Expanding to 20-39 year olds only slightly increases the risk reduction factor, from 1.9 to 2.↵

^{11}Denoting by*σ*the hyper-scale parameter in the hierarchical model, 2*σ*impact corresponds to 3.96-fold mean decrease in IFR. That means we expect 2.5% of studies to have IFR more than 4 times lower than the average IFR.↵

^{1}The role of time may be due to new treatments, improvements over time in our ability to treat COVID-19 or selection pressures which may lead to more benign versions of the virus. Country-specific or location-specific factors in IFR data may be driven by under-reporting, health care factors (including access to health care services) or underlying distributions of known risk factors. Additionally, some unknown risk factors (e.g. genetic) may also be operating, in which case controlling for age and co-morbidities will be not sufficient to account for cross-location differences.↵

^{2}Various estimates published since suggested that the relationship of mortality risk to age is consistent across different countries.↵

^{3}It is also possible to work with*I F R*_{k}parameters and treat them as derived from Beta distribution with some “hyperparameters”*α*and*β*of Beta distribution, as done by e.g. Carpenter[8]. That approach, however, does not offer an easy way of modelling impact of covariates (e.g. age and co-morbidities) on the rates.↵

^{4}Another advantage of such a model is that it can use either individual-level or summary data and work with covariates (such as gender, age, time of the study, co-morbidities), captured as odds ratios or risk ratios. If only summary data are available, covariates can be defined as study level distributions (e.g. % male).↵

^{5}This basic approach exaggerates uncertainty, as we treat different 95% intervals reported in the study as uncorrelated.↵

^{6}“asthma, other chronic respiratory disease, chronic heart disease, diabetes mellitus, chronic liver disease, chronic neurological diseases, common autoimmune diseases (Rheumatoid Arthritis (RA), Systemic Lupus Erythematosus (SLE) or psoriasis), solid organ transplant, asplenia, other immunosuppressive conditions, cancer, evidence of reduced kidney function, and raised blood pressure or a diagnosis of hypertension”↵

^{7}We obtain the interval using a simulation approach. Using normal approximation of log(RR) statistic we obtain a narrower 0.57 to 4.1, perhaps due to poor quality of approximation for rare events.↵

^{8}Using simple models that assumed identical risk ratios in all age groups would lead to a mean RR of similar magnitude but a much less uncertain estimate, due to more rigid model assumptions; similarly, assuming some linear age structure on risks, such as in the main meta-analysis model above, would may lead to a different RR, but we do not think such an assumption is justified here. We do not include outputs of these models in this short write-up.