## Abstract

Human Challenge Trials (HCTs) are a potential method to accelerate development of vaccines and therapeutics. However, HCTs for COVID-19 pose ethical and practical challenges, in part due to the unclear and developing risks. In this paper, we introduce an interactive model for exploring some risks of a SARS-COV-2 dosing study, a prerequisite for any COVID-19 challenge trials. The risk estimates we use are based on a Bayesian evidence synthesis model which can incorporate new data on infection fatality risks (IFRs) to patients, and infer rates of hospitalization. The model estimates individual risk, which we then extrapolate to overall mortality and hospitalization risk in a dosing study. We provide a web tool to explore risk under different study designs.

Based on the Bayesian model, IFR for someone between 20 and 30 years of age is 15.1 in 100,000, with a 95% uncertainty interval from 11.8 to 19.2, while risk of hospitalization is 130 per 100,000 (100 to 160). However, risk will be reduced in an HCT via screening for comorbidities, selecting lower-risk population, and providing treatment. Accounting for this with stronger assumptions, we project the fatality risk to be as low as 2.5 per 100,000 (1.6 to 3.9) and the hospitalization risk to be 22.0 per 100,000 (14.0 to 33.7). We therefore find a 50-person dosing trial has a 99.74% (99.8% to 99.9%) chance of no fatalities, and a 98.9% (98.3% to 99.3%) probability of no cases requiring hospitalization.

## 1 INTRODUCTION

As of February 1, 2021 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to over one hundred million confirmed infections worldwide, and over 2 million deaths. Despite the recent approval of vaccines, McKinsey estimates that global vaccine production will reach eight billion doses in 2021 – and because the vaccines require two doses, this is only enough to vaccinate a bit over half of the global population (Agrawal, Conway, Heller, Sabow, & Tolub, 2020). However, now that vaccines are approved, testing of additional vaccines with large-scale field trials is difficult and morally fraught. Not only that, but a variety of critical questions are emerging about the extant vaccines, including the effectiveness of single doses, and the effectiveness against novel strains of COVID.

Rapid answers have been, and will continue to be, critical in the management of disease burden and mitigating impacts of COVID-19. Different paths to producing relevant clinical data exist, but field trials are slow and require large subject populations to discern therapeutic effects (Jamrozik & Selgelid, 2020b), while observational studies of effectiveness are similarly slow, and not randomized. In order to accelerate testing of vaccines, several authors and institutions have proposed intentionally exposing human subjects to SARS-CoV-2 to test novel interventions; to this date, nearly forty thousands of people have expressed interest in volunteering for this task (1Day Sooner, 2020). Such human challenge trials (HCTs) have been useful in the past (Jamrozik & Selgelid, 2021) to develop vaccines and treatments for other infectious diseases such as malaria, cholera, respiratory syncytial virus (RSV), (Roestenberg, Hoogerwerf, Ferreira, Mordmüller, & Yazdanbakhsh, 2018; Gómez-Pérez et al., 2015) influenza, (Treanor et al., 1999) and dengue fever (Larsen, Whitehead, & Durbin, 2015). While the changing situation has led some to question the need for HCTs, others have defended their usefulness (Turk, 2021). The usefulness is because HCTs can be used to answer a variety of questions (Nguyen et al., 2020), including many not addressed by the availability of current vaccines (Steuwer, Jamrozik, & Eyal, 2021; Ducarmon, Kuijper, & Olle, 2021). However, two ethical problems have been raised which stand in the way of conducting HCTs for SARS-CoV-2 infection. First, given lack of rescue therapies and our limited understanding of COVID-19’s risks, it is difficult to weigh the likely impacts of these studies on volunteers against the benefit to society, or to obtain informed consent from volunteers (Palmer & Schuck, 2020). Second, we do not know what viral dose of SARS-CoV-2 should be given to volunteers.

To help address both concerns, we developed a model to help assess the risks participants will face in a hypothetical dosing study for COVID-19, one similar to hVIVO’s soon-to-be-launched trial (Turk, 2021). This model uses data from a non-systematic review of data on COVID-19 risks (mortality and infection rates) and describes risks for individuals as well as the overall study risk. As both clarification on viral dose and infection risks are essential before starting HCTs, this work can help inform policymakers and potential volunteers about some risks concerning the process of using HCTs to accelerate vaccine and therapeutic development.

## 2 METHODOLOGY

We developed a three-component tool to understand and explain the relevant risks. The first component quantifies risk of COVID-19 mortality by using a Bayesian evidence synthesis model; the second uses that estimate, along with other data on gender-specific mortality and hospitalization risks to simulate the risk of a study with given characteristics; the third is a front end tool for allowing interactive exploration of risks from a study or to an individual.

### 2.1 Bayesian evidence synthesis model

We use a Bayesian meta-analysis approach to obtain an estimate of mortality risk. This form of modelling combines different sources of evidence to characterise both mean and dispersion in a given statistic of interest. In our case, we use age-specific, location-specific death and prevalence data to generate an estimate of the infection fatality risk (IFR), i.e. the probability that a person infected with COVID-19 will die, accounting for age; we also provide an estimate of hospitalization risk. These two estimates allow us to understand risk reduction in individuals who would participate in an HCT.

We use Bayesian methods because they allow us to best account for heterogeneity in IFRs across age groups, different countries and regions. Characterising this heterogeneity is crucial when our goal is not only characterising historical data, but also assessing the possible reductions in IFRs. For example, it can be argued that an HCT can use screening and provide medical care to achieve a rate of IFRs which is at least as low as the region or country with the lowest IFRs in our data^{1}.

Although existing statistical packages for meta-analysis (both Bayesian and frequentist) could easily be used to model event rates such as IFR, (Więcek & Meager, 2020; Carpenter, 2016) these models may encounter problems or provide biased results when in a large proportion of studies no deaths are observed, as is sometimes the case for COVID-19 in younger populations. To address this, we use death data and estimates of prevalence as inputs instead of IFRs, so that the model is generative, following best practice for such models (Betancourt, 2016; Gelman et al., 2020). We then construct a new, reproducible model for IFRs. Methodological details of the model are described in the Appendix.^{2} Crucially, we assume that the fixed effect of age and random effects of location on IFR are on logistic scale, as is typical for meta-analysis models of binary data.

While our IFR estimates capture average risks within different age groups and even heterogeneity across regions or countries, they still refer to the general population (of a given age, in a given location). A prospective HCT participant would be screened for health issues and comorbidities, further reducing the risk in comparison to the members of general population. To account for this, we perform an additional analysis using OpenSAFELY, (Williamson et al., 2020) a large observational dataset on COVID-19 mortality factors that includes comorbidity and age information, as well as data on gender, comparing the risk in general population to lower-risk subpopulation^{3}. Similarly to adjustment for heterogeneity across general populations, the adjustment for screening can be turned on or off in our tool.

Input data for the Bayesian IFR model is based on a non-systematic review of the literature and earlier meta-analyses, particularly by Levin *et al*, (Levin, Cochran, & Walsh, 2020) but we have also opportunistically included other studies, and data from other official sources as detailed in the appendix. However, the list of studies is not fixed, since new and better characterized datasets are becoming available over time. For this reason, we have made sure that incorporating additional prevalence and death data from newer studies and/or updated data sets is straightforward. We will continually update the model to assure that any estimates provided to participants or used for decisions include all relevant data, rather than only using data that was available when the analysis was first performed.

We use the age- and gender-specific data from Salje et al. (Salje et al., 2020) for the rate of death of hospitalized patients to impute the relative risk of hospitalization based on our meta-analysis for mortality risk.

Our model is available publicly, together with input data and source code for the tool and under an open license, at https://github.com/1DaySooner/RiskModel (Manheim, Więcek, Choi, & Wick, 2021).

### 2.2 Transforming individual risk into study risk

Once a suitable challenge virus is manufactured, itself a complex process (Catchpole et al., 2018), the risk of the individual from a challenge trial depends on the dose of virus given. The uncertainties about dose-response lead to a number of additional uncertainties about overall study risk. For other viruses, such as H1N1 and H3N2 influenza, a dose-response relationship has been found (Memoli et al., 2015; Han et al., 2019). The specific dose-response relationship, and its functional form must be determined experimentally, which is an outcome rather than an input of a dosing study like the one we are considering.^{4}. This uncertainty is a key issue, so, as suggested by Morgan and Henrion, we advise that this structural uncertainty should not be treated as a probabilistic variable, and instead sensitivity analysis should be used to enable the consideration of a range of plausible outcomes (Morgan, Henrion, & Small, 1990).

Given the specifics of a study design, the relationship between individual risk and the risks in the overall study is straightforward, assuming independence of risk between individuals in the study.^{5} The risk to an individual of severe disease in the study given dose *d* is *S*_{d}, and the risk of mortality is *M*_{d}. The probability that someone in a group of size *N* experiences the corresponding outcome is 1 − (1 − *S*_{d}) ^{N} and 1 − (1 − *M*_{d}) ^{N}. By simulating the probability of impact for each dosed group and, in the case of more complex studies, conditioning the trial of later groups on the results of earlier ones, we can find the overall risk in more complex studies.

In the current version we have restricted the tool to consider to a simple *N* -person study. This means the tool provides an upper bound for risk of mortality and hospitalisation, higher than in a more complex dose-escalation study^{6}. However, the underlying model structure allows simulation of dose-dependent and/or conditional study designs, as necessary, to accommodate dose response information and allow for the simulation of more complex trials.

### 2.3 Interface

The web tool we built allows exploration of two related types of risk. The first allows individuals to explore their personal risk, for example, by gender and age, if they volunteer to participate in a study, while the second displays overall risk of a single round dosing study with a given number of participants. Assumptions on study design and the underlying risk can be adjusted to allow interactive exploration by policymakers and participants. In considering personal risk, the tool uses pre-calculated outputs from the risk model to calculate and display predictions for hospitalization rate and death rate for individuals as a function of age and sex.

Deaths are further translated into micromorts–the expected number of deaths per million events, a standard method of showing mortality risk often used for patient consent (Howard, 1980; Ahmad, Peterson, & Torella, 2015). Quantities which cannot be transformed into micromorts, are presented as probabilities.^{7}

The tool allows both the public and policymakers to explore how overall risks change depending on differences in study design. This also helps maintain transparency into clinical trial design concerns, thereby better informing potential challenge trial volunteers. We also note that for trial designers and ethicists, the relationship between risk of different impacts and compensation is critical (Grimwade et al., 2020; Palmer & Schuck, 2020; Blumenthal-Barby & Ubel, 2020).

The importance of transparency and public engagement has been widely noted in the literature on challenge trials (Jamrozik & Selgelid, 2020a). For this reason, the tool is available for public use, and is already being used to inform people who have volunteered to be contacted to potentially participate in a challenge trial (1Day Sooner, 2020).

## 3 RESULTS

The analysis dataset for age-specific IFRs contains 167 data points from 34 studies, with each containing between 2 and 11 different age groups; all data are presented in Tables 3 and 4 of the Appendix and are included in the code repository accompanying this paper. A glance at available data also confirms the necessity of using a more complicated modelling approach: only 37 inputs contain individuals aged between 20 and 30; however, out of these 37, only three cover precisely the main age group of interest, that is, people between 20 and 30 years of age.

The basic evidence synthesis model, which includes all data, but does not adjust for health status, finds that average IFR in 20-29 age group for the studies included in this analysis is 15.1 per 100,000 cases (95% uncertainty interval^{8} 11.8 to 19.2). Extending the HCT population to include 20-39 year olds gives mean IFR of 26.5 per 100,000 (95% interval 20.6 to 33.5).

The model assumes a log-linear relationship between age and IFR. As seen in Figure 1, the assumption is met across the entire age range we considered, from children to people older than 80. The model estimates that IFR increases on average 3.06-fold per each additional decade of age.

These results align with two recent meta-analyses of IFRs (O’Driscoll et al., 2020; Brazeau et al., 2020). Both studies also show log-linear relationships between age and IFR, as well as considerable variation in IFR by context. O’Driscoll *et al* report median IFRs of 0.6 and 1.3 per 10,000 among those aged 20-24 and 25-29, respectively, while Brazeau *et al* report 3 and 4 per 10,000 respectively.

Based on OpenSAFELY data, we estimate that in healthy population (defined as lack of any co-morbidities listed above), the average mortality risk in 20-29 year olds is 1.9 times lower than in the general population, with 95% interval from 1.3 to 2.8. Expanding to 20-39 year olds only slightly increases the risk reduction factor, from 1.9 to 2. We also note that there is large heterogeneity across the studies, due to treatment availability and other factors^{9}. The HCT volunteer population will receive the best available care, including the most up-to-date treatment options. They will also be screened based on socioeconomic and other population risk factors, trading off diversity for safety. For these reasons, we consider the population for an HCT to be akin to a best-case scenario. In the current analysis, this is France, where the estimated risk for 20-29 year olds is 4.67 × 10^{−5}, approximately the mean risk across all populations.

It is unclear how much double-counting occurs when adjusting for both available medical care and population screening, and for screening out comorbidities. However, using this estimate, we see that a simple 50-person challenge trial has a 99.87% probability of having no induced fatalities, and a 98.9% probability of having no cases serious enough to require hospitalization. In other words, the probability of at least 1 death during a 50-person trial is 0.13% and the probability of at least 1 hospitalization is 1.1%. This represents an upper bound of risk for the notional 50-person trial, since the dosing study will escalate doses until a sufficient number of participants are infected, and therefore not everyone dosed is likely to develop an infection.

While the estimate is specific to a dosing study, it can also be useful for understanding the risk of later vaccine trials, though in that case other factors, including the possibility of vaccine-enhanced infection, would need to be assessed.

Due to lack of reliable data, the current model does not include estimates of longer term impacts, and instead includes a more qualitative discussion of these risks, (Smela, Kleinwaks, Sexton, & Schmit, 2020) but the model and interface will be updated with such estimates as they become available. The model interface can be used to explore how these uncertain factors can interact, as shown below in Figure 2.

## 4 DISCUSSION

We have demonstrated that we can model certain risks associated with human challenge trials. However, our model is incomplete, and considerable concerns remain about the risks of human challenge trials. While a full accounting of challenge trial ethics is beyond the scope of this paper, we consider several factors below that inform our work on human challenge trial design, especially unmodeled risks. For a more complete perspective on the ethics of COVID-19 HCTs, we direct the reader to the World Health Organizations’ (WHO) key considerations (Jamrozik et al., 2020) and the recent discussion in Lancet Infectious Disease (Jamrozik & Selgelid, 2020a; Manheim, 2020) on how these issues are being addressed.

Our model suggests that the risk of a dosage-response study is far lower than other risks that are typically widely viewed as acceptable. The risk for participants selected to be low risk is clearly far lower than the risk in the general population from comparable clinical infections. While modelling the difference between the two is not possible, we are confident that the risk in HCT would indeed be lower than in observational, general population data that we used, due to more intense monitoring and availability of immediate high-priority medical care and pharmaceutical treatments, which we did not model here. We also note that HCTs have had historical precedent, showing promise for both less lethal human coronaviruses, and for yellow fever (Shah et al., 2017). They also provided early indications regarding the possible efficacy of a leading malaria vaccine candidate (Nielsen et al., 2018). The discussions about earlier trials shows that the ethical challenges of HCTs should not be seen as unique, but rather as laying along a natural continuum of clinical studies (Franklin & Grady, 2001). That said, the initial human challenge trials should be held to high ethical standards, both for individual risks and to preserve public trust in scientific and medical progress. That includes emphasis on fully informed consent of the participants, and, as noted by the WHO, higher than typical ethical standards (Jamrozik et al., 2020).

### 4.1 Limitations

Our model has a number of limitations. First, hospitalizations and deaths for 20-30 year olds are rare and may be subject to reporting bias. Second, we do not yet have enough information on long-term damage caused by COVID-19 and our model does not currently account for that risk, something we will discuss below. Third, although our model uses hospitalization as a proxy for the upper bound of serious non-fatal COVID-19 cases, more data are required to see if this is an accurate assumption.

Our model also may not capture changes in COVID-19 risks over time. It also does not estimate any indirect risks of the study. The model focuses on the risk due to the strain which will be used in an initial challenge trial. The model may not be appropriate for characteristing risk newer variants, including B.117. We stress that our model is not a comprehensive analysis of all available risks, but rather a tool quantifying certain known risks that can be used by trial participants and policymakers. We also limit our investigation to “risk” understood as probability of an event, as it is used in the medical literature(Kelly & Cowling, 2013), rather than a broader and more sociological understanding of risk.

Finally, this model captures only absolute risk. Relative risk assessment, comparing the risk of HCTs to other methods of finding the same information, is not intended. Any relative risk assessment would need to be combined with an assessment of relative benefits of each design, including speed and accuracy, as well as considering the ethical benefits of HCTs like the voluntary nature and better understanding and control of risks due to disease compared to other forms of clinical trials.

### 4.2 Opportunity Cost

When evaluating clinical trial designs, it is not sufficient to evaluate whether the proposed model is good, or even whether benefits outweigh costs, but also whether the alternatives are better. The key benefit of a dosing study is to allow further research with challenge trials, and alternative clinical trial models have major practical difficulties and far higher costs in a variety of ways. While this is not relevant for the initial set of vaccines that have now been approved, the challenge of conducting large scale trials is magnified for later vaccines.

For example, typical clinical phase 3 efficacy trials for an ongoing novel pandemic would be field trials, which rely on high numbers of trial participants, and require that a large number of people are treated with a new vaccine or drug. Such trials are relatively expensive and expose more trial participants to negative side effects of a given treatment, so that in many scenarios HCTs have been shown to be superior (Berry et al., 2020). These trials are also difficult to pursue now that an initial vaccine is available, due to ethical and logistical constraints. At this point, it may not be possible to find enough willing participants for field trials to study additional, potentially more effective or safer, vaccines.

### 4.3 Non-modeled Risks

We also note that there are several impacts we do not model in the study, most notably the concern about so-called “long COVID”, which is a catch-all term referring to a combination of persistent symptoms, slow recovery, and new post-recovery symptoms (Carfì, Bernabei, Landi, & Against COVID-19 Post-Acute Care Study Group, 2020). It is understood that for some cases, especially severe ones, recovery from COVID-19 can take months. In other cases there are longer-term symptoms differing from those experienced during the infection, perhaps similar to Post-SARS syndrome (Perrin et al., 2020; Moldofsky & Patcai, 2011). At the same time, COVID-19 recovery has been found to be faster in younger, healthier patients (Tenforde et al., 2020), which may mean the risk is lower in this group. It seems clear that the risk is a subject of continuing scientific investigation, and as it becomes better understood and quantified, it will be incorporated into both the risk model, and the web tool. Until then, the model contains an embedded overview of what is currently understood and known or unknown about longer term risks (Smela et al., 2020).

We also note that vaccine-induced disease enhancement is a critical concern for vaccine challenge trials, but is not relevant to dosing studies. Still, this risk must be considered in analyses of risks for later trials, and the current model would need to be adapted or supplemented to consider this if it were used to inform decisions regarding those later trials.

## 5 CONCLUSION

Human challenge trials are not risk free, but the balance of risk and benefits seems to clearly favor allowing them, as a large group of experts has argued (Sooner et al., 2020). This conclusion is disputed by some, but all decisions are made despite uncertainties and debate – whether empirical or moral (Lockhart, 2000; MacAskill, Bykvist, & Ord, 2020). The question is whether the both empirical and moral balance of factors lead to the indicated conclusion. The alternative is a failure to act due to misguided risk-aversion, or worse, using uncertainty and disputed moral claims as a positive stance to shut down further work, as has occurred in the debate about HCTs (Martinez et al., 2020).

It seems likely that HCTs are a viable way to rapidly test vaccine efficacy, which is particularly critical now for testing second-generation vaccines, which are important in quickly expanding vaccine availability (Castillo et al., 2021) and may prove superior to first generation vaccines. A dosing study is an urgent first step, and the risk estimates and tools developed for this paper can assist in planning such studies and informing volunteers.

Our model provides insight into the overall risk of a trial of a given size, and can better inform HCT participants about the dangers they face. Given that an HCT may help select the multiple vaccines necessary for global immunization while also assisting with therapeutic testing, the risk of an initial study into SARS-CoV-2 pathogenesis seems justified. If the dosing study is successful, future HCTs of COVID-19 may provide a rapid and systematic way of screening vaccine candidates for efficacy and safety, which is a significant benefit.

The results presented here are a useful static estimate of risk. The model is already being used to inform potential volunteers (1Day Sooner, 2020) and can be adapted and expanded in the future. Given the evolving understanding of the disease, the model should be continually updated with additional data on mortality and hospitalisations risks or other long-term risks. This will contribute to the discussion of whether or not to pursue challenge trials, which can help with response to COVID-19 in a variety of ways (Nguyen et al., 2020). Challenge trials may be an important tool for fighting COVID-19 and our model is a step towards that goal.

## Data Availability

All data and code is available on github, as linked in the paper.

## Acknowledgements and thanks

Thank you to Rouslan Karimov and Jupiter Adams-Phipps for validating, updating, and collecting references for the dataset used in the analysis. Thanks to Chris Choe, Sophie Rose, Kate Sheahan, Troy Yamaguchi and Linchuan Zhang, who helped prepare and edit an early version of the manuscript. We would like to thank Ed Choi and William Kahn, as well as two anonymous referees, for providing helpful comments, and Ben Goldacre and the OpenSAFELY team (https://opensafely.org/team/) for sharing data with us.

## Appendix: Bayesian model of COVID-19 mortality risk in HCT volunteers

### A-1 INTRODUCTION

This short document is a technical appendix to the paper discussing COVID-19 risks in human challenge trial. Here, we show how a Bayesian model can synthesise information on many infection fatality risks (IFRs) into a single estimate. This estimate is specific to certain age groups and can be further adjusted by e.g. co-morbidity status. The analysis presented here is a form of Bayesian meta-analysis, in that our primary objective is to weigh sources of evidence in a way that captures both variability (here, heterogeneity in real IFRs across different settings) and uncertainty (here, the fact that we do not know the IFRs in each setting precisely).

The ultimate objective of this model is to characterise risk in a way that is useful for design of HCTs. Therefore, as a minimum, we want to incorporate variability across different populations into our prediction. Even better would be to understand how different factors can drive heterogeneity: *a priori* we hypothesise that the three main drivers of differences in IFRs are time-specific, population-specific and otherwise country-specific.^{A1}

To characterise differences in observed IFRs we first develop a Bayesian model and apply it to publicly available summary data on IFRs from multiple countries and contexts, with particular focus on the impact of age. This is covered by Section 2. We then use a simple model to hypothesise reduction in risk that may be achieved by screening individuals for comorbidities; this is Section 3. We summarise all results in Section 4.

### A-2 AGE-SPECIFIC RISK OF COVID-19 MORTALITY

#### A-2.1 Bayesian evidence synthesis model

What follows is an adaptation of typical methods of Bayesian evidence synthesis to analysis of IFRs. IFR is the ratio of deaths to infections in a given population. Early estimates of COVID-19 mortality risk, e.g. by Verity, Okell, et al. (2020a), placed it at over 0.6%; however, it was also evident from data that IFR could be orders of magnitude higher in particular high risk groups, especially in the elderly, than in the general population.^{A2}

By definition, our data on IFRs is a combination of data on deaths with data on infections. Typically, these are disjoint samples, in that numbers of infections are estimated (typically very imprecisely) on select subpopulation, while deaths are recorded in the general population (at a level of country, administrative region etc.). There are clear reasons to believe that IFRs will differ across studies (e.g. due to age, comorbidity status, time, genetic factors, quality of healthcare etc.). To address this, we will use a Bayesian hierarchical modelling framework to assume that the setting-specific estimates of *IF R*_{k} can differ from each other but are linked through some common parameters. (By *k* ‘s we denote different populations; note that sometimes we may have multiple *IF R* ‘s from different age groups in the same location.)

The most straight-forward and “canonical” way to implement such a Bayesian model is by modelling log odds of the event.^{A3} Deeks (2002) present a general treatment of such approach in medical statistics. Note, that for very rare events the odds of mortality are very similar to probability of mortality, but we model events on odds scale as a good “generic” approach to modelling binary data (in this case death following infections).^{A4}

Basic models for this type of analysis of binary data can be implemented using existing statistical analysis packages; see, for example, *metafor* package in R or *baggr* by Więcek and Meager (2020). Such analysis would treat IFR as a logit-normal parameter to meta-analyse. However, note that when no deaths are observed, analysis of IFR (equal to observed deaths divided by modelled infections) is problematic. Therefore we propose a “custom” model that built in Stan which treats deaths and *prevalences* (rather than the IFRs) as data.

Let *d*_{k} denote observed deaths for data point *k* and assume that logit of corresponding prevalence estimate is *p*_{k} is a parameter (typically obtained from a statistical modelling papers, government reports etc.). Total population in *k* -th setting is *n*_{k}. Total number of estimates is *K*. Then the model likelihood is as follows:
where and are parameters obtained from the literature (or converted from these parameters – see next section). The *k* data points collected can span many locations (studies); we denote them by loc_{k} and the total number of locations by *K*_{l oc} (with *K*_{l oc} < *K*).

In this model we can also account for various covariates impacting the IFRs (let’s denote their total number by *N*_{p}), such as age groups (which we identify with median age of the population being studied, MedianAge_{k}). We code them in a design matrix *X*. To center our *X* at the value of interest in our model (risk in 20-30 year olds), we use a transformation `MedianAge/10 - 2.5` to construct our matrix *X*. We denote all of the covariates using a design matrix *X* and denote by *N*_{p} the number of columns in *X*. We assume the impact on IFR is on logit scale, same as in the “canonical” logistic models of binary data that we mentioned above:

This means *θ* spans location-specific (random) effects on IFR while *β* is *N*_{p} dimensional vector of (fixed) covariate effects.

We implement our model in Stan and assume very weakly informative priors on all parameters, with prior for *τ* centered at 1 death per 10,000 cases.

#### A-2.2 Data

We used estimates originally collected by Levin, Cochran, and Walsh (2020) to construct the first version of analysis dataset, which we then supplemented with more values extracted from other studies.

We included all relevant data points from the Levin et al. study This meant sometimes including values that the original study omitted. For example, Levin et al. exclude data points with seroprevalence indistinguishable from zero; our model retains them. We systematically went through the included studies listed in Appendix I and Appendices H.4 and H.5 of Levin et al. to ensure complete inclusion of all their data. In addition, in the course of this process we because aware of either updates to the studies Levin et al. used or entirely new relevant studies. We have reviewed and included them in our model as well.

The input data into our model consists of deaths (treated as known) and prevalences (treated as logit-distributed parameter with known mean and SD) in all reported age groups in all studies^{A5}.

All of input data are given in Table 4. The analysis dataset contains 167 data points from 34 studies, each containing between 2 and 11 different age groups. We made only minimal modifications to source data, by 1) imputing the values from the Italian fatality data based on a seroprevalence survey, 2) imputing population size in Maranhao (as ratio of the reported number of infections and the mean infection risk) which were not reported and 3) assuming that uncertainty in prevalence 0-29 age group in Iceland is same as in the 30-39 age group since data were missing.

As mentioned, our model treats number of COVID-attributable deaths as measured without error (due to lack of data) but accounts for uncertainty in infection risks, which are always model-based estimates extracted from various available data sources. In preparing data, we assumed that logits of prevalence estimates from available studies are normally distributed, which seems to reproduce majority of data very well, see Figure 7 in the Supplement. There are some discrepancies with studies that allowed for prevalence estimates to be 0, something that our logit model does not allow.

For each study we construct a median age scalar defined by the average of the endpoints of each age range, rather than attempting to calculate a population-weighted mean.

Our approach of regressing on the median age and use of all available data (rather than the subset of data available in younger adults only) is necessitated by data limitations: out of 167 data points comprising age-specific estimates of prevalence (or IFR) and counts of deaths, 37 contain individuals aged 20-30 who are of primary interest to us. However, the populations are mixed with regards to age, with typical age groupings such as 19-49, 20-49, 20-39, 0-49 used instead. In fact, we find only one estimate out of 37 that is entirely specific to the 20-29 age group (Brazilian state of Maranhao), while one more has median age falling between 20 and 30 but is not specific to that age group.

#### A-2.3 Results

There were no issues with convergence of the Bayesian model. We set number of iterations to 5,000 and used 4 chains, with `max_treedepth` option set to 15. There were no divergent transitions and effective sample size was greater than 2310 for all of 538 modeled parameters (this number includes fitted prevalences, IFRs, *θ*’s and their transformations into/from logit scales). For the three main parameters in the model we obtained the following:

The mean coefficient of beta 1.12 corresponds to 3.06-fold increase in mortality risk following an infection per each extra decade of age (95% uncertainty interval is 3.01-3.11).

From these parameters we can predict average risks for subjects of any given age *x*, by using the posterior distribution of *τ* + (10*x* + 2.5)*β* (where 2.5 and 10 refer to the transformation that we applied to `MedianAge` inputs).

#### A-2.4 Average infection fatality risk in young subjects

Since we centered our `MedianAge` at 25 years in constructing our matrix *X*, we can now obtain model-estimated risk for a typical HCT population (aged 20 to 30, with median 25) by ignoring the *β* coefficient and examining *τ* and *σ* only. We find that the average IFR for this group (equal to ) is 1.51 × 10^{−4} (with 95% interval from 1.18 × 10^{−4} to 1.92 × 10^{−4}).

##### A-2.4.1 Heterogeneity in IFRs

However, there is a considerable variability in IFRs across different locations/dataset that we should consider. To take into account parameter *σ*, we can generate draws from the N(*τ, σ*^{2}) distribution, corresponding to a hypothetical IFR in a new source of data. 95% interval for such model runs from 3.94 × 10^{−5} to 5.79 × 10^{−4}. Since the model works a logistic scale, another way of interpreting the across-dataset variability is reporting the fold-impact of *σ* on the mean IFR; here, we obtain on average a 3.82-fold increase (decrease) in IFR per 2*σ* increase (decrease).

The lower end of the 95% interval, 3.94 × 10^{−5}, is not extreme given input data, where the “crude” mean IFR (based on mean prevalence only) is below 7 per 10,000 for all data except for South Florida, and as low as 0 for some countries that did not record deaths (Belgium, New Zealand, Korea, Iceland) in various age groups including 20-29 year olds or 1.4 per 10,000 in Utah, in the population aged 19-44. (Please refer to Table 4 for complete list of inputs.)

We can assess this heterogeneity by inspecting the distribution of random effects in the model transformed into IFRs, i.e. the inverse logit transformation *θ* parameters. The largest (posterior mean) IFR value of *θ* is 5.09 × 10^{−4} in Castiglione d’Adda. The smallest posterior mean for 20-29 year olds is 4.67 × 10^{−5} in France.

##### A-2.4.2 Predictive checks for the model

We constructed posterior predictive distributions for number of deaths in each of the inputs by using the `generated quantities` functionality of Stan. Figure 3 compares the posterior means and 95% intervals with observed deaths. Out of 167 observations that were used to fit the model, 157 were within 95% intervals of the posterior predictive distributions. We observed the largest discrepancies occurred in Spanish data. Overall, we conclude that the simple binomial model we used here is flexible enough to capture both age-specific risk increases and heterogeneity in IFRs across settings/countries.

#### A-2.5 Sensitivity analyses

As a sensitivity analysis, we also considered the impact on the main model parameters of dropping some data from our analysis. The result is summarised in a table containing parameters *σ, β, τ* and the mean IFR for the 20-29 year olds age group.

We considered excluding data for the youngest and the oldest individuals, as well as excluding both at once. Our hypothesis was that at extreme ends of age the assumption of log-linearity of IFRs may not hold and potentially lead to a biased estimate of the IFR in the 20-29 and 20-39 age groups. However, as shown in the accompanying table, we find no substantial effect of excluding data on the main model paramters.

### A-3 RISK REDUCTION IN HEALTHY INDIVIDUALS

We now turn our attention to the question of how much a human challenge trial designer could reduce the mortality risk by using simple screening methods, as discussed in the main paper.

Data for this section has been provided by OpenSAFELY (https://opensafely.org/) and was used by Williamson et al. (2020) to characterise COVID-19 mortality risk factors for 10,926 COVID-19 deaths in England. We group the total of 21,444,863 individuals into a total population and a lower-risk sub-population, defined as non-smoker, non-obese and without the comoribidities reported in the OpenSAFELY study^{A6}, most notably respiratory and cardiovascular diseases and type I diabetes. For brevity we refer to the population without one of the pre-defined comorbidities as “healthy”. In contrast to the cited publication, we include records of individuals under 18 in our assessment. Counts grouped by age are presented in Table 2. Complete data (broken down by gender) are in Table 2 at the end of the document.

As shown in Table 2, for the age group of 20-29 the crude risk ratio (of general population vs the healthy subset only) is 1.53, but, due to low number of events in both healthy and general population, with a very wide 95% interval from 0.6 to 5.65.^{A7} As data on relative risks in other age groups is clearly related to the relative risk in 20-29 age group, we use another meta-analysis model to improve our estimate. Additionally, relative risks are higher in women than in men – something that we can account for in our model too.

In our modelling we make a strong assumption that infection risks in population with comorbidities are the same as in the general population. In other words, we assume that denominator for IFR is same in both populations. We then specify a generic partial pooling model of fatality risks (FR, defined as number of deaths in the entire population, without regards to infection status) such that
where, for *k* -th observation, age_{k} is the age group, and comorb_{k} and male_{k} are indicator variables. This means that each age group is assigned different “baseline” fatality risks.

Note that the assumption of FRs varying across age groups that we just mentioned differs from the model of age-specific IFRs in Section 2. This is because we hypothesised that infection risks (which are used as denominators in the IFR model of Section 2) will vary across age groups. This is borne out by Figure 5.

Summary of the main model parameters is as follows:

## mean se_mean sd 2.5% 25% 50% 75% 98% n_eff Rhat ## mu 1.43 0.00278 0.190 1.05 1.33 1.43 1.53 1.81 4674 1 ## sigma 0.36 0.00462 0.212 0.12 0.22 0.31 0.44 0.90 2115 1 ## alpha[3] 1.33 0.00440 0.329 0.60 1.14 1.34 1.53 1.95 5577 1 ## beta 0.40 0.00019 0.019 0.36 0.39 0.40 0.42 0.44 10314 1 ## gamma[3] −13.05 0.00415 0.342 −13.76 −13.27 −13.04 −12.82 −12.43 6795 1We find that the mean risk ratio between population with comorbidities and healthy sub-population in 20-29 age group (exponent of *α*_{3} above) is 3.97, with wide 95% uncertainty interval of 1.81 to 7.05.^{A8}

Next, using the posterior samples we calculate the event rate in total population (i.e. *θ** = (*θ*_{1}*n*_{1} + *θ*_{2}*n*_{2})/(*n*_{1} + *n*_{2}), where subscripts 1 and 2 are healthy and comorbid sub-populations) and then divide it by ratio in healthy population to obtain an estimate of risk reduction possible by selecting healthy volunteers only. The mean posterior value is 1.88, with 95% uncertainty interval from 1.24 to 2.79. Due to use of Bayesian hierarchical model over many age groups, the uncertainty interval is much narrower than on the risk reduction factor calculated on 20-29 age group only.

To validate the model, we conducted a simple posterior predictive check for numbers of deaths in different age groups and genders. The graphical check is presented in Figure 6. Overall we find that the simple model has no problem with reproducing observed data.

### A-4 CONCLUSION AND SUMMARY OF RESULTS

In conclusion, the implications of the model for the risk in healthy young subjects are as follows:

We find that average IFR in 20-29 age group for the studies included in this analysis is 1.51 × 10

^{−4}with 95% interval from 1.18 × 10^{−4}to 1.92 × 10^{−4}.– It is feasible that the mean IFR can be decreased as much as 3.82-fold (2

*σ*impact on the IFR according to hyper-SD parameter in the meta-analysis model).– It is easy to argue that a HCT designer would be able to achieve IFR at least as low as within any of the large-scale studies included in our sample of populations. The smallest posterior mean for 20-29 year olds is 4.67 × 10

^{−5}, fitted to data from France.– Extending the HCT population to also include 30-39 year olds would lead to mean IFR of 2.65 × 10

^{−4}with 95% interval from 2.06 × 10^{−4}to 3.35 × 10^{−4}. Lowest mean IFR would then be 8.2 × 10^{−5}(also in France).

In the general population the risk rises by the factor or 3.06 per each decade of age, with 95% interval from 3.01 to 3.11.

In healthy population (defined as lack of co-morbidities listed above), the average mortality risk in 20-29 year olds is 1.88 times lower than in the general population, with 95% uncertainty interval from 1.24 to 2.79.

– Our 1.88 estimate is a bit higher than the mean “crude” risk ratio of 1.53 because we use a Bayesian hierarchical model that synthesises evidence across all age groups.

– Expanding to 20-39 year olds, the risk in healthy sub-population would be 2.04 times lower than in the general population (95% interval from 1.24 to 2.79).

Combining the smallest posterior IFR for 20-29 year olds with our estimated fold-reduction due to excluding individuals with co-morbidities from the population would lead to a mean infection fatality risk of 2.6 × 10

^{−5}with 95% Bayesian interval from 1.62 × 10^{−5}to 3.89 × 10^{−5}.– In 20-39 year old subjects the risk would be 4.09 × 10

^{−5}(2.88 × 10^{−5}to 5.55 × 10^{−5}).

### A-5 SUMMARY OF INCLUDED STUDIES AND COMPLETE INPUT DATA

We present two tables, one listing all study-level information and another breaking down information by age group. References to all included studies are given in a separate bibliography at the end of this appendix. Below the table, we provide a list that can be used to cross-reference study locations with their bibliographic references.

Atlanta: Biggs et al. (2020); Australia: A. D. of Health (2020); Belgium: Herzog et al. (2020); Belgium: Molenberghs et al. (2020); Brazil Maranhao and Sao Luis: Silva et al. (2020); Brazil Regional: Hallal et al. (2020); Cache County, UT: Project (2020); Castiglione d’Adda: Pagani et al. (2020); Connecticut: Havers et al. (2020); Connecticut: Mahajan et al. (2020); Diamond Princess: Mizumoto et al. (2020); England: Ward et al. (2020); England (ONS): England (2020); France: Carrat et al. (2020); France: F. P. Health (2020); Gangelt: Streeck et al. (2020); Geneva: Perez-Saez et al. (2020); Iceland: I. D. of Health (2020); Indiana: Menachemi et al. (2020); Ireland: HPSC (2020); Italy: Istat (2020); Italy Deaths: Group (2020); Italy Report: Istat (2020); Korea: Control and Agency (2020); Lithuania: Registry (2020); Lombardy: Paradisi and Rinaldi (2020); Louisia.: Havers et al. (2020); Minneapolis: Havers et al. (2020); Missouri: Havers et al. (2020); Netherlands: RIVM (2020); New York: Rosenberg et al. (2020); New York State Comorbidity: N. Y. S. D. of Health (2020); New Zealand: N. Z. M. of Health (2020); NYC JAMA: Richardson S (2020); Ontario: Health Protection and Health Ontario) (2020); Philadelphia: Havers et al. (2020); Portugal: Saúde (2020); San Francisco Bay: Havers et al. (2020); South Florida: Havers et al. (2020); Spain: Pastor-Barriuso et al. (2020); Sweden: Authority (2020); Utah: Havers et al. (2020); Utah: Project (2020); Verity et al.: Verity, Okell, et al. (2020b); Washington County, UT: Project (2020); Weber County, UT: Project (2020); West Salt Lake, UT: Project (2020); Western Washington: Havers et al. (2020)

## Footnotes

Updated analysis including additional studies, and incorporate feedback from peer review.

↵

^{1}Since this is only an assumption, rather than a fact which we can derive directly from data, we will present both the overall estimate across studies, or the expected lower risk. Similarly, users of our tool can choose between different assumptions.↵

^{2}Briefly, let*k*index populations and loc_{k}their locations (countries, regions). Let*d*_{k}be observed deaths for data point the reported mean prevalence and its standard error, on logit scale;*n*_{k}the total population;*X*is a vector of median ages, expressed in decades and centered at 25 years. The hierarchical model is as follows:The parameters of the model are

*p*_{k}, the true prevalence;*θ*, location-specific (random) effects on IFR;*β*, (fixed) effect of age;*τ*and*σ*, the hyper-mean and hyper-scale parameters for IFR. We implement our model in Stan, (Carpenter et al., 2017) with weakly informative priors on all parameters. The prior for*τ*is centered at 1 death per 10,000 cases.↵

^{3}The lower-risk group is defined as non-obese, non-smoking, and without the following risk factors (following those used by (Williamson et al., 2020)): asthma, other chronic respiratory disease, chronic heart disease, diabetes mellitus, chronic liver disease, chronic neurological diseases, common autoimmune diseases (rheumatoid arthritis, systemic lupus erythematosus or psoriasis), solid organ transplant, asplenia, other immunosuppressive conditions, cancer, evidence of reduced kidney function, and raised blood pressure or a diagnosis of hypertension.↵

^{4}There is an assumption implicit in the model that the trial uses dose escalation or other data to ensure that the dose given does not greatly exceed the typical natural dose, or that a larger-than natural dose does not increase disease severity.↵

^{5}The assumed independence is conditional on the age and health status of participants, and for dose-response studies, also infection severity by dose.↵

^{6}This is equivalent to assuming that the risk of infection at each dose is above the threshold for infection to replace a dose-response curve.↵

^{7}See discussion below about incorporating other risks in the model as more data becomes available.↵

^{8}All intervals reported here are Bayesian posterior intervals. For brevity we just refer to them as “x% interval” or “uncertainty interval,” (Gelman, 2010).↵

^{9}Denoting by*σ*the hyper-scale parameter in the hierarchical model, 2*σ*impact corresponds to 3.96-fold mean decrease in IFR. That means we expect 2.5% of studies to have IFR more than 4 times lower than the average IFR.↵

^{A1}The role of time may be due to new treatments, improvements over time in our ability to treat COVID-19 or selection pressures which may lead to more benign versions of the virus. Country-specific or location-specific factors in IFR data may be driven by under-reporting, health care factors (including access to health care services) or underlying distributions of known risk factors. Additionally, some unknown risk factors (e.g. genetic) may also be operating, in which case controlling for age and co-morbidities will be not sufficient to account for cross-location differences.↵

^{A2}Various estimates published since suggested that the relationship of mortality risk to age is consistent across different countries.↵

^{A2}Various estimates published since suggested that the relationship of mortality risk to age is consistent across different countries.↵

^{A3}It is also possible to work with*I F R*_{k}parameters and treat them as derived from Beta distribution with some “hyperparameters”*α*and*β*Of Beta distribution, as done by e.g. Carpenter (2016). That approach, however, does not offer an easy way of modelling impact of covariates (e.g. age and co-morbidities) on the rates.↵

^{A4}Another advantage of such a model is that it can use either individual-level or summary data and work with covariates (such as gender, age, time of the study, co-morbidities), captured as odds ratios or risk ratios. If only summary data are available, covariates can be defined as study level distributions (e.g. % male)↵

^{A5}This basic approach potentially exaggerates uncertainty, as we treat different 95% intervals reported in the study as uncorrelated.↵

^{A6}“asthma, other chronic respiratory disease, chronic heart disease, diabetes mellitus, chronic liver disease, chronic neurological diseases, common autoimmune diseases (Rheumatoid Arthritis (RA), Systemic Lupus Erythematosus (SLE) or psoriasis), solid organ transplant, asplenia, other immunosuppressive conditions, cancer, evidence of reduced kidney function, and raised blood pressure or a diagnosis of hypertension”↵

^{A7}We obtain the interval using a simulation approach. Using normal approximation of log(RR) statistic we obtain a narrower 0.57 to 4.1, perhaps due to poor quality of approximation for rare events.^{A8}Using simple models that assumed identical risk ratios in all age groups would lead to a mean RR of similar magnitude but a much less uncertain estimate, due to more rigid model assumptions; similarly, assuming some linear age structure on risks, such as in the main meta-analysis model above, would may lead to a different RR, but we do not think such an assumption is justified here. We do not include outputs of these models in this short write-up.

## Abbreviations

- HCT
- Human Challenge Trial
- IFR
- infection fatality risk
- SARS-CoV-2
- severe acute respiratory syndrome coronavirus-2