## Abstract

Late-stage cancer immunotherapy trials often lead to unusual survival curve shapes, like delayed curve separation or a plateauing curve in the treatment arm. It is critical for trial success to anticipate such effects in advance and adjust the design accordingly. Here, we use *in silico* cancer immunotherapy trials – simulated trials based on three different mathematical models – to assemble virtual patient cohorts undergoing late-stage immunotherapy, chemotherapy, or combination therapies. We find that all three simulation models predict the distinctive survival curve shapes commonly associated with immunotherapies. Considering four aspects of clinical trial design – sample size, endpoint, randomization rate, and interim analyses – we demonstrate how, by simulating various possible scenarios, the robustness of trial design choices can be scrutinized, and possible pitfalls can be identified in advance. We provide readily usable, web-based implementations of our three trial simulation models to facilitate their use by biomedical researchers, doctors, and trialists.

## Introduction

Immunotherapy is revolutionizing the treatment landscape for patients with advanced cancers. While the number of immuno-oncology drugs under investigation is rising rapidly – around 4700 agents are currently in the development pipeline – the need to further improve patient outcomes remains high [1]. Well-designed immunotherapy trials are crucial to establish advances in clinical outcomes robustly. Unfortunately, the odds for cancer treatments to successfully pass the development pipeline are unfavorable, and only a minority of the treatments (5-10%) ultimately obtain market approval [2–4]. Even for cancer therapies that do reach late-stage development, approval rates remain modest at around 27% [5]. The primary reason in most of these trials (i.e., 63.7%) is failure to demonstrate efficacy [5], which can be partly attributed to suboptimal trial design choices based on overly optimistic assumptions of the treatment effect. Such assumptions may be used to erroneously justify low numbers of patients or inappropriate endpoints and lower the power of these trials [5, 6].

Immunotherapy trials raise complex design questions, and conventional design methods are not always a good match to the unique characteristics of immunotherapies [7]. There is a very broad spectrum of therapies based on various molecular mechanisms – ranging from immunomodulators to cell therapies, cancer vaccines, oncolytic viruses, and CD3-targeted bispecific antibodies – that can lead to unusual toxicity profiles, response patterns, and survival kinetics [8–10]. These observations render a “one-design-fits-all” approach futile and stress the need for designs that are tailored to immunotherapy or even combination therapies.

Immunotherapies are known to induce a delayed clinical effect and long-term overall survival (OS) in only a subset of patients [11]. The survival curve reflects these phenomena by a delayed curve separation and a plateau of the treatment arm at later stages of the trial [12]. These characteristics violate a fundamental premise that underlies the design of many trials: the proportional hazard assumption (PHA) – essentially stating that the treatment effect should remain constant over time [13]. As a result, immunotherapy trials based on this principle can have an overestimated power [12, 13] and require a longer follow-up to demonstrate efficacy than initially planned [12], increasing the likelihood of a negative trial.

These issues led to the development of innovative methods such as novel radiological criteria to quantify tumor responses [9, 14, 15], (surrogate) endpoints to capture unique survival kinetics [10, 16–19], biomarkers to enrich for patients more likely to respond to treatment [20–23], and statistical methods to retain a trial’s power in the presence of unusual survival kinetics [24–26]. Despite the multitude of available methods, it is difficult to predict trial outcomes in advance and select the methodology accordingly. The stakes are high: a trial design built on accurate predictions of the response kinetics is more likely to be positive, whereas misjudgment could result in a negative trial, potentially compromising patient benefit, vast amounts of work, and (public) research funds.

In this study, we use late-stage *in silico* cancer immunotherapy trials to investigate how design decisions affect the trial outcome in the context of cancer immunotherapy, possibly combined with chemotherapy. The mechanism-based nature of these trials allows researchers to translate cellular processes in the tumor microenvironment and immunotherapeutic interventions thereon into predicted response patterns, survival kinetics, and trial outcomes. An *in silico* immunotherapy trial is based on explicit biological assumptions and provides an intuitive means to predict risk profiles and treatment efficacy. Moreover, it equips researchers with a tool to scrutinize trial designs and analysis strategies of upcoming trials in advance to identify potential risks and pitfalls.

We use three different simulation models to perform our *in silico* trials, based on work by ourselves [27] and other authors [28, 29]. Despite considerable differences, all models replicate late-stage immunotherapy or combination trials reasonably well and capture their typical survival kinetics. Then, we demonstrate various applications of such trial simulations, including the ability to scrutinize a clinical trial’s design and sample size calculations based on a range of predicted possible outcomes. Finally, we illustrate the consequences of (not) considering immunotherapy-specific response patterns in settings selected for educational purposes, such as selecting survival endpoints and randomization ratios of upcoming trials and planning interim analyses.

## Results

### Generating trial populations based on tumor-immune dynamics

We used *in silico* cancer immunotherapy trials based on mechanistic simulations of cancer-immune dynamics to investigate the consequences of immunotherapy-specific response patterns on trial design principles [26]. The virtual patients in these trials are simulated with ODE models, which describe disease courses based on assumptions about interactions between tumor cells and the immune system [26]. In this paper, we will focus on simulating two years of follow-up after treatment – while it is straightforward to consider longer follow up times with *in silico* trials, a two-year time frame is common for contemporary immunotherapy trials [30–32].

To investigate the extent to which our simulation results depend on specific modeling choices, we use three different ODE models. Model 1 (M1) describes the following tumor-immune dynamics in the tumor microenvironment: immunogenic tumor growth leading to priming and clonal expansion of naïve T cells, migration of effector T cells to the tumor microenvironment, and formation of tumor-immune complexes to enable tumor cell killing (see Methods; **Figure 1A**). We simulate treating these patients with immune checkpoint inhibitors (ICI), chemotherapy, or both. ICI increase the T cell killing rate and directly affect the tumor-immune dynamics. Chemotherapy has a cytotoxic effect on the tumor, slowing its growth. A detailed description of the model, including the rationale for parameter selection, has been published previously [26]. In contrast, model 2 (M2) does not represent T cells migration between lymph nodes and tumor microenvironment; however, it does contain an explicit representation of antigen-presenting cells (APCs) [33]. Finally, Model 3 (M3) does not contain either T cell migration or APCs, but it does take T cell exhaustion into account. Another important difference between the models lies in how tumor growth is represented: M1 uses a size-dependent growth rate, M3 a resource-constrained growth rate (logistic growth), and M2 uses unlimited exponential growth.

Regardless of model specifics, *in silico* clinical trials describe cancer outcomes on three levels: (1) a cellular level, (2) a patient level, and (3) a trial population level. Cellular interactions in the tumor microenvironment are translated into clinical trial outcomes as follows: firstly, the ODE model is implemented, and model parameters that vary between patients are selected by fitting to existing survival data (**Figure 1B**; see Methods). Next, individualized disease trajectories – either treated or untreated – of cancer patients are generated (**Figure 1C**). Eventually, patients are randomized into two cohorts to resemble conventional phase III trials: a control group (either placebo or chemotherapy) and a treatment group (immunotherapy, chemoimmunotherapy, or induction chemotherapy followed by immunotherapy; **Figure 1D**). Since the cellular dynamics (e.g., tumor burden over time or the efficacy of T cell killing) and survival outcomes of these patients are known and can be modified, *in silico* clinical trials are suited to answer questions like:”Assuming that a novel treatment increases T cell killing 5-fold, how does this translate to a survival benefit in patients? Moreover, how many patients are needed to establish this benefit in a clinical trial? When should one analyze the results?” (**Figure 1E**).

Despite their differing mechanisms, the models generate qualitatively similar predictions (**Figure 2)**: tumors grow at realistic speeds and are usually not cleared by the immune system without therapeutic intervention. In the models, therapeutic interventions can slow or even reverse tumor growth, in principle leading to two major contrasting outcomes: death or long-term survival. However, there is a unique effect in M3 where even after treatment and growth reverse, the tumor burden keeps oscillating over time, leading to regular self-resolving recurrences. While this may not be entirely realistic, it is not an issue for our purpose as we shall focus on the initial growth trajectory of the tumor preceding and up to 2 years after treatment, and recurrences happen after that.

*In silico* late-stage immunotherapy trials yield realistic survival outcomes

To investigate whether our *in silico* models can generate realistic survival curves as observed in late-stage immunotherapy trials, we fitted the models to three different datasets: (1) the NCCTG lung cancer survival dataset [34]; (2) the CA184-024 trial (ipilimumab + dacarbazine vs. dacarbazine in previously untreated metastatic melanoma [35]); and (3) the CheckMate 066 trial (nivolumab + placebo vs. dacarbazine + placebo in treatment-naive metastatic melanoma patients without BRAF mutation [36]). The choice for these trials is based on the size of the trials and the maturity of the data. The follow-up of the CA184-024 trial and the CheckMate 066 trial were five and three years, respectively. As the last two datasets were not publicly available, we extracted the data using image digitization (see Methods). As a reference for the *in silico* trials, we visualized the Kaplan-Meier estimators of these datasets (**Figure 3A**). Both trials were digitized correctly, as reflected by the nearly identical risk tables compared to the original manuscripts [35, 36]. Next, we fitted the tumor growth rate distributions and treatment effect parameters for chemotherapy and immunotherapy s (NCCTG: 3 parameters; immunotherapy trials: 4 parameters) to these datasets (M1: **Figure 3B**; M2, M3: **Supplementary Figure S1)**. For the CA184-024 and CheckMate 066 trials, the simulated patients were treated with ICI upon diagnosis, increasing their T cell killing rates. For simplicity, we did not simulate dropout or censoring in the trials shown in this paper, although it could be added to the simulation. Model M1 achieved satisfactory fits to all datasets. However, M2 and M3 had difficulties fitting the CheckMate 066 data, with M2 predicting more rapid death in the control arm and M3 predicting a cross-over of survival curves. M3 also had difficulties fitting the other two datasets, as its survival curves plateaued from 12 months after treatment onwards. While the fit of all models can be improved by allowing more parameters to vary, we chose to keep the number of fitted parameters small to investigate the consequences of such issues on our downstream analyses.

Hence, our *in silico* trials couple the disease mechanism and mechanistic treatment effect to a predicted clinical trial outcome. By allowing model parameters to vary between patients, such models can be fitted to existing clinical trial data. Whether a good fit can be achieved depends on the model assumptions and the number of parameters that are allowed to vary. In our case, models M1 and M2 were able to fit the three datasets reasonably well, with M3 showing a substantially worse fit.

Interestingly, although not incorporated explicitly, the models reproduced hallmark survival curve features arising as a consequence of the interaction between tumor and immune cells typically seen in immunotherapy trials: a delayed curve separation and a plateau of the survival curve of the treatment arm at later stages of the trial (last two columns in **Figure 3B** and **Supplementary Figure S1)**.

*In silico* immunotherapy trials predict immunotherapy-specific response patterns

The design and the success rate of any clinical trial depends, among others, on a realistic predictions of the shape of the survival curves and the distribution of clinical outcomes. For late-stage immunotherapy trials, commonly observed immunotherapy-induced response patterns are a delayed curve separation and a plateauing tail of the survival curve of the treatment arm (**Figure 3)**. These characteristic survival curve shapes violate a vital premise of many clinical trials: the proportional hazard assumption (PHA). The PHA states that the “instantaneous death rate” of a patient (i.e., the hazard rate) in both arms of the trial should be proportional, resulting in a constant hazard ratio. Many traditional design methods, ranging from sample size calculations to outcome analyses, are based on this convenient assumption. For late-stage immunotherapy trials, this induces two problems: (1) while a violation of the PHA needs to be addressed during trial planning, the hazard rates – and an eventual violation of the PHA – becomes available only after the trial; and (2) if a trial does not adhere to a PHA, what will be the shape of the survival curve? Especially in an era where treatment and control arm regimens are becoming increasingly complex, adjusting the design and analysis methods to various survival curve shapes is challenging.

*In silico* clinical trials can provide principled predictions about possible shapes of the survival curve, including the underlying hazard rates and hazard ratios, before trial execution. We generated such survival predictions using the models – fitted to the CA184-024 data (**Table 3**) – and changed the treatment effect parameters according to the simulated scenario. A traditional scenario would be a trial in which patients are randomized 1:1 to mono-chemotherapy or placebo. Given the direct chemotherapy effect, the PHA is generally assumed to hold for these trials. An *in silico* trial in which chemotherapy reduces the tumor growth rate for the entire trial duration indeed replicates these assumptions (**Supplementary Figure S2)**: the survival curves separate from the start of the trial, and the hazard ratio remains roughly constant over time. However, what happens if the chemotherapy effect does not last for the entire trial but for – maybe more realistically – 6 months? For M1 and M2, the initial proportional separation of the survival curves is followed by a parallel decay and eventual convergence of both curves, leading to an early but transient survival benefit for the chemotherapy arm (**Figure 4A, B**). For M3, the chemotherapy effect estimated from the CA184-024 data is more profound and instead induces a permanent response (**Figure 4C**). Hence, substantial deviations from the PHA are observed in all cases, even for seemingly simple chemotherapy trials. Also, a violated PHA becomes immediately apparent when considering a more contemporary scenario of immunotherapy combined with chemotherapy compared to chemotherapy alone: through approximately the first six months, the hazard rates remain constant over time, but after that, they start to decline in the immunotherapy group (cyan line), yielding a non-constant hazard ratio over time (**Figure 4D**).

The flexibility of *in silico* trials lies in their ability to incorporate complex treatment regimens. For example, let us assume one would be interested in estimating the survival curves and underlying hazard ratio over time of an immunotherapy + placebo-chemotherapy vs. chemotherapy + placebo-immunotherapy trial (**Figure 4E**) or a trial with induction chemotherapy followed by immunotherapy vs. immunotherapy (**Figure 4F**). Mechanism-based immunotherapy trials translate biological assumptions regarding the disease and treatment effects into survival curves (including its hazard rate/ratio estimates). The resulting survival curve shapes, such as crossing survival curves (**Figure 4E**) or a temporary curve separation (**Figure 4F**), may be hard to predict otherwise and can be detrimental to the trial outcome if addressed appropriately.

We emphasize that different models can generate different predictions depending on model assumptions and parameters, as seen in our chemotherapy vs. placebo examples (**Figure 4A, B, C**). Conversely, however, even substantially different models can agree on the essential aspects of the predicted survival curves. For example, despite their differences, our three models all predict the characteristic delayed curve separation of immunotherapy trials (**Figure 4D, Supplementary Figure S3, Supplementary Figure S4)**.

### Using *in silico* trials to select an appropriate outcome metric for measuring a treatment’s clinical effect

A key design decision in a clinical trial is which effect size metric to use to define treatment success. Two common choices are the overall hazard ratio, which is affected by the entire survival data of the trial, and a survival endpoint such as 2-year overall survival (OS), which only depends on the specifically defined time-point. When there is no solid clinical rationale to prefer one effect size measure over the other, statistical considerations such as power become important. To investigate the consequences of choosing hazard ratio or 2-year OS as the study effect size in different immunotherapy scenarios, we determined the power of *in silico* trials by conducting simulations at varying study population sizes.

A potential advantage of using the hazard ratio is its use of the entire survival curve, which can increase power when the PHA is met and detect transient effects even if the PHA is not met. Indeed, when investigating the power of the transient chemotherapy effect generated by model M1 (**Figure 4A**), we found the power to be much greater when using the hazard ratio compared to the power to detect the minimal difference in survival still found after 2 years. The opposite was true when investigating the chemoimmunotherapy vs. immunotherapy scenario using M2 (**Figure 4B**): the power of trials that used the hazard ratio lagged far behind the power to detect a 2-year survival endpoint, as a consequence of the considerable violation of the PHA in this scenario. Indeed, when considering the persistent chemotherapy effect generated by model M3 (**Figure 4C**), a scenario with a substantially lower variation of the hazard ratio, we found the power to be more comparable, although the hazard ratio still had a meaningful advantage. When using M3 to investigate the chemoimmunotherapy vs. immunotherapy scenario, the choice of endpoint made hardly any difference (**Supplementary Figure S5)**.

These results illustrate the critical importance of choosing an appropriate effect size to measure the clinical outcome, which in turn strongly depends on the shape of the survival curves. For established treatments, investigators can rely on their experience or published results to make an appropriate choice; however, the expected survival curve shape might be very uncertain for novel immunotherapies or combinations of existing immunotherapies. In such cases, running various *in silico* trials would help investigators prepare for different plausible scenarios and choose a robust trial design. In our examples, the models agreed that hazard ratio would be a suitable effect size for a chemotherapy vs. placebo trial even if the PHA does not entirely hold, whereas 2-year OS would be appropriate for the chemoimmunotherapy vs. immunotherapy case (**Supplementary Figure S5)**.

*In silico* trials can help to choose endpoints and randomization ratios before trial execution

Clearly, the success rate of novel immunotherapy trials depends on more than its sample size alone. To establish an OS benefit of the treatment arm, it is crucial to analyze the trial once the data have reached a certain maturity – i.e., the treatment needs to be granted sufficient time to induce a survival benefit. We assumed that a delayed curve separation in immunotherapy trials would prolong the follow-up needed to establish an OS benefit of immunotherapy and thereby defer reaching maturity of the trial data. If the therapy is effective, data maturity can be regarded as the time point when a treatment effect can be observed. Hence, an optimal trial endpoint would be the earliest time at which this treatment effect can be detected with sufficient power. Therefore, we analyzed the power of differently sized trials with respect to their OS endpoint. Herein, we distinguished trials that were subject or were not subject to a delayed curve separation (immunotherapy and chemotherapy, respectively). In a classic chemotherapy trial, the treatment effect translates directly to a survival benefit in the treatment arm – the survival curves separate from the start. Therefore, the highest power is obtained after the total duration of the treatment effect (**Figure 6A**, panel 1). In this case, the treatment effect lasts for six months, leading to the 6-months OS as the endpoint with the highest power. The delayed curve separation in immunotherapy trials renders it futile to analyze OS data early on in the trial (**Figure 6B**, panel 1). A practical ramification is that in the presence of a delayed curve separation, the trial requires a sufficiently long follow-up and an adequate size to gain power and detect immunotherapy-specific treatment effects. Mechanism- and simulation-based power calculations with *in silico* trials can consider these specific survival curve features when determining the sample size for upcoming trials.

Given the observation that both the size of an immunotherapy trial and its endpoint heavily influence the probability of finding the survival benefit of interest, we presumed that increasing the size of the treatment arm – i.e., an unequal randomization scheme – would similarly affect the power. Instead of varying the study size, we now varied the randomization ratio (second panel of **Figure 6A/B**). Interestingly, while the power logically depended on the OS endpoint, the randomization ratio did not greatly affect the power (**Figure 6B**). Considering that an unequal treatment allocation may provide ethical benefits, we confirm that the randomization ratio in immunotherapy trials is of secondary importance compared to its size or primary OS endpoint.

In summary, our *in silico* immunotherapy trials replicate existing insights from trial design as to how violation of the PHA affects power and analysis choices. Our ability to directly translate biological assumptions on treatment mechanisms into survival curve shapes allows the trialist to reason deliberately about whether such violations of the PHA would or would not be expected in their specific trial design and how the problem could be addressed if it arises.

### Simulating interim analyses to evaluate the trade-off between patient benefit and trial resources

We have observed a clear trade-off between the power of an immunotherapy trial on the one hand, and the primary OS endpoint, and correspondingly the data maturity, on the other. Luckily, the two are not entirely mutually exclusive: interim analyses have been developed for ethical purposes to establish positive or harmful treatment effects early. However, there is a catch: the necessity to control for multiple testing at each interim analysis lowers the significance threshold on the final analysis to maintain the same overall type I error rate. This raises the question: “How many interim analyses should you plan, and when should you plan them?” Again, well-founded answers to such questions can be obtained with the help of *in silico* immunotherapy trials. To illustrate this, we used M1 to simulate 1000 immunotherapy trials with 1200 patients per trial, randomized 1:1 over immunotherapy with a strong treatment effect or a placebo (**Figure 7A**). In the absence of interim analyses, the vast majority of the trials are predicted to end up positive. Adding interim analyses (O’Brien- Fleming approach) to the equation induces a trade-off. On the one hand, increasing the number of equally-spaced interim analyses increases the probability of early detecting a positive treatment effect (e.g., approximately 60% of the trials are positive after 18 months in the case of three interim analyses; **Figure 7A**). On the other hand, the overall probability of ending up with a negative trial due to more stringent analyses (i.e., less power) also increases, especially in the case of immunotherapies with a weaker treatment effect (±57% without an interim analysis vs. ±63% with three interim analyses; **Figure 7B**). In an actual trial, the latter needs to be corrected by including additional patients to maintain the pre-planned power. Furthermore, we observe that the timing of the interim analysis is crucial. Whereas an interim analysis at 18 months provides additional value to the trial, interim analyses before 16 months are predicted to be wasteful due to non-proportional hazards and less mature data. As a control, we simulated trials without any treatment effect. By design, approximately 95% of the trials should end up negative irrespective of the number of interim analyses, which indeed seemed to be the case (**Figure 7C**). Logically, the weaker the treatment effect, the higher the probability of erroneously finding a harmful treatment effect – a characteristic that the simulation also exhibits (**Figure 7B/C**).

## Discussion

Over the past decade, tumor-immune dynamics have been investigated extensively with *in silico* models. In the early days of cancer immunotherapy, these modeling efforts focused – next to chemotherapy [37] – on cellular immunotherapy [38, 39]. More recently, the field has addressed ICI therapy (e.g., [33, 40, 41]). Models of tumor-immune dynamics have been applied to study pharmacokinetic and therapy dynamics (PK/PD), treatment effects (including mechanism(s) of action, optimizing dosing regimens), treatment combinations, toxicity, biomarker prediction, drug resistance, and drug discovery (see reviews on these topics [38, 42–47]). These extensive modeling efforts by the Mathematical Oncology community have created a rich and valuable methodological resource. The goal of the work described in this paper is to tap into this resource for the purpose of clinical trial design. Once parameterized, a mathematical model can predict likely outcomes of treatments for individual patients; using such a model for trial design requires considering heterogeneity between patients and translating these into likely survival curve shapes for each arm of the trial.

In this study, we leveraged mathematical models to perform cancer immunotherapy trials *in silico*, predicting survival and response profiles of various treatment regimens. Complementary to conventional design methods, *in silico* trials provide the ability to investigate the implications of a researcher’s biological (as opposed to statistical) hypotheses about a drug’s mechanism of action for the design, conduct, analysis, and outcome of clinical trials. When comparing the simulated outcomes to actual immunotherapy trial outcomes, we showed that *in silico* trials are suited to translate complex biological mechanisms (such as those observed during the treatment of patients with ICI) into realistic trial outcomes. Crucially, regardless of the model, the survival curves from these mechanism-based simulations reflected two pivotal components often found in immunotherapy trials: a delayed curve separation and a plateauing tail of the survival curve at later stages of the trial. In line with genuine immunotherapy trials, we find that these immunotherapy-specific response patterns differ considerably from chemotherapy. Our findings confirm that diversity in survival curves profoundly impacts the outcomes of immunotherapy trials [48]. Consequently, these features need to be considered when deciding on the sample size, endpoint, randomization ratio, and the number and timing of interim analyses of a novel cancer immunotherapy trial.

*In silico* clinical trials are gaining popularity in medicine. Such trials enable investigating, among others, how novel drugs, treatment schedules, dosing regimens, and inter-patient heterogeneity affect the outcome of a clinical trial [49]. *In silico* clinical studies have a wide range of applicability from pediatric infectious [50] and orphan diseases [51] to diabetes [52], inflammatory autoimmune diseases [53], traumatic injury [54], psychiatric illness [55], and cancer. In oncology, several *in silico* clinical trials involving chemotherapy and tyrosine kinase inhibitors have been performed [56], [57]. Moreover, with the onset of checkpoint inhibitors, *in silico* immunotherapy trials have gained interest, leading to trials with anti-CTLA- 4-antibodies and anti-PD-(L)1 antibodies [58], [59], [60]. The common denominator in these trials is that they primarily center on dosing regimens and treatment schedules. Herein lies the main difference with our simulation approach: although the ‘key ingredients’ of these approaches are similar – they are based on a mathematical abstraction of a disease mechanism – our trials do not aim to optimize treatment schedules. Instead, we complement traditional design methodology by adding the means to predict key aspects of response and survival *a priori* to steer design decisions of novel immunotherapy trials. These trials differ from traditional trial design research in that these, often statistically-grounded, approaches simulate clinical trials based on population-level assumptions (e.g., with particular distributions of survival times, study durations, or with a specific censoring mechanism). Examples of such high-level simulation approaches include, but are certainly not limited to, studies aiming to calculate the sample size and power of clinical trials [61], [62], [63]. Since these methods lack a direct link to the underlying biological disease mechanism, interpreting their parameters for individual trial participants is difficult or even impossible. In contrast, *in silico* trials are founded on biological assumptions but then translate these assumptions into statistical concepts such as hazard ratios. In this manner, simulated trials encourage an interdisciplinary discussion about the design of an upcoming trial.

*In silico* clinical trials are applicable in several settings. First, they provide the means to verify clinical trial and treatment assumptions before investing extensive amounts of work and funds into the development and execution of a clinical trial and can, thereby, function as a proof of principle of the soundness of the hypotheses for an upcoming trial. Scrutinizing each aspect of the trial design might lead to better design decisions and reduce unanticipated outcomes. Moreover, this mechanism-based approach does not necessitate a deep understanding of complex mathematical theorems; instead, it requires a biological understanding of a disease. This mechanistic basis is intuitive, which benefits the communication between clinical doctors and biomedical researchers on the one hand and statisticians and clinical trialists on the other. Additionally, *in silico* trials might serve as excellent educational tools. The ability to simulate a wide range – from basic to highly advanced – research questions can be exploited in teaching activities for entry-level clinicians to experienced trialists. A final implication, which holds for any trial simulation, is that they may provide some degree of insight when conventional clinical trials are unfeasible due to practical or ethical constraints (e.g., clinical trials in rare diseases, pediatrics, or critical care medicine).

Nonetheless, *in silico* clinical trials have to be considered in light of some limitations. The most critical limitation is universal to any scientific model, whether *in vitro, in vivo*, or computational: the immunotherapy trial outcomes depend heavily (if not entirely) on the biological assumptions of the model, meaning that incorrect interactions or erroneous parametrization of the model can lead to inaccurate predictions. The parameterization, in particular, might pose a problem: given the often novel treatment mechanisms, data to fine-tune the parameters of the model accurately might be scarce. In these cases, the simulation itself can be used as a sensitivity analysis to assess to what extent a certain parameter range, or the structure of the model itself, influences the robustness of the predictions. Our use of three different models in this paper can be seen as such a type of sensitivity analysis; indeed, despite the major differences, it was reassuring to observe that the models often agreed when it came to the critical qualitative aspects of the predicted survival curves.

In addition, while ODE models can be rather simple and intuitive to understand, translating biological principles into an ODE model and implementing it into a simulation requires thorough knowledge of computational methods, potentially limiting its widespread applicability. To address these limitations, we have made our model implementations available as (1) an interactive website that can be used without installing any software and without any programming knowledge (https://computational-immunology.org/models/immunotherapy-trials/); (2) an R package allowing to run simulations without requiring knowledge of ODEs and their solutions.

In summary, *in silico* cancer immunotherapy trials offer a versatile approach to simulate immunotherapy trials based on biological assumptions. As a simulation tool, they facilitate the scrutiny of trial design decisions to optimize the probability of a successful immunotherapy trial and contribute to high-quality research for cancer patients.

## Methods

### Mechanism-based models of the tumor microenvironment

We implemented three ordinary differential equation (ODE) models of tumor-immune interactions: one from our previous work [27] and two by other authors [28, 29]. We first describe the common aspects of the models, then explain the differences and show the model equations. All models describe cancer onset and progression, and we initialize each model by seeding a single growing tumor cell. This tumor cell divides, leading to a proliferating mass of tumor cells. The parameter *ρ* controls the grows rate. Within the tumor microenvironment, an anti-tumor immune response induces cytotoxic T cells to kill tumor cells at rate *ξ*. Intratumoral T cells die at a rate *δ*. In these models, the rate at which T cells are activated and/or proliferate depends initially on the tumor size: an early stage microscopic tumor presents fewer antigens than a larger – but still small – tumor. However, antigen presentation saturates as the tumor grows further (scaling factor *T* /(*g* + *T*) in Equations 3,4,3,9). Thus, four model parameters are shared between the models. Depending on the parameter values, it is possible that the immune response eliminates the tumor or that the tumor escapes and grows in an uncontrolled fashion.

We now discuss the model equations and parameters. In all models, we denote the number of tumor cells by *T* (Equations 1,5,8) and the number of intratumoral T cells by *I* (Equations 2,6,9). Compared to their original versions, variables and parameters in the equations below have been renamed, and the units of some parameters scaled to make the models easily comparable.

**Model M1** is based on our previous work [27] and has the following equations:

We implemented a tumor growth rate that is slower than exponential growth. This is a common modeling choice based on data and the biological premise that a growing tumor needs to sustain itself with nutrients. A common method to implement a sub-exponential growth, which we adopt here, is to raise the number of tumor cells to the 3/4th power to obtain the number of actively dividing cells [64]. We had previously modeled slightly faster-growing tumors using the less common power 4/5th [27]. However, given that the other two models already implement faster-growing tumors, we here use the more common, slower one. The killing of tumor cells is implemented using a double saturation model [65] parameterized as proposed by Gadhamsetty et al. [66] (Michaelis constants *h*_{T} and *h*_{I}). The double saturation model reflects that T cell killing of tumor cells takes hours [67]. The immune cells within the tumor microenvironment originate from tumor-draining lymph nodes, where naive cytotoxic T cells (*N*, Equation 4) turn into activated T cells (*S*, activation rate *α*_{N} ; Equation 4). Activated T cells proliferate at rate *p*_{S} and migrate to the tumor microenvironment to become infiltrating T cells (*I*). The migration step leads to a slight delay between T cell activation and tumor cell killing on the order of days (*m*_{S} = 1day^{−1}). If desired, the distinction between lymph node and tumor microenvironment sites could be removed for simplicity, given that the migration takes place on a faster timescale than the immune response.

**Model M2**, proposed by Tsur *et al*. [28], conceptually differs from M1 in five aspects. First, its tumor growth is unrestricted exponential. Second, the anti-tumor response saturates with increasing numbers of tumor cells but not with increasing numbers of T cells. Third, it explicitly represents antigen-presenting cells, called *A* (Equation 7), which are recruited at rate *α*_{A} in response to the tumor growth. Fourth, its T cells do not proliferate but are produced at a capped rate. Fifth, it does not distinguish between T cells in the lymph node and intratumoral T cells. As mentioned above, this is likely not critical. The model equations are as follows:

**Model M3** was recently proposed by Bekker *et al*. [29]. It has two equations representing tumor cells and T cells. It resembles M2 in that tumor growth is initially exponential, but there is a maximum capacity for tumor cells (logistic growth). Killing dynamics follow a “mass-action law” (i.e., there is no saturation of the killing rate like in M1 and M2). Further, it includes a term for tumor size-dependent T cell exhaustion. This modeling choice leads to oscillating numbers of T cells and tumor cells in many parameter regimes. The model equations are as follows:

We emphasize that M3 has been presented by Bekker *et al*. [29] as an abstraction of the general mechanisms underlying immunotherapy similar to M1; neither model claims to fit specific time-resolved data. Nevertheless, we included it as we were interested in the impact of the different modeling choices.

### Model parameters

**Table 1** shows an overview of the parameters in the three models. Four parameters appear in every model, but note that this does not necessarily mean that the parameters can be interpreted in the same way. For example, in a model where killing saturates in a scenario where there are many more tumor cells than T cells (M1 and M2), the same value of the killing rate will lead to less effective killing than in a model where there is no such saturation (M3). Other parameters are model-specific. To improve the inter-model comparability and reduce the potential for over-fitting, we left the parameters in all models fixed except the tumor growth rate *ρ*, which we varied to obtain heterogeneous patient populations.

Parameter values for M1 and M2 were taken from earlier publications [27, 28], where the biological reasoning underlying these values is explained, and references are provided. Differences in model structure, and in experimental data being referred to, yield extensive variation in parameter values (**Table 2**). The variation in *ρ* is just a consequence of the different tumor growth models, which give a different meaning to the parameter in each model. Despite the differences, the values actually lead to comparable growth kinetics. The biggest quantitative differences are in the killing kinetics. M2’s killing rate *ξ* is three orders of magnitude smaller than M1’s, but M2 compensates for this by saturating the killing at a number of tumor cells that is five orders of magnitude higher than M1’s. Overall, the amount of cells being killed when the immune system is active and the tumor exceeds the diagnosis threshold is comparable across the models.

M3 was not explicitly parameterized by the authors [29]. Therefore, we set its parameters to the same values as in M1 or M2 as much as possible. For instance, because both M2 and M3 contain essentially unrestricted exponential growth of the tumor cells until M3 approaches the carrying capacity, we used the value for the tumor growth rate in M2 for M3. For the killing rate, we used a value that gave similar killing speed as M1 for tumors containing 10^{9} − 10^{10} T cells. Note that due to the saturation term in M1, the killing is faster in M3 for larger tumors and slower for smaller tumors. Two parameters, the T cell exhaustion rate and the carrying capacity, were unique to M3. We set both to values that lead to a small influence of the corresponding terms on the simulation result and obtained comparable kinetics to the other two models at those parameter settings.

### Simulating untreated disease, chemotherapy, and immunotherapy in individual patients

Using ODE models, we can implement different cancer immunotherapies in two general ways: (1) by changing model parameters; (2) by adding or removing cells at a certain time.

Using this ODE model, we simulated cancer development and disease trajectories in patients. We extensively varied the tumor properties (i.e., the tumor growth rate, the growth rate decline, and the decline decay rate) between patients to generate interpatient variation in disease courses.

Each patient is simulated from cancer onset (i.e., malignant transformation of the first cell) for up to ten years. As argued previously [27], we start from a diagnosis threshold of a tumor mass of 65 * 10^{8} cells, corresponding to the size at which common malignancies are diagnosed [68–70]. The lethal tumor burden is set to 10^{12} tumor cells (a tumor volume of approximately 10.6 dm^{3}). Since we expect both thresholds to vary considerably between patients, depending, for example, on the timing of doctor visits or a tumor’s location, we implement them as random variables that change with every simulation. Specifically, every threshold is drawn from a log-normal distribution with a 4*s* range of one order of magnitude. The upper 2*s* point (95.45% quantile) is set to 65 * 10^{8} for diagnosis and 10^{12} for death.

Disease trajectories of patients with cancer can be steered with therapy. In our model, treatment is implemented by changing the model parameters once the tumor exceeds the diagnosis threshold, as we assume this is when treatment starts. Given their prominent roles in many oncological treatment plans, we included immune checkpoint inhibitors (ICI) and chemotherapy in the models. Both treatments function through their primary modes of action. ICI are implemented by increasing the killing rate of cytotoxic T cells (i.e., the parameter *ξ*) in M1 and M3. In M2, it is implemented by increasing the T cell activation rate *α*_{A} and decreasing the death rate *δ*; for simplicity, we restrict this such that the fold increase of *α*_{A} equals the fold decrease of *δ*. These changes are implemented directly after diagnosis and remain active for the rest of the simulation unless stated otherwise. The duration and potency of the ICI treatment (as measured by the magnitude of the change of the affected parameters) eventually determine patient outcome.

In patients treated with chemotherapy, the immune system is still present; however, it is not boosted (as is the case during ICI treatment). Hence, the T cells are not potent enough to curb tumor growth. We implement the cytotoxic capacity of chemotherapy in the models uniformly by reducing the tumor growth rate (parameter *ρ*) to a smaller number. Again, the duration and potency (as measured by the reduction in tumor growth rate) determine patient outcome. By default, the treatment duration for ICI and chemotherapy are two years and six months, respectively.

### Simulations of patient cohorts and parameter fitting

To generate heterogeneous patient populations, we draw each patient’s growth rate parameter *ρ* from a log-normal distribution. Depending on the parameter, the simulated patient’s tumor may clear spontaneously; such results are discarded (rejection sampling). When the tumor reaches the diagnosis threshold, we apply ICI, chemotherapy, a combination of chemotherapy and ICI, or we leave the patient untreated (i.e., a placebo treatment). Therefore, each patient cohort (**Figure 3**) is characterized by two to four parameters: mean and standard deviation of the log tumor growth rate, immunotherapy treatment effect size, and chemotherapy treatment effect size. These two to four parameters can be fitted to a given dataset.

Due to the stochastic nature of our model, we used an approximate Bayesian computation / sequential Monte Carlo (ABC- SMC) algorithm [71] to fit the parameters. As the test statistic for ABC-SMC, we used the root mean squared difference (RMSD) between model-predicted and data-estimated survival curves (i.e., Kaplan-Meier curves) evaluated for each month in a 2-year time window upon diagnosis. We set the sample size for generating the model-predicted survival curve to the same number of patients that is contained in the data being fitted. **Figure 3** and **Supplementary Figure S1** show, for each model, the simulation that achieved the lowest RMSD to the target data during each ABC run.

We applied the ABC-SMC algorithm to all three patient cohorts shown in **Figure 4**. When examining the posterior distributions of the parameters, we found that a wide range of chemotherapy effect values achieved comparable RMSD values for each model – which is not surprising, given that a higher baseline growth rate combined with a higher chemotherapy effect leads to similar predicted tumor growth as a lower baseline growth rate combined with a lower chemotherapy effect. We therefore performed a further set of fits to the CA184-024 data where we kept the chemotherapy effect values fixed at 0.6 for M1 and M2 and at 0.75 for M3, respectively – values that were chosen to obtain comparable and realistic impacts of chemotherapy on the 2-year OS curves (**Figure 5A**), and were plausible given the posterior distributions. We then again fitted the remaining three parameters to the CA184-024 data using ABC-SMC. By estimating the mode of the joint posterior distribution using kernel density smoothing, we obtained the parameter values shown in Table 3.

### Simulating late-stage immunotherapy trials

Late-stage (i.e., phase III) clinical trials traditionally contain two arms: a control arm and a treatment arm. The control arm can be a placebo (i.e., untreated) or a standard of care therapy. To construct phase III *in silico* immunotherapy trials, we extended the simulations with treatment cohorts (mono-chemotherapy, mono-immunotherapy, chemoimmunotherapy, or induction chemotherapy followed by immunotherapy). These cohorts facilitate the comparison between various treatment regimens. The treatment cohort uses the same baseline distribution of tumor growth parameters as the control cohort. Upon reaching the diagnosis threshold, up to two different treatments are applied in each arm; patients can be treated with chemotherapy, ICI, combination therapy, or left untreated (as described above). Unless otherwise specified, the baseline distribution of tumor growth parameters was derived from the most mature, digitized data from the CA184-024 trial, as shown below [35].

The primary endpoint of the trials is the 2-year OS. Given the absence of accrual times in *in silico* trials, the trial duration equals two years, providing each virtual patient with 24 months of follow-up at the time of analysis. If the OS endpoint is not reached for a patient (i.e., the patient’s tumor burden does not reach the lethal volume within the time frame of the simulated trial), the patient is considered censored for the endpoint and regarded as such in subsequent analyses.

### Power and interim analysis simulations

To illustrate how the analysis method can affect the outcome of immunotherapy trials, we use several simulation approaches to calculate the power of trials. Power simulations were performed as follows: a varying number of clinical trials were simulated per data point. The survival data from each trial was analyzed with a log-rank test (dependent on the proportional hazard assumption) or proportions test (Pearson’s chi-squared test; independent of the proportional hazard assumption), and we counted the number of positive trials (defined as *p <* 0.05). The percentage of positive trials indicates the power of the trial. A harmful trial is defined as a positive trial with an effect size that favours untreated patients.

### Data digitization & reconstruction

For some survival curves, the raw data was not available. Therefore, we extracted data points from the Kaplan-Meier curves with WebPlotDigitizer 4.6 (https://apps.automeris.io/wpd/), and individual patient data was reconstructed with the IPDfromKM package in R.

### Analyses

Analyses and visualizations were performed in R. The complete list of R packages used throughout this manuscript is provided in **Supplementary Table S1**. The R code used to perform all analyses is available at https://github.com/jtextor/insilico-trials.

## Data Availability

The code is available at GitHub: https://github.com/jeroencreemers/in-silico-clinical-trials

## Data availability

All simulated data used to generate the figures is available at this paper’s GitHub repository at https://github.com/jtextor/insilico-trials. The digitized survival data from the CA184-024 and CheckMate 066 trials is also available at the same repository.

## Code availability

C++ code that implements models M1, M2 and M3, and an R package that wraps the C++ code using Rcpp is available at this paper’s GitHub repository at https://github.com/jtextor/insilico-trials/models/TumorImmuneModels/. The R code used to perform all analyses and generate all plots is also available at the same repository. An interactive, web-based implementation of our models, written in Javascript and HTML, is available at https://computational-immunology.org/models/immunotherapy-trials/.

## Declarations

### Ethics approval and consent to participate

Not applicable.

### Consent for publication

Not applicable.

### Competing interests

Not applicable.

### Funding

JC was funded by the Radboudumc. CF received an ERC Adv Grant ARTimmune (834618) and an NWO Spinoza grant. IV received an NWO-Vici grant (918.14.655). JT and AA were supported by a Young Investigator Grant (10620) from the Dutch Cancer Society. JT and GS were also supported by NWO grant VI.Vidi.192.084.

### Author contributions

JHAC and JT conceived this study. JHAC, AA and JT performed the experiments. JHAC and JT wrote the manuscript. All authors provided feedback on the manuscript and reviewed the manuscript prior to submission.

## Supplementary Figures

## Acknowledgements

We thank Shabaz Sultan for his help with implementing the web-based frontend for performing simulations.

## Footnotes

We have added two additional mathematical models of cancer immunotherapy by other authors to the manuscript and we show that despite considerable differences between the models, the results are indeed broadly similar. We simplified our initial mathematical model (now model M1) and re-did all analyses. We account for a likely treatment effect when fitting our models to the lung cancer cohort. To make the methodology proposed in this paper more readily useable, we provide both a simple web-based implementation (usable by anyone with a web browser) and a fast R package to perform in silico immunotherapy trials using the three mathematical models we implemented.

## List of abbreviations

- ICI
- immune checkpoint inhibitor
- NCCTG
- North Central Cancer Treatment Group
- ODE
- ordinary differential equation
- OS
- overall survival
- PHA
- proportional hazard assumption

## References

- [1].↵
- [2].↵
- [3].
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].
- [18].
- [19].↵
- [20].↵
- [21].
- [22].
- [23].↵
- [24].↵
- [25].
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].
- [44].
- [45].
- [46].
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].↵
- [67].↵
- [68].↵
- [69].
- [70].↵
- [71].↵
- [72].
- [73].
- [74].
- [75].
- [76].
- [77].
- [78].
- [79].
- [80].
- [81].
- [82].