Estimating the Financial Impact of Gene Therapy

We assess the potential financial impact of future gene therapies by identifying the 109 late-stage gene therapy clinical trials currently underway, estimating the prevalence and incidence of their corresponding diseases, developing novel mathematical models of the increase in quality-adjusted life years for each approved gene therapy, and simulating the launch prices and the expected spending of these therapies over a 15-year time horizon. The results of our simulation suggest that an expected total of 1.1 million patients will be treated by gene therapy from January 2020 to December 2034. The expected peak annual spending on these therapies is $25.3 billion, and the total spending from January 2020 to December 2034 is $306 billion. We decompose their annual estimated spending by treated age group as a proxy for U.S. insurance type, and consider the tradeoffs of various methods of payment for these therapies to ensure patient access to their expected benefits.


Introduction
Gene therapy is a new class of medical treatment that alters part of a patient's genome through the replacement, deletion, or insertion of genetic material to treat a disease. While still in its infancy, gene therapy has demonstrated immense potential to treat and even cure previously intractable diseases. The introduction of voretigene neparvovec (marketed as Luxturna ® ) for inherited retinal disease and onasemnogene abeparvovec-xioi (marketed as Zolgensma ® ) for spinal muscular atrophy (SMA) in the U.S. have improved the lives of patients [115,124]. Yet the price per treatment of $425,000 per eye for Luxturna, and $2.1 million per patient for Zolgensma, have raised concerns regarding affordability among budget-constrained payers and patients alike.
Stakeholders have expressed concerns that gene therapy will be too expensive for individual patients to afford on their own, especially if they continue to be priced at more than 30 times the median household income of $61,937 [104], (equivalently, at several multiples of the average U.S. home mortgage of $354,400 [126]). Insurance coverage for gene therapy also varies by policy type. Many insurance policies do not cover access to gene therapy, or they impose very restrictive policies to limit the number of patients who might be treated [78,156]. Widespread underinsurance in the U.S. [117]-requiring substantial out-of-pocket costs in the form of deductibles and coinsurance payments-may place gene therapy out of reach for American patients who might benefit from treatment. Many health plans, especially those facing fixed annual budgets, including state Medicaid policies, some employer-offered plans, and insurance offered on the federal and state-sponsored exchanges, have warned they may not be able or willing to absorb the additional spending should a greater number of people become eligible for expensive gene therapy once it reaches the market [159]. While some spending on novel gene therapies would likely be paid for by Medicare, the taxpayer-supported health insurance for Americans over the age of 65, other spending on these treatments might have an impact on the wages that private corporations pay to their employees [70].
In this paper, we estimate the potential fiscal impact of gene therapy on the U.S. market. To do so, we create a new model to estimate the future number of gene therapy approvals, the size of their potential patient populations, and the prices of these future treatments. We begin by surveying the clinical trial databases for late-stage gene therapy trials, defined here as phase 2/3 or phase 3, and compile the prevalence and incidence of the diseases targeted in these trials from a meta-analysis of published sources. We develop a novel method to estimate the price of each gene therapy under consideration by calculating the expected qualityadjusted life years gained for each therapy in the relevant patient population. Combining 1 these results and previously published probabilities of technical success by therapeutic area, we simulate whether a disease will have an approved gene therapy over the next fifteen years, the expected number of treated patients and the expected spending from January 2020 to December 2034. Throughout this paper, the expected number of treated patients refers to the pool of eligible patients that remains after accounting for market penetration and the treatment schedule, explained in more detail in Section 2. 4. Our results suggest that an estimated total of 1.09 million patients will be treated with gene therapy by the end of December 2034. The number of patients receiving gene therapies annually will peak at 94,696 patients in 2025 before declining to 65,612 at the end of our time period. The annual spending across all expected products and patients is expected to reach $25.3 billion in 2026. The cumulative spending on treating these patients in the 15-year period is estimated to be $306 billion, or $241 billion when discounted by 3% per annum to 2020 dollars. We decompose the expected spending by patient age group as a proxy for insurance type. For the expected spending on these therapies that is not covered by Medicare, we consider several alternate ways of financing.

Simulation Design
A critical element in our simulation analysis is the number of gene therapies that will receive regulatory approval over the next few years. Therefore, we begin with a brief review of the approval process in the U.S.
Since the passing of the Food, Drug, and Cosmetic Act in 1938, pharmaceuticals developed by companies have to be reviewed by the Food and Drug Administration (FDA) for safety and efficacy before they can be marketed in the U.S. The application for marketing approval differs slightly by the type of therapy: New Drug Applications (NDAs) are for small molecules, and Biologics License Applications (BLAs) are for biologics. Gene therapy is considered a biologic product, hence the BLA designation applies.
Clinical investigations in human subjects typically take place in three phases-phases 1, 2 and 3-before marketing approvals are sought. Phase 1 trials are designed to investigate the dosage and safety of the treatment, while phase 2 trials attempt to detect early signs of efficacy and possible side effects in a relatively small sample of patients. Phase 3 trials are intended to demonstrate a statistically significant treatment effect when compared to the best standard of care in a broader population of patients. Some clinical trials combine multiple phases into a single design, with the phase numbers separated by a slash. For example, a phase 2/3 trial combines elements of phase 2 and phase 3 investigations into a single trial design in order to reduce the overall development time and cost, and maximize the participation of subjects with orphan disease willing to participate in trials. The clinical development of therapeutics is a tedious and costly process that may span decades and cost billions of dollars, with the bulk of the cost and time spent conducting phase 3 clinical trials [63,84]. The process is also very risky, with only 13.8% of therapeutic development programs entering phase 1 reaching approval [166].
To estimate the financial impact of gene therapies on the U.S. healthcare system, we first identify all existing late-stage clinical trials of gene therapies, simulate their successes or failures from phase 2/3 or 3 to approval, then estimate the spending on the successful ones by summing the product of their expected prices and number of patients, as outlined in Figure 1. By using simulation analysis rather than purely deterministic methods, we are able to capture the inherent uncertainty in costs, revenues, and other parameters of this new therapeutic class.
We organize the simulation into the following five distinct modules, and describe each of these in some detail: (1) identifying the number of gene therapies currently in clinical trials; (2) estimating the probabilities of success of these trials; (3) estimating the time to approval; (4) simulating the expected number of patients treated by these therapies if approved; and (5) estimating the expected market prices of the approved therapies. We describe the first four modules in Sections 2.1-2.4. Given the importance and potential controversies surrounding the pricing of gene therapies, we devote a standalone section to this issue in Section 3.

Clinical Trial Data
We use clinical trial metadata from the Citeline TrialTrove database and the U.S. National Library of Medicine's ClinicalTrials.gov database to determine the number of gene therapies currently under development and their potential number of patients.
We download data from the Citeline database, isolating trials tagged with 'gene therapy' under the 'therapeutic class' field. We supplement this information by searching for trials on the clinicaltrials.gov main page using the key words 'gene therapy', then reading the trial description to determine if the trial is in fact related to a gene therapy. All database queries were made before May 31, 2019. Clinical trials from both sources are merged before filtering for clinical trials that are in either phase 2/3 or phase 3 of the development process and are not known to be compassionate uses of the treatment. 1 The outcomes of compassionate use are rarely used as data points in the clinical development process. Even though adverse events from compassionate use are reported to the FDA and may, in rare cases, be used to Figure 1: A flowchart showing the performance of the simulation. After extracting the information on each disease from the clinical trial databases, we simulate whether the disease will obtain an approval. If it fails to do so, the simulation will end for this disease in this iteration. Otherwise, we will estimate the expected number of patients to be treated, compute the corresponding cost of treatment, and store the results. At each step of the computation, we source data from literature and impute missing information.
characterize the risk and benefits of a therapy, the FDA is cognizant that these uses often occur outside of clinical trial settings, and has almost never given an unfavorable decision to a product labeling because of an adverse outcome of compassionate use [88,116]. We include clinical trials without U.S. trial site in our dataset because it is currently possible for the FDA, as empowered by Federal administrative law 21 CFR Part 312.120, to grant marketing approval using evidence from foreign clinical trials [68].
Our filtering criteria are intended to remove trial entries unrelated to the clinical development process, and to isolate gene therapies that are most likely to seek regulatory approval in the U.S. in the near future.
We remove repeated entries, and identify the diseases and therapeutic areas targeted by each gene therapy. Each clinical trial entry in our dataset contains a brief title of the trial, its clinical phase, the disease being targeted, the start and end dates of the clinical trial, the therapy name, and the companies involved in the clinical trial.
This process yields 109 trials investigating 57 distinct diseases, listed in Table A1 in Supplementary Materials. We classify the diseases into three categories: cancer (oncology), 2 rare disease, and general disease. The distribution of disease and the clinical trials by category and therapeutic area are shown in Table 1 and Table 2. The majority of trials and diseases are in the area of oncology, followed by rare diseases. These therapeutic areas are notoriously risky. Only 3.1% and 6.2% of the drug development programs in oncology and rare diseases go from Phase 1 to approval, respectively, compared to the baseline of 13.8% across all drugs and indications [166].

Probability of Success Estimates
We define a gene therapy development program as the set of clinical trials by a sponsor testing a therapeutic for a disease. We consider whether a gene therapy will be developed for a disease by simulating correlated coin flips' for each gene therapy program, and observing if there is at least one approval. Our computational method assumes that clinical trials are always perfectly correlated within the same development program. This is logical, since the FDA requires findings from at least two pivotal trials in the BLA review process [171] 3 . Our assumption can also be justified for clinical trials run by the same sponsor that target the same disease, but for different patient segments. We reason that the sponsors are risking multiple expensive latestage trials for the same disease, thus have confidence that the treatment will work on all patient sub-populations, and therefore any marketing licensing approval (or denial) will be similar for all the patient segments. It can be argued that different gene therapy treatments for a disease are highly correlated, since they operate on similar platforms (e.g. CAR-T or in-vivo gene delivery using adenoassociated virus vectors), even though different gene sequences may be targeted. To reflect this, we consider a correlation of 90% between development programs in our simulations. Our sensitivity analysis, however, demonstrates that these computations are insensitive to this parameter (see Section 4.3).
The phase 3 to approval probability of success (PoS 3A ) for each disease is informed by prior studies on the probabilities of success of drug development programs by therapeutic area from the MIT Laboratory of Financial Engineering's Project ALPHA website [97]. These estimates for the probabilities of success are derived from over 55,000 drug development programs between January 2000 and January 2020, and computed using the path-by-path methods as introduced in Wong et al. [166]. The PoS 3A values used in our simulations are given in Table 3 and the mapping of diseases to therapeutic areas is shown in Table A2.

Time to Approval
We also require an estimate of the time to approval for gene therapy treatments in order to assess the patient impact and cost over time. Typically, companies submit their Biologics License Application (BLA) to the U.S. Food and Drug Administration (FDA) some time after the end of the clinical trial period. We assume that the time between the end of the last clinical trial for the disease and the submission of the BLA is a variable drawn from 6 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 31, 2020. ; a triangular distribution between 0 and 365 days, with a median of 182.5 days. This is informed by the practical knowledge that it takes an average of 6 months to prepare the documents for the BLA submission [166].
In addition, there will be a lag time between the submission of the BLA and the decision of the FDA. The FDA has 60 days to decide if it will follow up on a BLA filing [91], and it can take another 10 months to deliver its verdict [90]. This implies the maximum possible time between BLA submission and FDA approval will be 12 months. We thus assume that the time between the BLA submission and the FDA decision is drawn from a triangular distribution between 0 and 365 days, with a median of 182.5 days. Our assumptions are also valid for therapies that use the priority review pathways.
We also assume that the BLA will be filed only after the last clinical trial for a disease has ended. Trials with missing declared end dates will have their end dates imputed by adding random durations to the trial start date, drawn from a gamma distribution fitted to clinical trials with complete date information in our data (see Figure 2). Diseases with a prior approved therapy will automatically be considered to be approved as of January 1st 2020. For some diseases, the last clinical trial will have ended more than three years ago (i.e., before January 2017). For diseases that match this criterion, we treat them as though they have failed.

Number of Patients
The second module simulates the number of new and existing patients that will be treated over time, conditioned on the disease receiving an approved gene therapy.
We consider only the superset of the patient segments listed in the clinical trials for each disease. For example, if there are two clinical trials, one targeting 'patients above the age of 40' and the other targeting 'patients above the age of 18', we only consider the latter when estimating the patient population for the disease. If insufficient information about the sub-population is given, we assume that all the patients with that disease are eligible. The proportion of patients who are eligible for treatment and are willing to do so will be taken into account later, as we will explain in Section 2.4.

Incidence and Prevalence
For the number of currently affected patients and the number of new patients per year for each indication, we source medical journals and online data repositories, such as the Surveillance, Epidemiology, and End Results (SEER) website and cancer.net. If we are able to find an estimated patient population, we cite it directly. Otherwise, we multiply the prevalence and incidence rates by the population of the U.S., which we take to be 327.7 million [74]. When necessary, we also make the assumption that the female to male ratio is 1:1.
In cases in which we are able to find estimates for the disease incidence but not the prevalence, we combine the incidence of the disease (i.e., i new patients a year) and the disease survival rate (i.e., p% of the people with a disease will be alive after k years) to obtain the steady-state estimate of the prevalence (j) using Equation Equation 1. The incidence can also be estimate from the prevalence by rearranging Equation Equation 1 to  yield Equation 2.
The equations can be derived by assuming that the number of patients will be constant through the years at a level j. Since ki new patients are added over k years and j(1 − p) patients that will die over the same period, ki = j(1 − p) for the number of patients to be constant over time. Rearranging this equation will yield Equation 1 and Equation 2. The number of patients for each disease are presented in Table A3 in the appendix. We adjust 8 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 31, 2020. ; these estimates to avoid double-counting in cases of overlapping patient populations, e.g., the number of patients for 'Spinal Muscular Atrophy' is the difference between 'Spinal Muscular Atrophy' and 'Spinal Muscular Atrophy I' (a sub-category of the former).

Treatment of Patients over Time
In our simulation, we assume that newly diagnosed patients are treated immediately upon diagnosis. We further assume that the proportion of existing patients who seek treatment do so in such a way that the existing stock of patient declines exponentially, with a half-life of λ. Mathematically, the proportion of existing patients that seek treatment between time t and t + δ after approval is given by E(t, δ, λ), where: In the face of limited information, we assume that 25% of the existing stock of patients will seek treatment in the first year of our simulation. This requires that the half-life be set to 28.91 months, which in turn implies that 95% of all patients who are diagnosed prior to the approval of the gene therapy want treatments within 10.5 years. Admittedly, this is just an assumption, and we perform a sensitivity analysis to determine its impact on our results in Section 4.3. Not everyone who seeks treatment will be given one; the effective number of patients treated is determined by the patient penetration rate, as we shall describe next.

Patient Penetration
It is unlikely that all the patients under consideration will receive gene therapy treatments. This may be due to ineligibility, lack of awareness of the treatment, or simply lack of interest in gene therapy, among many other reasons. We term the percentage of the patients that receive gene therapy treatments the 'patient penetration rate', and model it using a ramp function, ρ(t, Θ max , T max ). The ramp function is frequently used by the industry to model the rate of adoption of a product or technology [142], and is given by: An illustration of the ramp function is given in Figure 3. Θ max and T max are assumed to follow Gaussian distributions N (µ θ , σ 2 θ ) and N (µ T , σ 2 T ), respectively. The parameter settings are listed in Tables 4 and 5. When setting µ θ and µ T , we need to take into the account the nature of the diseases. At one extreme, we have rare Time (t) Proportion of patients treated diseases, which are often life-threatening, and affect a relatively small number of people. Faced with these prospects of survival, more patients are willing to enroll in new treatments quickly after they are approved. In addition, since the number of patients is relatively small, insurers are more willing to cover these therapies and manufacturers are more able to cope with a larger proportion of patients. Given this, we assign a high value of 40% for µ θ , and a low value of 6 months for µ T .
On the other hand, general diseases are seldom deadly, and affect a large number of patients, possibly even in the millions. Since an acceptable standard of care is available for these conditions, patients may be less inclined to use new treatments due to fear of the unfamiliar. In addition, it is often financially difficult for insurance companies to cover so many patients. We thus assume that the maximum penetration rate will be 1%, and the ramp-up period, 5 years.
As an intermediate case, cancers have characteristics that fall between these two extremes, but in general, they are more similar to rare diseases. We therefore assign values of 10% and 12 months to the maximum penetration rate and ramp-up period, respectively. All variances are set to 10% of the means to model moderate uncertainty in our numbers. They do not affect our mean estimates of the number of impacted patients or spending on gene therapy.
The net number of patients to be treated for the disease at time t after the approval of a gene therapy is given by: We do not consider the effect of market competition among different therapies for the same disease and patient groups on the number of treated patients. In our model, there  Table 4: Parameter settings for Θ max ∼ N (µ θ , σ 2 θ ).
Classification µ T σ 2 T General 60 6 Rare Diseases 6 0.6 Cancer 12 0.12 Table 5: Parameter settings for T max ∼ N (µ T , σ 2 T ). We consider the severity of the disease and the number of patients when making the assumptions.
is only one approval per disease, and a fraction of the eligible patients will receive that treatment.

Pricing
The cost to the healthcare system of providing the gene therapy for a disease for all patients being treated at time t after approval is given by C(t), where The price of each treatment is crucial to computing the expected total spending, and a source of considerable controversy because of the high price of gene therapies relative to many conventional therapeutics. The Institute for Clinical and Economic Review (ICER)an independent nonprofit organization that aims to evaluate the clinical and economic value of healthcare innovation-has advocated pricing drugs and gene therapies by the relative risk and benefit to the patient. This is typically done by comparing the quality-adjusted life years (QALY) with and without the treatment, then multiplying the change in QALY (∆QALY) by a constant, typically set between $50,000 and $150,000 per ∆QALY [135].

Price of gene therapy
ICER has published reports containing its estimates of QALY gained by patients with vision loss associated with biallelic RPE65-mediated retinal disease following treatment with 11 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 31, 2020. ; Luxturna ® [124], and with SMA Type I following treatment with Zolgensma ® [82]. These reports compute ∆QALY using the results of clinical trials to make informed estimates about the potential improvements in the quality of life and life expectancy of the patients.
While ICER's methods are considered by some stakeholders to be the gold standard for this type of calculation, replicating its methods for all the clinical trials under consideration is not feasible in this paper, given the fact that all the clinical trials in our analysis are still pending. As an alternative, we develop a mathematical model to estimate the expected increase in QALY for each disease in our sample.

Estimating ∆QALY
We consider a representative patient who is expected to live to the age of x with a probability of l(x). The function l(x) is also known as the survival curve of the population. The patient enjoys a quality of life, f (s, x), that is dependent on his age, x, and his state of health, s. The expected QALY of a typical person in the baseline state of s 0 (the 'healthy' state) can be computed by integrating l(x)f (s 0 , x) over x.
Suppose that the patient is afflicted with a disease at time a, which changes his survival curve after time a from l(x) to l(x). Likewise, his quality of life after diagnosis changes from f (s 0 , x) to f (s d , x). This patient will then have an expected QALY of: The change in the expected QALY due to the disease can then be expressed as: It is customary in the literature to incorporate time preferences into the model. This is done by multiplying the integrand by the discount factor, r(x − a). There is a normalization 12 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 31, 2020. ; term l(a) to reflect conditional survival to age x.
If the distribution of age when the patient population contracts the disease is given by A(a), then the expected decrease in QALY over the patient population is given by: Equation 15 is a general formula that accounts for the expected value of the changes in QALY between two states of health using only three variables: the time of disease onset, and the utility of the two health states. By making the relevant substitutions, we can also apply this formula to compute the expected changes in QALY given a gene therapy (gt) and an alternative treatment (alt).
While death and patient statistics can be collected to determine l(x) and A(a) empirically, determining f (s, x) and l(x) is challenging. Therefore, we use simple functions to modify these variables. In particular, we assume that being afflicted with a severe disease will modify the survival curve by a multiplicative factor, D(t). That is, the survival curve of a patient after he is diagnosed at age a is given by: This functional form assumes that the disease is age-agnostic, and affects the survival curve only through the time elapsed since the patient has been diagnosed. For example, if the disease does not affect mortality (e.g. blindness), then D(x − a) = 1 for all x − a > 0. On the other hand, if the condition causes death immediately, then D(x − a) = 0 for all x − a > 0.
For the utility function, f (s, x), we assume that it can be decomposed into two multiplicative factors, one dependent only on age, f a (x), and the other dependent only on the state of health, f h (s): 13 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 31, 2020. ; Assuming that Equations 17 and 18 hold, Equation 16 can be simplified to: where K(x, a) is the change in the quality-adjusted life years:

Calibration of ∆QALY
For each of these variables, we attempt to obtain empirical values from the literature as much as possible. When necessary, we interpolate values, briefly explaining our assumptions and the data collection methods for the inputs to the model. For the age-dependent QoL, f a (x), we extract the general population utility values from Institute for Clinical and Economic Review [115] and fit a linear model across the data. The QoL values and the fitted model are shown in . The values extracted from ICER's SMA final report [115] are replicated in the table on the left and are presented as crosses in the figure on the right. A linear line, with its intercept set to 1, is fitted with the data points.
Since we are unable to know the patient outcomes for these potential gene therapies ahead of their approval, we assume that the gene therapy treatments will restore a person's survivability to that of a normal individual. This implies that D gt (x − a) = 1. To estimate the impact of a disease on patient survivability, we model its survival curve, D alt (x − a), using the exponential survival curve shown in Equation 21. In the equation, λ is the force of mortality, and µ is the normalization factor. We estimate λ and µ by matching the function to T -year survival rates, which are the proportions of the patients (k) who will be alive after 14 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 31, 2020. ; T years, from data. The parameter values and their sources are listed in Table A4 in the Supplementary Material.
The health-related quality of life variables, f h (s gt ) & f h (s alt ), are treated separately, depending on the disease classification. For cancerous indications, we assume that the quality of life of the patients is not affected by the disease. For non-cancerous indications, we source the medical literature for the available quality of life (QoL) estimates. We use the QoL for the typical disease condition to approximate the 'before treatment' QoL, f h (s alt ), and use the best possible outcome for each condition as the 'post-treatment' QoL, f h (s gt ). We interpolate the missing values using linear regressions of the sourced QoLs against disease severity. To do this, we first give scores, ζ, ranging from one to five for each disease, based on our perception of disease severity. We then fit a line of f h (s alt ) against ζ in order to estimate the missing 'before treatment' QoL values, f h (s alt ) (see Figure 5). We define ∆QoL = f h (s gt ) − f h (s alt ). Separately, we regress ∆QoL against ζ to interpolate the change in QoL (see Figure 6). Given ∆QoL and f h (s alt ), we can then estimate the missing values of f h (s gt ). Our estimated values are reported in Table A5 in the Supplementary Material. We are able to extract the distribution of the age of cancer onset from the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program website. However, empirical age distributions are practically nonexistent for non-cancerous diseases. To overcome this lack, we search the literature for the average age of diagnosis of each disease, and fit a triangle distribution for each disease using the optimization program shown 15 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ;  Figure 7.
subject to Figure 7: An optimization program to obtain the triangle distribution given the average age of diagnosis, µ age . x min and x max are coordinates of the base of the triangle. c and z are the mode and height of triangle.

This program maximizes the domain's interval (Equation 22
) while imposing the requirement that the distribution's mode, c, has to be in the domain (Equation 23). In addition, the area under the curve has to be equal to 1 (Equation 24), and the mean of the distribution has to be equal to the average age (Equation 25).
We experimented with an impulse function and an uniform distribution to model the age distribution, but these functions created unrealistic scenarios. Modeling the age distribution with the impulse function, while simple, will force Equation 15 to collapse into a single point, and lose any nuance in the QALY gained by patients of different ages. On the other hand, estimating a uniform distribution from the average age creates distributions with narrow 16 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ; support. The distributions from our optimization program have a wider base of support and avoid sharp changes in density. We illustrate this with some examples in Figure A1 in the Supplementary Materials.
We assume a 3% per annum discount rate, as suggested by ICER for high-impact single or short-term therapy (SST) [92].

Price per ∆QALY
To estimate as realistic a market price of gene therapy as possible, we calibrate our assumed price per ∆QALY with the 4 data points currently available: Zolgensma, priced at $2.1 million per patient [132], Luxturna, priced at $0.425 million per eye treated [157], Kymriah, priced at $0.475 million for a one-time dose [67], and Yescarta, priced at $0.373 million for a one-time dose [67]. Separately, Zynteglo, sold at a cost of 1.6 million Euros (approximately $1.8 million), has been approved in the European Union. The data points are listed in Table 6. We calibrate the price per ∆QALY by minimizing the mean-squared error (MSE) between the estimated price given the expected change in QALY and the actual price. We report the mean absolute percentage error (MAPE) between the estimated price and the actual price in addition to the MSE. We note that Zolgensma, Zynteglo, and Luxturna are gene replacement therapies for rare diseases, while Kymriah and Yescarta are chimeric antigen receptor T-cell (CAR-T) therapies indicated for cancers. As such, we perform two separate calibrations, one for rare diseases and the other for cancerous indications. We assume that the price per ∆QALY for general diseases is identical to that for cancerous indications.
Considering only the therapies approved in the United States, we estimate a price per E(∆QALY) of $101,663 (MSE: 2.18×10 9 , MAPE: 11.2%) for rare diseases and $40,797 (MSE: 1.77×10 1 0, MAPE: 44.2%) for other diseases. Using all the data points, the price per E(∆QALY) for rare diseases increases to $114,781 (MSE: 1.70×10 12 , MAPE: 108%). In this paper, we use the former for our calculations since it has a smaller mean-squared error 17 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ; and better reflects prices in the U.S., our focus. This value will give us estimates of $2.09M per patient for Zolgensma and $0.470M per eye for Luxturna.
Our calibrated price per E(∆QALY) for cancerous indications is just slightly below ICER's $50,000 to $100,000 range for 'intermediate care value'. The higher price per E(∆QALY) for rare diseases also reaffirms the general belief that developers of treatments for rare diseases should be compensated more for their elevated R&D risk and the low financial prospects of serving a small population of patients. It is assumed that the clinical cost of delivering the gene therapy is a negligible fraction of the overall cost of development (though they are considerably higher than the delivery cost of conventional therapeutics). It is also highly likely that the outside option cost will be similar.
The expected increases in QALY computed by our model are close to those provided by the ICER reports for the treatments [115,124]. For example, we estimate that treatments for Spinal Muscular Atrophy Type 1 and Leber Congenital Amaurosis due to RPE65 Mutations provide 20.56 and 4.63 incremental QALYs, whereas ICER estimates Zolgensma and Luxturna to provide 12.23 to 26.58 and 1.3 to 2.7 incremental QALYs 4 , respectively. We have deliberately applied the same methods and assumptions used for the all other diseases to estimate the expected changes in QALY for Spinal Muscular Atrophy Type 1 and Leber Congenital Amaurosis due to RPE65 Mutations even though we could have obtained these numbers directly from ICER reports. By doing so, our price per ∆QALY calibration will correct for potential biases in our data, and our price estimates will be more realistic.
Our estimated change in QALY, the price per unit change in QALY, and the estimated price of therapy for each disease are shown in Table 7. Continued on next page 4 ICER provides a range of ∆QALY estimates corresponding to different age groups. We have considered the distribution of ages to produce a weighted average estimate.

18
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ; is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ;

Results
All the equations are discretized from their continuous form in our computations. When solving integrals using the trapezoidal rule to obtain the ∆QALY, we use strip widths of 1 year for an age range from 0 to 110 years old, the resolution offered by the life tables. When simulating the number of patients and the cost over time, we use time intervals of 1 month. Our code is implemented on Python 3.6 backed by Numpy, and executed on a single 2.2GHz CPU core. Pseudo-code and further details of computation can be found in Section A8 in the Supplementary Materials. We perform 1,000,000 iterations of the simulation to compute the mean number of patients and the total spending. With this number of iterations, one can expect the computed mean to be within 1.89% of the true mean 95% of the time (see Section A7 in the Supplementary Materials). We also report the 5th and 95th percentiles of the computed values as our upper and lower bounds respectively.
In the following section, we define a 'minor' to be a patient below the age of 18 and an 'elderly' patient to be one who is older than 62 years old. The remainder of the patients are labeled as 'adults'.

20
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Expected Number of Approvals and Patients
Our simulations indicate that the expected number of gene therapies approved between January 2020 and January 2034 is 18.3, with a 90% confidence interval of [14.0, 23.0] (see Figure 8). Table 8 shows the annual number of patients over time by age groups. Our simulations expect the number of patients treated to grow from 16,244 in 2020 to 94,696 in 2025 before declining to 65,612 in 2034. The decline can be attributed to the declining stock of existing patients as they are treated, and the fact that we do not consider new development programs launched in the future. The proportions of patients who are minors, adults and elderly are 17.9%, 35.4%, and 46.7% respectively.
We show the number of patients treated by month in Figure 9a. We can see that our simulations expect the number of patients treated to peak at around 7911 (CI: [3978, 12477]) per month in Jul 2025 before declining to 5424 (CI: [2778,8350]) by December 2034. The monthly number of existing patients treated exceeds the monthly number of newly-diagnosed patients treated until Sep 2024, when this trend is expected to reverse (see Figure 9b). Only 7% of all patients treated in December 2034 are preexisting patients. Cancer patients are expected to form the biggest group of patients receiving gene therapy treatments, simply due to the number of cancer indications being targeted. We expect the relative proportions of cancer, general disease, and rare disease patients to be 48.0%, 30.0%, and 22.0%, respectively, in December 2034. The cumulative number of patients to be treated is expected to be 1.09 million (CI: [0.595M, 1.66M]) by the end of December 2034 (see Figure 9c).

21
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ;

Expected Spending
We expect an increase in spending up to $2.11 billion per month (CI: [1.01B, 3.88B]) in Apr 2026, before decreasing slowly to a steady-state rate of $1.62 billion (CI: [0.624B, 2.9B]) per month (see Figure 10a). We emphasize that the total spending eventually declines because our simulations analyze a fixed stock of innovations, and do not account for new development programs that may be launched in the future. Treating existing cancer patients initially consumes over 45.6% of the total monthly expenditure, but declines to only 0.99% by December 2034. In contrast, the proportion of spending on new patients in the 'general disease' and 'rare disease' groups will increase from 0.0% and 4.26%, respectively, in Feb 2020 to 21.2% and 46.2% by December 2034. The monthly spending on treating existing patients shall exceed the monthly spending on treating newly diagnosed patients in Nov 2023. The cumulative discounted spending on treating patients with approved gene therapy products is expected to reach $241 billion (CI: [123B, 402B]) by December 2034, 15 years after the start of our simulation.
In terms of annual spending on approved gene therapies, we expect that $5.15 billion will be spent in 2020, increasing to $25.3B in 2026 before declining to $21.0B in 2034 (see Table 9). Minors, adults and the elderly will consume 43.2%, 26.0%, and 30.9%, respectively, of the total spending. In the U.S., all elderly people are covered by Medicare. It is also estimated that two in five children and one in seven adults in the U.S. are covered by Medicaid [99]. The remainder of the spending is expected to come from private sources such as employerprovided or private insurance. Using these proportions, we estimate that the expected annual spending by Medicare, Medicaid 5 and private sources respectively may reach $8.1, $5.44, and $12.2 billion (see Table 10). We discuss the implications of these estimates in Section 4.4.
The total expected increase in QALY over these 15 years is 5.59 million (see Figure 11), which translates to an average increase of 5.12 in QALY per patient. This comes at an average 2020 present value cost of $43,110 per unit change in QALY. 5 The spending estimates for Medicaid do not take into account the 23.1% drug rebate that it is expected to receive [98].

22
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ; 23 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ; Table 9: Expected annual spending on gene therapy between 2020 and 2035, conditioned on the age group and patient type. 'Minor', 'adult' and 'elderly' are defined to be 'below the age of 18', 'between the ages of 18 and 62', and 'greater than 62 years old' respectively. Numbers in billions.

24
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ; is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Figure 9: Number of patients treated between January 2020 and December 2034, obtained from 1,000,000 simulation runs. 26 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  J a n 2 0 2 0 J a n 2 0 2 1 J a n 2 0 2 2 J a n 2 0 2 3 J a n 2 0 2 4 J a n 2 0 2 5 J a n 2 0 2 6 J a n 2 0 2 7 J a n 2 0 2 8 J a n 2 0 2 9 J a n 2 0 3 0 J a n 2 0 3 1 J a n 2 0 3 2 J a n 2 0 3 3 J a n 2 0 3 4 Cumulative QALY gained (c) Cumulative QALY gained by treating patients with gene therapy. The line represents the mean and the shaded region represents the 5th and 95th percentiles of our simulation. Figure 11: Expected ∆QALY made possible by gene therapy treatments between January 2020 and December 2034, obtained from 1,000,000 simulation runs. 28 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Sensitivity Analysis
To test the sensitivity of our results to initial conditions, we simulate ±20% changes in the following variables, analyzing their impact on our results.
1. The maximum penetration rate in the ramp function, Θ max 2. The time to maximum penetration rate in the ramp function, T max 3. The amount of QALY gained in each disease 4. The price per ∆QALY 5. The phase-3-to-approval probability of success (PoS 3A ) 6. The number of new patients of each disease 7. The number of existing patients of each disease 8. The time from phase 3 to BLA 9. The time from BLA to approval For each of these factors, we consider its impact on the the peak monthly spending and the cumulative spending from January 2020 to December 2034 of patient treatment. We also look at how the variables change the timing of the peak monthly spending.
As can be seen from Figure 12, the percentage change in the discounted cumulative spending and the maximum monthly spending on treating all patients with gene therapy scale linearly with the percentage change in several variables: the maximum penetration rate (Θ max ) and the QALY gained (∆QALY), the price per ∆QALY. Increasing or decreasing the transition probability from phase 3 to approval, or the number of new or existing patients only leads to sublinear increases or decreases in the discounted cumulative spending and the maximum monthly spending. However, changing the time variables, such as the number of days from phase 3 to BLA, from BLA to approval, or the ramp-up period (T max ), induce a small change in the opposite direction.
Introducing perturbations of 20% in the probability of success, the number of new patients, the number of days from Phase 3 to BLA or from BLA to approval, or the time to maximum penetration rate in the ramp function (T max ) will change the date of the peak monthly spending in the same direction as the perturbation, by up to 10 months. Increasing or decreasing the number of existing patients, on the other hand, will cause a shift of up to 4 months in the date of peak spending in the opposite direction. Perturbing the maximum penetration rate (Θ max ), the QALY gained (∆QALY), and the price per ∆QALY will not change the date of peak spending.
We also study the effect of changing the correlation between development programs. Changing the correlation from our assumed value of 0.9 to 0 (i.e., perfectly uncorrelated)

29
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.   increases the mean discounted cumulative spending by 3.4%, from $241 billion to $245 billion. Increasing the correlation to 1.0 instead will decrease the mean discounted cumulative spending by 0.4% to $236 billion.
We vary the proportion of existing patients seeking treatment in the first year -which determines the λ parameter in Equation 3 -and observe that mean discounted cumulative spending changes by between -32% and +0.08% (see Figure 13). We can expect the results to differ by less than 5% from the baseline if the proportion of existing patients seeking treatments in the first year is between 8% and 45%.
Our study is independent of the results by Quinn et al. [142], who estimated that 341,775 30 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. patients will have been treated with gene therapy by December 2030, and increases by approximately 50,000 per year in the steady state. The authors of this report did not attempt to quantify the cost of providing gene therapy to these patients. In contrast, our simulation expects that about 820,425 patients will be treated by the end of December 2030, with a steady-state increase of around 61,170 per year in the long run.
Some of the differences between our estimates and this other report are due to differences in sample inclusion criteria and the use of different data for patient prevalence and incidence of disease. For example, Quinn et al. [142] considers "durable" gene therapies under all phases of clinical investigation whereas we consider any therapy with late-stage clinical trial(s). Furthermore, they assume that the 'potentially treatable pool in oncology is entirely incident-there is no prevalence'. Another difference arises from our decision to start with the broadest range of patients and then deflate these numbers through the penetration rate, rather than attempting to estimate the prevalence and incidence for each patient segment. If we removed existing oncology patients from our simulation, the cumulative number of patients treated by December 2030 becomes 666,895, approximately 1.95 times the estimate in Quinn et al. [142]. We can obtain similar patient estimates to Quinn et al. [142] simply by reducing our penetration rates by 48.8%, which will lower our estimated cumulative spending on gene therapy between January 2020 and December 2034 to $149 billion.

Discussion
We estimate that 1.09 million patients are to be treated with gene therapy by the end of December 2034, spending up to $25.3B annually. These estimates are likely to be lower bounds since our simulation employs conservative assumptions about the speed and volume 31 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ; of gene therapy development. Specifically, we consider only late-stage gene therapy development programs, defined as those already in phase 2/3 or phase 3, and do not account for the possibility that a program in phase 1 or phase 2 may be fast-tracked or granted accelerated approval. We also do not account for any new gene therapies programs being added and approved between 2020 and 2034.
A potential criticism of our approach is that estimating the cost of gene therapies solely based on the change in QALY will overestimate the aggregate spending in the U.S. For example, we do not take into account the potential cost savings to gene therapies due to avoiding multiple costly therapeutic sessions over time based on the current standard of care, or to the recovery of the opportunity cost of caregivers. We have omitted the clinical costs of delivering gene therapy in our analysis, which are often higher than conventional therapeutics due to the need for inpatient hospital care. While there are cases where gene therapy is predicted to provide net cost-savings in treatment after accounting for the direct medical cost (e.g., valoctocogene roxaparvovec for the management of hemophilia A [80]), there is not yet evidence showing that gene therapy will result in net long-term cost savings. In addition, research has shown that new medical technologies generally raise health costs, and that cost-increasing changes in treatments outweigh cost-saving changes the majority of the time [102]. We also do not consider any markup that happens under the prevalent 'buy-and-bill' process in the U.S. These considerations suggest that our approach is indeed conservative, and that our estimates are likely to be lower bounds for realized costs over the next 15 years. Nonetheless, we have taken care to calibrate our price per ∆QALY using actual prices for approved therapeutics and estimating QALYs for those diseases, thereby allowing us to produce price estimates that closely track past data.
Another potential criticism is that we fail to consider the possibility that having multiple gene therapies for the same disease may lower the prices of the therapies. However, there is no analogous evidence that the presence of multiple brand-name drugs in the same class lowers the list prices of the drugs [149].
Based on our assumptions, the annual spending on gene therapy will average $20.4 billion and may reach $25.3 billion in 2026. This is well above the $819 million threshold recommended by ICER for its annual potential budget impact [93]. The cumulative spending on all future gene therapies from from January 2020 to January 2034 will be approximately $306 billion, or $241 billion when discounted at a cost of capital of 3% per annum over the next 15 years. We estimate the cost of gene therapy to average $43,110 per unit QALY, several times the average annual expenditure of $16,346 for American cancer patients between 2010 and 2014 [138].
However, when viewed from the broader perspective of aggregate U.S. spending, these 32 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ; figures seem less daunting. In 2018, the U.S. tax revenue was $3.33 trillion, of which individual income tax and payroll tax revenues were $1.68 and $1.17 trillion, respectively [136]. If the average spending of $20.4 billion were to be fully funded through income and payroll taxes, an increase of 0.612% would cover the expense. From a purely budgetary perspective, universal access to gene therapy should be feasible if taxpayers are willing to pay for it. This is probably the simplest and most direct way to provide access to gene therapy to all Americans, though it may not be politically feasible.
Since all elderly patients are covered by Medicare, we estimate that the program would need to increase its annual budget by up to $7.89 billion, or 1.1% of its 2018 spending of $750.2 billion [19]. Funding this increase would require either an increase in payroll taxes or a reduction in other expenditures.
We estimate that annual gene therapy spending by Medicaid may reach $5.44 billion. This is approximately 0.9% of its 2018 spending of $597.4 billion [19]. Since Medicaid must be provided to all eligible Americans without any preset cap, managing this increase will require either raising funds from state and federal governments to pay for these additional costs, or cutting benefits.
Annual spending by minors and adults who are ineligible for Medicare or Medicaid-and therefore must rely on private insurers-is predicted to reach $12.2 billion. This spending poses a significant challenge for insurers and companies, who face annual budgets and competing priorities. In order to manage spending, many insurance policies might choose not to cover spending on gene therapy, or impose very restrictive policies to limit the number of potential patients who might be treated [156]. Many private insurers are already warning they may not be able or willing to absorb the additional spending should a greater number of people become eligible for expensive gene therapy treatments once new ones reach the market [159].
Many novel methods to finance gene therapy treatments through the existing healthcare infrastructure have been proposed, such as outcome-based payments, whereby the manufacturers would be paid only if the patients achieve predefined outcomes after treatment [72]. We note that both Zolgensma and Luxturna have offered outcome-based payment methods to payers. There have also been proposals to allow mortgage-like payments, and even performance-based annuity payments, as ways to finance gene therapy treatments [133]. In September 2019, Cigna, one of the largest U.S. health insurance companies, announced a program called Embarc Benefit Protection in which employers, health plans, and unions pay a monthly per-member premium that provides members with access to the two FDAapproved gene therapies, Luxturna and Zolgensma, at no out-of-pocket costs to them if their physicians authorize treatment. At the time of writing, Luxturna is not provided under 33 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ; Embarc Benefit Protection, and it is unclear if the program is in effect. Cigna hopes to keep the monthly cost of the program to below $1 per member [164], but if our simulations are accurate, this will be financially infeasible.
A more ambitious proposal involves creating a national and possibly international gene therapy reinsurance company that performs a similar function to Embarc, but which serves a large number of primary health insurance providers. By allowing multiple primary insurers to cede the specific risk of gene therapy patients to the reinsurer, these risks can be diversified over a much larger pool members, lowering the cost of capital. The capital required for such a reinsurer can be raised through securitization techniques as described in [133], who simulated such a structure, and concluded that the returns to investors would be quite attractive under a broad range of assumptions. However, their simulations were not specifically calibrated for gene therapy, hence our framework may provide a useful complement to their analysis.
Also, it may be more cost-effective for the reinsurer to assume the responsibility of delivering the gene therapies it reinsures through nationally distributed Centers of Excellence (CoEs). This may seem too far afield for a reinsurance company, but the ability to have direct control over the quality of delivery, and to be able to collect data on the performance of these therapies over time, are two compelling reasons for the reinsurer to take this on. The data collected from these centers will be critical, not only for assessing the actuarial risk of reinsurance, but also for implementing performance-related contractual agreements, e.g., if a gene therapy ceases to be effective, then any remaining payments for the therapy will be cancelled.
An additional benefit of a single reinsurer to manage the risk and responsibility of delivering gene therapy is the ability of that reinsurer to avoid the adverse selection problem that often plagues individual insurers [71]. This problem arises when some insurers are willing to pay for gene therapy treatments while others are not, leading patients who require gene therapies to enroll en masse with those insurers providing coverage. Since these policies will likely have higher premiums to cover the high cost of gene therapy, patients have an incentive to leave the policy after receiving the treatment, leaving the insurers to pay the remaining cost without being able to recover the expenses. If a single reinsurer can aggregate this risk across a large pool of gene therapy patients and coordinate payouts across all the insurers, this adverse selection problem can be greatly mitigated, or altogether avoided. The viability of such a reinsurance vehicle would depend critically on the various parameters of the modules in our simulation, as well as the ability to engage with the largest health insurer of all, the U.S. government.

34
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ;

Conclusion
In this paper, we estimate the number of patients who will be treated by gene therapy between January 2020 and December 2034 using various data sources. We also develop a mathematical model to estimate the cost of these gene therapies, and calibrated the model to yield realistic cost estimates. It is our hope that this study, and our estimates of the potential financial impact of gene therapy in the U.S., will clarify some of the unknowns surrounding the impact of this new class of treatment, and allow policymakers, healthcare providers, insurance companies and patients alike to make more informed financial decisions about the future of this important therapeutic class.

35
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ;

36
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.

38
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ; [64] Sarah Acaster, Binny Pinder, Clara Mukuria, and Amanda Copans. Mapping the EQ-5D index from the cystic fibrosis questionnaire-revised using multiple modelling approaches. Health and Quality of Life Outcomes, 13(1):33, 2015.
40 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ; [103] Yezaz Ahmed Ghouri, Idrees Mian, and Julie H Rowe. Review of hepatocellular carcinoma: Epidemiology, etiology, and carcinogenesis. Journal of carcinogenesis, 16, 2017.

43
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
A comparative study of survival rate in high grade glioma tumors being treated by radiotherapy alone versus chemoradiation with nitrosourea. Global journal of health science, 7(6):33, 2015.

44
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ;

46
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ;

A1 Current Gene Therapy Clinical Trials
As mentioned in the main paper, we list the clinical trials that are used in this study in the following table. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.   is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.      is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ; is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020. ;

A4 Calibration of Survival Functions D alt (x − a)
We source either the survival or mortality rate from literature and use them to compute λ, the time parameter in the exponential survival function. We show our result in the table below. Table A4: List of survival rate or mortality rate and λ, for each disease. An asterisk ( * ) under λ denotes that the disease does not affect mortality directly.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.  [46] 9.92 0.462 Squamous Cell Cancer of Head and Neck or Esophagus [22] 64 0.089 Synovial Sarcoma [33] 55 0.120

CC-BY 4.0 International license
It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 31, 2020.

A5 Calibration of Age Distribution A(x)
As mentioned in the main paper, our optimization program produces triangular age distributions that conforms to data, have wider support compared to fitting uniform distributions and, avoids sharp changes in the probability density. We illustrate some examples that compare triangle distributions with the uniform distributions with the same average age.

75
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 31, 2020. ;

76
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 31, 2020. ; https://doi.org/10.1101/2020.10.27.20220871 doi: medRxiv preprint

A7 Simulation Convergence Criteria
Let X k be the results of the k-th simulation. X k has a true mean of µ and variance σ 2 . Let the mean of the Monte Carlo simulations over n runs beμ n = 1 n n k X k . Then, by Lindeberg-Lévy's Central Limit Theorem,μ n converges in distribution to a normal distribution with mean µ and variance of nσ 2 . The 95 percent confidence interval for µ is given by: where s n is the sample variance of {X 1 , · · · , X n }.
Since we are using 1-by-T vectors, we investigated the error in our simulation by dividing the half-range of the confidence interval in each time-step byμ n before taking the maximum across the time series. As can be seen from Figure A2, we should expect the simulated mean to be within 1.89% of the true mean 95% of the time with 1,000,000 iterations.

77
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 31, 2020. ;