Abstract
Multiple randomized controlled trials, each comparing a subset of competing interventions, can be synthesized by means of a network meta-analysis to estimate relative treatment effects between all interventions in the evidence base. Often there is an interest in estimating the relative treatment effects regarding time-to-event outcomes. Cancer treatment effectiveness is frequently quantified by analyzing overall survival (OS) and progression-free survival (PFS). In this paper we introduce a method for the joint network meta-analysis of PFS and OS that is based on a time-inhomogeneous tri-state (stable, progression, and death) Markov model where time-varying transition rates and relative treatment effects are modeled with known parametric survival functions or fractional polynomials. The data needed to run these analyses can be extracted directly from published survival curves. We demonstrate use by applying the methodology to a network of trials for the treatment of non-small-cell lung cancer. The proposed approach allows the joint synthesis of OS and PFS, relaxes the proportional hazards assumption, extends to a network of more than two treatments, and simplifies the parameterization of decision and cost-effectiveness analyses.
1 Introduction
Randomized controlled trials (RCTs) are considered the most appropriate study design to obtain evidence regarding relative treatment effects. However, an individual RCT rarely includes all alternative interventions of interest, and as such does not provide all the information needed to select the best alternative. Typically, the evidence base consists of multiple RCTs, each of which compares a subset of the interventions of interest. If each of these trials has at least one intervention in common with another trial such that the evidence base is represented by a single connected network, a network meta-analysis (NMA) can estimate relative treatment effects between all the competing interventions in the evidence base ^{1}.
Often there is an interest in estimating the relative treatment effects of alternative interventions regarding time-to-event outcomes. For example in oncology, treatment efficacy is often quantified by analyzing time from treatment initiation to the occurrence of a particular event. Very commonly studies report data on overall survival (OS), where the event is death from any cause, and on progression-free survival (PFS), where the event is death from any cause or disease progression, whichever occurred first ^{2}.
NMA of time-to-event outcomes with a single effect measure per study are based on the proportion of patients alive at a specific time point, median survival, or reported hazard ratio (HR) ^{3}. The limitation of a NMA of survival at a specific time point is that we only focus on the cumulative effect of treatment at that time point and ignore the variation in effects over time up to, as well as beyond, that time point. NMAs of median survival times have similar limitations. The HR summarizes the treatment effect for the complete follow-up period of the trials, but only represents the treatment effect for each time point if the proportional hazards (PH) assumption holds. If the PH assumption is violated, trial specific HRs represent an average effect over the follow-up period, which can cause biased estimates in a NMA if trials have different lengths of follow-up.
As an alternative to a NMA with a univariate treatment effect measure, we can also use a multivariate treatment effect measure that describes how the relative treatment effects change over time ^{3}. Ouwens et al., Jansen, and Cope et al presented methods for NMA of time-to-event outcomes where the hazard functions of the interventions in a trial are modeled using known parametric survival functions or fractional polynomials and the difference in the parameters are considered the multi-dimensional treatment effects, which are synthesized across studies ^{4}–7. By incorporating time-related parameters, these NMA models can be fitted more closely to the available data.
Both PFS and OS of an intervention determine its value and can inform decision-making. In combination with a baseline survival function for a reference treatment, the multivariate NMA models embedded in parametric survival functions can form the basis for partitioned-survival cost-effectiveness models ^{8}. Frequently, the pooled PFS and OS curves need to be extrapolated over time in order to obtain estimates of the expected quality adjusted life-years before and after disease progression. Since the separate meta-analyses of PFS and of OS data ignore the correlation between the outcomes, any required extrapolation may result in possible crossing of PFS and OS curves. A state-transition model with three health states – stable (pre-progression), progression, and death – with parametric hazard functions for the three corresponding transitions avoids this issue. If we have individual patient data (IPD) regarding time to progression, time to death, and censoring for all trials included in the NMA, we can estimate these hazard functions using a statistical model with the same tri-state structure and avoid any inconsistency between the clinical evidence synthesis and the economic evaluation ^{9}. Reality though is that for most, if not all, trials there is no access to IPD and the synthesis has to be based on reported summary findings. Although reported Kaplan-Meier curves for PFS and OS can be digitized and a dataset of “virtual” IPD event-times can be created with the algorithm by Guyot et al., it does not provide the information needed to determine which time-to-progression data point corresponds to which time-to-death data point^{10}. Markov-state-transition NMA models have been presented for disease progression^{11},^{12} and competing risks ^{13} based on aggregate level data, but these models assumed constant hazards for transitions between states.
In this paper we introduce a method for the joint NMA of PFS and OS that is based on a tri-state (stable, progression, and death) transition model where time-varying hazard rates and relative treatment effects are modeled with known parametric survival functions or fractional polynomials. We illustrate parameter estimation based on aggregate level data.
2 Multi-state network meta-analysis framework
2.1 Model
At any time u, patients in study i randomized to treatment arm k can be in one of three health states: alive with stable disease (i.e. not progressed), alive and progressed, and dead expressed with S_{ik}(u), P_{ik}(u) and D_{ik}(u), respectively, as shown in Figure 1., and are the hazard rates for disease progression, dying post-progression, and dying pre-progression.
A multi-state NMA that explicitly estimates each possible transition in a tri-state model and modeling time-varying hazard rates and relative treatment effects with known parametric survival functions or fractional polynomials can be expressed as follows: with d_{1,11} = d_{2,11} =, …, = d_{B,11} = 0foridentification where p_{1}, …, p_{B} are fractional powers and the round bracket notation denotes the Box-Tidwell transformation: u^{(p)} = u^{p} if p ≠ 0 and u^{(p)} = ln(u) if p = 0. Equation 1 also includes the situation of repeated powers, where p_{x} = p_{y} for at least 1 pair of indices (x, y), 1 ≤ x < y ≤ a−1, a ≤ x < y ≤ b − 1, or b ≤ x < y ≤ B − 3. In this situation, is used instead of itself. A complete set of flexible fractional polynomials can be created with p_{1}, …, p_{B−3} ∈ {−2, −1, −0.5, 0, 0.5, 1, 2}. α_{1,ik}, α_{2,ik}, …, α_{a,ik} are regression coefficients that represent the scale and shape parameters of the log-hazard function describing the stable-to-progression transition in study i for study arm k. α_{a+1,ik}, α_{a+2,ik}, …, α_{b,ik} are the regression coefficients that represent the log-hazard function for the stable-to-death transition. α_{b+1,ik}, α_{b+2,ik}, …, α_{B,ik} are the regression coefficients that represent the scale and shape parameters of the log-hazard function describing the progression-to-death in study i for study arm k.
The µ_{·,i} reflect the study effects regarding the scale and shape parameters in each study i. The δ_{·,ik} are the study specific true underlying relative treatment effects for the treatment in study arm k relative to the treatment in arm 1 of that trial (with δ_{·,i1} = 0 to ensure identification) regarding the scale and shape of the log-hazard function for the different transitions, which are drawn from a normal distribution with the mean effect for treatment t expressed in terms of the overall reference treatment 1,, and with a between-study-heterogeneity covariance matrix Σ. represents the relative treatment effect with treatment t in study arm k in study i relative to reference treatment 1 regarding the scale and shape parameters of the log-hazard functions.
A fixed effect model is obtained by replacing with .
The random effects model presented with Equation 1 does not account for correlation between trial-specific δ_{·,ik}s in multiple-arm trials (>2 treatments). A simplified random effects model with only one heterogeneity parameter, e.g. for , can be easily extended to fit trials with three or more treatment arms by decomposition of a multivariate normal distribution as a series of conditional univariate distributions^{14}: The vector of random effects follows a multivariate normal distribution, a_{i} represents the number of arms in trial i (a_{i} = 2, 3,) and . The conditional univariate distribution for the random effect of arm k > 2, given all arms from 2 to k − 1, is:
2.2 Data and likelihood
For this paper we assume there is no access to IPD for the trials included in the NMA. The parameters will be estimated based on the conditional survival probabilities regarding PFS and OS that can be infered from the published KM curves. (See Appendix A for the algorithm outlining construction of the dataset.) The total follow-up time can be partitioned into M successive non-overlapping intervals indexed by m = 1, …, M. We refer to interval m as U_{m} and write u ∈ U_{m} to denote Δu_{m} ≤ u < u_{m+1}. The length of U_{m} is Δu_{m} = u_{m+1} − u_{m}. For each interval m, we propose a binomial likelihood for the conditional survival probabilities regarding PFS and OS at time point u relative to the time point at the beginning of the interval U_{m} according to: where are the observed number of patients who have not yet experienced progression or death at time u in the m^{th} interval in study i for treatment arm k and are the observed number of patients who have not died at time u in that interval. is the underlying conditional survival probability regarding PFS and is the underlying conditional survival probability regarding OS, and are the corresponding sample sizes at the beginning of the interval.
For the m^{th} interval, the conditional probabilities and are related to the proportion of patients who are progression free (stable disease) S_{ik}(u) and the proportion of patients with progressed disease P_{ik}(u) according to: Arbitrary hazard functions can be approximated with a set of discontinuous constant hazard rates over relative short successive time intervals. For each interval m, S_{ik}(u), P_{ik}(u), and death D_{ik}(u) are related to the hazards according to the following set of differential equations (See Appendix B): In order to estimate the three parameters , and for each interval m, we need to define Equation 4, Equation 5 and Equation 6 for at least two time points per interval. In order to improve identifiability of hazard rates in the presence of a small number of events, we use three time points per interval: 1) a time point at 1/3 of the length of the interval , which we define as ; 2) a time point at 2/3 of the length of the interval , which we define as ; and 3) the time point at the end of the interval u_{m+1}. The obtained estimates of the hazards for interval m are assigned to the time point for Equation 1.
3 Illustrative example
3.1 Evidence base
An example of the multi-state models is presented for a NMA of first line treatment of adult patients with metastatic EGFR+ non-squamous non-small-cell lung cancer (NSCLC) with gefitinib, erlotinib, afatinib, dacomitinib, or platinum-based doublet chemotherapy regimens. Thirteen RCTs were obtained with a systematic literature review (ARCHER1050^{15},^{16}; LUX-LUNG 7^{17},^{18}; LUX-LUNG 3^{19},^{20}; LUX-LUNG 6^{20},^{21}; EURTAC ^{22–24}; ENSURE^{25}; OPTIMAL^{26},^{27}; First-SIGNAL ^{28}; WJTOG3405^{29}; IPASS ^{30},^{31}; NEJ002^{32},^{33}; Han2017^{34}; Yang2014^{35},^{36}). The evidence network is presented in Figure 2 and the trial-specific PFS and OS curves are provided as supplementary information in Section C.1.
3.2 Network meta-analysis to estimate relative treatment effects
The following model was used for the NMAs, which is a simplification of Equation 1 to faciliate parameter estimation: where u^{0} = ln(u) and d_{1,11} = 0, d_{2,11} = 0, d_{3,11} = 0, and d_{4,11} = 0.
When p_{1} = 0 and α_{3,ik} = 0, the log-hazard functions follow a Weibull distribution. When p_{1} = 1 and α_{3,ik} = 0 these log-hazard functions follow a Gompertz distribution. When {(p_{1}, p_{2}) = (0, 0), (0, 1)} and α_{3,ik} ≠ 0, the log-hazard functions follow a second order polynomial that are extensions of the Weibull and Gompertz model to allow for arc- and bathtub shaped log-hazard functions. When α_{6,ik} = 0 this transition follows an exponential distribution. When α_{6,ik} ≠ 0 the transition rates over time are modeled with a first order fractional polynomial of which Weibull and Gompertz are special cases.
With this model we assume there is one between-study heterogeneity parameter, which is related to the relative treatment effects that act on the scale of the log-hazard function for the stable-to-progression transition: The δ_{1,ik} are drawn from a normal distribution with the mean effect for treatment t expressed in terms of the overall reference treatment 1,, and between study heterogeneity . The treatment specific relative effects regarding the first and second shape parameters of the log-hazard function for the stable-to-progression transition were assumed to be fixed. If it is assumed that treatment has only a direct effect on the transitions from stable to progression, which can reasonably be defended when a particular treatment upon disease progression is discontinued, the model can be further simplified by setting .
The following prior distributions for the parameters of the model were used:
3.3 Meta-analysis of absolute effects with overall reference treatment
A NMA provides estimates of relative treatment effects between the competing interventions (i.e. hazard ratios). In order to obtain estimates for the transition rates over time between health states for each treatment, we first need to estimate the time-varying transition rates for an overall reference treatment, defined as treatment 1, and subsequently apply the hazard ratios of each treatment relative to treatment 1 obtained with the NMA to these baseline transition rates. As a final step, these time-varying transition rates for each treatment can be transformed into the distribution S, P, and D over time, and PFS and OS curves. In this example gefitinib is defined as treatment 1, and a meta-analysis of gefitinib arms of the trials was performed to estimate baseline rates with the following fixed effects model: We used the same data structure, likelihood, and link functions as used for the NMA. (See Equation 4,Equation 5, and Equation 6). The prior distribution for this model was:
3.4 Parameter estimation
The parameters of the different models were estimated using a Markov Chain Monte Carlo (MCMC) method implemented in the JAGS software package^{37}. All JAGS analyses were run using R statistical software ^{38}. See Section C.2 for the JAGS code for one of the models used to estimate relative treatment effects.
If the sample size or number of PFS or OS related events in interval U_{ikm} is relatively small then it may be challenging with the MCMC algorithm to distinguish between the “correct answer” for , and , and alternative estimates where either or . As such, we set a constraint to avoid that and is estimated to be zero. (See Section C.2)
The residual deviance and the deviance information criterion (DIC) were used to compare the goodness-of-fit of the competing models. ^{39} The DIC provides a measure of model fit that penalizes model complexity. In general, a more complex model results in a better fit to the data, demonstrating a smaller residual deviance. The model with the better trade-off between fit and parsimony has a lower DIC. A difference in the DIC of about 10 points can be considered meaningful.
3.5 Results
For transparancy purposes we first present the results of the meta-analysis of treatment 1 (gefitinib), followed by the results of the NMA, and finally the PFS and OS curves obtained by applying the relative treatment effects obtained with the NMA to the pooled results for treatment 1.
For this example we used the following competing models for the meta-analysis of treatment 1:
- SP 2nd order FP(01); SD Weibull; PD Weibull. This is a fixed effects second order fractional polynomial with time transformations according to p_{1} = 0 and p_{2} = 1 for the stable-to-progression (SP) transition; a Weibull distribution for the stable-to-death (SD) transition; and a Weibull distribution for the progression-to-death (PD) transition. (See Equation 9)
- SP 2nd order FP(01); SD Weibull; PD exponential
- SP 2nd order FP(01); SD exponential; PD Weibull
- SP 2nd order FP(01); SD exponential; PD exponential
- SP 2nd order FP(00); SD exponential; PD Weibull (See Equation 9 with p_{1} = p_{2} = 0)
- SP Weibull; SD Weibull; PD exponential
- SP Weibull; SD exponential; PD Weibull
- SP Weibull; SD exponential; PD exponential
- SP Gompertz; SD exponential; PD Weibull
- SP Gompertz; SD exponential; PD exponential
The model fit statistics are presented in Table 1. The models with a second order fractional polynomial for the the stable-to-progression transition resulted in the smallest deviance and DIC. The models assuming a Gompertz distribution for this transition resulted in the largest deviance and DIC. The parameter estimates of a selection of four competing models that show different patterns of time-varying hazards for the three transitions between the health states are presented in Table 2. The actual time-varying hazard rates and corresponding PFS and OS curves are plotted in Figure 3.
For the NMA the following models were evaluated:
- SP 2nd order FP(01) FE3; SD Weibull; PD Weibull FE1(scale). This is a second order fractional polynomial with time transformations according to p_{1} = 0 and p_{2} = 1 for the stable-to-progression transition with fixed effects relative treatment effects on all 3 parameters (α_{1}, α_{2}, α_{3}); a Weibull distribution for the stable-to-death transition without relative treatment effects; and a Weibull distribution for the progression-to-death transition with fixed effects relative treatment effects only on the scale parameter (α_{5}).
- SP 2nd order FP(01) FE3; SD Weibull; PD exponential FE
- SP 2nd order FP(01) FE3; SD Weibull; PD exponential. This model only has relative treatment effects applied to the stable-to-progression transition.
- SP 2nd order FP(01) FE3; SD exponential; PD Weibull FE1(scale)
- SP 2nd order FP(01) FE3; SD exponential; PD Weibull
- SP 2nd order FP(01) FE3; SD exponential; PD exponential
- SP Weibull FE2; SD Weibull; PD exponential FE
- SP Weibull FE2; SD exponential; PD Weibull FE1(scale)
- SP Weibull FE2; SD exponential; PD Weibull
- SP Weibull FE2; SD exponential; PD exponential
- SP 2nd order FP(01) RE3; SD Weibull; PD Weibull FE1(scale). This is the random effects model according to Equation 7 with p_{1} = 0 and p_{2} = 1.
- SP 2nd order FP(01) RE3; SD Weibull; PD exponential FE
- SP 2nd order FP(01) RE3; SD exponential; PD Weibull FE1(scale)
- SP Weibull RE2; SD exponential; PD Weibull FE1(scale)
The NMA models that assumed a second order fractional polynomial for the stable-to-progression transition had a lower deviance and DIC than the corresponding simpler models assuming a Weibull distribution for this transition. The models with a Weibull distribution for the stable-to-death transition had a lower DIC than the corresponding models assuming an exponential distribution for this transition. Models with a Weibull distribution for the progression-to-death transition were not a meaningful improvement over the corresponding models that assumed an exponential distribution for this transition. Comparing models that assumed a relative treatment effect for the progression-to-death transition with the corresponding models without this relative treatment effect indicates that a relative effect may be a relevant component to include for this transition in some models. The second order fractional polynomial and Weibull random effects models performed better than their fixed effects equivalents, indicating that incorporating between-study heterogeneity is important.
Parameter estimates for the random effects second order fractional polynomial model (SP 2nd order FP(01) RE3; SD Weibull; PD Weibull FE1(scale)), the random effects Weibull model (SP Weibull RE2; SD exponential; PD Weibull FE1(scale), and two fixed effects models (SP Weibull FE2; SD Weibull; PD exponential FE and SP Weibull FE2; SD exponential; PD exponential) are presented in Table 3. The corresponding time-varying HRs with treatment relative to treatment 1 for the stable to progression transitions are presented in Figure 4, and the constant HRs for the progression-to-death transition in Figure 5. (Please note that we did not assume a relative treatment effect for the stable-to-death transition). Applying the relative treatment effect parameter estimates describing the HRs over time obtained with these NMA models to the parameter estimates of the models used for the analysis of treatment 1, we obtain the PFS and OS curves by treatment, as presented in Figure 6. In order to illustrate the width of the 95 credible intervals of these survival curves due to the uncertainty in the time-varying HRs, we ignored the uncertainty for the reference treatment 1. (If the uncertainty of the meta-analysis would have been incorporated as well, the 95 credible intervals would have been a bit wider.)
4 Discussion
With this paper we present a method for the joint NMA of PFS and OS that is based on a tri-state transition model. This method extends existing parametric NMA methods for time-to-event data ^{4–7} by defining the structural relationship between PFS and OS according to the stable, progression, and death states that define the course of disease over time. Instead of modeling the time-varying hazard rates for PFS and OS separately, we model the time-varying transition rates between the three health states simultaneously. The primary advantage of this evidence synthesis framework is that estimates for PFS and OS remain consistent over time.
The primary reason to propose the method described in this paper is to facilitate parameterization of multi-state cost-effectiveness models based on summary level data. In order to do so, we describe the conditional survival probability for PFS and OS with two separate binomial likelihoods for a given interval m (See Equation 4), and capture their relationship with Equation 5 and Equation 6. The implication of this approach is that we assume that is independent from or , or both and . The correlation between conditional PFS and conditional OS is explained by shared parameters and . For the purposes of decision making, the uncertainty in estimates are as important, if not more important, than the point estimates themselves. If the structural assumption regarding the relationship between the likelihoods for conditional PFS and OS does not capture their correlation appropriately, the precision in some or all of the transition rates, and therefore OS, will be overestimated. We perfored a simple simulation to assess the performance of estimating the hazard rates for the three transitions in a given time interval using the three conditional PFS and OS data points for that interval. Overall, the coverage probabilities of the 95% credible intervals for the hazard rates can be considered acceptable. (See Section C.3 for more detail.) However, more elaborate simulation studies are recommended to investigate this in more detail.
To facilitate parameter estimation, we opted to use three conditional PFS and three conditional OS data points for each time interval. In principle, two conditional PFS and OS data points (i.e. four data points in total) would be sufficient to estimate the three constant hazard rates corresponding to the possible transitions in a given interval. When the number of events are small, however, the MCMC algortithm may not be able to distinguish between the “correct solutions” for the three hazard rates and alternative solutions corresponding to a situation where either or equates zero. By using three conditional PFS and OS data points for each time interval, this is less likely to be the case. However, a longer time interval for which the hazards are assummed to be constant is needed (e.g. a 3-month interval) than if two conditional PFS and OS data points would have been used (e.g. a 2-month interval). Further research is needed to define the optimal length of intervals and number of data points to use per interval given the oberved number of events in an interval, the population size at risk at the beginning of the interval, and rate of change of the hazards over time. In our example, we also used a likelihood constraint to avoid or equate zero.
For the illustrative example we used a subset of the theoretically possible competing models available within the proposed framework by making simplifying structural assumptions to facilitate parameter estimation. First, we assumed that a relative treatment effect for the stable-to-death transition was not needed reflecting the belief that differences in survival between treatments are only due to differences in delayed or avoided tumor progression, and not due to other treatment related (adverse) events. Second, we assumed that a function more complex than a first order fractional polynomial with a constant relative treatment effect was not required for the progression-to-death transition because this transition is conditional upon experiencing progression and modeled in relation to follow-up time. One could even argue that a relative treatment effect for this transition is not needed when treatment is discontinued upon progression. However, the DIC indicated that adding this parameter to the models resulted in a meaningful improvement for the example analyses, which is primarily related to the PBDC trials. Reasons to include a treatment effect parameter for the progression-to-death transition is treatment cross-over upon progression in a subset of trials, if post-progression treatment between trial arms differ, or if the post-progression mortality patterns is expected to differ between treatments due to other reasons. Third, out of the possible fractional polynomials, we only evaluated exponential, Weibull and Gompertz models and their extensions where the additional parameter related the log-hazard to time or log-time. We did not consider any of the negative power transformations of time, primarily because these functions do not link to known survival distributions and the 2nd order models we did use have already the flexibility to capture arc-shaped hazard functions.
This brings us to the point of model selection. When evaluating competing models, both model fit criteria as well as clinical or biologic plausibility need to considered. In this paper we used DIC to inform model selection among the presented alternatives, but estimating the total residual deviance and comparing this to the number of datapoints would be a useful addition as it tells us how well the models are actually fitting the data. Evaluating all possible competing models available within the proposed framework regarding fit of to the data is not feasible from a practical perspective. Even if we are estimating DIC using only a small number of exploratory MCMC samples, the computational burden is still substantial. A priori, we need to define the relevant subset of models to compare. Factors to consider include: (i) the required flexibility to capture time-related patterns of the hazard functions for the different transitions; (ii) which transitions do we expect to vary by treatment; (iii) does treatment only impact scale or also shape parameters; (iv) how do we incorporate between-study heterogeneity; and (v) availability of data in relation to the number of parameters to estimate. In general, for the stable-to-progression transition, the degree of flexibility of the considered function is arguably most important as these hazards may vary substantially over time and between treatments. For the progression-to-death transition, the stability of estimates is arguably most important given the potential need for overall survival extrapolation and the notion that any between-treatment contrasts are most likely primarily observed for the stable-to-progression transitions. For the stable-to-death transition, the importance of the appropriate function depends on the expected hazard rates in comparison to the rates for the other transitions. If the stable-to-death transition rates are relatively low, model mis-specification may have a limited impact. In addition, these transitions may reflect background-mortality and, as such, we do not need treatment effect-parameters. Future research is needed to inform a model selection strategy or algorithm that results in a set of models that is likely to cover the distribution of transition rates between the health states, results in realistic extrapolations over time, and, given the computational burden of the more complex models for large datasets, can be evaluated in a reasonable amount of time.
The proposed evidence synthesis framework relates directly to clock forward time-inhomogeneous Markov models where treatment specific transition rates between health states are only a function of time in the model. A frequently used approach for cost-effectiveness analysis of cancer treatments are partitioned survival models. However, the main limitation is that extrapolated parametric PFS and OS curves for a given treatment may cross. This will not be case with Markov models and, as such, are preferred as long as time-varying transition rates between health states can be estimated that reflect the actual PFS and OS of the treatments compared. As far as we know, the method presented in this paper is the first to facilitate this based on reported aggregate level data. In order to obtain the parameter estimates for a cost-effectiveness analysis we need to define a baseline model and a NMA model. The baseline model provides estimates for the absolute effect with the reference treatment, which in this case are the time-varying log-hazard rates between each of the three health states. The NMA model provides estimates of the relative treatment effects of each intervention in the network relative to the reference treatment, which in this case are the time-varying log hazards ratios. The absolute effect with each treatment is obtained by adding the relative treatment effects from the NMA to the absolute effect with the reference treatment from the baseline model, and subsequently transforming these to the natural scale by inverting the log-link function ^{40}. In the current example we used the RCT evidence base to estimate the baseline meta-analysis model as well as the NMA model. However, for an actual cost-effectiveness analysis it may be more appropriate to use a different evidence base that better reflects expected outcomes with the reference treatment for the baseline model, such as a long-term routine practice observational studies.
The estimates obtained with the proposed evidence synthesis models can also be used in semi-Markov individual simulation models (i.e. models where some transitions are affected by time in an intermediate state). For example, imagine a cost-effectiveness model of first-line cancer treatment consisting of the three health states stable, progression, and death. The stable-to-progression and stable-to-death transitions can be estimated based on first-line trials using the proposed multi-state (network) meta-analysis method. The progression-to-death transitions in these trials no longer represent current standard of care and we need to estimate these transition rates based on overall survival data from second line trials of current treatment using a separate (network) meta-analysis model. Similarly, we can use the proposed evidence synthesis models in the context of semi-Markov simulation models describing sequential cancer treatment. Imagine a model consisting of four health states: 1) stable disease with first line treatment, 2) progression with first line treatment/stable disease with second line treatment, 3) progression with second line treatment, and 4) death. First-line treatment transitions from stable-to-progression and stable-to-death are estimated with one multi-state (network) meta-analysis model based on first line trials, and the second-line treatment transitions from stable-to-progression, stable-to-death, and progression to death (reflecting third line treatment and beyond) are estimated with another multi-state (network) meta-analysis model based on second line trials. In general, for the transitions in a simulation model for which the “clock is reset” a separate multi-state (network) meta-analysis needs to be performed.
Time-varying transition rates from an intermediate health state, e.g. progressed disease, are typically modeled as a function of time since entering that state using (time-reset) semi Markov models. For clock-forward Markov models, where transition rates are only a function of time in the model, typically constant transition rates are used for the transitions from intermediate health states. However, we want to highlight that this is not a requirement and time-varying transition rates in a clock-forward Markov model can be defended in certain situations. For example, a monotonically decreasing hazard function (corresponding to Weibull distribution) for the progression-to-death transition means that an individual progressing after (say) 6 months has a greater probability of dying in the subsequent month than (say) an individual who progressed after 24 months. This reflects the possible scenario that more severe individuals or individuals without any treatment response are more likely to die faster once progressed than less severe patients who did show an initial response and progressed slower. In fact, this is a potential benefit of this multi-state evidence synthesis method. Separate estimation of transition rates between health states cannot capture this aspect. To capture differential patterns between treatments, a relative treatment effect for the progression-to-death transition can be incorporated in the evidence synthesis model.
All studies provided PFS and OS Kaplan-Meier data in the example analyses. In principle, the NMA model can be extended to create a shared-parameter model to incorporate studies that only provide information for PFS or only for OS. Studies with only PFS data provide evidence regarding the stable-to-progression and stable-to-death transitions and contribute to estimating the corresponding treatment specific hazard ratios if these are assumed fixed or exchangeable across all studies providing direct or indirect evidence for that particular intervention. (When a meta-analysis of absolute effects with the overall reference treatment is performed, the fixed effects or exchangeability assumption applies to the transition rates.) Incorporating studies that only provide OS data for a particular intervention in the NMA will require the additional assumption of fixed or exchangeable rates for one of the transitions across all studies for that intervention, if treatment is assumed to impact more than just the stable-to-progression transition in order to facilitate parameter estimation. A related topic for future research is whether and how this framework can be used to validate PFS as a surrogate for OS and to predict OS for novel interventions for which only mature PFS is available. This will be of great benefit for cost-effectiveness analyses.
5 Conclusion
We introduced a method for the joint meta-analysis of PFS and OS that is based on a non-homogenous Markovian tri-state transition model. Arbitrary hazard rate functions can be approximated by piecewise constant hazard rates at successive time intervals, and are flexibly modeled as (fractional) polynomial functions of time. The proposed approach relaxes the proportional hazards assumption, extends to a network of more than two treatments, and simplifies the parameterization of decision and cost-effectiveness analyses. The data needed to run these analyses can be extracted directly from published survival curves.
Data Availability
Dataset is available upon reasonable request.
6 Conflict of interest
The authors have no conflict of interest related to this work to report.
7 Funding
This work was supported by the UCSF Academic Senate Committee on Research (Academic Senate RAP Grant). The funders had no role in the preparation, review, or approval of the manuscript or the decision to submit the manuscript for publication.
8 Data availability statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Appendices
A Constructing dataset for analyses
Data inputs required are the coordinates extracted from digitally scanned PFS and OS Kaplan-Meier curves: time points (u),corresponding survival probabilities (s_{u}), and corresponding population size at risk (n_{u}). These points must capture all steps in the curve, and may require adjustments to the extracted coordinates to ensure the survival probabilities are decreasing with time. For both curves it should include the times at which numbers at risk are reported below the curve.
The total follow-up time can be partitioned into M successive non-overlapping intervals indexed by m = 1, …, M. We refer to interval m as U_{m} and write u ∈ U_{m} to denote u_{m} ≤ u < u_{m+1}. The length of U_{m} is Δu_{m} = u_{m+1} − u_{m}. For each time interval m, we want to obtain four data points: At the beginning of the interval, u_{m}; at 1/3 of the length of the interval, , which we define as ; at 2/3 of the length of the interval, , which we define as ; and at the end of the interval, u_{m+1}. It is desirable to have the time intervals defined in such a way that (some) of these time points are aligned with the time point for which the size of the at-risk population is reported below the published Kaplan-Meier curves, and are the same for PFS and OS where available. For the current study, we used intervals with a length of three months.
If no PFS or OS proportion have been recorded for a specific time point of interest (i.e. whole months), a corresponding value for s_{u} can be obtained by linear interpolation of the first available extracted scanned survival proportions before and after this time point.
When the population n_{u} is not reported below the PFS and OS Kaplan-Meier curve for certain time points u, it can be imputed. First, based on the reported size of the at-risk population at subsequent time points (n_{u+1}), n_{u} will be estimated according to . With this ‘backward calculation’ approach we implicitly assume that censoring occurs before the events happen within a time interval. However, this approach is not feasible if there is no information regarding the at-risk population for time intervals beyond the at-risk population reported at a certain time point. In other words, this approach is only feasibly for intervals up to the latest time point for which population is reported. Next, n_{u} will be estimated according to . The disadvantage of this ‘forward calculation’ approach is that censoring is ignored and the sample size potentially too large for those timepoints. For intervals where both and was calculated, the actual estimate for the population at-risk is calculated as: to ensure the sample size is not overestimated. For time points where could not be calculated, .
Based on the subsequent s_{u} for the four points at each interval (i.e. , and ) three conditional survival proportions are obtained: , and . The corresponding sample sizes are defined as . The corresponding observed number of patients who have not yet experienced progression or death are calculated according to .
Applying this algorithm to PFS and OS of each arm i of each trial k, we get a data set with and .
We set-up the event dataset such that every row represents one time interval with , and corresponding to , and u_{m+1}. In addition, each row has a variable related to follow-up time , three variables related to , and , the study number, and study-arm number within that study.
In additon to the event dataset, we create a study dataset indicating the compared interventions in each study along with the number of study arms.
B States and between-state transition rates
B.1 Dynamic transitions - Problem specification
Figure 1 represents a closed dynamic system (S_{ik}(u) + P_{ik}(u) + D_{ik}(u) = 1) whose evolution is determined by a known initial condition at time u = 0 and three differential equations: with , and the time-varying hazard rates for the transitions in the figure.
B.2 Aproximating arbitrary , and
We can approximate arbitrary hazard rate functions with a set of discontinuous constant hazard rates over successive time intervals. We prefer this approximation because the system Equation A1 can be solved analytically when the transition rates are constant using the the eigenvalue method for first-order differential equations. For u ∈ U_{m} Equation A1 become:
B.3 Analytic solutions for S_{ik}(u), P_{ik}(u), and D_{ik}(u) where u ∈ U_{m}
Write the system in Equation A2 in matrix form: with the obvious notational correspondence between the two equations. For u ∈ U_{m} the system is homogenous and its general solution is the superposition: where λ_{1,ik}, λ_{2,ik}, and λ_{3,ik} are the eigenvalues of the coefficient matrix A_{ik}. v_{1ik}, v_{2ik} and v_{3ik} are the corresponding eigenvectors, and c_{1,ik}, c_{2,ik}, c_{3,ik} scalar constants to be identified from the initial condition in Equation A2. In our case: The eigenvectors are:
B.3.1 Identification of constants in the general solution
The constants c_{1,ik}, c_{2,ik}, c_{3,ik} are identied from the proportions at the beginning of U_{m}. Setting u = u_{m} in the general solution, and using the initial condition in Equation A1 and Equation A2 we obtain: where v_{xy,ik} is element x of eigenvector y.
B.3.2 Solution for S_{ik}(u), u ∈ U_{m}
Substituting c_{1,ik}, c_{2,ik}, c_{3,ik} from Equation A7 in Equation A4 we obtain for S_{ik}(u) :
B.3.3 Solution for P_{ik}(u), u ∈ U_{m}
Substituting c_{1,ik}, c_{2,ik}, c_{3,ik} from Equation A7 in Equation A4 we obtain for P_{ik}(u):
B.3.4 Solution for D_{ik}(u), u ∈ U_{m}
Using Equation A1, Equation A8, and Equation A9 we obtain:
C Online supplement
C.1 Kaplan-Meier curves
C.2 JAGS code random effects NMA model
Example JAGS code corresponding to model SP Weibull RE2; SD exponential; PD Weibull FE1(scale) has been presented here.
model{ # Likelihood for conditional survival probabilities for (j in 1:Nd){ r_cond_pfs1[j]∼dbinom(cond_pfs[j,1], n_cond_pfs1[j]) r_cond_pfs2[j]∼dbinom(cond_pfs[j,2], n_cond_pfs2[j]) r_cond_pfs3[j]∼dbinom(cond_pfs[j,3], n_cond_pfs3[j]) r_cond_os1[j]∼dbinom(cond_os[j,1], n_cond_os1[j]) r_cond_os2[j]∼dbinom(cond_os[j,2], n_cond_os2[j]) r_cond_os3[j]∼dbinom(cond_os[j,3], n_cond_os3[j]) } # transformation of conditional survival probabilities to survival probabilities p4[1]<-1 p5[1]<-0 for (j in 2:Nd){ p4[j]<-equals(a[j]-a[j-1],0)*p[(j-1),4] + (1-equals(a[j]-a[j-1],0))*1 p5[j]<-equals(a[j]-a[j-1],0)*p[(j-1),5] + (1-equals(a[j]-a[j-1],0))*0 } for (j in 1:Nd){ cond_pfs[j,1]<-p[j,1]/p4[j] cond_pfs[j,2]<-p[j,4]/p4[j] cond_pfs[j,3]<-p[j,7]/p4[j] cond_os[j,1]<-(p[j,1]+p[j,2])/(p4[j]+p5[j]) cond_os[j,2]<-(p[j,4]+p[j,5])/(p4[j]+p5[j]) cond_os[j,3]<-(p[j,7]+p[j,8])/(p4[j]+p5[j]) # transformation of survival probabilities in time-varying hazards # for transitions between health states p[j,1]<- p4[j]*exp(-(h.sd[j]+h.sp[j])*dt[j,1]) p[j,2]<- p5[j]*exp(-h.pd[j]*dt[j,1])+p4[j]*h.sp[j]*(exp(-(h.sd[j]+h.sp[j])*dt[j,1]) -exp(-h.pd[j]*dt[j,1]))/(h.pd[j]-h.sp[j]-h.sd[j]) p[j,3]<-1-(p[j,1]+p[j,2]) p[j,4]<- p4[j]*exp(-(h.sd[j]+h.sp[j])*dt[j,2]) p[j,5]<- p5[j]*exp(-h.pd[j]*dt[j,2])+p4[j]*h.sp[j]*(exp(-(h.sd[j]+h.sp[j])*dt[j,2]) -exp(-h.pd[j]*dt[j,2]))/(h.pd[j]-h.sp[j]-h.sd[j]) p[j,6]<-1-(p[j,4]+p[j,5]) p[j,7]<- p4[j]*exp(-(h.sd[j]+h.sp[j])*dt[j,3]) p[j,8]<- p5[j]*exp(-h.pd[j]*dt[j,3])+p4[j]*h.sp[j]*(exp(-(h.sd[j]+h.sp[j])*dt[j,3]) -exp(-h.pd[j]*dt[j,3]))/(h.pd[j]-h.sp[j]-h.sd[j]) p[j,9]<-1-(p[j,7]+p[j,8]) # constraint to avoid “closing off” SD or PD path constraint.sd[j] ∼ dinterval(h.sd[j], EPSILON[1]) # h.sd[j] > EPSILON[1] (e.g. >0.001) constraint.pd[j] ∼ dinterval(h.pd[j], EPSILON[2]) # h.pd[j] > EPSILON[2] (e.g. >0.001) # decribe hazards as a function of time log(h.sp[j])<- alpha[s[j],a[j],1]+alpha[s[j],a[j],2]*timetrans1[j] log(h.sd[j])<- alpha[s[j],a[j],3] log(h.pd[j])<- alpha[s[j],a[j],4]+alpha[s[j],a[j],5]*timetrans1[j] } # random effects model for (i in 1:Ns){ for (k in 1:na[i]){ alpha[i,k,1]<-mu[i,1]+delta[i,k] alpha[i,k,2]<-mu[i,2]+d[t[i,k],2]-d[t[i,1],2] alpha[i,k,3]<-mu[i,3] alpha[i,k,4]<-mu[i,4]+d[t[i,k],3]-d[t[i,1],3] alpha[i,k,5]<-mu[i,5] } w[i,1]<-0 delta[i,1]<-0 for (k in 2:na[i]){ delta[i,k]∼dnorm(md[i,k],taud[i,k]) md[i,k]<-d[t[i,k],1]-d[t[i,1],1] +sw[i,k] w[i,k] <- (delta[i,k] - d[t[i,k],1] + d[t[i,1],1]) sw[i,k] <- sum(w[i,1:(k-1)])/(k-1) taud[i,k] <- tau *2*(k-1)/k } } # priors for (i in 1:Ns){ mu[i,1:5] ∼ dmnorm(prior_mean_mu[1:5],prior_varcov_mu[,]) } d[1,1]<-0 d[1,2]<-0 d[1,3]<-0 for (T in 2:Nt){ d[T,1:3] ∼ dmnorm(prior_mean_d[1:3],prior_varcov_d[,]) } sd∼dunif(0,2) tau<-1/(sd*sd) # output for (T in 2:Nt){ for (u in 1:48){ log(HR.SP[1,T,u])<-(d[T,1]-d[1,1])+(d[T,2]-d[1,2])*log(u) log(HR.SD[1,T,u])<-0 log(HR.PD[1,T,u])<-(d[T,3]-d[1,3]) } } Mu_mean[1]<- -3.563312141 Mu_mean[2]<- 0.486802196 Mu_mean[3]<- -4.580999291 Mu_mean[4]<- -3.488028822 Mu_mean[5]<- 0.174942184 for (T in 1:Nt){ Alpha1[T]<-Mu_mean[1]+d[T,1] Alpha2[T]<-Mu_mean[2]+d[T,2] Alpha3[T]<-Mu_mean[3] Alpha4[T]<-Mu_mean[4]+d[T,3] Alpha5[T]<-Mu_mean[5] } for (T in 1:Nt){ for (u in 1:48){ log(HAZARD.SP[T,u])<-(Alpha1[T])+(Alpha2[T])*log(u) log(HAZARD.SD[T,u])<-(Alpha3[T]) log(HAZARD.PD[T,u])<-(Alpha4[T])+(Alpha5[T])*log(u) }} for (T in 1:Nt){ P.S[T,1]<-1*exp(-(HAZARD.SD[T,1]+HAZARD.SP[T,1])) P.P[T,1]<-0*exp(-HAZARD.PD[T,1])+1*HAZARD.SP[T,1]*(exp(-(HAZARD.SD[T,1]+HAZARD.SP[T,1])) -exp(-HAZARD.PD[T,1]))/(HAZARD.PD[T,1]-HAZARD.SP[T,1]-HAZARD.SD[T,1]) PFS[T,1]<-P.S[T,1] OS[T,1]<-P.S[T,1]+P.P[T,1] for (u in 2:48){ P.S[T,u]<-P.S[T,(u-1)]*exp(-(HAZARD.SD[T,u]+HAZARD.SP[T,u])) P.P[T,u]<-P.P[T,(u-1)]*exp(-HAZARD.PD[T,u])+P.S[T,(u-1)]*HAZARD.SP[T,u] *(exp(-(HAZARD.SD[T,u]+HAZARD.SP[T,u])) -exp(-HAZARD.PD[T,u]))/(HAZARD.PD[T,u]-HAZARD.SP[T,u]-HAZARD.SD[T,u]) PFS[T,u]<-P.S[T,u] OS[T,u]<-P.S[T,u]+P.P[T,u] } } }C.3 Simulation to assess performance of estimating stable-to-progression, stable-to-death, and progression-to-death hazard rates simultaneously based on conditional PFS and OS data for a given time interval
In order to assess the performance of estimating interval specific hazard rates for the stable-to-progression, stable-to-death, and progression-to-death transitions based on conditional PFS and OS data with the approach as proposed in this paper, we performed a simple simulation study for a single time interval. We set true values for each of the three transitions as follows: h^{SP} = {0.1, 0.2, 0.3, 0.4, 0.5}, h^{SD} = {0.02, 0.05, 0.1}, and h^{SP} = {0.05, 0.1, 0.15, 0.2, 0.25}. The sample size was assumed to be 1000. Based on the true rates and sample size, the number of PFS and OS events for four timepoints over a 3-month time interval were determined and subsequently transformed in three conditional PFS events and three conditional OS events according the approach described in the Appendix. This data was used to estimate h^{SP}, h^{SD}, and h^{P D} according to Equation 4, Equation 5, and Equation 6 for each simulation run. One hundred simulation runs were performed for each scenario.
Results are presented in Table A1, Figure A14, Figure A15, and Figure A16. The mean of the estimates for h^{SP}, h^{SD}, and h^{P D} across the hundred simulaton runs are close to the true hazard rates. The mean of the standard deviation of the posterior distribution for log(h^{SP}) (equivalent to the mean of the standard error of the estimate with a frequentist analysis) across the hundred simulaton runs and the standard deviation of the estimates for log(h^{SP}) across the same simulation runs were relatively similar across the different scenarios. However, for h^{SD}, and expecially for h^{P D}, the mean standard deviation of the posterior distribution is larger than the standard deviation of the estimates indicating that the obtained 95% credible intervals for these transitions with the proposed approach seem conservative. The proportion of 95% credible intervals that include the true hazard rate over the hundred simulation-runs is about 95% for h^{SP} but may be higher or less than 90% depending on the true rate of each of the three transitions as depicted in Figure A14. For h^{SD} and h^{P D}, the proportion of 95% credible intervals that include the true hazard rates over the hundred simulation runs seems greater than 95%, indiciting some degree of overcoverage.
Footnotes
Section 2.2 updated: Used three (rather than two) conditional PFS and OS data points per time interval; Section 3.4 and Section C2 updated: Tweaked the jags code to improve estimation. Section C3: Added simple simulation study to assess performance of estimating stable-to-progression, stable-to-death, and progression-to-death hazard rates simultaneously based on conditional PFS and OS data for a given time interval