Conditional Generative Adversarial Networks for Individualized Treatment Effect Estimation and Treatment Selection

Qiyang Ge; Xuelin Huang; Shenying Fang; Shicheng Guo; Yuanyuan Liu; Wei Lin; Momiao Xiong

doi:10.1101/2020.09.28.20203075

Abstract

Treatment response is heterogeneous. However the classical methods treat the treatment response as homogeneous and estimate the average treatment effects. The traditional methods are difficult to apply to precision oncology. The artificial intelligence (AI) is a powerful tool for precision oncology. It can accurately estimate the individualized treatment effects and learn optimal treatment choices. Therefore, the AI approach can substantially improve progress and treatment outcomes of patients. As one of AI approach, conditional generative adversarial nets for inference of individualized treatment effects (GANITE) have been developed. However, the GANITE can only deal with binary treatment and does not provide a tool for optimal treatment selection. To overcome these limitations, we modify conditional generative adversarial networks (MCGANs) to allow estimation of individualized effects of any types of treatments including binary, categorical and continuous treatments. We propose to use sparse techniques for selection of biomarkers that predict the best treatment for each patient. Simulations show that the CGANs outperform seven other state-of-the-art methods: linear regression (LR), Bayesian linear ridge regression (BLR), KNN, random forest classification (RF (C)), random forest regression (RF (R)), logistic regression (LogR) and support vector machine (SVM). To illustrate their applications, the proposed CGANs were applied to 256 patients with newly diagnosed acute myeloid leukemia (AML) who were treated with high dose ara-C (HDAC), Idarubicin (IDA) and both of these two treatments (HDAC+IDA) at M. D. Anderson Cancer Center. Our results showed that the MCGAN can more accurately and robustly estimate the individualized treatment effects than other state-of-the art methods. Several biomarkers such as GSK3, BILIRUBIN, SMAC are identified and a total of 30 biomarkers can explain 36.8% of treatment effect variation.

Introduction

The traditional clinical management estimates the average treatment effects from observational data, assuming the complex disease is homogeneous (Hansen 2024; Kennedy et al. 2017; Liu et al. 2018; Luo and Zhu 2017; Rosenbaum and Rubin 1983; Diamond and Sekhon 2013). Alternative to traditional clinical management, “precision medicine” or “precision oncology” attempts to match the most accurate and effective treatments with the individual patient (Ali and Aittokallio 2019; Shi et al. 2017), rather than using monotherapy that treats all patients. In real world, treatment response is heterogeneous. Therapy should be tailored with the best response possible and highest safety margin to ensure that the right therapy is offered to “the right patient at the right time”(Subbiah and Kurzrock 2018). Precision oncology can substantially improve progress and treatment outcomes of patients. It plays a central role in revolutionizing cancer research. Consequently, alternative to calculating the average effect of an intervention over a population, many recent methods attempt to estimate individualized treatment effects (ITEs) or conditional average treatment effects from observational data (Makar et al. 2019). To accurately estimate the individualized treatment effects and learn optimal treatment choices are a key issue for precision oncology. More accurate estimation of individualized treatment effects, which provide information to guide the individual selection of the target therapies, is essential for the success of precision medicine (Kornblau et al. 2009).

Methods for estimation of individualized treatment effects (ITEs) using observational data largely differ from standard statistical estimation methods. Estimating ITEs and learning optimal treatment strategies raise a great challenge for the following reasons. First, a common framework for treatment effect estimation is the potential outcomes assumptions (Ray and Szabo 2019) where every individual has two ‘potential outcomes’ covering the hypothesized individual’s outcomes with and without treatment. Estimation of ITEs requires estimation of both factual and counterfactual outcomes for each individual. However, only the factual outcome is actually observed. We never observe the counterfactual outcomes (Rosenbaum and Rubin 1983; Chen and Paschalidis 2018; Yoon et al. 2018).

If the effect of each treatment in the subpopulation which is separately estimated is taken as an individual effect, this can create large biases. The estimated effect of each treatment in the subpopulation is still the average effect of the treatment in that subpopulation and is not individualized treatment effect in the subpopulation.

Second, clinical data often have many missing values. Simultaneously imputing both counterfactual values and missing values is not easy. Third, the function forms of the treatment effects which are often nonlinear functions are unknown (Ray and Szabo 2019). Statistical methods and computational algorithms that can efficiently deal with unknown forms of nonlinear functions are still lacking (Lengerich et al. 2019).

Classical works such as random forest and hierarchical models are adapted to estimate heterogeneous treatment effects (Wager and Athey 2015). Recently, machine learning and neural network methods are used to move away from average treatment effect estimation to personalized estimation (Alaa et al. 2017; Johansson et al. 2016; Shalit et al. 2016). AI and causal inferences are becoming a driving force for innovation in precision oncology (Seyhan and Carini 2019). A key issue for ITE estimation is to learn unobserved (missing) counterfactuals. The idea of using generative adversarial networks (GANs) for handling missing data is a very promising approach to imputing counterfactual (Goodfellow et al. 2014; Yoon et al. 2018a). Using conditional GAN (CGAN) to estimate the individualized treatment effects (GANITE) has been developed (Yoon et al. 2018a). The CGANs consist of a generator and a discriminator. The generator (G) observes the factual part of real data and imputes the counterfactuals (missing part) conditioned on observed factual data, and outputs the complete dataset. The discriminator (D) inputs the real dataset and tries to determine which part was actually observed and which part was imputed counterfactuals. The discriminator enforces generator to learn the desired distribution (hidden data distribution) (Yoon et al. 2018b).

However, the original GANITE was designed for estimation of the effects of binary treatment and cannot be applied to continuous and categorical treatments. The treatment variable in the original GANITE is binary variable which only represents the presence and absence of treatment. Therefore, the treatment variable in the original GANITE is unable to quantify the dosage of the treatment, and hence the original GANITE cannot be applied to continuous treatment. To overcome this limitation, we introduce treatment assignment indicator variable and treatment quantity variable. The treatment quantity variable can represent binary treatment, categorical treatment and continuous treatment. We change mathematical formulations of the generator and discriminator and extend GANITE from binary treatment to all types of treatments including binary, categorical and continuous treatments. The modified GANITE is abbreviated as MGANITE.

The GANITE or in general, CGAN has not systematically investigated the estimation of ITE for chemotherapy and other type of treatments in Cancer and compare the results from causal inference using observed data with the results of randomized clinical trial. One of our goals in this manuscript is to examine whether MGANITE still works well in cancer research.

In MGANITE, biomarkers that serve as conditioned variables, will be used to estimate the ITEs of both single and multiple treatments (Yoon et al. 2018a; Mirza and Osindero 2014). Sparse techniques will be employed to select biomarkers for prediction of treatment effects and to learn optimal treatment choices of patients (Emmert-Streib F, Dehmer 2019).

In summary, Novelty of the modified GANITE (MGANITE) is summarized as below.

The previous conditional generative adversarial network (CGAN)-based causal inference methods (GANITE) only can estimate the individualized effects of binary treatment and cannot estimate the individualized effects of continuous treatments. The proposed MGANITE is the first time to use modified CGANs for estimation of individualized effects of continuous treatments.
This is the first time to apply CGANs to estimation of ITE in cancer research.
We use randomized clinical tried data to validate the CGAN-based methods for estimation of ITE using observational data.
We develop new network structures for generator and discriminator in the CGANs.
We combined sparse techniques for selection of biomarkers with MGANITE to predict the best treatment for each patient

To evaluate its performance for estimating ITEs, simulations are conducted to estimate ITEs using simulated data and the MGANITE, and to compare its estimation accuracy with seven other state-of-the-art methods (LR, KNN, BLR, RF, and SVM). To further evaluate its performance, the MGANITE is applied to 256 newly diagnosed acute myeloid leukemia (AML) patients, treated with high dose ara-C (HDAC), Idarubicin (IDA) and HDAC+IDA at M. D. Anderson Cancer Center to estimate ITEs and identify the optimal treatment strategy for each patient. Preliminary results from simulations and real data analysis show that the MGANITE outperforms five other state-of-the-art methods. A program for implementing the proposed MGANITE for ITE estimation and optimal treatment selection can be downloaded from our website https://sph.uth.edu/research/centers/hgc/xiong/software.htm.

Materials and Methods

Potential outcome framework for estimation of treatment effects

We assume the Rubin causal model for estimation of treatment effects (Rubin 1974) and modified the approach to individualized treatment effect estimation in Yoon et al (Yoon et al, 2018a). The original GANITE only can estimate ITE of binary treatments, but cannot be applied to categorical and continuous treatments. We develop MGANITE which can estimate ITE of all types of treatments including binary, categorical and continuous treatments by introducing treatment assignment indicator variable and changing formulation of generator and discriminator. Consider K treatments. Let T_k be the k^th treatment variable that can be binary, categorical or continuous, and T = [T₁, …, T_K]^T be the treatment vector. We assume that there is precisely one non-zero component of the treatment vector T, which is denoted by T_η, where η is the index of this component. Each sample has one and only one assigned treatment T_η. To extend the binary treatment to including categorical and continuous treatments, we define the treatment assignment indicator vector M = [M₁, …, M_k, …, M_K]^T as where .

For example, if then η = 2 and

If we consider treated and untreated cases, then K = 2. Let T₁ denote the treatment and T₂ denote no treatment where T₂ = 1. For the sample with the treatment, we have

For the sample with no treatment, we have

Define the vector of potential outcome Y(T) = [Y(T₁), …, Y(T_K)]^T, where Y(T_k) is the potential outcome of the sample under the treatment T_k. When K = 2, the potential outcome Y(T₁) corresponds to the widely used notation for one treatment Y¹, the potential outcome of the treated sample, while the potential outcome Y(T₂) corresponds to Y⁰, the potential outcome of the untreated sample. Only one of the potential outcomes can be observed. The observed outcome that corresponds to the potential outcome of the individual receiving the treatment T_η is denoted by Y(T_η). The observed outcome is called the factual outcome and unobserved potential outcomes are called counterfactual outcomes, or simply counterfactuals. For the convenience of notation, the factual outcome is also denoted by Y_f and the counterfactuals are denoted by Y_cf.

The observed outcome Y_f can be expressed as

When K = 2, we have M₂ = 1 − M₁. The above equation becomes which coincides with the standard expression of the observed outcome for one treatment.

Let X = [X₁, …, X_q]^T be the q-dimensional feature vector. Assume that n individuals are sampled. Let and be the treatment vector, the vector of potential outcomes, and feature vector of the i^th individual, respectively.

The most widely used measure of the treatment effect for the multiple treatment is the pairwise treatment effect. The individual effect between the pairwise treatments: T_j and T _k is defined as , the average pairwise treatment effect . The average pairwise treatment effect τ_jk|T_j on the patients treated with T_j is defined as

The focus of this paper is on the conditional distribution of treatment effect, given the feature vector X. Let F_Y|X(T_k) be the conditional distribution of the potential outcome Y(T_k) under the treatment T_k, given the feature vector X, and F_Y|X(T) be the conditional joint distribution of the potential outcome vector Y(T) under the K treatment T, given the feature vector X. Assume that n individuals are sampled. For the i^th individual, T_η treatment (M_η = 1) is assigned. Let X⁽ⁱ⁾ and be the observed feature vector and the observed potential outcome of the i^th individual. Therefore, the observed dataset is given by . The factual and counterfactual outcomes of thei^th individual are denoted by and , respectively.

To estimate the treatment effects, we often make the following three assumptions (Rubin 1974; Yoon et al. 2018a):

Assumption 1. (Ignorability Assumption). Conditional on X, the potential outcomes, Y(T) and the treatment T are independent,

This assumption requires no unmeasured confounding variables.

Assumption 2. (Common Support). For the feature vector X and all treatment,

Assumption 3. (Stable Unit Treatment Value Assumption). No interference (units do not interfere with each other).

Conditional generative adversarial networks as a general framework for estimation of individualized treatment effects

The key issue for the estimation of individualized treatment effects is unbiased counterfactual estimation. Counterfactuals will never be observed and cannot be tested by data. The true counterfactuals are unknown. Recently developed generative adversarial networks (GANs) started a revolution in deep learning (Luo and Zhu 2017). GANs are a perfect tool for missing data imputation. An incredible potential of GANs is to accurately generate the hidden (missing) data distribution given some of the features in the data. Therefore, we can use GANs to generate counterfactual outcomes.

GANs consist of two parts: “generative” part that is called the generator and “adversarial” part that is called the discriminator. Both generator and discriminator are implemented by neural networks. Typically, a K-dimensional noise vector is input to the generator network that converts the noise vector to a new fake data instance. Then the generated new data instance is input to the discriminator network to evaluate them for authenticity. The generator constantly learns to generate better fake data instances while the discriminator constantly obtains both real data and fake data and improves accuracy of evaluation for authenticity.

Architecture of conditional generative adversarial networks (CGANs) for generating potential outcomes

Features provide essential information for estimation of counterfactual outcomes. Therefore, we use conditional generative adversarial networks (CGANs) (Mirza and Osindero 2014) as a general framework for individualized treatment effect (ITE) estimation. The CGANs for ITE estimation consist of two blocks. The first imputation block is to impute the counterfactual outcomes. The second ITE block is to estimate distribution of the treatment effects using the complete dataset that is generated in the imputation block. The architecture of CGANs is shown in Figure 1.

Figure 1. Scheme of MGANITE for the estimation of potential outcomes.

Both generator and discriminator are implemented by feedforward neural networks. The architectures of the neural networks are described as follows. The generator consists of seven layers of feedforward neural network. The first layer is the covariate input layer that input a vector × of covariates. The second and third layers are hidden layers, each layer with 64 nodes. The fourth layer concatenated the output of the third layer, the response vector Y, treatment vector T and treatment assignment indicator vector M and noise vector Z. The fifth and sixth layers are hidden layers, each layer with 64 nodes. Finally, the seventh layer is the output layer. All activation function of neurons were sigmoid function.

The architecture of the discriminator is similar to the architecture of the generator except for adding one more output layer with sigmoid nonlinear activation function.

Imputation block

To extend the GANTITE from binary treatments to all types of treatments, we introduce the treatment assignment vector and change some mathematic formulation of the generator. A counterfactual generator in the imputation block is a nonlinear function of the feature vector, treatment vector T, treatment assignment indicator vector M, observed factual outcome y_f and K dimensional random vector z_G with uniform distribution z_G∼U((−1,1)^K) where Y_f = Y_η. The generator is denoted by where output represents a sample of G. It can take binary values, categorical values or continuous values. 1 is a vector of 1, ⊙ denotes element-wise multiplication and θ_G is the parameters in the generator. We use to denote the complete dataset that is obtained by replacing with Y_f.

The distribution of depends on the determinant of the Jacobian matrix of the transformation function G(X, Y_f, T, M, z_G, θ_G). Changing the transformation function can change the distribution of the generated counterfactual outcomes. Let (y) be the conditional distribution of the potential outcomes, given X = x, T = t, M = m, Y_f = y_f. The goal of the generator is to learn the neural network G such that .

Unlike discriminator in the standard CGANs where the discriminator evaluates the input data for their authenticity (real or fake data), the counterfactual discriminator D_G that maps pairs to vectors in [0,1]^k attempts to distinguish the factual component from the counterfactual components. The output of the counterfactual discriminator D_G is a vector of probabilities that the component represents the factual outcome. Let represent the probability that the i^thcomponent of is the factual outcome, i.e., i = η, where θ_d denotes the parameters in the discriminator. The goal of the counterfactual discriminator is to maximize the probability for correctly identifying the factual component η via changing the parameters in the discriminator neural network D_G.

Loss function

The imputation block in the MGANITE attempts to impute counterfactual outcomes by extending the loss function of binary treatment in the GANITE (Yoon et al. 2018 a) to all types of treatments: binary, categorical or continuous treatments, we define loss function V(D_G, G) as where log is an element-wise operation.

The goal of imputation block is to maximize the counterfactual discriminator D_G and then minimize the counterfactual generator G:

In other words, we train the counterfactual discriminator D_G to maximize the probability of correctly identifying the assigned treatment M_η and quantity of the treatment T_η or Y_f(Y_η), and then train the counterfactual generator G to minimize the probability of correctly identifying M_η and T_η. After the imputation block is performed, the counterfactual generator G produces the complete dataset . Next, we use the imputed complete dataset to generate distribution of potential outcomes and to estimate the ITE via CGANs which is called ITE block.

ITE block

The CGANs consist of three parts: generator, discriminator and loss function which are summarized as follows (Yoon et al. 2018a).

ITE generator

Unlike the ITE in the GANITE where the ITE generator is a nonlinear transform function of only X and Z_I, the ITE generator G_I in the MGANITE is a nonlinear transform function of X, T and Z_I: where is the generated K-dimensional vector of potential outcomes, X is a feature vector, T is a treatment vector, and Z_I is a K-dimensional vector of random variables and follow the uniform distribution Z_I∼𝒰((−1,1)^K). The ITE generator attempts to find the transformation such that .

ITE discriminator

Following the CGANs, we define a discriminator D_I as a nonlinear classifier with or (X,T,Y* = Ŷ) as input and a scalar that outputs the probability of Y^* being from the complete dataset .

Loss function

Again, unlike the loss function in the GANITE where decision function is D_I(X, Y^*), a decision function in the MGANITE is defined as D(X, T, Y^*). The loss function for the ITE block in the MGANITE is then defined as where D_I(X, T, Y^*) is nonlinear classifier that determine whether Y^* is from the complete dataset or from generator G_I.

The goal of ITE block is to maximize the probability of correctly identifying that Y^* is from the complete dataset and to minimize the probability of a correct classification. Mathematically, ITE attempts

The algorithms for numerically solving the optimization problems (4) and (7) are summarized in Supplementary Note.

The learning parameters for the feedforward neural networks are given below. We set batch size equal to 16. We assumed that the learning rates for discriminator and generator were 0.0001 and 0.001, respectively. We further assume that decay rate was 0.1. Learning rate decayed (exponentially) to 10% of the starting learning rate during 70% of total batches, and stay 10% during last 30% batches. The total number of batches was 1,000,000. Adam Optimizer was used to perform optimization. We assume that 20% of the nodes are dropped randomly during training process.

Sparse techniques for biomarker identification

The LASSO (least absolute shrinkage and selection operator) that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the results can be used to select biomarkers for optimal treatment selection (Ali et al. 2019). Let and X⁽ⁱ⁾denote the estimated effect of the k^th treatment and feature vector of the i^th individual, respectively. Let

The outputs of the neural networks are in general continuous function even if the potential outcomes are binary. For the convenience of presentation, we assume that the treatment effects are continuous regardless of the potential outcomes are binary, categorical or continuous.

The LASSO estimators for identifying biomarkers that predict treatment effects are given by

Where ‖. ‖_F is the Frobenius norm of the matrix

Nonzero elements β_jl ≠ 0 predict treatment effect variation and hence its correspondence X_j = can be used as biomarkers for investigation of the l^th treatment.

For the continuous treatment, we define the treatment matrix T and its associated coefficient matrix Γ:

Equation (8) should be changed to where λ₁, λ₂ are penalty parameters.

Biomarker identification for optimal treatment selection

Consider K treatments. Let be the K-diemnsional vector of the estimated potential outcomes for the i^th individual and be the index of the optimal potential outcomes of the i^th individual. To select biomarkers for optimal treatment selection, we define the following LASSO:

Solving the above categorical LASSO problem, we obtain a set of non-zero coefficients that are denoted as , L. The covariates that correspond to the non-zero coefficients of the LASSO solution are chosen as biomarkers for optimal treatment selection.

Again, for the continuous treatment, equation (10) needs to be changed to

Data collection

The proposed MGANITE was applied to 256 newly diagnosed acute myeloid leukemia (AML) patients, treated with high dose ara-C (HDAC), Idarubicin (IDA) and HDAC+IDA at M. D. Anderson Cancer Center. There were 212 valid samples and 85 useable features (14 discrete and 71 continuous), including 51 total and phosphoprotein from several biological processes such as apoptosis, cell-cycle and signal transduction pathways (Kornblau et al. 2009). Among 212 valid samples, 37 were treated with HDAC, 9 were treated with IDA and 54 were treated with HDAC+IDA, and 112 were treated by other drugs.

Prediction accuracy was defined as the proportions of correctly predicted potential outcomes. False positive rate that was defined as the proportion of individuals who were wrongly classified as positively treatment response. Discriminator accuracy is defined as proportion of correctly classified real or fake samples. Replication error is defined as cross entropy where ŷ_f = , t = t_* and separate distance is defined as where , t ≠ t_*.

Results

Simulations

We first examine the performance of the MGANITE in estimating the ITE of binary treatment using simulations. A synthetic dataset was generated as follows. A total of 10,000 individuals with 30-dimentinal feature vectors followed the normal distributions N(0, I). Let and

Then, the potential outcomes were generated as

Treatment was assigned by the Bernoulli distribution: where , n_t∼N(0,0.1), and Bern represented the Bernoulli distribution. Treatment effect can take three values of 1, 0 and -1. In other words,

We compared MGANITE with linear regression (LR) (Makar et al. 2019), logistic regression (LogR) (Emmert-Streib and Dehmer 2019; Makar et al. 2019), support vector machine (SVM) (Makar et al. 2019), k-nearest neighbor (k-NN) (Crump et al. 2008), Bayesian linear regression (BLR) (Johansson et al. 2016), causal forest (CForest) (Wager and Athey 2015), and random forest (RForest) (Breiman 2001). We used six methods: MGANITE, LR, LogR, SVM, kNN and RForest to estimate the counterfactual potential outcomes and calculated the mean square error (MSE) between the estimated treatment effect and the true treatment effect, standard deviation (STD) and prediction accuracy defined as the proportions of correctly predicted potential outcomes. Table 1 presents MSE, STD and prediction accuracy of six methods to fit the generated data. We observed that the CGANs more accurately estimated the potential outcomes than the other five state-of-the-art methods. Figure 2 presented the true counterfactuals and estimated counterfactuals using MGANITE. We observed that the MGANITE reached remarkably high accuracy for estimating counterfactuals.

View this table:

Table 1. Performance of six methods for estimating the potential outcomes.

Figure 2A.

The true potential outcomes with treatment Y¹ and estimated potential outcomes ŷ¹ using MGANITE, where x axis denoted a value of covariate X₁, y axis denoted the potential outcome, dot in blue color represented the true outcome Y¹ and dot in red color represented the estimated outcomes ŷ¹.

Figure 2B.

The true potential outcomes without treatment Y⁰and estimated potential outcomes ŷ⁰ using MGANITE, where x axis denoted a value of covariate X₁, y axis denoted the potential outcome, dot in blue color represented the true outcome Y⁰ and dot in red color represented the estimated outcomes ŷ⁰.

The treatment effect estimation of eight methods were summarized in Table 2. Table 2 showed that the MGANITE had the highest accuracy of estimation of all treatment effects: average treatment effect (ATE), average treatment effects on the treated (ATT) and average treatment effect on the control (ATC), followed by random forests. We observed that the estimation of ATE using all methods were inflated. The inflation rates of ATE using MGANITE and RForect were 3.9% and 7.9%, respectively. The SVM reached the inflation rate of the estimation of ATE as high as 29.8%. All inflation rates of estimation of ATE using LR, LogR, SVM, KNN and BLR were very high. The simulations also showed that the false positive rate that was defined as the proportion of individuals who were wrongly classified as positively treatment response using CGANs, LR, LogR, SVC, KNN(5), KNN(10), BLR, causal forest and random forest were 3.9%, 24.7%, 28.1%, 29.8%, 28/1%, 19.7%, 25.3%, 9% and 8.4%, respectively. The results showed that false positive rates of LR, LogR, SVM, KNN and BLR for prediction of positive treatment response were too high to be applied to treatment selection. Even random forest reached the false positive rate as high as 8.4%. Table 2 also showed that the number of individuals that showed positive treatment effect increased while the number of individuals that showed no treatment effect decreased from ground truth.

View this table:

Table 2. Treatment effects estimated for simulation data using nine methods.

Next we examine the performance of the MGANITE in estimating the ITE of continuous treatment using simulations. A synthetic dataset was generated as follows.

Draw the covariate variable X from the standard normal distribution for 10,000 individuals.
The treatment T is exponentially distributed as P(t) = e^−(t−1), t ≥ 1. Define g(t) = 0.1t².
Define a nonlinear function
Define , where is a randomly sampled noise variable from a normal distribution N(0, 0.01).
Define , where is a randomly sampled noise variable from a normal distribution N(0, 0.01).
Treatment assignment indicator variable M_i is drawn from a Bernoulli distribution with P = 0.5 for each subject.

The mean square errors (MSE) for the MGANITE, Linear Regression, KNN, Bayesian ridge regression, random forest regression and support vector machine regression were 0.011004916, 0.08500695, 0.012520364, 0.085007192, 0.014281599, 0.013962992, respectively. Figures 3 A and 3B plotted the true ITE and estimated ITE for in-samples and out-of-samples data, using six methods: MGANITE, LR, KNN, BLR, RF (R), and SVM, respectively, where dash straight line indicated that the true ITE and estimated ITE were equal. We observed from Figures 3A and 3B that many green cross points for both in-sample and out-of-sample data were much closer to the dash straight line than other types of points. This showed that the estimated ITE points using MGANITE were much closer to the true ITE point than using other five methods. In other words, the estimator of ITE using MGANITE was more accurate than that using other five methods. The results clearly demonstrated that the MGANITE outperformed 5 other state-of-the-art treatment effect estimation methods.

Figure 3A.

True ITE and estimated ITE for in-sample data using six methods: MGANITE, LR, KNN, BLR, RF (R), and SVM, where MGANTE was denoted by green cross point, LR was denoted by orange point, KNN was denoted by green point, BLR was denoted by red point, RF (R) was denoted by purple point and SVM was denoted by dark red point, x axis denoted the true ITE and y axis denoted the estimated ITE.

Figure 3B.

True ITE and estimated ITE for out-of-sample data using six methods: MGANITE, LR, KNN, BLR, RF (R), and SVM, where MGANTE was denoted by green cross point, LR was denoted by orange point, KNN was denoted by green point, BLR was denoted by red point, RF (R) was denoted by purple point and SVM was denoted by dark red point, x axis denoted the true ITE and y axis denoted the estimated ITE.

To further evaluate the performance of MGANITE, we provided Figure 4 that plotted receiver operating characteristic (ROC) curve for evaluation of the ability of the MGANITE to predict potential outcomes of treatment. Our calculation showed that area under the ROC curve (AUC) for MGANITE reached 0.98, a very high value. The ROC curve and AUC value demonstrated that power of the MGANITE for prediction of the potential outcomes of the treatments was very high.

Figure 4. Receiver operating characteristic (ROC) curve for evaluation of performance of MGANITE.

Real Data Analysis

The proposed MGANITE was applied to 256 newly diagnosed acute myeloid leukemia (AML) patients for clinical trial dataset (Kornblau et al. 2009).We first presented the results of treatment HDAC, HDAC+IDA (101) vs all other drugs (111). A key issue for MGANITE is how to train MGABITE. To tracking the training process of MGANITE, we presented Figure 5 that showed ATE, discriminator accuracy, replication error and separate distance curves as a function of number of batches.

Figure 5.

ATE, discriminator accuracy, replication error and separate distance curves as a function of number of batches where x axis denoted the number of batches, y axis denoted values of ATE, discriminator accuracy, replication error and separation distance for ATE, discriminator, replication and separation curves, respectively, red, orange, blue and green curves were ATE, discriminator, replication and separation curves.

We observed from Figure 5 that discriminator accuracy converged to 1, replication error converged to zero, separation distance converged to a constant and ATE converged to a stable value. Figure 4 demonstrated that MGANITE was trained very well.

Next we compare the treatment effect estimations using nine methods: MGANITE, LR, LogR, SVM, KNN(5), KNN(10), BLR, Rforest (C) and Rforest (R) where 5 and 10 were the number of neighbors. Treatment was HDAC or HDAC+IDA, and 85 protein expressions and other geographical variables were used as covariates. The response status (response or no response) was used as the outcome.

Table 3 summarized results of estimation of HDAC treatment effect using MGANITE and other eight methods where individuals with HDAC or HDAC+IDA were taken as the treated population, individuals with other drugs were taken as the control population. Comparison of treatment effect estimation algorithms on real data analysis is not easy because of the lack of ground truth treatment effects and small sample sizes. In general, using MGANITE, we observed that the majority of individuals who were treated by other drugs did not show any response and that 65% of the individuals who were treated by HDAC or HDAC+IDA responded. Only 13.5% of individuals who were treated by other drugs responded. To illustrate the difference between the estimated treatment effect and treatment response, we presented Figure 6 that showed the histogram of the estimated effects of the treatments HDAC or HDAC+IDA vs other drugs using MGANITE (Figure 6A), and observed the number of the response of the individuals in the population who were treated with HDAC or HDAC+IDA vs other drugs (Figure 6B). ITE was calculated based on both of factual and counterfactual. We observed that ITE = 0 consisted of two scenarios: (1) no response of the patients to any drugs and (2) response of the patients to both HDAC or HDAC+IDA, and other drugs. A proportion of the patients with response to HDAC or HDAC+IDA on the right side of Figure 6B and the patient with response to other drugs on the left side of the Figure 6B had ITE = 0. The observed response of the patients to one drug did not imply that these patients would not respond to other drugs. However, ITE = 1 or ITE = 0 implied that the patients responded to only one type of drug.

View this table:

Table 3. Treatment effects estimated for AML dataset using nine methods.

Figure 6A.

Histogram of estimated drug treatment effect using CGANs, where x axis denoted the value of ITE and y axis denoted the number of patients, ITE = +1 denoted the ITE of patients treated with HDAC or HDAC+IDA, ITE = −1 denoted the ITE of patients treated with other drugs, and ITE = 0 denoted the ITE of two groups of patients: one group of the patients treated with HDAC or HDAC+IDA and another group of the patients treated with other drugs.

Figure 6B.

Histogram of observed drug treatment response where x axis indicated three scenarios as described in Figure 4B and y axis denoted the number of patients, the right side in the Figure 4B denoted the number of the patients only responding to the HDAC or HDAC+IDA, the middle denoted the number of the patients that responds to both (HDAC or HDAC+IDA) and other drugs or did not respond to both (HDAC or HDAC+IDA) and other drugs, and the left side denoted the number of patients only responding to the other drugs.

To further compare the performance of the MGANITE and other methods for evaluation of ITE, we split a given data set into an in-sample dataset (190 samples), used for the initial parameter estimation and model selection, and an out-of-sample dataset (22 samples), used to evaluate performance of ITE estimation. The results were summarized in Table 4. We observed that difference in the estimated ATT, ATC, ATE and proportions of ITE between In-samples and out-of-samples using MGANITE were much smaller than using other methods. This showed that the ITE estimation using MGANITE was more robust than using other methods. We calculated the Kullback-Leibler (K-L) divergence between the distributions of ITE using in-sample and out-of-samples and nine methods. The results were summarized in Table 5. Table 5 showed that K-L divergence using MGANITE was much smaller than that using other methods, which implied that MGANITE was more robust than other eight methods.

View this table:

Table 4. Treatment effects estimated for AML dataset using nine methods.

View this table:

Table 5. K-L divergence between the distribution of ITEs using in-samples and out-of-samples.

LASSO was used to identify biomarkers for prediction of treatment effect and treatment selection. Table 6 listed the top 30 biomarkers identified by LASSO. All top 30 biomarkers explained 36.82% of variation of HDAC or HDAC+IDA treatment effect. The top Gene GSK3 accounted for 4.4% of the explanation of treatment effect variation.

View this table:

Table 6. Top ranking variables for explanation of treatment effect variation.

Garson’s algorithm (Garson 1991; Zhang et al. 2018; Siu 2017) that describes the relative magnitude of the importance of input variables (biomarkers) in its connection with outcome variables (ITE) of the neural network can also be used to identify biomarkers for predicting the ITE. Top 30 biomarkers identified by Garson algorithm were listed in Table S1 where relative contribution of each biomarker to the ITE variation and cumulative contribution of biomarkers to the ITE variation were also listed in Table S1. The correlation coefficient between the importance ranking of the markers using Garson algorithm and LASSO was only -0.05.

View this table:

Table S1. Top 30 biomarkers identified by Garson algorithm.

Next, we study the joint estimation of effects of the multiple treatments. The number of individuals that were treated with HDAC, HDAC+IDA and other dugs was 37, 54 and 121, respectively. The widely used treatment estimation methods with multiple treatments are simultaneous estimation of effects of pairwise treatments. We estimated the effects of the pairwise treatments HDAC versus HDAC+IDA, HDAC versus other drugs, and HDAC+IDA versus other drugs. The results were summarized in Table 7. Pairwise comparisons listed in Table 7 did not present the results of the treatment compared with a placebo (without using any drugs). We compared the effect of one treatment with another treatment. Specifically, we made pairwise comparisons: HDAC vs other drugs, HDAC+IDA vs other drugs, and HDAC+IDA vs HDAC. The average treatment effects (ATE) of these three pairwise treatments: HDAC vs other drugs, HDAC+IDA vs other drugs, and HDAC+IDA vs HDAC using MGANITE, were 0.1001, 0.2311 and 0.1310, respectively. This demonstrated that on the average, the effect of the HDAC+IDA was the largest among the three treatments: HDAC+IDA, HDAC and other drugs, followed by the treatment HDAC. In other words, the treatment HDAC was better than other drugs, in turn, the combination of HDAC and IDA was better than HDAC. It was also noted that

View this table:

Table 7. Multiple treatment effects estimated for AML dataset using nine methods.

The effect of HDAC+IDA vs other drugs – effect of HDAC vs other drugs =0.2311-0.1001 = 0.1310 = effect of HDAC+IDA vs HDAC.

However, using LR, LogR, SVM, Rforest (C) and Rforest (R), we observed that HDAC was the best treatment. This conclusion violated the biological interpretation. We explain the reasons that caused this incorrect conclusion as follows. The traditional methods for treatment estimation are mainly based on the population average of the treatment responses. The number of observed responses and no responses of the individuals treated with other drugs was 66 and 55, respectively. The average response rate of the other drugs was 0.545. The number of observed responses and no responses of the individuals treated with HDAC was 29 and 8, respectively. The average response rate of HDAC was 0.784. The number of observed response and no response of individuals treated with HDAC + IDA was 33 and 21, respectively. The average response rate of HDAC +IDA was 0.611. Therefore, estimators of ATE of treatment HDAC vs other drugs using LR, LogR, SVM, Rforest (C) and Rforest (R) were higher than the estimators of ATE of treatment HDAC + IDA. However, the individuals treated with HDAC+IDA usually did not respond to the treatment HDAC. Therefore, the number of individuals with no response should be adjusted to be 62. After adjustment, the response rate of HDAC was changed to 0.319. Therefore, after adjustment, the ATE of HDAC vs other drugs was smaller than the ATE of HDAC +IDA. Then, the estimators of the pair-wise treatments using MGANITE were consistent with the treatment responses after the data were adjusted. This example showed that these traditional methods that were designed for single treatment effect estimation should be modified when they are applied to multiple treatment effect estimation.

Enrichment analysis to top ranking variables for explanation of treatment effect variation was performed by hypergeometric test via Reactome Pathway Database (RPD) (Jassal et al. 2020) to assess whether the number of identified biomarkers associated with Reactome pathway is over-represented than expected. Original P-value from hypergeometric test was then adjusted by FDR for multiple test correction. We found top ranking biomarkers for explanation of treatment effect variation were enriched in multiple cancer related pathways (Figure 7A), including intrinsic pathway for apoptosis (R-HSA-109606, P-value=2.86×10⁻¹⁴), Signaling by Interleukins (R-HSA-449147, P-value=2.86×10⁻¹⁴), Programmed Cell Death (R-HSA-5357801, P-value=9.7×10⁻¹¹), PIP3 activates AKT signaling (R-HSA-1257604, P-value=2.98×10⁻⁸), RUNX3 regulates WNT signaling (R-HSA-8951430, P-value=1.03×10⁻⁵), RNA Polymerase II Transcription (R-HSA-73857, P-value=9.4×10⁻⁵). In addition, we could find the drug target of idarubicin (TOP2A) and Cytarabine (POLB) form significant protein-protein interaction network (P<1.0×10⁻¹⁶), indicating that the predictive biomarkers worked as the direct interactive proteins of cancer drug targets (Figure 7B).

Figure 7.

Reactome pathway analysis and protein-protein interaction (PPI) network analysis to top ranking biomarkers for explanation of treatment effect variation.

Figure 7A. Enrichment analysis to top 44 ranking biomarkers for explanation of treatment effect variation with Reactome pathway database by hypergeometric test to assess whether the number of identified biomarkers associated with Reactome pathway was over-represented than expected. Original P-value from hypergeometric test was then adjusted by FDR for multiple test correction. Top 15 most significant enriched pathways had been showed.

Figure 7B. PPI network analysis was performed by String 11.0 to show the protein-protein interaction among top ranking biomarkers. We found that these proteins were highly interacted which was consistent with pathway enrichment analysis (PPI enrichment P-value is 1.0e-16).

Discussion

In this paper, we presented the MGANITE coupled with sparse techniques as a framework to estimate the ITEs and select the optimal treatments. We demonstrated that the proposed MGANITE had several remarkable features.

First, the MGANITE extended the GANITE from binary treatment to all types of treatments: binary, categorical and continuous treatments. We showed that the MGANITE had much higher accuracy for estimation of ITE than other state-of-the-art methods.

Second, in-sample and out-of-sample analysis showed that the the K-L divergence between the distributions of ITE in-sample and out-of-samples for MGANITE was much smaller than that for other methods, which implied that MGANITE was more robust than other state--of-the art methods.

Third, unlike many popular methods that are usually used to estimate the average effect of the single treatment, the MGANITE not only can estimate the ITE of single treatment, but also can accurately and jointly estimate the ITE of multiple treatments. We also showed that the results of the joint estimation of multiple treatments suing other classical methods were inconsistent and might violated the biological interpretation.

Fourth, precision oncology is the identification of the right treatment for the right patient. The essential aim is to discover biomarkers that can accurately predict individual treatment effect for each individual. Our results showed that the MGANITE with sparse techniques can identify a set of biomarkers with significant biological features. The following identified biomarkers were such typical examples.

GSK3 is a kinase so adaptable that it has been recruited evolutionarily to phosphorylate over 100 substrates, and can regulate numerous cellular functions (Beurel et al. 2015). GSK3 phosphorylates HDAC3 and promotes its activity, including the neurotoxic activity of HDAC3 (Bardai and D’Mello 2011). GSK3 also phosphorylates HDAC6 to modify its activity and the link between GSK3beta and HDAC6 involved in neurodegenerative disorders (Chen et al. 2010).

Bilirubin is a reddish yellow pigment generated when the normal red blood cells break. Normal levels ranged from 0.2 to 1.2 mg/dL(Davis 2020). In adults, indirect hyperbilirubinemia can be due to overproduction, impaired liver uptake or abnormalities of conjugation (Gondal 2016). For AML patients, enasidenib is an inhibitor of mutant IDH2 proteins used to treat newly diagnosed mutant-IDH2 AML patients. The most common treatment-related adverse events were indirect hyperbilirubinemia (31%), nausea (23%), and fatigue (Steinwascher et al. 2015). Therefore, bilirubin is an important biomarker for monitoring adverse effect in AML patients who receive treatment.

Preclinical studies have discovered that Smac mimetics can directly cause cancer cell death, or make tumor cells become more sensitive to various cytotoxic treatment agents, including conventional chemotherapy, radiotherapy, or new drugs (Gulda 2015). There is synergistic interaction of Smac mimetic and HDAC inhibitors in AML cell lines, and Smac mimetic and HDAC inhibitors can trigger necroptosis when caspase activation is blocked (Meng et al. 2016).

AKT.p308 and Src.p527 are phosphorylated signal transduction proteins. These two proteins were found to have lower expression in M0, M1, M2, but they had higher levels in the other AML French-American-British (FAB) types. The expression of those two proteins, together with 22 other proteins, can be used to define distinct signatures for each FAB type (Kornblau et al. 2009).

PTEN is a tumor suppressor protein. Promising anti-cancer agents, HDAC inhibitors, particularly trichostatin A (TSA), can promote PTEN membrane translocation. Meng et al (Meng et al. 2016) revealed that non-selective HDAC inhibitors, such as TSA or suberoylanilide hydroxamic acid (SAHA), induced PTEN membrane translocation through PTEN acetylation at K163 by inhibiting HDAC67. Similarly, treatment with an HDAC6 inhibitor alone promoted PTEN membrane translocation and correspondingly dephosphorylated AKT. The combination of celecoxib and an HDAC6 inhibitor synergistically increased PTEN membrane translocation and inactivated AKT (Zhang and Gan 2017).

Our results showed that multiple treatments improved efficiency of drugs for curing ANL. This can be biologically explained. HDAC inhibitors have emerged as a potent and promising strategy for the treatment of leukemia via inducing differentiation and apoptosis in tumor cells (Jin et al. 2016). A phase II study with 37 refractory acute myelogenous leukemia (AML) patients showed only minimal activity of Vorinostat(HDACi), and Vorinostat failed to control the leucokyte count among most AML patients (Schaefer et al. 2009). A preclinical study revealed that the combination regimen of chidamide (a novel orally active HDAC inhibitor) and IDA could rapidly diminish tumor burden in patients with refractory or relapsed AML (Li et al. 2017). A Phase II trial of Vorinostat with idarubicin (IDA) and Ara-C for patients with newly diagnosed AML or myelodysplastic syndrome revealed good activity with an overall response rates of 85%. No excess toxicity due to Vorinostat was observed (Garcia-Manero et al. 2012). Taken together, HDACs in combination therapy with IDA or other chemotherapeutic drugs showed encouraging clinical activity in different haematologic maligancies. This explained that the combination of HDAC and IDA was the best treatment.

Although the MGANITE showed remarkable features in ITE estimation and optimal treatment selection, the results in this paper were very preliminary. Training stable GANs was a challenging task. The training process was inherently unstable, resulting in inaccurate estimation of ITEs. In this study, we ignored unobserved confounders, unmeasured variables that affect both patients’ medical prescription and their outcome. Overlooking the presence of unobserved confounders may lead to biased results. The main purpose of this paper is to stimulate discussion about how to use AI as a powerful tool to improve the estimation of ITEs and optimal treatment selection. We hope that our results will greatly increase the confidence in using AI as a driving force to facilitate the development of precision oncology.

Data Availability Statement

Data with total of 256 newly diagnosed acute myeloid leukemia (AML) patients, treated with high dose ara-C (HDAC), Idarubicin (IDA) and HDAC+IDA were downloaded from M. D. Anderson Cancer Center (http://bioinformatics.mdanderson.org/Supplements/Kornblau-AML-RPPA/aml-rppa.xls).

Data Availability

Data with total of 256 newly diagnosed acute myeloid leukemia (AML) patients, treated with high dose ara-C (HDAC), Idarubicin (IDA) and HDAC+IDA were downloaded from M. D. Anderson Cancer Center.

http://bioinformatics.mdanderson.org/Supplements/Kornblau-AML-RPPA/aml-rppa.xls

Authors’ Contributions

Conception and design: M. Xiong and W. Lin

Development of methodology: Q. Ge, M. Xiong, S. Fang

Acquisition of data: X. Huang

Analysis and interpretation of data: Q. Ge, S. Guo, Y. Liu and S. Fang.

Writing, review, and/or revision of the manuscript: M. Xiong, Q, Ge, S, Guo, S. Fang, Y. Liu, W. Lin and X. Huang.

Conflict of interest

We have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Dr. Wei Lin is supported by the National Key R&D Program of China (Grant no. 2018YFC0116600), the National Natural Science Foundation of China (Grant no. 11925103), and by the STCSM (Grant no. 18DZ1201000). The authors thank Sara Barton for editing the manuscript and the Texas Advanced Computing Center for computation support.

References

Alaa, A.M., and van der Schaar, M. 2017. Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2017arXiv170402801A [Accessed April 01, 2017].
↵
Ali, M., and Aittokallio, T. (2019). Machine learning and feature selection for drug response prediction in precision oncology applications. Biophysical reviews 11(1), 31–39. doi: 10.1007/s12551-018-0446-z.
OpenUrl CrossRef PubMed
↵
Bardai, F.H., and D’Mello, S.R. (2011). Selective toxicity by HDAC3 in neurons: regulation by Akt and GSK3beta. J Neurosci 31(5), 1746–1751. doi: 10.1523/JNEUROSCI.5704-10.2011.
OpenUrl Abstract/FREE Full Text
↵
Beurel, E., Grieco, S.F., and Jope, R.S. (2015). Glycogen synthase kinase-3 (GSK3): regulation, actions, and diseases. Pharmacology & therapeutics 148, 114–131. doi: 10.1016/j.pharmthera.2014.11.016.
OpenUrl CrossRef PubMed
↵
Breiman, L. (2001). Random Forests. Mach. Learn. 45(1), 5–32. doi: 10.1023/a:1010933404324.
OpenUrl CrossRef
↵
Chen, R., & Paschalidis, I. (2018). Learning optimal personalized treatment rules using robust regression informed K-NN. arXiv preprint arXiv:1811.06083.
↵
Chen, S., Owens, G.C., Makarenkova, H., and Edelman, D.B. (2010). HDAC6 regulates mitochondrial transport in hippocampal neurons. PLoS One 5(5), e10848. doi: 10.1371/journal.pone.0010848.
OpenUrl CrossRef PubMed
Chen, S., Owens, G.C., Makarenkova, H., and Edelman, D.B. (2010). HDAC6 regulates mitochondrial transport in hippocampal neurons. PLoS One 5(5), e10848. doi: 10.1371/journal.pone.0010848.
OpenUrl CrossRef PubMed
↵
Crump, R.K., Hotz, V.J., Imbens, G.W., and Mitnik, O.A. (2008). Nonparametric Tests for Treatment Effect Heterogeneity. The Review of Economics and Statistics 90(3), 389–405.
OpenUrl CrossRef Web of Science
↵
Davis, C.P. (2020). Bilirubin and Bilirubin Blood Test. MedicineNet. https://www.medicinenet.com/bilirubin_and_bilirubin_blood_test/article.htm.
↵
Diamond, A., and Sekhon, J.S. (2013). Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies. The Review of Economics and Statistics 95(3), 932–945. doi: 10.1162/REST_a_00318.
OpenUrl CrossRef
Ding, P., and Li, F. 2017. Causal Inference: A Missing Data Perspective. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2017arXiv171206170D [Accessed December 01, 2017].
↵
Emmert-Streib, F., and Dehmer, M. (2019). High-Dimensional LASSO-Based Computational Regression Models: Regularization, Shrinkage, and Selection. Machine Learning and Knowledge Extraction 1(1), 359–383. doi: 10.3390/make1010021.
OpenUrl CrossRef
Fulda, S. (2015). Promises and Challenges of Smac Mimetics as Cancer Therapeutics. Clinical Cancer Research 21(22), 5030–5036. doi: 10.1158/1078-0432.Ccr-15-0365.
OpenUrl Abstract/FREE Full Text
↵
Garcia-Manero, G., Tambaro, F.P., Bekele, N.B., Yang, H., Ravandi, F., Jabbour, E., et al. (2012). Phase II trial of vorinostat with idarubicin and cytarabine for patients with newly diagnosed acute myelogenous leukemia or myelodysplastic syndrome. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 30(18), 2204–2210. doi: 10.1200/JCO.2011.38.3265.
OpenUrl Abstract/FREE Full Text
↵
Garson, GD. (1991). Interpreting neural network connection weights. Artif. Intell. Expert, 6, 47–51.
OpenUrl
Gondal, B., and Aronsohn, A. (2016). A Systematic Approach to Patients with Jaundice. Semin Intervent Radiol 33(4), 253–258. doi: 10.1055/s-0036-1592331.
OpenUrl CrossRef
↵
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. 2014. Generative Adversarial Networks. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2014arXiv1406.2661G [Accessed June 01, 2014].
Hansen, B. (2004). Full Matching in an Observational Study of Coaching for the SAT. Journal of the American Statistical Association 99, 609–618. doi: 10.1198/016214504000000647.
OpenUrl CrossRef
Jassal, B., et al. (2020). The reactome pathway knowledgebase. Nucleic Acids Res. 48, 498–503. doi: 10.1093/nar/gkx1132.
OpenUrl CrossRef
↵
Jin, Y., Yao, Y., Chen, L., Zhu, X., Jin, B., Shen, Y., et al. (2016). Depletion of γ-catenin by Histone Deacetylase Inhibition Confers Elimination of CML Stem Cells in Combination with Imatinib. Theranostics 6(11), 1947–1962. doi: 10.7150/thno.16139.
OpenUrl CrossRef
↵
Johansson, F.D., Shalit, U., and Sontag, D. 2016. Learning Representations for Counterfactual Inference. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2016arXiv160503661J [Accessed May 01, 2016].
↵
Kennedy, E.H., Ma, Z., McHugh, M.D., and Small, D.S. (2017). Nonparametric methods for doubly robust estimation of continuous treatment effects. Journal of the Royal Statistical Society. Series B, Statistical methodology 79(4), 1229–1245. doi: 10.1111/rssb.12212.
OpenUrl CrossRef PubMed
↵
Kornblau, S.M., Tibes, R., Qiu, Y.H., Chen, W., Kantarjian, H.M., Andreeff, M., et al. (2009). Functional proteomic profiling of AML predicts response and survival. Blood 113(1), 154–164. doi: 10.1182/blood-2007-10-119438.
OpenUrl Abstract/FREE Full Text
↵
Lengerich, B., Aragam, B., and Xing, E.P. 2019. Learning Sample-Specific Models with Low-Rank Personalized Regression. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2019arXiv191006939L [Accessed October 01, 2019].
↵
Li, Y., Wang, Y., Zhou, Y., Li, J., Chen, K., Zhang, L., et al. (2017). Cooperative effect of chidamide and chemotherapeutic drugs induce apoptosis by DNA damage accumulation and repair defects in acute myeloid leukemia stem and progenitor cells. Clinical epigenetics 9, 83–83. doi: 10.1186/s13148-017-0377-8.
OpenUrl CrossRef
↵
Liu, J., Ma, Y., and Wang, L. (2018). An alternative robust estimator of average treatment effect in causal inference. Biometrics 74(3), 910–923. doi: 10.1111/biom.12859.
OpenUrl CrossRef
Luo, W., and Zhu, Y. (2019). Matching Using Sufficient Dimension Reduction for Causal Inference. Journal of Business & Economic Statistics, 1–13. doi: 10.1080/07350015.2019.1609974.
OpenUrl CrossRef
↵
Makar, M., Swaminathan, A., and Kiciman, E. (2019). A Distillation Approach to Data Efficient Individual Treatment Effect Estimation. Proceedings of the AAAI Conference on Artificial Intelligence 33, 4544–4551. doi: 10.1609/aaai.v33i01.33014544.
OpenUrl CrossRef
↵
Meng, Z., Jia, L.F., and Gan, Y.H. (2016). PTEN activation through K163 acetylation by inhibiting HDAC6 contributes to tumour inhibition. Oncogene 35(18), 2333–2344. doi: 10.1038/onc.2015.293.
OpenUrl CrossRef
↵
Mirza, M., and Osindero, S. 2014. Conditional Generative Adversarial Nets. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2014arXiv1411.1784M [Accessed November 01, 2014].
Pollyea, D.A., Tallman, M.S., de Botton, S., Kantarjian, H.M., Collins, R., Stein, A.S., et al. (2019). Enasidenib, an inhibitor of mutant IDH2 proteins, induces durable remissions in older patients with newly diagnosed acute myeloid leukemia. Leukemia. doi: 10.1038/s41375-019-0472-2.
OpenUrl CrossRef
↵
Ray, K., and Szabo, B. 2019. Debiased Bayesian inference for average treatment effects. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2019arXiv190912078R [Accessed September 01, 2019].
↵
Rosenbaum, P.R., and Rubin, D.B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55. doi: 10.1093/biomet/70.1.41.
OpenUrl CrossRef Web of Science
↵
Rubin, D.B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66(5), 688. doi: 10.1037/h0037350.
OpenUrl CrossRef
↵
Schaefer, E.W., Loaiza-Bonilla, A., Juckett, M., DiPersio, J.F., Roy, V., Slack, J., et al. (2009). A phase 2 study of vorinostat in acute myeloid leukemia. Haematologica 94(10), 1375–1382. doi: 10.3324/haematol.2009.009217.
OpenUrl Abstract/FREE Full Text
↵
Seyhan, A.A., and Carini, C. (2019). Are innovation and new technologies in precision medicine paving a new era in patients centric care? Journal of Translational Medicine 17(1), 114. doi: 10.1186/s12967-019-1864-9.
OpenUrl CrossRef
↵
Shalit, U., Johansson, F.D., and Sontag, D. 2016. Estimating individual treatment effect: generalization bounds and algorithms. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2016arXiv160603976S [Accessed June 01, 2016].
↵
Shin, S.H., Bode, A.M., and Dong, Z. (2017). Addressing the challenges of applying precision oncology. NPJ precision oncology 1(1), 28–28. doi: 10.1038/s41698-017-0032-z.
OpenUrl CrossRef
↵
Siu, C. (2017). Day33: Garson’s algorithm. March 29, 2017. https://csiu.github.io/blog/update/2017/03/29/day33.html.
↵
Steinwascher, S., Nugues, A.L., Schoeneberger, H., and Fulda, S. (2015). Identification of a novel synergistic induction of cell death by Smac mimetic and HDAC inhibitors in acute myeloid leukemia cells. Cancer Lett 366(1), 32–43. doi: 10.1016/j.canlet.2015.05.020.
OpenUrl CrossRef PubMed
↵
Subbiah, V., and Kurzrock, R. (2018). Challenging Standard-of-Care Paradigms in the Precision Oncology Era. Trends in cancer 4(2), 101–109. doi: 10.1016/j.trecan.2017.12.004.
OpenUrl CrossRef
Szklarczyk, D., et al. (2019). STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, 607–613, DOI: 10.1093/nar/gky1131.
OpenUrl CrossRef
↵
Wager, S., and Athey, S. 2015. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2015arXiv151004342W [Accessed October 01, 2015].
↵
Yoon, J., Jordon, J., and van der Schaar, M. 2018a. GAIN: Missing Data Imputation using Generative Adversarial Nets. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2018arXiv180602920Y [Accessed June 01, 2018].
Yoon, J., Jordon, J., and van der Schaar, M. (Year). “GANITE: Estimation of Individualized Treatment Effects using Generative Adversarial Nets. “, in: ICLR).
↵
Zhang, G., and Gan, Y.H. (2017). Synergistic antitumor effects of the combined treatment with an HDAC6 inhibitor and a COX-2 inhibitor through activation of PTEN. Oncol Rep 38(5), 2657–2666. doi: 10.3892/or.2017.5981.
OpenUrl CrossRef
↵
Zhang Z, Beck MW, Winkler DA, Huang B, Sibanda W, et al. (2018). Opening the black box of neural networks: methods for interpreting neural network models in clinical applications. Ann Transl Med. 6(11):216. doi: 10.21037/atm.2018.05.32.
OpenUrl CrossRef PubMed

View the discussion thread.

Posted September 29, 2020.

Download PDF

Data/Code

Citation Tools

Subject Area

Oncology

Subject Areas

All Articles

Addiction Medicine (354)
Allergy and Immunology (680)
Anesthesia (182)
Cardiovascular Medicine (2692)
Dentistry and Oral Medicine (318)
Dermatology (227)
Emergency Medicine (404)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (956)
Epidemiology (12326)
Forensic Medicine (10)
Gastroenterology (771)
Genetic and Genomic Medicine (4164)
Geriatric Medicine (392)
Health Economics (688)
Health Informatics (2702)
Health Policy (1010)
Health Systems and Quality Improvement (1004)
Hematology (366)
HIV/AIDS (864)
Infectious Diseases (except HIV/AIDS) (13771)
Intensive Care and Critical Care Medicine (805)
Medical Education (401)
Medical Ethics (110)
Nephrology (446)
Neurology (3947)
Nursing (214)
Nutrition (587)
Obstetrics and Gynecology (751)
Occupational and Environmental Health (704)
Oncology (2076)
Ophthalmology (596)
Orthopedics (244)
Otolaryngology (308)
Pain Medicine (253)
Palliative Medicine (75)
Pathology (474)
Pediatrics (1133)
Pharmacology and Therapeutics (473)
Primary Care Research (462)
Psychiatry and Clinical Psychology (3495)
Public and Global Health (6590)
Radiology and Imaging (1428)
Rehabilitation Medicine and Physical Therapy (835)
Respiratory Medicine (877)
Rheumatology (415)
Sexual and Reproductive Health (414)
Sports Medicine (345)
Surgery (455)
Toxicology (55)
Transplantation (191)
Urology (170)

[1] Alaa, A.M., and van der Schaar, M. 2017. Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2017arXiv170402801A [Accessed April 01, 2017].

[2] ↵
Ali, M., and Aittokallio, T. (2019). Machine learning and feature selection for drug response prediction in precision oncology applications. Biophysical reviews 11(1), 31–39. doi: 10.1007/s12551-018-0446-z.
OpenUrl CrossRef PubMed

[3] ↵
Bardai, F.H., and D’Mello, S.R. (2011). Selective toxicity by HDAC3 in neurons: regulation by Akt and GSK3beta. J Neurosci 31(5), 1746–1751. doi: 10.1523/JNEUROSCI.5704-10.2011.
OpenUrl Abstract/FREE Full Text

[4] ↵
Beurel, E., Grieco, S.F., and Jope, R.S. (2015). Glycogen synthase kinase-3 (GSK3): regulation, actions, and diseases. Pharmacology & therapeutics 148, 114–131. doi: 10.1016/j.pharmthera.2014.11.016.
OpenUrl CrossRef PubMed

[5] ↵
Breiman, L. (2001). Random Forests. Mach. Learn. 45(1), 5–32. doi: 10.1023/a:1010933404324.
OpenUrl CrossRef

[6] ↵
Chen, R., & Paschalidis, I. (2018). Learning optimal personalized treatment rules using robust regression informed K-NN. arXiv preprint arXiv:1811.06083.

[7] ↵
Chen, S., Owens, G.C., Makarenkova, H., and Edelman, D.B. (2010). HDAC6 regulates mitochondrial transport in hippocampal neurons. PLoS One 5(5), e10848. doi: 10.1371/journal.pone.0010848.
OpenUrl CrossRef PubMed

[8] Chen, S., Owens, G.C., Makarenkova, H., and Edelman, D.B. (2010). HDAC6 regulates mitochondrial transport in hippocampal neurons. PLoS One 5(5), e10848. doi: 10.1371/journal.pone.0010848.
OpenUrl CrossRef PubMed

[9] ↵
Crump, R.K., Hotz, V.J., Imbens, G.W., and Mitnik, O.A. (2008). Nonparametric Tests for Treatment Effect Heterogeneity. The Review of Economics and Statistics 90(3), 389–405.
OpenUrl CrossRef Web of Science

[10] ↵
Davis, C.P. (2020). Bilirubin and Bilirubin Blood Test. MedicineNet. https://www.medicinenet.com/bilirubin_and_bilirubin_blood_test/article.htm.

[11] ↵
Diamond, A., and Sekhon, J.S. (2013). Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies. The Review of Economics and Statistics 95(3), 932–945. doi: 10.1162/REST_a_00318.
OpenUrl CrossRef

[12] Ding, P., and Li, F. 2017. Causal Inference: A Missing Data Perspective. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2017arXiv171206170D [Accessed December 01, 2017].

[13] ↵
Emmert-Streib, F., and Dehmer, M. (2019). High-Dimensional LASSO-Based Computational Regression Models: Regularization, Shrinkage, and Selection. Machine Learning and Knowledge Extraction 1(1), 359–383. doi: 10.3390/make1010021.
OpenUrl CrossRef

[14] Fulda, S. (2015). Promises and Challenges of Smac Mimetics as Cancer Therapeutics. Clinical Cancer Research 21(22), 5030–5036. doi: 10.1158/1078-0432.Ccr-15-0365.
OpenUrl Abstract/FREE Full Text

[15] ↵
Garcia-Manero, G., Tambaro, F.P., Bekele, N.B., Yang, H., Ravandi, F., Jabbour, E., et al. (2012). Phase II trial of vorinostat with idarubicin and cytarabine for patients with newly diagnosed acute myelogenous leukemia or myelodysplastic syndrome. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 30(18), 2204–2210. doi: 10.1200/JCO.2011.38.3265.
OpenUrl Abstract/FREE Full Text

[16] ↵
Garson, GD. (1991). Interpreting neural network connection weights. Artif. Intell. Expert, 6, 47–51.
OpenUrl

[17] Gondal, B., and Aronsohn, A. (2016). A Systematic Approach to Patients with Jaundice. Semin Intervent Radiol 33(4), 253–258. doi: 10.1055/s-0036-1592331.
OpenUrl CrossRef

[18] ↵
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. 2014. Generative Adversarial Networks. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2014arXiv1406.2661G [Accessed June 01, 2014].

[19] Hansen, B. (2004). Full Matching in an Observational Study of Coaching for the SAT. Journal of the American Statistical Association 99, 609–618. doi: 10.1198/016214504000000647.
OpenUrl CrossRef

[20] Jassal, B., et al. (2020). The reactome pathway knowledgebase. Nucleic Acids Res. 48, 498–503. doi: 10.1093/nar/gkx1132.
OpenUrl CrossRef

[21] ↵
Jin, Y., Yao, Y., Chen, L., Zhu, X., Jin, B., Shen, Y., et al. (2016). Depletion of γ-catenin by Histone Deacetylase Inhibition Confers Elimination of CML Stem Cells in Combination with Imatinib. Theranostics 6(11), 1947–1962. doi: 10.7150/thno.16139.
OpenUrl CrossRef

[22] ↵
Johansson, F.D., Shalit, U., and Sontag, D. 2016. Learning Representations for Counterfactual Inference. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2016arXiv160503661J [Accessed May 01, 2016].

[23] ↵
Kennedy, E.H., Ma, Z., McHugh, M.D., and Small, D.S. (2017). Nonparametric methods for doubly robust estimation of continuous treatment effects. Journal of the Royal Statistical Society. Series B, Statistical methodology 79(4), 1229–1245. doi: 10.1111/rssb.12212.
OpenUrl CrossRef PubMed

[24] ↵
Kornblau, S.M., Tibes, R., Qiu, Y.H., Chen, W., Kantarjian, H.M., Andreeff, M., et al. (2009). Functional proteomic profiling of AML predicts response and survival. Blood 113(1), 154–164. doi: 10.1182/blood-2007-10-119438.
OpenUrl Abstract/FREE Full Text

[25] ↵
Lengerich, B., Aragam, B., and Xing, E.P. 2019. Learning Sample-Specific Models with Low-Rank Personalized Regression. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2019arXiv191006939L [Accessed October 01, 2019].

[26] ↵
Li, Y., Wang, Y., Zhou, Y., Li, J., Chen, K., Zhang, L., et al. (2017). Cooperative effect of chidamide and chemotherapeutic drugs induce apoptosis by DNA damage accumulation and repair defects in acute myeloid leukemia stem and progenitor cells. Clinical epigenetics 9, 83–83. doi: 10.1186/s13148-017-0377-8.
OpenUrl CrossRef

[27] ↵
Liu, J., Ma, Y., and Wang, L. (2018). An alternative robust estimator of average treatment effect in causal inference. Biometrics 74(3), 910–923. doi: 10.1111/biom.12859.
OpenUrl CrossRef

[28] Luo, W., and Zhu, Y. (2019). Matching Using Sufficient Dimension Reduction for Causal Inference. Journal of Business & Economic Statistics, 1–13. doi: 10.1080/07350015.2019.1609974.
OpenUrl CrossRef

[29] ↵
Makar, M., Swaminathan, A., and Kiciman, E. (2019). A Distillation Approach to Data Efficient Individual Treatment Effect Estimation. Proceedings of the AAAI Conference on Artificial Intelligence 33, 4544–4551. doi: 10.1609/aaai.v33i01.33014544.
OpenUrl CrossRef

[30] ↵
Meng, Z., Jia, L.F., and Gan, Y.H. (2016). PTEN activation through K163 acetylation by inhibiting HDAC6 contributes to tumour inhibition. Oncogene 35(18), 2333–2344. doi: 10.1038/onc.2015.293.
OpenUrl CrossRef

[31] ↵
Mirza, M., and Osindero, S. 2014. Conditional Generative Adversarial Nets. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2014arXiv1411.1784M [Accessed November 01, 2014].

[32] Pollyea, D.A., Tallman, M.S., de Botton, S., Kantarjian, H.M., Collins, R., Stein, A.S., et al. (2019). Enasidenib, an inhibitor of mutant IDH2 proteins, induces durable remissions in older patients with newly diagnosed acute myeloid leukemia. Leukemia. doi: 10.1038/s41375-019-0472-2.
OpenUrl CrossRef

[33] ↵
Ray, K., and Szabo, B. 2019. Debiased Bayesian inference for average treatment effects. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2019arXiv190912078R [Accessed September 01, 2019].

[34] ↵
Rosenbaum, P.R., and Rubin, D.B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55. doi: 10.1093/biomet/70.1.41.
OpenUrl CrossRef Web of Science

[35] ↵
Rubin, D.B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66(5), 688. doi: 10.1037/h0037350.
OpenUrl CrossRef

[36] ↵
Schaefer, E.W., Loaiza-Bonilla, A., Juckett, M., DiPersio, J.F., Roy, V., Slack, J., et al. (2009). A phase 2 study of vorinostat in acute myeloid leukemia. Haematologica 94(10), 1375–1382. doi: 10.3324/haematol.2009.009217.
OpenUrl Abstract/FREE Full Text

[37] ↵
Seyhan, A.A., and Carini, C. (2019). Are innovation and new technologies in precision medicine paving a new era in patients centric care? Journal of Translational Medicine 17(1), 114. doi: 10.1186/s12967-019-1864-9.
OpenUrl CrossRef

[38] ↵
Shalit, U., Johansson, F.D., and Sontag, D. 2016. Estimating individual treatment effect: generalization bounds and algorithms. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2016arXiv160603976S [Accessed June 01, 2016].

[39] ↵
Shin, S.H., Bode, A.M., and Dong, Z. (2017). Addressing the challenges of applying precision oncology. NPJ precision oncology 1(1), 28–28. doi: 10.1038/s41698-017-0032-z.
OpenUrl CrossRef

[40] ↵
Siu, C. (2017). Day33: Garson’s algorithm. March 29, 2017. https://csiu.github.io/blog/update/2017/03/29/day33.html.

[41] ↵
Steinwascher, S., Nugues, A.L., Schoeneberger, H., and Fulda, S. (2015). Identification of a novel synergistic induction of cell death by Smac mimetic and HDAC inhibitors in acute myeloid leukemia cells. Cancer Lett 366(1), 32–43. doi: 10.1016/j.canlet.2015.05.020.
OpenUrl CrossRef PubMed

[42] ↵
Subbiah, V., and Kurzrock, R. (2018). Challenging Standard-of-Care Paradigms in the Precision Oncology Era. Trends in cancer 4(2), 101–109. doi: 10.1016/j.trecan.2017.12.004.
OpenUrl CrossRef

[43] Szklarczyk, D., et al. (2019). STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, 607–613, DOI: 10.1093/nar/gky1131.
OpenUrl CrossRef

[44] ↵
Wager, S., and Athey, S. 2015. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2015arXiv151004342W [Accessed October 01, 2015].

[45] ↵
Yoon, J., Jordon, J., and van der Schaar, M. 2018a. GAIN: Missing Data Imputation using Generative Adversarial Nets. arXiv e-prints [Online]. Available: https://ui.adsabs.harvard.edu/abs/2018arXiv180602920Y [Accessed June 01, 2018].

[46] Yoon, J., Jordon, J., and van der Schaar, M. (Year). “GANITE: Estimation of Individualized Treatment Effects using Generative Adversarial Nets. “, in: ICLR).

[47] ↵
Zhang, G., and Gan, Y.H. (2017). Synergistic antitumor effects of the combined treatment with an HDAC6 inhibitor and a COX-2 inhibitor through activation of PTEN. Oncol Rep 38(5), 2657–2666. doi: 10.3892/or.2017.5981.
OpenUrl CrossRef

[48] ↵
Zhang Z, Beck MW, Winkler DA, Huang B, Sibanda W, et al. (2018). Opening the black box of neural networks: methods for interpreting neural network models in clinical applications. Ann Transl Med. 6(11):216. doi: 10.21037/atm.2018.05.32.
OpenUrl CrossRef PubMed

Conditional Generative Adversarial Networks for Individualized Treatment Effect Estimation and Treatment Selection

Abstract

Introduction

Materials and Methods

Potential outcome framework for estimation of treatment effects

Conditional generative adversarial networks as a general framework for estimation of individualized treatment effects

Architecture of conditional generative adversarial networks (CGANs) for generating potential outcomes

Imputation block

Loss function

ITE block

ITE generator

ITE discriminator

Loss function

Sparse techniques for biomarker identification

Biomarker identification for optimal treatment selection

Data collection

Results

Simulations

Real Data Analysis

Discussion

Data Availability Statement

Data Availability

Authors’ Contributions

Conflict of interest

Acknowledgements

References

Citation Manager Formats

Subject Area