Abstract
T-acute lymphoblastic leukemia (T-ALL) is an aggressive hematological malignancy associated with poor outcome. To unravel gene-expression profile of immunophenotypic subtypes of T-ALL, we did transcriptome analysis in 35 cases. We also analyzed the prognostic relevance of 23 targets: protein-coding genes, histone modifiers and long non-coding RNA (lncRNA) expression profile, identified on RNA sequencing, on an independent cohort of 99 T-ALL cases. We found high expression of MEF2C to be associated with prednisolone resistance (p=0.048) and CD34 expression (p=0.012). BAALC expression was associated with expression of CD34 (p=0.032) and myeloid markers (p=0.021). XIST and KDM6a expression levels were higher in females (p=0.047 and 0.011, respectively). Post-induction minimal residual disease (MRD) positivity was associated with high lncRNA PCAT18 (p=0.04), HHEX (p=0.027) and MEF2C (p=0.007). Early thymic precursor (ETP-ALL) immunophenotype was associated with high expression of MEF2C (p=0.003), BAALC (p=0.003), LYL1 (p=0.01), LYN (p=0.01), XIST (p=0.02) and low levels of ST20 (p=0.007) and EML4 (p=0.03). On survival analysis, MEF2C high expression emerged as significant predictor of 3-year event free survival (EFS) (low 71.78±6.58% vs high 36.57±10.74%, HR 3.5, p=0.0003) and overall survival (OS) (low 94.77±2.96% vs high 78.75±8.45%, HR 4.88, p=0.016) in our patients. LncRNA MALAT1 low expression also emerged as predictor inferior OS (low 76.02±10.48 vs high 94.11±3.31, HR 0.22, p=0.027).
Introduction
T-lineage acute lymphoblastic leukemia (T-ALL) accounts for ∼25% of adults and 10%–15% of pediatric ALL cases1. In the past several years immunological, cytogenetic and genomic approaches have been extensively utilized for a better understanding of the genomic organization of functional elements and their alterations in leukemia. This, combined with techniques of cell biology, applied to T-ALL has led to major advances in our understanding of this disease and has allowed the development of novel therapeutic approaches.
The malignant transformation that culminates in T-ALL is a multi-step process, in which both genetic and epigenetic alterations of key cellular pathways coordinate to produce the T-ALL phenotype (Peirs et al., 2015; Girardi et al., 2017). T-ALL pathogenesis is characterized by an abnormality with highly variable gene expression profiles. The genetic alterations and transcriptomic signatures are often used for classification of T-ALL patients into molecular subtypes exhibiting distinct clinical outcomes which are partly determined by their underlying oncogenic signalling pathways, such as IL7R/JAK/STAT, PI3K/AKT5-7 and RAS/MEK/ERK8 signaling. Aberrant expression of a diverse group of transcription factors such as LYL1, LMO1, LMO2, TAL1, TLX1, TLX3, HOXA, NKX2.1, NKX2.2, NKX2.5, MYC, MYB and SPI1 has been reported in distinct T-ALL subtypes3,4. Some of the less understood facets in T-ALL include epigenetic deregulation, ribosomal dysfunction, and altered expression of oncogenic miRNAs and long non-coding RNA. Apart from the genetic alterations, epigenetic alterations such as DNA methylation, histone modifications, chromosomal remodelling are chromosomal topology have also been shown to be involved in T-ALL pathogenesis and modulation of clinical response (Van der Meulen et al., 2014; Kloetgen et al., 2020).
More recently, a greater understanding of molecular pathophysiology and immunophenotyping methods led to the refinement of classification of T-ALL but the clinical relevance of these subtypes remains controversial1,9-11. The only subtype of T-ALL which has got a place in the 2016 revision of WHO, is early thymic precursor ALL (ETP-ALL)11,12. In this study, we analyzed the relevance of T-ALL immunophenotyping and the expression pattern of protein-coding genes, epigenetic modifiers and long non-coding RNAs in T-ALL classification and also determined their association with patient prognosis.
Materials and Methods
A total of 134 T-ALL cases, newly diagnosed based on morphology, cytochemistry and immunophenotyping, were enrolled in this study. These cases were immunophenotypically classified into immature (pro T- and pre T-), cortical and mature T-ALL based on the EGIL criteria13. ETP-ALL was recognized based on the previously defined criteria12. The Patients were divided into 2 cohorts: discovery (n=35) and validation cohort (n=99). Total RNA sequencing was done in the discovery cohort. The clinical and prognostic relevance of transcriptomic features identified in the discovery cohort were tested in the validation cohort. All patients or guardians gave informed consent for blood/bone marrow collection and biological analyses, in agreement with the Declaration of Helsinki. The study was approved by the Institutional Ethical Committee, All India Institute of Medical Sciences, New Delhi, India. Transcriptome data of pooled RNA extracted from 5 normal human thymus samples were used as a non-cancer control (kind courtesy Dr Jan Cools, Belgium).
Discovery cohort
In the discovery cohort (n=35), there were 13 immature (including 5 ETP-ALL), 17 cortical and 5 mature T-ALL cases with a median age of 14 years (range,1 to55 year). There were 26 males (19 pediatrics, 7adults) and 9 females (7 pediatrics, 2 adults).
RNA sequencing and analysis of the discovery cohort
RNA was extracted from freshly isolated patient blood samples by the TRIzol method (Thermo Fisher Scientific, Massachusetts, USA). Transcriptome data of pooled total RNAs from 5 normal human thymus was used as a control (kindly gifted by Dr Jan Cools, Belgium) for analysis of epigenetic modifiers and miRNA. Paired-end whole transcriptome sequencing was performed on the Illumina HiSeq2000 platform using Truseq RNA sample preparation kit (Illumina, San Diego, California, U.S.). Sequence reads were processed to identify the expression profile of protein-coding and non-coding RNAs by supervised and unsupervised approaches (supplementary methods). To investigate the role of histone modifiers in T-ALL development, transcript abundance of epigenetic modifiers was measured across mature, cortical, and immature subtypes of T-ALL (supplementary methods). To know the novel lncRNA transcript, a computational pipeline combining Open Reading Frame (ORF) prediction coupled with coding potential calculator (CPC algorithm) to annotate the protein-coding potential of transcripts, was used (supplementary methods). Biomart was utilized to identify the annotated miRNA and transcripts.
Transcripts with ≥ 2-FC were selected as differentially expressed putative coding and non-coding transcripts in the given subgroup and heatmap were plotted using MeV. Gene Ontology enrichment and KEGG pathway analyses were used to explore the potential biological processes, cellular components, and molecular functions of differentially expressed genes (DEG). Gene functional analysis was done on DAVID 6.8 database14,15. To predict the functions of highly overexpressed validated genes for the protein-coding genes, histone modifiers, lncRNA in all three subgroups of T-ALL, a correlation network analysis was performed using the Rcorr package in R based on Spearman correlation and visualized using Cytoscape16. Detailed methods are described in the supplementary methods.
Validation cohort
Candidate genes found differentially expressed in the discovery cohort were further validated through real-time PCR in an independent cohort of 99 {46 immature (including 15 ETP-ALL), 43 cortical, 10 mature} T-ALL cases including 70 pediatrics (61 males, 9 females), 29 adults (25 males, 4 females). The age of the patients ranged from 3 to 65 years with a median age of 12 years. The expression levels of 23 targets: BAALC, HHEX, MEF2C, FAT1, LYL1, LMO2, LYN, TAL1, DOT1L, XIST, PCAT18, PCAT14, LNC202, LNC461, LNC648, MEF2C-AS1, ST20, RAG, EP300, EML4, EZH2, MALAT1 and KDM6A were compared with patient characteristics and survival. (Supplementary Methods). The relative quantitation method by real-time PCR was used to calculate the fold change in gene expression relative to housekeeping gene ABL1. Consistently selected genes from discovery and validation cohort followed the similar distribution of expression. Detailed experimental plan used to study the discovery and validation cohorts is mentioned in Supplementary Methods.
Treatment
In the discovery cohort, 14 patients were treated with ICICLE protocol, 3 with Berlin-Frankfurt-Munster-90 (BFM-90) protocol, 1 with INCTR, 1 with Holzer’s protocol and 3 with hyper CVAD. 13 patients did not receive treatment/succumbed to the disease before taking treatment.
In the validation cohort, seventy-one patients were treated with ICICLE protocol, 15 with Berlin-Frankfurt-Munster-90 (BFM-90) protocol and 3 with Hyper CVAD. Ten patients did not take any treatment. Two patients died during induction chemotherapy. Complete remission was defined as bone marrow blasts <5% with a recovery of blood counts at the end of 4 weeks of induction chemotherapy. Any failure to do so (including the persistence of leukemic blasts in an extramedullary site), or death during induction therapy due to any cause, was considered as induction failure. Patients who failed with one protocol were re-induced with another.
Statistical analyses
Fisher’s exact test for categorical data and the nonparametric Mann-Whitney U for continuous variables were used to compare baseline clinical variables across groups in the validation cohort. A P-value ≤0.05 (two-sided) was considered significant. Event-free survival (EFS) was defined as the time from diagnosis to the date of the last follow-up in complete remission or the first event (i.e., induction failure, relapse, secondary neoplasm, or death from any cause). Failure to achieve remission due to non-response was considered an event at time zero. Survival was defined as the time from diagnosis to death or the last follow-up. Patients lost to follow-up were censored at the last contact. The last follow up was carried out on April 2020. The Kaplan-Meier method was used to estimate survival rates, with the differences compared using a two-sided log-rank test. Univariate and multivariate Cox proportional hazard models were constructed for EFS and OS. Covariates included sex, WBC (<50×109/L, ≥50×109/L), age (<12 years vs. ≥12 years), gene expression, immunophenotype and response to prednisolone treatment and presence of minimal residual disease after the end of induction chemotherapy. Patients with high and low expression were delineated using maximally selected rank statistics as implemented in the maxstat R package (http://cran.r-project.org/web/packages/maxstat/index.html) for each target (BAALC, HHEX, MEF2C, FAT1, LYL1, LMO2, LYN, TAL1, DOT1L, XIST, PCAT18, PCAT14, LINC00202, LINC00461, LINC00648, MEF2C-AS1, ST20, RAG1, EP300, EML4, EZH2, MALAT1 and KDM6A). Statistical analyzed were performed using the SPSS statistical software package (version 20.0), STATA software (version 11) and R statistical software (version).
Data Availability
The RNA-Seq raw data that support the findings of this study are available from the corresponding author upon reasonable request.
RESULTS
A. Discovery cohort
Distinct profiling of differentially expressed genes expression among T-ALL immunophenotype
Comparison of gene expression profile among the three different subtypes defined by immunophenotyping revealed a total of 2,318 genes to be differentially expressed (Figure 2a) (supplementary1-4). In immature T-ALL subtype, transcription factors which control early hematopoiesis, such as MEF2C, TP63, HHEX, RUNX2, HOXA10, HOXA9, RUNX1T1 and ZBTB16; homeobox gene like HOPX; LYL1, LMO2 and LYN (a tyrosine kinase coding gene) were highly expressed. Interestingly, LMO2 was previously reported to be expressed under the presence of LYL1 gene only but in our data LMO2 was found upregulated in the absence of significant overexpression of LYL1, a gene reported to be critical for oncogenic functions of LMO2. BAALC and MN1 genes, previously reported in AML17, were overexpressed. BAALC associated genes like IGFBP7 and PROM1 (CD133), which are known to confer chemoresistance, were also overexpressed18. Also, MAML3, NT5E (CD73) and ARID5B had significant overexpression in immature T-ALL. Moreover, genes not previously described in T-ALL like PLD4 and TP63, were overexpressed. In cortical T-ALL, the cortical thymocytes - defining CD1A gene was overexpressed. Homeobox domain genes, NKX2-1, TLX1, TLX3, involved in T-cell development were overexpressed. Also, FAT1, FAT3, RAG1, EREG, CD1C, AKAP-2, IL-4, PRTG, TCL-6, ZP1 and TRAV genes were overexpressed. Comparison of CD1a+/sCD3+ and CD1a+/sCD3-groups, revealed that TLX1, in contrast to reported literature (Ferrando 2002), was overexpressed in the former group whereas TLX3 was overexpressed in the latter. In mature T-ALL, APC2, BCL3, CCR4, CDKN2A, EML4, HIST1H4G, HIST2H2B, NCOR2, ST20 and TRAV22 genes were overexpressed (figure 2a). TAL1 expression was seen in T-ALL cases with mature immunophenotype. A complete list of DEG is shown in the supplementary file.
Unsupervised immunophenotype analysis segregates mixed T-ALL gene expression profile into three distinct clusters
Principal component analysis (PCA) was performed to determine variabilities in gene expression profiles (GEPs) in T-ALL concerning normal thymus tissue (kind courtesy, Dr Jan Cools, Belgium). We observed that these GEPs could be broadly separated in 3 major clusters (figure 2b), in which Cluster 1, Cluster 2 and Cluster 3 comprised 5, 9 and 19 cases, respectively. Three samples including one normal thymus were not present in any cluster. In major cluster 1, all 9 samples had immature T-ALL immunophenotype and further encompassed 2 sub-clusters consisting of 4 samples each while one remaining sample did not fall in any of the two sub-clusters. In 1st sub-cluster, 3 samples had ETP-ALL CD5- immunophenotype and one sample were CD5+ near-ETP-ALL (pre-T-ALL). In 2nd sub-cluster, three samples were near-ETP-ALL (pre/pro-T-ALL) and one sample was ETP-ALL. Major cluster 2 consisted of 5 samples, all were cortical T-ALL. In major cluster 3, out of 19 samples, 3 were immature, 4 mature and the remaining were cortical T-ALL. Three samples that did not fall into any of the major clusters consisted of normal thymus, immature and mature T-ALL, respectively. Interestingly, principal component analysis placed the mixed T-ALL cases into three distinct categories of T-ALL subclasses.
mRNA expression profiling of epigenetic modifiers in T-ALL
Analysis of genes which are primarily involved in the epigenetic regulation revealed overexpression of SETD2, ATM, ASHIL, KDM6A, PHF6, SUZ12 and HDAC4 in all subtypes of T-ALL concerning normal thymic tissue. Furthermore, In immature T-ALL, HDAC9 and SMYD3 were overexpressed while EZH2 was under-expressed. HDAC10 was under-expressed in cortical T-ALL. In mature T-ALL, EP300, PKN1, EML4, DOT1L were overexpressed while HDAC7 was under-expressed in immature and cortical T-ALL. (figure 3b)
Functional gene annotation and pathway analysis
To determine the biological role of DEG, which were observed during comparison of GEP in different subtypes, we performed gene ontology analysis. These DEG were strongly involved in the various biological processes including pathways involved in the inflammatory response, immune response, T cell co-stimulation, positive regulation of interferon-gamma production, signal transduction, cell-cell signaling, cell adhesion and migration. Their involvement may lead to abnormal function of various cellular components comprising plasma membrane, cell surface, extracellular space, an integral component of the plasma membrane, extracellular region, transcription factor complex, an integral component of the membrane and secretory granules. A complete list of biological processes and cell components are mentioned in figure 4a,b. We further tried to disintegrate the molecular pathways by KEGG in which the immunophenotype associated DEG could be involved. This analysis demonstrated enrichment of cellular pathways such as hematopoietic cell lineage, allograft rejection, transcriptional dysregulation in cancer, toll-like receptor signaling pathway, graft-versus-host disease, cytokine-cytokine receptor interaction, cell adhesion molecules (CAMs). Detailed pathways information showed in figure 4c.
Co-expression network analysis
Co-expression networks were constructed for differentially expressed genes between the immature, cortical and mature subgroups (with correlation score≥0.9). The genes included in the analysis were highly expressed in our RNA seq data of T-ALL patients. Table 1 shows the relation among the protein-coding and non-coding genes. Coexpression networks of BAALC, MEF2C, lncRNA and epigenetic modifiers are described in supplimentry figures 1a-b, 2a-b respectively.
Long non-coding RNA (lncRNA)
RNA-seq analysis post annotation and differential expression analysis resulted in 2,243 lncRNAs which were expressed in the T-ALL samples
A total of 223 lncRNAs were filtered based on the criteria of >2FPKM scores. We observed differentially enriched lncRNA in the subgroup of T-ALL such as HOTTIP in immature T-ALL; LINC01221, LINC00202, LINC00461, LINC00648 in cortical T-ALL and MALAT1, ST20 and TRBV11 in mature T-ALL, to be overexpressed. Interestingly, these lncRNAs have not been earlier reported in T-ALL (figure 3a). X-inactive specific transcript (XIST), which is known to have a role in multiple cancers, was expressed in both immature and cortical T-ALL19. LUNAR1, which is known to be a specific NOTCH1-regulated lncRNA, was expressed in cortical and mature T-ALL20. The co-expression network for HOTTIP, which was highly expressed in immature T-ALL, was constructed and observed to exhibit a strong positive correlation with important transcription factors, having a role in/ previously reported to be involved in T-ALL pathogenesis. (supplimentry figure 2a). In comparison with normal thymus transcriptome, we found lncRNAs, PCAT14 and PCAT18, to be significantly overexpressed in T-ALL cases.
In addition to the annotated lncRNA though less explored in T-ALL, we also identified 1,290 novel putative non-coding transcripts or lncRNA in our data. All transcripts which overlapped coding potential were removed from the analysis as these could potentially contribute to false-positive annotations.
Validation cohort
Correlation between gene expression, response to chemotherapy and outcome
The mRNA expression levels of differentially expressed 23 targets selected from discovery cohort based on their variable expression pattern in distinct immunophenotype, including protein-coding genes, epigenetic modifiers and lncRNA transcripts: BAALC, HHEX, MEF2C, FAT1, LYL1, LMO2, LYN, TAL1, DOT1L, XIST, PCAT18, PCAT14, LNC202, LNC461, LNC648, MEF2C-AS1, ST20, RAG, EP300, EML4, EZH2, MALAT1 and KDM6A. These genes were further validated in the validation cohort to assess their clinical significance. Interestingly inconsistency to the results obtained in RNA seq analysis we observed the similar pattern of overexpression in distinct immunophenotypes in the validation cohort. High expression of BAALC (p=0.001), MEF2C (p=0.002), LYL1 (0.018), HHEX (p=0.007) and low expression of EZH2 (p=0.005) were found significantly associated with the immature T-ALL immunophenotype.
RAG1 and FAT1 expression were higher in cortical T-ALL (p=0.004 and 0.033, respectively). DOT1L expression was higher in mature T-ALL (p=0.025). ETP-ALL immunophenotype was associated with high levels of BAALC (p=0.003), MEF2C (p=0.003), LYL1 (p=0.01), LYN (p=0.01), XIST (p=0.02) and lower levels of ST20 (p=0.007) and EML4 (p=0.03). We also observed an association between CD34 positivity on immunophenotyping with expression levels of BAALC (p=0.032) and MEF2C (p=0.012). Myeloid markers (CD13/CD33) expression on immunophenotyping was associated with high BAALC (p=0.021) and low ST20 (p=0.007) and low KDM6A (p=0.026). We did not find any significant association between T-ALL subtype and expression levels of PCAT14, PCAT18, TAL1, LMO2, XIST, ST20, EP300, EML4, KDM6A, LINC00202, LINC00461 and LINC00648. Of the 99 T-ALL patients in the validation cohort, RNA sample was inadequate quality and quantity was available for the determination of MEF2C in 99; BAALC, HHEX, LYL1, TAL1, FAT1, XIST and TAL1 in 87 cases; LMO2, DOT1L and LYN in 78; LINC00648, PCAT18 and LINC00461 in 72; MEF2C-AS1, PCAT14 and LINC00202 in 76; ST20, RAG1, EP300 and EML4 in 84; EZH2 and KDM6A in 82 and MALAT1 in 81 cases due to inadequacy of the samples.
Association of protein and non-coding RNAs levels with patient variables
On analysis of the potential association of patient’s characteristics with expression levels of protein and non-coding RNAs, we found an association between RAG1 expression and age of the patients. RAG1 expression was higher in patients <12 years as compared to age ≥12 years (p=0.034). XIST and KDM6A expression were higher in females (p=0.047; 0.011, respectively). We did not find any association between WBC count at diagnosis and all parameters tested. Patients with low XIST expression and high TAL1 more frequently had NCI high risk (p=0.01). Prednisolone resistance was associated with high MEF2C expression (p=0.048). Post-induction MRD positivity (≥0.01%) was associated with high expression of PCAT18 (p=0.04), HHEX (p=0.027) and MEF2C (p=0.007). (Table 2)
Survival analysis
Complete remission was achieved in 78 (87.64%) patients with induction chemotherapy. Median follow up was 22 months. The 3 year EFS (±SE) and OS (±SE) was 62.23±5.86% and 90.40±3.24%, respectively. On univariate analysis, we observed high MEF2C expression (low 71.78±6.58% vs high 36.57±10.74, HR 3.5, 95% confidence interval 1.67-7.3, p=0.0003), high LYL1 expression (low 63.14±6.65% vs 26.67±15.9%, HR 2.69, 95% confidence interval 1.08-6.69, p=0.029), low ST20 (low 43.21±13.56 vs high 61.02±7.26, HR 0.45, 95% confidence interval 0.19-1.03, p=0.049), low RAG1 expression (low 41.67±12.9 vs high 61.24±7.49, HR 0.45, 95% confidence interval 0.20-0.99, p=0.037), low EML4 expression (low 45.29±8.92 vs 83.33±8.78, HR 0.26, 95% confidence interval 0.078-0.88, p=0.018) and low KDM6A expression (low 50.48±7.58 vs high 83.33±8.78, HR 0.27, 95% confidence interval 0.062-1.12, p=0.049) were significantly associated with poor 3-year EFS (Table 3). In addition, age≥12 years and ETP-ALL immunophenotype were also associated with poor 3-year EFS (Table 3). We also found high MEF2C expression (low 94.77±2.96 vs high 78.75±8.45, HR 4.88, 95% confidence interval 1.16-20.40, p=0.016), low DOT1L expression (low 68.38±3.15 vs high 92.64±3.55, HR 0.22, 95% confidence interval 0.06-0.88, p=0.019), low RAG1 expression (low 75±10.83 vs high 92.54±3.6, HR 0.25, 95% confidence interval 0.06-0.98, p=0.03) and low MALAT1 expression (low 76.02±10.48 vs high 94.11±3.31, HR 0.22, 95% confidence interval 0.048-0.97, p=0.027) to be significantly associated with poor 3-year OS (Table 3). On multivariate analysis for EFS, we found high MEF2C expression (HR 3.25, p=0.017) to be significantly associated with inferior EFS (Table 4a). We also found MEF2C expression (HR 6.73, p=0.04) and low MALAT1 expression (HR 0.16, p=0.031) to be significantly associated with inferior OS (Table 4b).
Discussion
With the advancement of molecular techniques, T-ALL has been extensively molecularly characterized. Although genetically heterogeneous, T-ALL can be categorized into various subtypes based on gene expression profiles. Although, unlike B-ALL, molecular features in T-ALL have not been utilized for risk stratification in clinical practice. Furthermore, only a limited number of studies have recently reported prognostic relevance of lncRNAs and epigenetic modifiers in T-ALL. In this study on 134 T-ALL cases, we performed high throughput RNA sequencing in the discovery cohort (n=35) and identified several protein-coding and non-coding transcripts which exhibit differential expression among immunophenotypic subtypes of T-ALL viz. immature, cortical and mature T-ALL. Furthermore, we also validated the expression of 23 identified targets in T-ALL patients and assessed their clinical significance.
The key genes which served as transcription factors in early hematopoiesis like MEF2C, LYL1, LMO2, HHEX, RUNX2, HOXA10, HOXA9, RUNX1T1 and ZBTB16 were upregulated in immature T-ALL.
MEF2C dysregulation has been previously shown in immature T-ALL21-27. Our previous study also showed the clinical importance of MEF2C in predicting prognosis of the ETP-ALL T-ALL patient (Singh et al 2020). Colomer-Lahiguera S, et al22, reported that MEF2C dysregulation in T-ALL is associated with CDKN1B deletion and poor response to prednisolone therapy. We also found an association between prednisolone resistance and high MEF2C expression. Starza et al28 showed its upregulation in T-ALL cases with interstitial deletion of 5q. Nagel et al27 proposed distinct mechanisms for aberrant MEF2C gene expression, either by NKX2-5 signaling or by chromosomal deletion of 5q. They also showed that MEF2C inhibits BCL2-regulated apoptosis by inhibition of NR4A1/NUR7727. In addition to this, Kawashima-Goto et al, 201523, reported that BCL2 inhibitors may be useful for treating T-ALL with high expression levels of MEF2C. On network analysis, MEF2C gene expression was found to interact with protein and non-protein-coding partners: HOPX, KIT, BAALC, HHEX, EMP1, LYL1, BCL, SMYD3, HDAC9, HOTTIP, HOTAIR, LINC01021, XIST, MIR3142HG, MIR3132, MIR4741, SNORD100, SNORD101.
In our study, MN1, BAALC and IGFBP7 were overexpressed in immature T-ALL. The upregulation of these genes is believed to arise from T-cell progenitors retaining myeloid differentiation potential29-31. Previous studies suggest that overexpression of these genes is associated with poor outcome and resistance to chemotherapy29,32-35. Baldus et al, 200733, reported that high BAALC expression was associated with poor long-term survival in T-ALL. On contrary to their observation, we did not find any significant association between BAALC expression and prednisolone sensitivity. Like previous studies, we found BAALC overexpression to be associated with the expression of CD34 and myeloid markers29,31.
However, we did not find any association between BAALC expression and patient outcome. We also found overexpression of ZBTB16 (PLZF) in our patients of immature T-ALL, although not stressed in previous western studies, was a notable finding in a recently reported study21,36. ZBTB16 (or promyelocytic leukemia zinc finger, PLZF) contains one BTB domain and nine zinc finger motifs. Its overexpression was shown in this study to be a result of ZBTB16-ABL1 translocation and occurred in different patients along with other mutations, including NOTCH1, ZEB2, PTEN, MYCN, and PIK3CD. Laboratory studies with both in vitro and mouse model suggest ZBTB16-ABL1 be a driver leukemogenic lesion that causes increased proliferation and a heightened protein tyrosine kinase (PTK) activity that is amenable to tyrosine kinase inhibitor (TKI) activity21. These findings indicate that our ZBTB16-expressing patients have the same translocation. Our finding is of significance also because along with LYN overexpression, T-ALL patients with ZBTB16 overexpression may also benefit from TKIs.
Apart from these known genes, we identified aberrantly expression of some genes which have not been reported in T-ALL patients such as RUNX1T1, RUNX2, PLD4, NT5E (CD73), HOPX, TP63, HOXA11-AS. A role for RUNX2 in T-ALL has been suggested in a study by Nagel et al, 2011, who, to uncover additional target genes, investigated in detail the aberrant expression of MEF2C mediated by complex deletion at 5q, del(5)(q14) in T-ALL cell line26. This could be an evidential proof where RUNX2 instead of RUNX1 could be involved in the manifestation of ETP-ALL that allows in vivo functional evaluation of putative oncogenes and allows preclinical drug testing.
Further, some of the observed differentially expressed gene in cortical T-ALL such as CD1A, CD1C, CD4, CFTR, FAT3, NKX2-1, TLX1, TLX3 and RAG1 have been previously reported while we observed three additional genes, EREG, PAX and, ZIC2 to be upregulated in the present study.
Neumann et al 2013, in a study of adult ETP-ALL, showed that cadherins FAT1 (25%) and FAT3 (20%) were mutated, implicating alterations in cell adhesion, and activation of the Wnt pathway37. Neumann et al, 201438, showed that FAT1 expression was correlated with a more mature leukemic immunophenotype in T-ALL, with 74% patients with thymic T-ALL being FAT1 positive compared with 45% of patients with mature T-ALL and only 4% of early T-ALL patients. This is in line with our results, as we observed that FAT1 was associated with cortical immunophenotype in our study. Like previous study38, we did not find any correlation between FAT1 expression and patient outcome.
Mature T-ALL is a rare subgroup and immunophenotypically diagnosed by CD1a- and sCD3+. Molecularly TAL1 has been identified as a driver gene for late cortical T-ALL1. We observed TAL1 be overexpressed in both mature and cortical T-ALL in our study. Among the protein-coding genes, APC2, BCL3, CCR4, ST20, EML4 and NCOR2 were some of the key upregulated genes.
Aberrant histone modifications are the hallmark for cancer and are associated with dysregulated expression of histone modifiers. Therefore, we also studied their expression pattern of epigenetic modifiers to identify a set of histones modifying enzymes to be upregulated or downregulated specifically to the subtype of T-ALL. EZH2, a member of the polycomb repressor complex, which is known to be under-expressed in our immature T-ALL cases. This may be related to their previously reported mutations in immature T-ALL39. Danis et al40, mechanistically linked EZH2 inactivation to stem-cell-associated transcriptional programs and increased growth/survival signaling, features that convey an adverse prognosis in patients. However, we did not observe the association between EZH2 expression and clinical outcome. Loss-of-function mutations and deletions in SETD2 have been shown to lead to chemotherapy tolerance and clonal survival by cell cycle arrest followed by apoptosis. Further, overexpression of SETD2 has been demonstrated to confer chemotherapy resistance in a variety of cancers including leukemias39,41-43. We also observed overexpression of SETD2 in T-ALL as compared to the normal thymus. Therefore, the potential role of SETD2 overexpression in therapeutic resistance in T-ALL requires further investigation. In pediatric cases, higher expression of HDAC7 and HDAC9 in ALL is associated with poor prognosis. In our study, we observed overexpression of HDAC9 in our immature and cortical cases. CREBBP, EP300, ASH1L, ATM, PKN1, KDM2B, KDM4B and DOT1L showed significant differential expression in mature T-ALL. In the context of transcription coactivation, EP300 and CREBBP have lysine acetyltransferase activity44-47. Targeted histone lysine acetylation of EP300 and CREBBP can influence chromatin conformation46, and concomitant binding of EP300 and acetylation of H3K27 is a hallmark of promoter or enhancer activation48.
We also found low expression of ST20 (figure 5c)and EML4(figure 5e) to be associated with poor EFS. This has not been reported before.
DOT1-like (DOT1L) histone lysine methyltransferase methylates H3K79 and plays an important role in embryogenesis and hematopoiesis (Ref). Its aberrant activation is associated with acute leukemias49,50, but Its function is unknown in T-ALL. DOT1L catalytic activity depends on the mono-ubiquitination of lysine120 in histone H2B (H2BK120Ub), which provides crosstalk between various histone post-translational modifications51. Recent studies suggested the role of DOT1L in H3K79 methylation and mono-ubiquitination of lysine (H2BK120Ub) that may pave the way for the development of novel DOT1L-driven anti-leukemia therapies.52-56. DOT1L was overexpressed in our mature T-ALL patients and it may be worth investigating if they could be subjects for DOT1L-driven anti-leukemia therapy. We found DOT1L low expression to be associated with poor OS. This has never been reported before.
Apart from proteins, non-coding transcript’s/ RNA repertoire forms another layer of regulatory paradigm in normal cell hemostasis. Using RNA-seq, we identified the differentially expressed non-coding RNAs especially lncRNAs which have been very well documented earlier for their role in cancers. Our analysis revealed 223 lncRNAs, showing differential expression among various T-ALL subtypes. NOTCH1-regulated lncRNA, LUNAR1, was overexpressed in cortical and mature T-ALL20. This may be related to a higher incidence of activating NOTCH11 mutations in this T-ALL subtypes1. HOTTIP and MEF2C-AS1 were overexpressed in immature; LINC00202, LINC0648, LINC00461 in cortical T-ALL and MALAT1 in mature T-ALL. These have not been reported in T-ALL before in the English literature. We did not find any significant association between their expression with immunophenotypic subtypes.
HOTTIP has been reported to be aberrantly activated in AML. It promotes hematopoietic stem cell renewal leading to AML-like disease in mice.57 This may explain its overexpression in immature T-ALL which has myeloid potential in our study. MALAT1 is known to be involved in a plethora of biological processes ranging from alternative splicing, nuclear organization, epigenetic regulation of gene expression. It is also associated with various pathological complications like breast cancer, lung adenocarcinomas, hepatocellular carcinomas, bladder cancers and diabetes etc58-60. Several studies suggest MALAT1 expression as a prognostic marker for various cancer types61. At a molecular level, MALAT1 plays an important role in modulating several signaling pathways like MAPK/ERK, PI3K/AKT, WNT and NF-kB leading to a modification of proliferation, cell death, cell cycle, migration, invasion, immunity, angiogenesis, and tumorigenicity. The exact mechanism of how MALAT1 helps in cancer development and progression is not fully known. MALAT1 can be a therapeutic target, potential diagnostic and prognosis biomarker for cancers59,62,63.
Out of the 23 targets tested, we found that MEF2C gene expression emerged as a significant predictor of EFS and OS(figure 5a ans 6a). Although MEF2C overexpression is associated with chemoresistance and poor outcome in AML, its prognostic relevance in T-ALL has not been reported to the best of our knowledge. MALAT1 low expression also emerged as a marker for the poor OS. This has also not been reported before. Apart from these, the prognostic relevance of other markers like, ST20(figure 5c) DOT1L, RAG1(figure 5d), EML4 (figure 5e) and KDM6a (figure 5f) should be studied in a larger number of patients. Previous studies on the utilization of high throughput sequencing and microarray gene expression have shown that the immature gene signature is associated with inferior survival in T-ALL. Both of these methods are time, labour and cost-intensive. Based on our results, we recommend MEF2C gene expression analysis by real-time PCR is a reliable and cheap alternative, therefore, can be easily integrated into routine clinical practice.
Taken together, our study provides a comprehensive transcriptional map of coding as well as long noncoding RNAs. We have identified unique gene signatures that were not discovered in the western population. This may be related to the ethnic variation of Indian patients. Along with protein-coding genes, we identified novel as well as known lncRNAs which were differentially expressed in T-ALL patients. Experimental validation and survival analysis for some of the candidates confirmed the RNA-seq results while co-expression analysis gave an insight into the putative functional roles and pathways involved in T-ALL. MEF2C high expression emerged as a significant predictor of poor EFS and OS.
Data Availability
Data will be available on kind request after publication.
Conflict of Interest
The authors declare that there are no competing interests.
Ethics approval and consent to participate
I confirm all relevant ethical guidelines have been followed, and necessary ethical approval approvals (IESC/T-395/28-11-2014) have been obtained from Institute Ethics Committee For Post Graduate Research, All India Institute of Medical Sciences New Delhi.
Author Contribution
DV performed the wetlab experiments related to RNA-Seq and also analyzed the data with SKp, DS. JS, GS, SKr, MA performed the validation experiments and data collection. SB, RS, BN, AS, RP, JP, RK contributed to project design, clinical data, patient recruitment, and experiment management. SS,VS supported through highthroughput computing facility. AC is the principal project investigator.
Acknowledgements
None