Discovery of a simple two-gene expression biomarker in whole blood predictive of the need for treatment escalation in inflammatory bowel disease

Background and aims: The IBD-Character consortium has recruited large internationally based inception cohorts of treatment-naive inflammatory bowel disease (IBD) patients, providing a unique resource to derive a simple transcriptome signature in the field of prognostication. Methods: The discovery cohort (n=160) was recruited in Norway, Sweden and Spain. The replication inception cohort from the United Kingdom (n=97) was followed-up for a mean (SD) of 350 (228) days. Treatment escalation was formally defined as the need for a biologic agent, ciclosporin and/or surgery, instituted for disease flare after initial remission, or colectomy during the index admission for ulcerative colitis. Whole blood RNA was subject to paired-end sequencing. In the discovery cohort a simple procedure was applied, which exploited differences of transcript ratios. The ten top performing ratios were tested using Cox regression models in the validation cohort. Results: Newly diagnosed IBD patients with high CACNA1E/LRRC42 expression ratio had an increased risk of treatment intensification (validation cohort: HR=19.3, 95%CI 2.6-143.9, p=0.000005; AUC 0.76, 95%CI 0.66-0.86). In 51 patients with CRP < 3.5 mg/L, CACNA1E/LRRC42 still predicted escalation (HR=10.4; 95%CI 1.2-86.5, p=0.007). The second best performing transcript ratio was CACNA1E/CEACAM21 yielding a HR of 10.9 (95%CI 2.5-46.7, p=0.00002) and an AUC of 0.76 (95%CI 0.65-0.86) in the validation cohort. Conclusion: Transcriptomic profiling of an IBD inception cohort identified gene expression ratios CACNA1E/LRRC42 and CACNA1E/CEACAM21 as prognostic biomarkers. These were validated in a replication cohort as strongly associated with short- and long-term risk of treatment intensification and may provide valuable information in clinical decision-making.


Introduction
The field of therapeutics in inflammatory bowel disease (IBD) has seen marked expansion over the last decade. In parallel, expectations have developed that biomarker discovery may help in rationalizing therapeutic choice in individual patients. A number of multi-omic analyses have been proposed to be of potential clinical use in predicting disease course in IBD. Discoveries in recent studies include genetic, epigenetic, transcriptomic and proteomic signals associated with treatment escalation at diagnosis 1,2 . Of particular note is the whole blood-derived transcriptomic signature, described by Biasci and colleagues, which is now being assessed in a prospective clinical trial in Crohn's disease 3 (CD).
The IBD-Character consortium has recruited large internationally based inception cohorts of treatment-naïve IBD patients with extensive multi-omic characterization, thereby providing a unique resource to derive a simple transcriptome signature in the field of prognostication. In this context, IBD-Character has to date enabled the identification of prognostic proteomic and micro-RNA biomarkers 2,4 . In these cohorts, we now show that simple gene expression signatures in whole blood taken at initial presentation can define future disease course and provide complementing data to be used in clinical practice.

Methods
The discovery cohort (n=160) was recruited in Oslo (Norway; n=87), Örebro (Sweden; n=52), and Zaragoza (Spain; n=21) and included newly diagnosed patients who escalated treatment within the first year of diagnosis or did not require escalation but were followed-up for at least a year (Supplementary Table 1).
In the replication inception cohort from Edinburgh (United Kingdom; n=97) the mean (± SD) follow-up was 350 ± 228 days. All the centers employed the step-up . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 10, 2021. treatment strategy. Treatment escalation was formally defined as the need for a biologic agent, ciclosporin and/or surgery, instituted for disease flare after initial remission. In ulcerative colitis (UC), the definition of treatment escalation also included any patient requiring colectomy during the index admission.
Whole blood was collected to Paxgene tubes, and the obtained RNA was subject to paired-end sequencing (AmpliSeq). In the discovery cohort, reads were filtered for highly expressed genes only (n=13,071), and a simple procedure was applied, which exploited differences of transcript ratios (described in detail in Supplementary Methods). In the validation cohort, the signals were tested in all 21,913 annotated transcripts using Cox regression models.

Results
Eighty-five million ratios were analyzed to identify the top 10 ratios with the highest areas under the curve (AUCs) in the discovery cohort; of these, two ratios demonstrated highly significant validation in the replication cohort ( Figure 1). Six other ratios were validated with smaller effects (Supplementary Methods).
Newly diagnosed IBD patients with high CACNA1E/LRRC42 expression ratio (Calcium Voltage-Gated Channel Subunit Alpha1 E / Leucine Rich Repeat Containing 42; NM_001205293/NM_052940 > 0.066) had an increased risk of treatment intensification (validation cohort: HR 19.3, 95%CI 2.6-143.9, p=0.000005; AUC 0.76, 95%CI 0.66-0.86). Furthermore, the ratio was complementary to baseline C-reactive protein (CRP) in prediction. All patients with CRP above a cut-off based on maximum sensitivity and specificity (>65 mg/L) also had high CACNA1E/LRRC42. However, importantly among patients with lower CRP concentrations, CACNA1E/LRRC42 was able to identify a validation subgroup predisposed to escalation in the long term ( Figure   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 10, 2021. After adjustment for CRP, drug-naïve status, age and sex, CACNA1E/LRRC42 retained HR of 12.4 (95%CI 1.6-96.0; p=0.000004) and CACNA1E/CEACAM21associated HR was 6.9 (95%CI 1.5-31.8, p=0.00001), both in the validation cohort.

Discussion
In clinical practice, CRP has been shown to be the most accessible prognostic biomarker in blood with respect to future treatment escalation in patients with IBD 5 , whilst calprotectin is the most informative faecal biomarker. We confirm the predictive ability of CRP in this context. Additionally, we demonstrate that in patients in whom CRP is not elevated, an at-risk subgroup can be discerned using the gene expression ratio CACNA1E/LRRC42. Crucially, even patients who required treatment escalation after as long as two years were identified at diagnosis using This finding provides renewed optimism for the potential for clinical translation of transcriptomic IBD biomarkers in view of current concerns 6 regarding the organizational and technological complexity of replication efforts, as well as their cost . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 10, 2021. ; and the intricacy of models. Our findings set the stage for future validation of this simple, low-cost transcript ratio that could be easily translated to the clinic.
Mechanisms underlying the observed relationships remain to be characterized. In CD, a polymorphism in CACNA1E has been proposed to correlate with progression from inflammatory to stricturing or penetrating phenotype 7 . Further work is needed to elucidate its role in IBD. As the CACAN1E blocker topiramate was inefficacious in IBD, it appears that either CACNA1E plays no causative role or that a more refined approach is required to target its function specifically in leukocytes. In conclusion, transcriptomic profiling of an IBD inception cohort identified gene expression ratios CACNA1E/LRRC42 and CACNA1E/CEACAM21 as prognostic biomarkers. These were validated in a replication cohort as strongly associated with short-and long-term risk of treatment intensification and may provide valuable information in clinical decision-making.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 10, 2021. ; https://doi.org/10.1101/2021.07.09.21259804 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 10, 2021.   Replication of top 10 ratios from the discovery step was attempted and most highly significant results were obtained for two ratios (CACNA1E/LRRC42, CACNA1E/CEACAM21). (Right) CACNA1E/LRRC42 performs better than CRP in identification of newly diagnosed patients with IBD at risk of future treatment escalation. CRP cut-off 65 mg/L was selected automatically as maximizing the sum of sensitivity and specificity. The capacity of CACNA1E/LRRC42 to identify at-risk patients among those with low-CRP persists even after the CRP threshold is set at 3.5 mg/L (see main text). Confidence intervals (95%) for the survival curves are shown.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 10, 2021. ; Supplementary Figure 1. In the replication cohort, CACNA1E/LRRC42 was able to identify patients at risk of treatment escalation among 51 persons with CRP < 3.5 mg/L (a highly-performing threshold that we determined previously in Kalla R et al. Am J Gastroenterol. 2016; 111:1796-1805). Low CACNA1E/LRRC42 strongly associates with the absence of the need for escalation, even in the longer term (HR = 10.4; 95%CI 1.2-86.5, p=0.007).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Supplementary methods
The patients were recruited in 2013-2016. All the participants gave informed written consent for their inclusion in the study. Whole blood was collected to Paxgene tubes.
The obtained RNA was quantified (Qubit 2.0), checked for quality (RNA integrity number) and finally subject to paired-end sequencing using Ion AmpliSeq Human Gene Expression Core Panel (Thermo Fisher Scientific, Waltham, USA). Each pool contained eight equimolar samples. A library quality check was performed with the Ion Library TaqmanTM Quantitation kit (Thermo Fisher Scientific). Alignment of reads to hg19 was achieved with Torrent Software v. 4.6.
Four patients with mismatched sex were removed, as were: one patient with "possible IBD" diagnosis and one patient with "other inflammation" diagnosis. Highsensitivity CRP was assayed in one batch. Smoking status was self-declared.
Analyses were conducted in R 4.0.4 (R Software Foundation, Vienna, Austria).

Discovery cohort filtering and normalization
Patients from Oslo, Örebro and Zaragoza had a follow-up period of 7-365 days and treatment escalation or >365 days and no escalation detected at any timepoint. The data were checked for duplicates. Genes with raw mean expression ≥10 and less than 5% of null values were processed further. Globin genes were removed. After obtaining gene names using org.Hs.eg.db package, biomaRt was queried for gene lengths, enabling transcripts-per-million (TPM) normalization using code adapted from bedapub/ribiosNGS. The mean sample correlations were all >0.80.

Discovery procedure
In the discovery cohort, all transcript ratios (~85 mln) were compared between patients who escalated and who did not, using the fast implementation of two sample . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 10, 2021.

Replication cohort normalization
Gene expression profiles of patients from Edinburgh with follow-up ≥7 days were included. Globin genes were removed, gene lengths were obtained as above, and the data was TPM-normalized as in bedapub/ribiosNGS. PCA and correlation plots were inspected. All the average sample correlations were >0.80.

Validation in the replication cohort
In the replication cohort, the optimal cut-off was selected for each ratio by the criterion of maximum sum of sensitivity and specificity based on output from pROC.
Packages survival and survminer were employed to analyze survival and prepare figures. Cox models were built, and from the total of ten ratios, the top two best performing models (lowest p) were selected. CACNA1E/LRRC42 and CRP performance were visualized in the Edinburgh cohort. CRP threshold of 65 mg/L was automatically determined in the replication group by identifying the maximum sum of sensitivity and specificity. To complement these analyses, another threshold of 3.5 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 10, 2021. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 10, 2021. ; https://doi.org/10.1101/2021.07.09.21259804 doi: medRxiv preprint