Sex-specific transcriptional differences and loss of gene imprinting in pancreatic neuroendocrine tumors

Pancreatic neuroendocrine tumors (PNETs) occur more frequently in men and are associated with higher mortality in males; however, the molecular basis for these sexual dimorphisms is unclear. Here, we demonstrate that PNETs are associated with the emergence of unique sex-specific transcriptomic differences that are not observed in non-neoplastic pancreatic islet tissues. We also show that while widespread sex-specific differences are present in the DNA methylation landscapes of control pancreatic islets, they are erased in PNETs. This includes a loss of imprinting with regards to many genes. These results implicate an emergence of sex-associated genetic and epigenetic dysregulations in PNETs.

In our discovery dataset, we found 559 genes differentially expressed (DE) by sex at genome-wide significance (FDR £ 0.1), with 60.5% of them overexpressed in males. In the validation dataset, 659 genes were DE by sex, with 60.1% overexpressed in males. In the control dataset, only 173 genes were DE by sex (60.1% overexpressed in males) (Figure 1a-c). Interestingly, in both the discovery and validation PNET datasets, there were significantly more genes differentially expressed by sex as compared to control pancreatic islet tissues (discovery vs control: p = 6.5x10 -5 ; validation vs control: p = 2.0x10 -48 ), despite greater sample size (and therefore power) in the control dataset (Figure 1a and Supplementary Table 3). Additionally, there were significantly more transcription factors (TFs) differentially expressed between sexes in the PNET datasets than in control data (55 and 65 differentially expressed TFs in the discovery and validation datasets respectively; 15 TFs in the control dataset) (discovery vs control: p = 2.7x10 -6 ; validation vs control: p = 3.4x10 -8 ). Eight TFs were differentially expressed by sex in both discovery and validation datasets, and while 5 of them are located on sex chromosomes and were also differentially expressed by sex in controls, 3 autosomal TFs (HSPA1A, SOX15, and APOA4) were uniquely DE by sex only in PNET samples (see Supplementary Table 3). Notably, SOX15 plays an important role in tumorigenesis in various cancers (13), potentially including pancreatic cancer (14).
In order to investigate whether specific biological programs are differentially regulated by sex, we performed gene set enrichment analysis on the DE genes in our 4 discovery dataset using goseq (see Methods). We found significant enrichment for the Gene Ontology (GO) biological process (BP) 'detoxification of copper ion' (FDR = 0.019). Interestingly, copper transport has been previously implicated in cancer development (15). In the validation dataset, there was significant enrichment for the GO BP 'lipoprotein metabolic process' (FDR = 0.0079), which has also been linked to oncogenesis (16).
Of all the genes differentially expressed by sex in PNETs, 56 were found to be differentially expressed in both the discovery and validation cohorts. Of these, 37 were differentially expressed by sex in PNETs but not in controls. (All 37 genes are located on autosomal chromosomes.) Sixteen of these genes exhibited concordant foldchanges between the discovery and validation cohorts, meaning that they were differentially expressed by sex in the same direction (i.e., overexpressed in the same sex) in both datasets ( Table 1). Some of these genes play known roles in tumorigenesis. For instance, RASSF7 (overexpressed in males with a large effect size: log2FC of 30 and 23 in the discovery and validation analyses respectively) is an oncogene that promotes non-small cell lung cancer development and metastasis (17).
RXRB is another gene that we found to be overexpressed in males, and it has been shown to promote cancer stemness by serving as a major effector of the oncogene RAB39A in the RAB39A-RXRB axis (18). Not all differentially expressed genes favored worse outcomes in males. CD52, overexpressed in female PNETs, is a lymphocyte surface glycoprotein known to inhibit effector T-cells (19). It is upregulated in breast cancer and is associated with reduced survival rate (20). The mitogen IGF2, whose expression was significantly overexpressed in male-derived PNET tissues, is . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)  Table 2). We analyzed data across 456,928 probes targeting single methylation loci after excluding 17,120 loci with common single nucleotide polymorphisms and probes on the sex chromosomes (see Methods).
We found widespread sex specific DNAm differences in the control pancreatic islet dataset, with 38,623 cytosines (8.4% of all interrogated probes) differentially methylated by sex at genome-wide significance (FDR £ 0.1) (Figure 1d). Of note, since the Illumina 450k array probes target single cytosines (CpGs or to a much lesser extent CHs), differentially methylated cytosines are also commonly referred to as a differentially methylated probes (DMPs). Many of the sex-DMPs (16,800) were associated with regulatory elements (promoters or enhancers) and 35,611 overlapped or were within 5 kb of genes (Supplementary Table 4). None of the sex-DMPs clustered together into differentially methylated regions (see Methods). There were also no long-. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 15, 2021.
6 range regions with consistent significant methylation changes between sexes (DNA methylation 'blocks') (see Methods).
In contrast to controls, we found sparse DNAm differences by sex in PNET samples, with only 4 cytosines differentially methylated by sex at genome-wide significance (Figure 1e), 3 of which were also differentially methylated by sex in controls. Of the 4 sex-specific sex-DMPs, 3 overlapped or were within 5 kb of at least one gene (Supplementary Table 5), however, none of these proximal genes exhibited significant correlation between DMP methylation levels and gene expression (Supplementary Figure 2).
As expected, IGF2 was differentially methylated by sex in the control dataset (4 sex-DMPs overlapped IGF2, see Supplementary Table 5). However, IGF2 was not differentially methylated in the PNET dataset, suggesting a loss of imprinting in the PNET state. In the control dataset, 605 sex-DMPs overlapped or were within 5kb of 87 unique imprinted genes (Supplementary Table 5), whereas no imprinted genes were differentially methylated by sex in PNET samples.
In conclusion, we show that sex-specific differences emerge in PNET transcriptomes. Gene expression in pancreatic islets is known to differ by sex (24), however, our analysis indicates that the transcriptional differences in PNETs are distinct from those that are observed in non-neoplastic tissue counterparts. Interestingly, these PNET gene expression differences by sex are not related to DNA methylation. In fact, while control islets harbor >38,600 sex-specific methylome differences, they appear to be almost completely erased in the cancer state.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 15, 2021. ; 7 Effects of sex on tumorigenesis and clinical outcomes has emerged as an important component in our understanding of oncologic disease (25,26). However, as yet, there are no sex-specific cancer therapy protocols and no clinical trials investigating outcome differences by sex. It is therefore the logical next step to identify sex-driven differences that would affect prognosis and response to therapy and incorporate that information into clinical trial design. Thus, achieving a greater understanding of molecular differences in PNETs by sex is invaluable for advancing personalized therapy and informing future clinical trial design. Lastly, pancreatic neuroendocrine tumors encompass several heterogeneous subgroups, with distinct genomic, clinical, and prognostic profiles. Future studies should analyze sex differences in these PNET subtypes individually, yielding results with a high degree of internal validity, which are applicable to a narrow range of patients with specific cancer subtypes, a concept that underlies the essence of personalized medicine.

Human PNET tissue collection
Studies were conducted in accordance with the Declaration of Helsinki. Informed written consents were obtained from patients with a diagnosis of PNETs pre-operatively for study enrollment under the Weill Cornell Medicine Institutional Regulatory Board approval. Clinical specimens (derived from primary PNETs) were collected from patients undergoing surgery at Weill Cornell Medicine/NewYork Presbyterian Hospital.
Tumor RNA sequencing data generation . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Quantified data was then imported into R and transcript-level quantification was summarized to gene-level with the tximeta R package (version 1.8.4). Differential . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Gene set over representation analysis
Over Representation Analysis (ORA) for Gene Ontology terms and KEGG pathways was performed with the goseq Bioconductor package (version 1.42.0) (32), adjusting for gene length. The probability weighing function (PWF) was calculated with the data bias (bias.data argument in the nullp function) set to the median transcript length for each gene. The median transcript length of each gene was calculated from Ensembl (release 97) data. Category enrichment scores were calculated using the . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 15, 2021.
where Mi and Ui are intensities measured by the i th Methylated and Unmethylated probes, respectively, and 100 is an offset constant which is added to regularize the βvalue when M and U intensities are low, per Illumina's recommendation (34). All of our samples exhibited a bimodal distribution of β-values, with enrichment of low β-values (close to zero) and high β-values (close to 1) (Supplementary Figure 4), consistent with expected results (33).
Next, the data was normalized via functional normalization, an unsupervised method which removes unwanted technical variation by utilizing control probes on the array. Functional normalization was shown to outperform existing normalization methods when analyzing data that is expected to have global DNAm differences, such as cancer data (35). After normalization, we removed 17,120 probes that contained common SNPs (based on dbSNP build 147) at target cytosines or single base extension (SBE) sites. We also dropped the probes on sex chromosomes, as they are harder to accurately normalize (36). At the end of data processing, we ended up with in a total of 456,928 probes (each one targeting a unique CpG/CH) on 87 samples.

Differential methylation analysis
To find differentially methylated cytosines (referred to as Differentially Methylated Positions/Probes, or DMPs) we fit the following linear model for every cytosine interrogated by the array: . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 15, 2021. ; https://doi.org/10.1101/2021.06.09.21258573 doi: medRxiv preprint where, yij is the normalized proportion methylation at probe i and sample j, Sex is a binary variable (Female=0; Male=1) blockFinder collapses neighboring CpGs/probes into a single probe-group . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 15, 2021. ; 13 measurement, and fits Eqn. 1 for each i th probe group (thus, yij now represents normalized proportion methylation at probe-group i and sample j). Similar to DMRs, we required a 0.1 difference in DNAm levels between sexes for DNAm blocks (argument: cutoff=0.1), 1000 linear model bootstrapping permutations were performed (arguments: nullMethod = 'bootstrap' and B = 1000), and blocks with FWER ≤ 0.1 were called significant.
DMPs and regions of interest were mapped to genes annotated by GENCODE (version 31). All downloaded 450k datasets were annotated relative to Human Genome version 19 (hg19), so all DNAm analyses were done relative to hg19 for consistency.
Whether or not CpGs/probes are associated with promoters or enhancers was determined from Illumina annotation, which is in turn determined using data from the Methylation Consortium (for enhancers) and ENCODE Consortium (for promoters).

Statistical tests
To calculate the significance of differences in number of differentially expressed genes and TFs between datasets, the Chi squared test with continuity correction was utilized. Computations were performed in R (version 4.0.3).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 15, 2021. ;

Data availability
Processed RNA sequencing data (log2(TPM+1) values) for all samples that were sequenced for this study is provided in Supplementary Data 1.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 15, 2021. ; * Concordance refers to genes being differentially expressed by sex in the same direction (i.e., overexpressed in the same sex) in both the discovery and validation datasets.
** log2 Fold Change > 0 denotes gene overexpression in males . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 15, 2021. ; https://doi.org/10.1101/2021.06.09.21258573 doi: medRxiv preprint