Abstract
Adult granulosa cell tumors (AGCTs) harbor a somatic FOXL2 c.402C>G mutation in ∼95% of cases and are mainly surgically removed due to limited systemic treatment effect. In this study, potentially targetable genomic alterations in AGCTs were investigated by whole genome sequencing on 46 tumor samples and matched normal DNA. Copy number variant (CNV) analysis confirmed gain of chromosome 12 and 14, and loss of 22. Pathogenic TP53 mutations were identified in three patients with highest tumor mutational burden and mitotic activity, defining a high-grade AGCT subgroup. Within-patient tumor comparisons showed 29-80% unique somatic mutations per sample, suggesting tumor heterogeneity. A higher mutational burden was found in recurrent tumors, as compared to primary AGCTs. FOXL2-wildtype AGCTs harbored DICER1, TERT(C228T) and TP53 mutations and similar CNV profiles as FOXL2-mutant tumors. Our study confirms that absence of the FOXL2 c.402C>G mutation does not exclude AGCT diagnosis. The lack of overlapping variants in targetable cancer genes indicates the need for personalized treatment for AGCT patients.
Introduction
Adult granulosa cell tumors (AGCTs) belong to the sex cord-stromal tumors of the ovary and account for 2-5% of ovarian malignancies with an estimated incidence of 0.6-1.0 per 100.000 women per year worldwide1–4. Patients may develop symptoms, such as vaginal bleeding, caused by prolonged exposure to tumor-derived estrogen which may result in early detection of the disease. However, AGCTs are usually not preoperatively suspected and a histopathological diagnosis is made after surgical resection of an ovarian mass. Granulosa cell tumors also exist as a juvenile subtype, which generally occurs in young women and respresents only 5% of this tumor type4. AGCTs are microscopically defined by grooved, uniform and pale nuclei and a variety and mixture of histologic patterns can be found including microfollicular, trabecular, insular and diffuse patterns4. Although the disease is frequently described as indolent in behavior, recurrences occur in approximately 50% of the patients of whom 50-80% ultimately die of their disease5–7. Recurrences often require repeated debulking surgeries since alternative treatment options including chemotherapy, radiotherapy and hormone therapy have thus far shown only limited effect8. The current lack of effective systemic treatment emphasizes the need for novel therapeutic strategies. Molecular characterization of AGCTs could help to identify potential targets for treatment.
In contrast with other ovarian malignancies, AGCTs harbor a specific FOXL2 c.402C>G point mutation (C134W), which has been reported in 94-97% of patients6,9. FOXL2 is a transcription factor involved in ovarian function and granulosa cell differentiation10,11. Until now, efforts to target this gene have not been successful. Previous studies on genomic alterations in AGCTs identified copy number gains in chromosome 12 and 14 and loss of chromosome 2212–18. Furthermore, mutations were detected in genes that are known for their involvement in other cancer types, such as TERT, KMT2D, PIK3CA, AKT1, CTNNB1 and NR1D113,14,19–22. These studies included a subset of AGCTs within a larger ovarian cancer cohort and/or performed targeted or whole exome sequencing. Importantly, most studies did not analyze the corresponding normal reference DNA, essential for identifying true somatic variants.
We present the largest molecularly characterized cohort of AGCTs to date, in which we perform whole genome sequencing (WGS) on fresh frozen tumor material with matched normal reference DNA. We use this comprehensive method to investigate copy number changes, frequently mutated genes, mutational signatures and tumor heterogeneity. We define a subgroup of patients with high-grade AGCTs, harboring a pathogenic TP53 mutation. In addition, we identify potential driver mutations in AGCTs without the FOXL2 c.402C>G mutation, and find more variants in recurrent AGCTs as compared to primary tumors. Finally, we detect a high degree of intra-patient tumor heterogeneity.
Results
Description of WGS cohort
WGS was performed on 46 fresh frozen tumor samples from 33 patients (Table 1). We analyzed 11 patients with a primary tumor and 22 patients with recurrent disease. Microscopically assessed average tumor purity was 80% (range 40-90%). The whole genome was sequenced to an average read depth of 35X per sample (range: 26X–107X), with 97.5% of bases covered >10-fold (range: 93.9%-98.3%, Supplementary File 1). Matched normal reference DNA was obtained for all AGCTs. The median age at diagnosis was 53 years (range 29-75) and at the time of study, 21 patients had no evidence of disease, 11 were alive with disease and one patient had died of her disease. From five patients, tumors at multiple time points and/or multiple tumor locations were analyzed.
In our cohort, we detected a median number of somatic single nucleotide variants (SNVs) and small insertions/deletions (INDELs) of 3,579 variants per genome (range 1,346-21,452), of which 29 (range 13-238) were exonic, resulting in a tumor mutational burden (TMB) of approximately 1.25 per mega base (Mb, range 0.47-7.5). We identified 20 structural variants (SVs) per sample (median, range 0-314). The median number of mutations (SNVs and INDELs) detected in primary tumors was 1.5x higher than reported in a previous WGS study including 10 primary AGCTs which identified 1,578 variants per tumor (range 630-2,706)19. The TMB in our AGCT cohort was comparable to the TMB reported in a previous whole exome sequencing based study (1.2 mutations per Mb in primary AGCTs and 2.1 mutations per Mb in recurrences)13. The number of variants detected in AGCTs falls within the same range of variants described in low grade serous ovarian cancer (median: 3,064, range 1,641-7,398)23.
Previous studies reported conflicting results on the difference in mutational burden between primary and recurrent tumors13,14. In our study, primary tumors harbored 2,199 SNVs (median, range 1,346-4,120) and recurrent tumors 4,279 SNVs per sample (median, range 2,114-21,452; p<0.001). In addition, primary tumors carried fewer SVs (median 10, range 2-34) as compared to recurrent tumors (median 22, range 0-314; p=0.018). Our WGS data suggest that recurrences harbor significantly more variants than primary tumors.
Copy number alterations
Copy number analysis was performed on WGS data of 27 patients fulfilling the CNV caller pipeline requirements (see Methods). The majority of copy number alterations were duplications or losses of entire chromosome arms or chromosomes. In most patients, copy number loss of chromosome 22(q) (15/27, 56%) or gain of chromosome 14 (15/27, 56%) was found (Fig. 1). Concurrent gain of chromosome 14 and loss of 22 was seen in 11/27 patients (41%). Loss of chromosome 16(q) was seen in 4/27 patients (15%). Concurrent copy number gain of chromosomes 8, 9 and 12 was detected in 7/27 patients (26%) and gain of both chromosome 18 and 20 was seen in 5/27 patients (19%). Within patients, copy number profiles remained stable between different time points and tumor locations (patient 8 and 13, Fig. 1), or differed slightly between time points (patient 11 T1 versus T2 and T3). CNVs were detected in 7/9 (78%) patients with a primary tumor and in 16/18 (89%) patients with a recurrence. This study confirmed previously reported CNVs in either chromosome 12, 14 or 22 in 22/27 patients (81%)12–18. Trisomy 12 is also often detected in other sex cord-stromal tumors and is usually the single copy number alteration in benign sex cord-stromal tumors such as fibromas and thecomas24. Monosomy 22 was identified as the sole anomaly in a mixed germ cell sex cord-stromal tumor of the ovary and in a fibrothecoma25,26, and trisomy 8 as the single copy number variant in a Sertoli-Leydig cell tumor27. The effect of concomitant gain of chromosome 14 and loss of 22 in AGCTs is unknown and requires further investigation. In our cohort, copy number alterations are equally present in primary tumors and recurrences. It therefore remains unclear whether chromosome gains and losses are a cause or a consequence of tumor evolution.
Mutational signatures
In addition to the number of mutations detected per sample, we investigated which mutational processes generated specific single base substitutions (C>A, C>G, C>T, T>A, T>C, and T>G). We applied de novo signature extraction and compared these signatures to COSMIC mutational signatures version 3 (Fig. 2a,b). Four different major signatures were identified in the tumor samples. Two of the derived signatures were related to normal ageing processes being present in all cells (COSMIC 3, 5, 37 and COSMIC 3, 5, 40, respectively), one to platinum treatment (COSMIC 31, 35) and one signature with a yet unestablished cause (COSMIC 4, 20, 38, 45). No signatures related to microsatellite instability (MSI) or homologous recombination (HR) deficiency were detected (see Methods). Patients treated with chemotherapy demonstrated a significantly higher total number of variants (median 5,463 SNVs) as compared to chemotherapy naïve patients (median 2,861 SNVs, p<0.001). Approximately 50% of the variants observed in these chemotherapy-exposed tumors can be explained by the platinum signature. In contrast, two patients (patient 30 and 49) treated with platinum-based chemotherapy did not show single base changes related to platinum treatment. These patients received only three out of six cycles of chemotherapy due to disease progression during treatment, whereas the patients that do show the platinum signature had a prolonged platinum exposure and received up to 2×6 cycles of chemotherapy. A recent study established that the contribution of platinum signature is dependent on the duration of treatment28. The absence of the platinum signature in these patients could possibly be explained by their resistance to platinum and the short duration of treatment.
Variants in known cancer genes
A total of 239 variants in known cancer genes from COSMIC29 were detected in our AGCT cohort (Supplementary File 2). Most recurrently mutated genes included FOXL2, TERT, KMT2D, PIK3CA and TP53 (Fig. 2c). We confirmed the FOXL2 c.402C>G mutation in 29/33 patients (88%). Two patients had a second mutation in FOXL2 consisting of a frameshift mutation and a variant in the UTR5 region, respectively (Supplementary File 2). TERT C228T and C250T promoter mutations were present in respectively eight and three patients (together 33%), and were mutually exclusive. These variants were present in 2/11 (18%) patients with a primary tumor and in 9/22 (41%) of the patients with a recurrence. The majority of patients with active disease or who died of disease harbored a TERT promoter variant (7/12, 58%), while only 3/18 patients with no evidence of disease had a TERT promoter mutation (17%, Fig. 2c,d). Although the two TERT promoter variants are known hotspots in cancer, this study is the first to describe the C250T variant in AGCTs. Our results corroborate the findings of a previous study that detected the TERT C228T variant more often in recurrences (41%) than in primary tumors (22%) and associated this mutation with impaired prognosis20.
Subgroup of patients with high-grade AGCT characterized by TP53 mutation
Out of the five patients with the highest mutational load, three had a TP53 mutation combined with loss of heterozygosity (R248G, H179R and C135Y, respectively. Fig. 2). These tumors harbored numerous copy number alterations, and increased mitotic activity was seen on hematoxylin and eosin (H&E) slides from two patients (18 and 70 per 2mm2, respectively) as compared to TP53-wildtype patients (range 2-12 per 2mm2) (Fig. 3). The TP53 mutant tumors harbored a higher number of both SNVs (median 12,027, range 7,100-21,452) and SVs (median 188, range 66-314) as compared to the TP53-wildtype tumors (median 3,336 SNVs, range 1,346-9,211, median 20 SVs, range 0-120, Fig. 4).
We investigated the clinical disease course in the patients harboring a TP53 mutant tumor. The first patient (patient 46) was diagnosed with AGCT four years prior to study participation and was re-treated with chemotherapy after multiple surgeries and different hormonal and cytostatic treatment regimens (Supplementary File 1). The second patient (patient 2) had a total of four disease recurrences in eight years and is alive with no evidence of disease. The third patient (patient 16) presented with metastases at diagnosis (stage IIB disease), which is unusual since the vast majority of patients is diagnosed with a primary tumor confined to one ovary. This might indicate an early aggressive behavior of the tumor. The patient is being treated for her fifth recurrence and developed breast cancer as second primary tumor.
In the literature, cases of AGCTs with areas of high-grade malignant morphology in either the primary tumor or recurrence have been reported. One study detected a TP53 mutation in these high-grade components of 2/4 FOXL2 mutant AGCTs and stated that TP53 mutation likely plays a role in the high-grade transformation30. A recent case report describes an aggressive AGCT without a FOXL2 mutation with strongly positive p53 immunohistochemistry and numerous mitoses31, possibly similar to our finding of one FOXL2-wildtype TP53-mutant AGCT with high mitotic activity. TP53 mutations have previously been found in 4-9% of AGCTs13,14,32. In conclusion, we found an association between the occurrence of a TP53 mutation, high mutational burden, copy number alterations and mitotic activity. These characteristics define a subgroup of high-grade AGCTs with a potentially more aggressive tumor behavior.
FOXL2-wildtype AGCTs
Four clinically and histopathologically confirmed AGCTs did not harbor the specific FOXL2 variant (Fig. 2c). As described above, we detected a damaging TP53 variant in one FOXL2 wildtype tumor. Interestingly, in one other tumor both the TERT C228T variant and two DICER1 variants were found (E1813D and a stopgain variant Q1230* predicted to result in nonsense-mediated decay33, Supplementary File 2). The TERT C228T variant has been previously detected in 26% of AGCTs20 and similarly in 8/33 (24%) patients in our study. DICER1 variants are detected in 2.38% of all cancers with non-small cell lung cancer, colorectal cancer, endometrial cancer, breast cancer, and melanoma having the greatest prevalence34. Somatic heterozygous mutations in DICER1 are present in 29% of non-epithelial ovarian cancers, including Sertoli-Leydig cell tumors (60%, the majority harboring a E1705K or D1709N hotspot mutation), juvenile GCTs (7%), yolk sac tumors (13%) and mature teratomas (12%)35. Specifically, DICER1 mutations containing nonsense and missense variants in one of the four catalytic residues in the RNase IIIB domain (E1705, D1709, D1810, E1813), as we find, are the major known DICER1 events in cancer36. No germline or somatic exonic truncating or missense DICER1 variants were detected in the three remaining FOXL2-wildtype patients. FOXL2-wildtype patients also harbored somatic mutations in the cancer genes ARID1B, STK11, TP53, PIK3R1, LRP1B, GATA1, NOTCH2, CTNNB1 and PAX3 (Supplementary File 2). However, none of the FOXL2-wildtype patients shared a variant in the same gene. Additionally, no SVs were detected in the vicinity of FOXL2 (+/-10,000 base pairs) with a predicted effect on expression and/or regulation of FOXL2.
FOXL2 is thought to be a major tumor driver in GCT, although the absence of a FOXL2 mutation in a small subset of AGCTs suggests potential alternative mechanisms for AGCT tumorigenesis. In a recent study, GCT tumorigenesis was associated with the combined inactivation of p53 and Rb pathways, with FOXL2 still present in newly developed AGCTs and FOXL2 downregulation starting during AGCT growth37. Cluzet et al. suggest that FOXL2 downregulation may be a late event in GCTs and therefore possibly not a prerequisite for tumor development, but rather for tumor growth. This finding supports the hypothesis of alternative mechanisms for AGCT tumorigenesis.
The copy number profiles, number of SNVs and SVs of the FOXL2-wildtype tumors are similar to the tumors with FOXL2 mutation (Fig. 1 and Fig. 4). FOXL2-wildtype tumors harbored 2,791 SNVs (median, range 1,919-12,027) and FOXL2-mutant 3,662 SNVs per tumor (median, range 1,346-21,452). Three out of four FOXL2-wildtype patients had CNVs in both chromosome 12 and 14 and one patient had additional loss of chromosome 22. One patient had monosomy X, which was previously detected in 1/17 (6%) AGCTs15 and present in 3/28 (11%) of patients in our cohort. In all FOXL2-wildtype tumors, the diagnosis AGCT was reconfirmed by the pathologist. For example, the diagnosis Sertoli-Leydig cell tumor was considered and excluded in the DICER1 mutated tumor because the tumor harbored the pathological characteristics of an AGCT (Fig. 3b) and did not show signs of another ovarian tumor. In addition, this patient was postmenopausal when the ovarian mass was diagnosed, while the peak incidence of a Sertoli-Leydig cell tumor lies around the age of 25, and also had pre-operatively elevated circulating inhibin and estrogen levels. Our study confirms that the absence of the FOXL2 c.402C>G mutation does not exclude the diagnosis AGCT. Moreover, in two patients we identified TP53 and DICER1, respectively, as potential tumor drivers. These findings suggest there may be alternative mechanisms for AGCT development beside the FOXL2 c.402C>G mutation.
Shared variants between AGCT patients
Besides the specific missense variant in FOXL2, no overlapping somatic variants between ≥4 AGCT patients were detected in the coding regions of the genome. All variants detected by WGS with a CADD score >5 (Supplementary File 3a) were analyzed. Hereby, we identified 39 shared variants between ≥3 patients including the two TERT promoter variants (Supplementary File 3b). The sole detected coding variant shared by 3 patients, TSPYL2 L34Pfs*28, was concluded to be a technical artefact upon Sanger sequencing validation. We identified 305 non-coding variants present in only two patients. However, it remains challenging to interpret the effect of potential non-coding mutations on gene function. Variation in the non-coding areas of the genome is incompletely characterized and may be a result of sequencing and mapping artefacts as well as poorly understood localized hypermutation processes, and the functional effect that these variants may have on cis or trans regulatory elements is unknown38,39. We detected full loss of single or multiple genes in 7/33 (21%) of patients, although there was no overlap in deleted genes between patients, and specific breakpoints on chromosome 1 in nine patients (Supplementary Fig. 1a,b). This study confirms the lack of overlap in mutations in AGCTs, as shown in previous studies13,14. These studies suggest that “second-hit” mutations leading to recurrence may be random.
Novel candidate gene analysis
Cohort analysis resulted in the identification of 13 genes with mutations in ≥3 patients that could possibly be deleterious, as predicted by the CADD PHRED score >5 (Supplementary File 3c). This list included two previously identified genes involved in AGCTs, FOXL2 and KMT2D, and the newly described gene TP53. Our study confirms the previously reported limited number of recurrent mutations in individual genes in AGCTs14.
Intra-patient comparisons reveal tumor heterogeneity
To investigate tumor heterogeneity and tumor evolution over time, tumor tissue from multiple time points and/or multiple tumor locations of five different patients was sequenced and analyzed. In patients 8 and 23, we investigated tumor tissue obtained from two different recurrences and detected that 54-80% of the somatic variants (SNVs and INDELs) in these tumors were unique to a single time point (Supplementary Fig. 3). Surprisingly, two samples taken from the left and right side within a primary AGCT (patient 22) had the least overlap and carried 75-80% unique somatic variants (456 overlapping and 1,377-1,786 unique variants). The SVs also differed in this early stage of disease (1 and 2 SVs with no overlap, respectively, Supplementary Fig. 3). We investigated the variants in 7 and 5 different recurrences and tumor locations of patient 11 and 13, respectively (Fig. 5). These patients showed a similar pattern of distribution between the proportion of overlapping (25-41% and 25-35%) and unique variants (29-73% and 31-66%). The detected larger structural variants also capture the heterogeneity of this tumor, with only 1 and 4 shared SVs between all samples of patient 11 and 13, respectively (Supplementary Fig. 2c,d). Interestingly, all samples from patient 13 share the same SVs present at the initial time point and acquired additional SVs over time.
Patient 11 had active recurrent disease during the study period and required alternative treatment options after multiple surgeries and different chemotherapy and hormone treatment combinations (Fig. 6). In this patient, recurrent PIK3CA mutations were found and this gene was identified as potential target for future treatment. However, during the course of her disease, different variants in the PIK3CA pathway emerged and disappeared. The PI3K/AKT pathway was thought to be involved in AGCT tumorigenesis since PI3K activity within oocytes irreversibly transforms granulosa cells into AGCTs in mice40. Mutations in this pathway, however, have been detected in only a small proportion of patients (2/33, 6% in our study). Although this pathway was recurrently hit in this patient, it remains difficult to assess whether these mutations are drivers or passenger mutations. A previous study identified KMT2D inactivation as a driver event in AGCTs and suggest that mutation of this gene may increase the risk of disease recurrence13. In our study, mutations in KMT2D were present in 6/33 patients (18%). Patient 13 had different exonic inactivating mutations in KMT2D in 3/5 tumor samples (Fig. 6). In this patient, KRAS was the only additional cancer gene besides FOXL2 and TERT that was mutated in all samples. These examples illustrate both inter- and intra-patient heterogeneity in AGCTs.
We also compared the dispersion of mutational signatures within the longitudinally obtained tumor samples from patient 11 and 13. Patient 11 was treated with chemotherapy after the first two tissue samples had been obtained (Fig. 6a). Only the tissue samples obtained at later recurrences showed the platinum signature (Fig. 1b), while tissue obtained before treatment did not show the platinum signature. In line with these results, patient 13 had not been treated with chemotherapy before or during the study and none of her tumor samples harbored the platinum signature. This data confirms that chemotherapy can increase the number of variants, induces specific base changes and potentially plays a role in the heterogeneity between tumor metastases28. The lack of overlapping variants within and between patients provides evidence for tumor heterogeneity. This challenges the paradigm of AGCT as being a homogenous tumor. Our study shows that AGCTs, despite the microscopically homogenous cell population, are not homogenous in the genomic alterations they harbor. This can be a challenge for designing effective treatment strategies.
Discussion
This is the first study to investigate the whole genome of a large cohort of AGCTs by sequencing both tumor and matched reference DNA. AGCT patients did not share specific variants or affected genes except for the FOXL2, KMT2D and TERT promoter variants. A higher mutational burden was found in recurrent tumors, as compared to primary AGCTs. The molecular differences we detected between and within patients confirm tumor heterogeneity which is an established characteristic of cancer41 and rejects the view of AGCTs as being a homogenous tumor. This study also illustrates the complexity of treating recurrent AGCTs, emerging with decreasing time intervals, and therefore suggests it should not be described as an indolent tumor. We confirm that absence of the FOXL2 c.402C>G mutation does not exclude AGCT diagnosis and identified TP53 and DICER1 as potential drivers in these tumors. Furthermore, we define a subgroup of high-grade AGCTs characterized by a damaging TP53 mutation, high tumor mutational burden and mitotic activity.
Whole genome sequencing has emerged as a comprehensive test over the past decade, enabling the detection of both exonic, intronic, and intergenic variation. Recent studies suggest that deeper sequencing (e.g. 90X) of the genome is needed to detect subclonal events with a low allele frequency in tumors42. In our study clonal tumor mutations could reliably be detected by 35X coverage of tumor samples with matched reference DNA. Moreover, the AGCTs in our study were high in tumor content, with the majority of tissues having a tumor percentage of 80-90% (31/48 samples). Variant allele frequency (VAF) calling at 30X in 80-90% tumor samples might be more accurate than VAF calling at 90X in a 30% tumor sample. As a result, this coverage is sufficient to detect clonal mutations in our cohort of high tumor purity samples and is currently used in large cancer studies43,44.
The lack of overlapping variants in targetable cancer genes in AGCTs indicates the need for personalized treatment. TERT is essential for the maintenance of telomere length and thus influences cellular immortality. TERT activity is an important mechanism for cancer to escape apoptosis and a promising therapeutic target for cancer, as it is highly expressed in most tumor cells and hardly expressed in normal cells,. TERT mutations have frequently been detected in many cancers45 and in our study a damaging promoter variant was consistently present in 33% of patients and in 41% of recurrences. Although it can be difficult to target a promoter variant, TERT silencing has been successful in NRAS mutant melanoma46 and might be applied to other tumors in the future.
In epithelial ovarian cancer, the presence of a TP53 mutation determines high-grade disease. It can be hypothesized that AGCT patients harboring characteristics of a more aggressive tumor, such as a TP53 mutation, might respond better to chemotherapy than patients without this variant since chemotherapy targets rapidly dividing cells. This requires further investigation in a larger cohort of TP53-mutant and wildtype tumors.
In conclusion, AGCTs harbor significant genetic heterogeneity between and within patients, and are not a homogenous and stable tumor type. This heterogeneity can be a challenge for future targeted treatment in AGCT patients, and suggests that a personalized, genotype-guided approach is required. Future personalized in vitro drug screens on patient-derived tumor tissue could facilitate the identification of potential patient-specific treatment and targeted sequencing of these tumors could identify actionable mutations.
Methods
Patient cohort and study inclusion criteria
A national prospective study was performed to obtain patient derived fresh frozen tumor tissue and corresponding germline DNA (blood or saliva). Patients were included consecutively during their hospital consultation in 5 hospitals between 2018-2019. In addition, patients with fresh frozen tumor material available in the hospital’s pathology archive were asked for consent. Ethical approval was obtained by the Institutional Review Board of the University Medical Center Utrecht (UMCU METC 17-868) and by the board of directors of the participating centers. All participants provided written informed consent. Clinical data was acquired from patient reports.
Tissue acquisition
All tumor material was obtained directly from surgery, fresh frozen and transported in dry ice and stored in the −80 degrees freezer. From the tissue, 20x 5 μm slices were cut for DNA isolation. Hematoxylin and eosin (H&E) staining slides were made and reviewed by a pathologist (GNJ) to confirm AGCT diagnosis and assess tumor percentage. Minimal tumor percentage for study inclusion was 40%, with the majority of tissues having a tumor percentage of 80-90% (31/48 samples, Table 1). In parallel, fresh frozen tumor material was crushed in a liquid nitrogen cooled mortar using a pestle, and pulverized tissue was collected for DNA isolation. Material for normal reference DNA isolation was acquired by prospective blood draw during patient follow-up visit. In cases where patients no longer required follow up at the hospital, Oragene-DNA OG-500 saliva DNA isolation kits (DNAGenotek, Ottawa, Ontario, Canada) were mailed to the patient residence and taken by the patient after instructions, so that a hospital visit and blood draw were not required.
DNA isolation, quantification and qualification
DNA from the fresh frozen pulverized tumor tissue was isolated using Genomic-tip 100/G (Qiagen, Venlo, NL). Isolated DNA was quantified by Qubit 2.0 dsDNA broad assay kit (ThermoFischer Scientific, Waltham, MA, USA) and ensured to be of high molecular weight by visualization on a 1.5% agarose gel. Samples were then concentrated if needed to a minimal concentration of 15 ng/μl and a total of 1 μg was aliquoted for WGS. Normal reference DNA isolation from blood was performed according to the DNeasy blood isolation protocol (Qiagen, Venlo, NL) and normal reference DNA isolation from saliva samples performed according to the prepIT-L2P protocol (DNAGenotek, Ottawa, Ontario, Canada).
Whole-genome sequencing and variant calling
DNA was sent for 2X150bp paired-end sequencing on Illumina HiSeq X or NovaSeq 6000 instrument (Illumina, San Diego, CA, USA) to the Hartwig Medical Foundation (HMF, Amsterdam, NL, 23 tumor samples with normal reference DNA) or Novogene (Novogene, Beijing, China, 23 tumor samples with normal reference DNA). Mapping and variant calling from raw fastQ reads was performed using a pipeline (v4.8) from the Hartwig Medical Foundation (HMF) installed locally at the UMCU using GNU Guix (https://github.com/UMCUGenetics/guix-additions; https://github.com/hartwigmedical/pipeline42). Sequence reads were mapped with Burrows-Wheeler Alignment v0.75a47 against human reference genome GRCh37. Realignment of insertions and deletions and base recalibration was performed with the Genome Analysis Toolkit (GATK, v3.8.1) (https://www.ncbi.nlm.nih.gov/pubmed?term=20644199). Somatic single nucleotide variants (SNVs) and small insertions and deletions (INDELS) were called with Strelka (v1.0.14)48. All SNVs labelled as “PASS” were included in the analysis. The functional effect of the somatic SNVs and INDELS were predicted with SnpEff (v.4.3)49. Somatic structural variants (SVs) were called using GRIDSS (v1.8.0) and copy number variation called by using PURPLE50.
WGS data analysis
The tumor mutational burden (TMB; mutations per Mb) for each sample was derived by dividing the sum of mutations across the entire genome (SNVs, MNVs (multiple nucleotide variants) and INDELS) by the total mappable sequence length of the GRCh37 FASTA file (2858674662) divided by 106, as has been described previously51. Wilcoxon rank-sum test was performed for between-group comparisons, two tailed, with a 0.05 significance level.
CNV analysis
The results obtained from PURPLE (CNVs per breakpoint and per gene) were processed and plotted using in-house R tools for inter- and intra-patient comparison of CNVs. Samples not fulfilling PURPLE quality control criteria for CNV calling (status: “FAIL_SEGMENT”) were not plotted.
Exonic variant analysis
The exonic and UTR variant analysis was performed using Alissa (Agilent Technologies Alissa Interpret v5.1.4). Variants in the exonic regions, ±100 base pairs into the intronic regions, and in the 5’ and 3’ UTR of Refseq transcripts were retained for analysis. Variants present in the COSMIC Cancer Gene Census (release v89) as somatic mutations in cancer were annotated (Supplementary File 2a,b).
To find novel candidate genes in which mutations could contribute to oncogenesis and/or recurrence, we performed a cohort analysis to identify genes with a somatic variant in multiple samples. As this study contained multiple samples from individual patients, this variant list was further filtered to remove any genes in which variation occurred solely within samples of a particular patient. Variants were annotated with the CADD PHRED (Combined Annotation Dependent Depletion)52 score and genes with no or only one variant >5 were removed.
Whole genome variant analysis
Variation throughout the entire genome was assessed to identify recurrent variations and prioritized by CADD score. Briefly, somatic Variant Call Format (VCF) file of each sample was combined into one complete cohort specific VCF containing somatic variation across all samples. As the effect of non-coding variation is difficult to predict, we annotated all variant positions with the CADD PHRED score, a score that is suitable for assessing the effect of exonic, intronic, regulatory and intergenic variation. To exclude mutations with minimal predicted deleteriousness, we selected the variants with a CADD PHRED score > 5 and identified mutations shared by multiple patients. Candidatevariants shared by ≥3 patients were validated by Sanger sequencing, as recurrent unannotated variation can be a result of platform or library preparation technical artifacts53.
Mutational Signature analysis
Mutational signatures were created using the R-package mutationalPatterns54. We derived mutational signatures from the SNV data and selected the optimal number of signatures by inspecting the Non-negative Matrix Factorization (NMF) rank survey as described in the vignette of the MutationalPatterns R-package. In detail, We examined the “rss” (residual sum of squares) plot and the “cophronetic” (cophronetic correlation) plot. We checked for an elbow in the rss plot and aimed for a high cophronetic correlation between model and data. We created four signatures per mutation type and compared them to the signatures from COSMIC mutation signatures version 3 using the cosine-similarity measure. Derived signatures were named according to the proposed etiology of the closest COSMIC signatures.
Tumor heterogeneity assessment
SNV heterogeneity was assessed by comparing the number of overlapping SNVs for the patients with multiple tumor samples from different time points and/or tumor locations. SV heterogeneity was investigated using in house tools based on the R-package StructuralVariantAnnotation (Bioconductor version: Release 3.10).
Homologous Recombination
We investigated Homologous Recombination by the CHORD (https://github.com/UMCUGenetics/CHORD) method, which is a random forest model that predicts homologous recombination deficiency (HRD) using the relative counts of specific somatic mutation contexts. The main contexts used by CHORD are small deletions with flanking microhomology and 1-100kb structural duplications55.
Data availability
WGS Binary Alignment Map (BAM) files are available through controlled access at the European Genome-phenome Archive (EGA), hosted at the EBI and the CRG (https://ega-archive.org), with accession number EGAS00001004249. Requests for data access will be evaluated by the UMCU Department of Genetics Data Access Board (EGAC00001000432) and transferred on completion of a material transfer agreement and authorization by the medical ethical committee of the UMCU to ensure compliance with the Dutch ‘medical research involving human subjects’ act.
Data Availability
WGS Binary Alignment Map (BAM) files are available through controlled access at the European Genome-phenome Archive (EGA), hosted at the EBI and the CRG (https://ega-archive.org), with accession number EGAS00001004249. Requests for data access will be evaluated by the UMCU Department of Genetics Data Access Board (EGAC00001000432) and transferred on completion of a material transfer agreement and authorization by the medical ethical committee of the UMCU to ensure compliance with the Dutch ‘medical research involving human subjects’ act.
Author’s contributions
JFR, JWG, ES, RHMV and RPZ designed the study and obtained ethical approval of the study protocol. JFR, STP, HWN, HSM, LRCWL, JMJP, CARL, GNJ, POW and RPZ collected clinical data, tissue and reference DNA samples. JFR, GRM and JK performed data analysis and bioinformatics. GNJ assessed tumor purity and reviewed all specimens for histological pathology. GRM performed DNA isolations. JFR, GRM, JWG, GWH and RPZ wrote the manuscript. All authors revised the manuscript and approved final version for publication.
Competing interests
The authors declare no competing interests.
List of Supplementary Files and Figures
Supplementary File 1. Clinical characteristics and sequencing details of aGCT samples
Supplementary File 2. Variants in COSMIC somatic cancer genes
Supplementary File 3. Shared variants and candidate genes
Variants with CADD score > 5
Variants shared in ≥ 2 patients
Genes with variation in ≥ 3 patients
Supplementary Figure 1. Full gene loss
Supplementary Figure 2. Structural Variants
Structural Variants in more than 1 sample
Structural Variants in chromosome 1
Structural Variants in patient 11
Structural Variants in patient 13
Supplementary Figure 3. Circos plots and venn diagrams from all intra-patient comparisons
Acknowledgements
The authors wish to thank the Granulosa Foundation Philine van Esch, which provided funding that was critical to this project. The pipeline filtering software used in this study was provided by the Hartwig Medical Foundation. We gratefully thank Edith D.J. Peters for PCR validations, Wigard P. Kloosterman and Edwin P.J.G. Cuppen for scientific input, Arne van Hoeck for his expertise in the mutational signature analysis, and Sander W. Boymans for bioinformatic support.