Abstract
The majority of patients with ovarian cancer ultimately develop recurrent chemotherapy resistant disease. Treatment stratification is mainly based on histological subtype and stage, prior response to platinum-based chemotherapy and time to recurrent disease. Here, we integrated clinical treatment, treatment response and survival data with whole genome sequencing profiles of 132 solid tumor biopsies of metastatic epithelial ovarian cancer to explore genome-informed stratification opportunities. Samples from primary and recurrent disease harbored comparable numbers of single nucleotide variants and structural variants. Mutational signatures represented platinum exposure, homologous recombination deficiency and aging. Unsupervised hierarchical clustering based on genomic input data identified specific ovarian cancer subgroups, characterized by homologous recombination deficiency, genome stability and duplications. The clusters exhibited distinct response rates and survival probabilities which according to our analysis could potentially be improved by genome-informed treatment stratification.
Introduction
Epithelial ovarian cancer (EOC) is a genetically heterogeneous disease that is characterized by high recurrence rates, the development of chemotherapy resistance and subsequently, poor survival. Over the past decades, survival rates have hardly improved1. Worldwide, yearly ~295,000 women are diagnosed with ovarian cancer, and ~185,000 die of the disease2. Primary treatment of advanced disease consists of surgery and platinum-based chemotherapy. The combination carboplatin and paclitaxel is the most frequently used regimen and most patients respond well to this treatment. However, with every recurrence the response rates to platinum treatment decline, due to the emergence of chemotherapy resistance. Subsequent disease free intervals shorten, eventually leading to death from disease in the majority of patients 3,4 There is an urgent need to find new treatment options, especially for recurrent disease. Patient stratification at recurrence is still mainly based on prior response to platinum treatment and time to recurrent disease, and not on molecular or genetic characteristics5.
Due to advances in sequencing technologies and their decreasing costs, our understanding of the molecular basis of ovarian cancer has improved. Targeted and whole exome sequencing of primary high grade serous ovarian cancer (HGSC) revealed mutations in TP53 in nearly all patients, as well as extensive copy number variation6,7. Mutations in BRCA1 and BRCA2 occur in ±20% of patients (due to a combination of germline and somatic events) and are the most prominent cause of defective homologous recombination (HR), which is associated with improved response to platinum-based chemotherapy and PARP-inhibition6,8. Potential driver genes, other than TP53 and BRCA1/2, are mostly observed in only a small subset of patients, illustrating extensive interpatient tumor heterogeneity6,7. Recently, the genomic landscape across different cancers has been studied in unprecedented detail9,10, and it was shown that clustering based on genomic data can divide patients with the same cancer into clinically meaningful subgroups11–14.
Here, we studied the genomic landscape of 132 prospectively collected solid tissue biopsies of 127 patients with metastatic ovarian cancer. Due to integration of detailed clinical (treatment) data, we were able to analyse the impact of previous treatments on the genome and to evaluate the response to subsequent treatment in relation to genomic characteristics. We identified subgroups of patients that harbored distinct genomic features such as HR-deficiency and genomic stability that could benefit from traditional chemotherapeutics, PARP-inhibitors and other targeted drugs.
Results
Patients with metastatic ovarian cancer have diverse treatment histories
We analyzed the whole genome sequencing (WGS) and clinical data of 132 tissue biopsy samples from 127 patients with metastatic ovarian cancer, partially included in the pan-cancer analysis described by Priestley et al. (methods section)10. In this study a biopsy of metastatic disease was obtained and response to subsequent treatment was monitored as depicted in the flowchart (Fig. 1a). The majority of samples were collected at time of recurrent disease (86%, N=113/132), while a minority was included at time of primary disease, prior to any systemic treatment (14%, N=19/132) (Table 1, Table S1). The most common histopathological subtype at time of diagnosis was high grade serous carcinoma (HGSC; 56%, 74/132) followed by low grade serous carcinoma (LGSC; 12%, 16/132) and serous carcinoma not otherwise specified (serous NOS; 10%, 13/132). Non-serous subtypes included clear cell carcinoma (N=5) and endometrioid carcinoma (N=5) (Table 1). The majority of tumors were initially diagnosed at advanced disease (FIGO stage III (N=69) and IV(N=40)). Nearly half of the biopsies were taken from metastatic sites in the peritoneum or omentum (48%, 63/132), followed by lymph node (25%, 33/132) and liver metastases (11%, 14/132). The median patient age at the time of biopsy was 63 (range 31-85).
Most patients with recurrent disease had been heavily treated prior to the time of biopsy, and treatment history varied widely between patients (Fig. 1 and S1a). The median time between diagnosis and study biopsy in patients with recurrent disease was 37 months (including 10 patients with an interval of more than 10 years). Patients were exposed to a mean of 4.1 drugs in this period (range 1-20) and a longer diagnosis-biopsy interval was associated with a higher number of drugs (p=0.01, R2=0.058)(Fig. 1, Table S1). All patients with recurrent disease were treated with chemotherapeutic agents prior to biopsy (Fig. 1B). A subset of the cohort was additionally treated with targeted drugs (34%, 38/113), hormonal therapy (19%, 22/113) and immunotherapy (5%, 6/113). In total, 28 different drugs were administered to the patients in this cohort (Fig. S1a, Table S1). Nearly all patients were exposed to the standard first line treatment for ovarian cancer, carboplatin (110/113, 97%) and paclitaxel (103/113, 91%). Other commonly administered drugs were pegylated liposomal doxorubicin (PLD) (34/113, 30%), bevacizumab (31/113, 27%), gemcitabine (28/113, 25%) and tamoxifen (19/113, 17%). The remaining 22 drugs were distributed to less than 10 patients each. The majority of patients received a unique combination of systemic treatments, illustrating the heterogeneity in therapies even in the absence of molecular guidance, and reflecting the clinical challenges in defining optimal treatment strategies for this patient group (Fig. S1a). A subset of patients was additionally treated with radiotherapy (20%, 23/113) (Table S1).
The correlation of known clinical determinants and drug response
We evaluated which treatments were administered after the biopsy was obtained, and how patients responded to the given treatment (Fig. 2a, Table S1). In total, 109 patients started a treatment following biopsy collection. Similar to the time period prior to biopsy collection, there was great variety in the treatments administered after biopsy. The most commonly used treatment regimens consisted of carboplatin and paclitaxel ((N=26), including 16 patients with primary disease), carboplatin, gemcitabine and bevacizumab (N=15), and carboplatin and PLD (N=11) (Fig. 2a). Response to treatment was monitored radiologically by CT-scanning and assessed according to the RECIST criteria version 1.115(Fig. 2a). For 95/109 patients (87%) the RECIST response was available, which ranged from complete response (CR 4/95, 4%), partial response (PR 31/95, 33%), and stable disease (SD 43/95, 45%) to progressive disease (PD 17/95, 18%). We confirmed that the group with a favorable response (CR+PR) contained a higher proportion of patients with primary disease compared to the poor response group (SD+PD), 29% vs 13% (Fig. 2b)16. Further, all tumors that were initially diagnosed as well differentiated exhibited a poor response (Fig. 2b).
A shorter platinum free interval (PFI), defined as the time between last platinum treatment and next recurrence, has been associated with a poorer response to subsequent platinum treatment5. We investigated whether the PFI correlated with response based on radiological RECIST assessment in 27 patients15. Only two patients experienced a recurrence within six months, of whom one experienced a favorable response to platinum-based (combination) therapy (Fig. 2c). Of the patients with a PFI of over six months, 56% (14/25) had a favorable response. Our analysis did not support the clinical prognostic value of the PFI, likely due to a combination of small numbers and a mix of the number of relapses (Fig. S1b).
Primary and recurrent samples have comparable mutational loads
We subsequently analyzed the WGS data of the 132 EOC tissue biopsies. On average, a total of 10,729 mutations were detected per sample. The total number of mutations did not differ significantly for primary versus recurrent samples (primary mean 10,202 (SD 7,548); recurrent mean 10,818 (SD 6,108); Wilcoxon rank sum test p=0.4, Fig. S2c), in line with recent observations in other tumor types10. In contrast, a previous WGS study in ovarian cancer identified a higher mutation frequency in recurrent versus primary samples7. Notably, in that study the majority of samples in the recurrent disease cohort were derived from ascites, while our cohort was entirely made up of solid tumor biopsies. Our primary and recurrent samples contained a comparable number of mutations with high or moderate predicted impact (primary mean 84, SD 72); recurrent mean 92, SD 55) (Wilcoxon rank sum test p=0.27, Fig. S2c).
To detect driver genes in our cohort based on mutational status, we assessed which genes had a higher somatic mutation rate than expected based on the background mutation rate17. We identified four genes in this cohort, TP53, KRAS, PIK3CA and NF1, all known drivers in ovarian cancer (Table S2)6,7. In a subanalysis restricted to primary samples, only TP53 reached significance, whereas in the recurrent group all four genes were detected. Of note, the power to detect driver genes relies on the sample size which was markedly larger in the recurrent group (compared to the primary group (N=120 vs N=19).
Copy number aberrations characterize ovarian cancer beyond HGSC
HGSC is known for extensive copy number aberrations (CNA). We therefore evaluated CNA on a genome-wide and gene level both across the cohort and per subtype. The average tumor genome ploidy in our cohort was 2.8 (range 1.6-5.9) and the majority of samples (61%, N=80/132) had undergone whole genome duplication (WGD). This was recently also observed in multiple other types of metastatic cancer10. WGD was observed in both primary (68%) and recurrent samples (59%), suggesting that WGD is an early event in tumor evolution, in line with the results of a recent detailed analysis on the evolution of ovarian cancer by Gerstung et al18. There was no difference in the median tumor genome ploidy of samples from primary disease versus samples from recurrent disease (primary 3.1 (SD 1.1), recurrent 2.7 (SD 0.8), Wilcoxon rank sum test p=0.28, Fig. S2c). In a cohort-wide analysis, multiple genomic regions containing recurring amplifications and deletions were identified, encompassing copy number gains in genes marked as drivers in ovarian cancer such as MYC and CCNE (Table S4)9,19. On average, 412 structural variants (SVs) were detected (range 1-2135), including balanced and unbalanced events. The number of SVs was comparable in the primary disease group (mean 478, SD 495) versus the recurrent disease group (mean 400, SD 338) (Wilcoxon rank sum test p=0.64, Fig. S2c).
Next, we evaluated differences in copy number state between the different histological subtypes. As expected, genome-wide copy number aberrations were most distinct in HGSC. Interestingly, the other subtypes also harbored extensive copy number aberrations, although in LGSC samples to a lesser extent (Fig. S3a). The average non-diploid fraction (combined haploid and polyploid fraction) was 0.58 for HGSC samples, compared to 0.26 for LGSC and 0.38 and 0.37 for endometrioid and clear cell carcinoma samples respectively. Merged CNA profiles per subtype revealed common changes such as a gain of chromosomes 1q and 8q (Fig. S3b).
Mutational signatures reflect treatment history and tumor biology
To assess the footprint of biological factors and treatment effects on the tumor genome, we assessed which mutational signatures were present in our cohort. We derived the contribution of COSMIC v.3 mutation profiles to our samples and identified major contributing profiles for eight single bases substitutions (SBS), five double base substitutions (DBS) and five indel (ID) signatures (Fig. S4a-c, Table S5). The identified signatures (Fig. S4d-f) represented mutational processes previously linked to aging (SBS1, 5, 40, DBS2 and 4, ID1, 2, 5, 8), platinum exposure (SBS31 and 35, DBS5) and defective HR (SBS3, ID6). Our findings confirm a recent analysis that identified these mutational signatures in a similar sized cohort of primary and metastatic ovarian cancer samples20. Increased exposure to platinum was associated with higher absolute contributions of the platinum-associated signatures SBS31, SBS35 and DBS5 (Fig. 3a-c). Further, samples classified as homologous recombination (HR) deficient according to HR-classifier CHORD21 had a significantly higher contribution of SBS3, ID6 and ID8 (Fig. 3d-f). Both ID6 and ID8 have been linked to double strand break repair through non-homologous end joining (NHEJ)20. Double strand breaks can be induced by radiotherapy22, though we did not observe an association between ID8 and prior exposure to ionizing radiation therapy (Fig. 3g). Taken together, the genomic landscape of metastatic EOC is shaped by both endogenous and exogenous clinically relevant mutational processes as multiple mutational processes are simultaneously active in every patient.
Unsupervised clustering based on genomic input reveals seven distinct clusters
We subsequently performed unsupervised hierarchical clustering analysis of the entire cohort based on genomic features to identify subgroups with distinct characteristics (Fig. 4). Input consisted of sample ploidy (haploid/diploid/polyploid fraction), the number and relative mutation frequencies of single base substitutions (C>A, C>G, C>T, T>A, T>C, T>G), double base substitutions, indels (insertions and deletions in the context of repetitive regions, regions with microhomology and others) and SV categories (deletions, duplications, inversions, insertions and translocations, divided in long and short events (≥ or <10,000bp)). PCA analysis on the clustered data identified short structural deletions (<10,000bp), small deletions with microhomology, and ploidy state as the most important discriminating features of the cohort (Fig. S5b-c). We annotated the cluster plot with clinical and genetic data and identified seven distinct clusters (Fig. 4 and S6). Bootstrapping analysis revealed a high degree of stability of the clusters (Fig. S5d). Next, we analyzed the distribution of clinical features across the seven clusters. Primary disease samples did not cluster together but were distributed across six clusters. This is in line with our observation that primary samples are comparable to samples from recurrent disease, regarding average ploidy, WGD status and total number of SNVs and SVs. While the main histological subtype, HGSC, was present in all clusters, LGSC samples concentrated in cluster III. No obvious clustering of the other subtypes was observed, likely due to low numbers in these groups. Further, no clustering was observed according to the biopsy site, patient age at biopsy or disease spread at primary diagnosis (FIGO stage).
Clusters with specific genomic and clinical features have potential clinical impact
Assessment of the individual clusters revealed distinct genomic and clinical characteristics with potential prognostic importance. The first two clusters, I and II, were characterized by a large number of mutations and structural variants. Of these, the most prominent were small deletions with flanking microhomology and short structural deletions and duplications (<10,000bp) (Fig. 4 and S6). On average, the length of structural deletions in cluster I and II (202 and 235 bp) was markedly shorter compared to the remaining clusters (24,473 bp) (p=2.2e-16) (Fig. S7a). Notably, the HR-classifier CHORD independently classified 100% of the samples in cluster I and II as HR-deficient based on genome-wide mutation patterns21. In contrast, all of the samples in the remaining clusters were classified as HR-proficient. Within CHORD, a further distinction can be made between BRCA1- and BRCA2-subtype HR-deficiency. BRCA1- and BRCA2-type HR-deficient samples were randomly distributed across cluster I and II. Biallelic inactivation of BRCA1 or BRCA2 that explained the HR-deficiency genotype was observed in 16/42 (38%) samples in cohort I and II, while 3/42 (7%) samples harbored a single somatic variant. Samples in cluster I and II were among the samples with the highest contribution of mutational signature ID6, which is characterized by deletions with flanking microhomology and associated with HR-deficiency (Fig. S7b). Additionally SBS3, which has been attributed to HR-deficiency, was highly represented in these two clusters (Fig. S7b). Cluster I and II consisted mainly of HGSC, but also included a poorly differentiated endometrioid and a clear cell carcinoma. The majority of samples in the HR-deficient clusters had an aberration in TP53 (40/42, 95%), in line with the high number of HGSC samples in these clusters. The main difference between cluster I and II was related to sample ploidy. Cluster II was largely polyploid and all samples had undergone whole genome duplication (WGD), whereas the average genome ploidy of cluster I was two. Additionally, NF1 aberrations were present in the majority of cluster I (9/17, 53%), while they were rarely observed in cluster II (2/25, 8%). Despite missing data, a trend of improved response to treatment after biopsy was observed in cluster I and II. 17/35 tumors (49%; N=7 missing data) were sensitive to subsequent treatment (CR or PR), versus 18/60 tumors (30%; N=30 missing data) in the remaining cohort (Fig. 4). Additionally, survival in the two HR-deficient clusters was among the highest in the cohort. One-year survival from biopsy was 88% (15/17) and 72% (18/25) in cluster I and II, respectively, compared to 52% (47/90) in the remaining cohort, indicating a prognostic advantage for patients within these clusters (Fig. S8).
In contrast to cluster I and II, cluster III comprises a genomically stable subgroup (Fig. 4). The tumors in this cluster harbor the lowest numbers of SNVs and SVs and have diploid genomes (Fig. S6). The majority of samples were wild type for TP53 (16/20, 80%). Most samples in this cluster were of the LGSC subtype. However, also five HGSC were present in this cluster, of which three harbored a mutation in TP53. Genomically stable tumors tend to respond poorly to chemotherapeutic treatment23. Overall response to treatment in this cluster was poor (SD/PD 11/14, 79%; N=6 missing data) and the one-year survival from biopsy was 53% (10/19) (Fig. 4 and S8).
The distinguishing feature of cluster IV is the abundance of long duplications (Fig. 4, S6, S7a and c). Long duplications comprised 51% of SVs in cluster IV compared to 16% in the remaining clusters. A tandem duplicator phenotype (TDP) was previously identified in ovarian cancer and distinguished six subgroups based on tandem duplication span size24. The duplication length in cluster IV peaked around 231Kb, correlating with TDP group two. TDP group two is characterized by TP53 mutations and additionally CCNE1 pathway activation in a third of samples. Likewise, the majority of samples in our duplication cluster IV harbored a TP53 mutation (10/11, 91%) and in 27% (3/11) the CCNE1 pathway was activated, either through CCNE1 amplification (HMF002954) or a mutation in FBXW7 (HMF001177 and HMF002316) (Table S6). In contrast, only 7% (8/121) of the samples outside cluster IV had an amplification of CCNE1 and no other FBXW7 mutations were observed.
Cluster VII harbored most indels at repeat regions and most C>T SNVs (Fig. 4, Fig. S6). The majority of patients either had a mutation in TP53 or a mutation in KRAS (15/18, 83%). Mutations in KRAS and TP53 were mutually exclusive, indicating distinct underlying tumor biology processes. Notably, across all cohorts, missense mutations in KRAS and aberrations of TP53 were rarely observed simultaneously (2/16, 13%), while amplification of KRAS was only observed in the presence of TP53 aberrations (7/7, 100%). Cluster VII further consisted of a mix of all histological subtypes. The majority of patients in this cluster were treated with platinum-based agents after biopsy. Response to treatment was missing for most patients, but importantly the one-year survival in this cluster was the worst in the cohort (8/18, 44%) (Fig. 4 and S8).
High frequency of actionable targets in the poor response cluster
We assessed whether the genomic data revealed actionable targets in our cohort. We evaluated both level A (approved) and level B (experimental) drugs, for on- and off-label indications. We restricted this analysis to drugs not available as standard of care for patients with metastatic ovarian cancer (excluding platinum agents and PARP-inhibitors). In total, 15 actionable genes were identified for which 59 different therapies were available (Table S7). Nine genes were affected by mutations, three by an amplification and one by a deletion. Also, four fusion genes were detected. In 57 samples (43%) an actionable target was identified, yet none of these patients were actually treated accordingly. In nearly all cases it concerned category B off label drugs, which refers to drugs for which experimental evidence is available based on studies on other cancer types. The genes with a potential actionable target in most people were KRAS (hotspot mutation in 16 patients, amplified in seven additional patients), NF1 (11 patients) and PIK3CA (seven patients) (Table S7). Targetability per cluster varied widely (16-83%) (Fig. 5). Despite a high number of duplications in cluster IV, the patients in this cluster did not harbor any targetable amplified genes. Importantly, in the cluster with the worst survival most patients were treated with platinum-based agents (cluster VII), while this cluster contained the highest fraction of patients with an actionable target, 83% (15/18 samples). Genome-informed treatment stratification might therefore have improved the prognosis of the patients in this cluster.
Intrapatient genomic stability and actionability over time
For five patients two subsequent time points during recurrent disease were sequenced. Four pairs were assigned next to each other in the same cluster, indicating that for these patients genomic profiling of recurrent disease was not influenced by the timing of the biopsy (Fig. 4, sampleIDs indicated with an asterisk). In line with this, no difference in targetability between the two biopsies of these patients was observed. In contrast, the two biopsies obtained from the fifth patient (HMF000019) had distinct genomic profiles and were clustered apart from each other in cluster II and V, respectively (Fig. 4, Table S6). These biopsies were sampled 10 months apart from distinct tumor locations. Additionally, the mutational profile of both samples showed a completely different pattern. The biopsy obtained at the first time point presented with inactivation of TP53 and KMT2C (HMF000019B), while the second biopsy lacked these inactivations and harbored an actionable KRAS variant (HMF000019A), indicating the presence of two primary tumors.
Discussion
In conclusion, we analyzed WGS data and clinical data of patients with metastatic ovarian cancer. The described cohort is representative of the mixed clinical population of patients with (recurrent) ovarian cancer, including both serous and non-serous histological subtypes and varying treatment histories. Simple genomic clustering of this heterogeneous cohort resulted in subgroups with potential clinical impact. We identified seven clusters including two HR-deficiency clusters (I+II), a genomically stable cluster (III), a duplication cluster (IV) and a poor survival cluster (VII) in which clustering-based stratification could potentially improve outcomes. Patients in the HR-deficiency cluster will benefit more from standard chemotherapeutics and PARP-inhibitors compared to patients from the genome stable cluster. Actionability analysis revealed targets for treatment, with special promise for the cluster VII in which an actionable target was identified in 83% of the patients. Importantly, most of the patients in this cluster were treated with platinum, and one-year survival from biopsy was only 44%. In contrast to a previous hypothesis on a higher rate of actionable amplified genes in samples with a duplication phenotype24, we did not find evidence to support this claim in cluster IV, possibly due to low numbers.
We showed that genomic clustering can classify tumors beyond the traditional histopathological classification parameters. While most LGSC samples clustered in the genomically stable cluster (III), two LGSC samples (both wildtype for TP53 and KRAS mutant) ended up in the cluster with the worst prognosis, cluster VII. These samples had few SVs like the other LGSC samples, but much more nucleotide variants and one of them had undergone whole genome duplication. Furthermore, we identified five HGSC samples in an unexpected cluster, the genomically stable cluster (III). These five samples had lower mutation and SVs counts compared to most HGSC samples. Notably, two out of five samples were wildtype for TP53, which according to updated histopathological guidelines (taking P53 staining into account) might classify them as LGSC. However, the other three samples are likely true HGSC samples, based on biallelic inactivation of TP53 in two of them and a TP53 missense mutation in the other. Identifying these outliers could improve treatment stratification. While (HGSC) samples in cluster III could benefit from a more targeted approach as they might not respond well to standard chemotherapeutics, (LGSC) samples in cluster VII might benefit from early targeted treatment to improve their poor prognosis. These hypotheses should be confirmed in prospective trials with larger cohorts.
Previously, Tothill et al. identified six molecular subtypes of ovarian cancer based on gene expression data, of which four were confirmed as HGSC subtypes in the TCGA dataset6,25. The data in our cohort was restricted to WGS and therefore we were not able to correlate our clusters with these transcriptomic subtypes. Integrated WGS and RNAseq analysis of larger datasets should be conducted to identify overlap between the transcriptomic subtypes and the genomic clusters described here. As the applications of exome and WGS are emerging in the clinic, this opens the way for actionability analysis which can identify therapeutic options for individual patients beyond standard treatment regimens26–28. Integrating WGS in randomized controlled trials will allow us to evaluate whether treatment allocation can be improved using both histopathological and genomic information. The results from such trials will bring us closer to an individualized treatment approach, and ultimately, to increasing survival rates for patients with ovarian cancer.
Methods
Patient inclusion, sample selection and clinical data collection
For this study we obtained the WGS data of patients with advanced or metastatic ovarian cancer that were included in clinical studies CPCT-02 (NCT0855477) and DRUP (NCT02925234). The Institutional Review Board of the UMC Utrecht and the Netherlands Cancer Institute (IRB UMCU/NCI) approved both studies and all patients provided written informed consent. Part of our cohort (N=95) was previously included and described as part of the pan-cancer analysis paper by Priestley et al10. Clinical data was obtained from the CPCT-02 and DRUP study registries. Additional clinical data was acquired through a query at the Dutch Cancer Registration (DCR) with permission of the local CPCT-02/DRUP principal investigators. This concerned data routinely collected for clinical use, including data related to primary disease (date of diagnosis, histopathological diagnosis, FIGO stage and treatment details), as well as vital status obtained from the population registry and updated till January 31st, 2020. Patients with non-epithelial tumors were excluded (N=7), to limit our cohort to EOC. Patients with serous ovarian carcinoma were further classified according to differentiation grade. Poorly (grade 3) and moderately (grade 2) differentiated serous carcinomas were classified as HGSC, while well (grade 1) differentiated serous carcinomas were classified as LGSC. In case the differentiation grade was unknown, we classified serous carcinomas as serous carcinoma not otherwise specified (NOS). Disease status was defined based on treatment history prior to biopsy collection and subsequent treatment. Patients that had received systemic treatment prior to biopsy were regarded as having recurrent disease. Patients that were treatment-naive prior to biopsy were regarded as having primary disease. For these patients, it was confirmed that subsequent treatment consisted of standard first line carboplatin and paclitaxel combination treatment. Response to post-biopsy treatment was registered according to the RECIST criteria for solid tumors based on CT-imaging and ranged from complete response (CR), partial response (PR) and stable disease (SD) to progressive disease (PD). In one case clinical progression was observed, which was regarded as PD.
WGS and data analysis
DNA isolation, WGS, and variant calling (SNV, indel, SV, CNA) were performed as described previously by Priestley et al. on solid tumor biopsies with more than 30% tumor purity (according to histopathological assessment) and matching blood samples10. Tumors were sequenced to a median depth of 103x (range 74-137x) and paired germline reference samples to 38x (range 27-54x) (Fig. S2a, Table S2). Lower tumor purity scores did not result in a decreased variant detection rate (Fig. S2b). In contrast to Priestley et al., we kept multiple samples from individual patients. For five patients, two subsequent timepoints of recurrent disease were included (Table S1, biopsy count). The ploidy value and tumor purity (>20%) for each sample were obtained from PURPLE. For the ploidy fraction we defined all ranges with a copy number between 1.5 and 2.5 as diploid, the range below as haploid and the range above as polyploid.
Gene mutation burden analysis
To find significantly mutated driver genes in our cohort we used the R package dNdSCV 17. We combined the VCFs for each of the five patients that had two samples. We ran dNdSCV on the whole cohort and also on the primary and recurrent subsets. We used a qglobal_cv cut-off of <0.1. To identify genes with driver capacities on a per sample basis, we employed the method described in Priestley et al10. Additionally, we extracted germline mutations in BRCA1 and BRCA2 with expected pathogenic effect, by selecting for frameshift and nonsense variants. Samples with a germline variant in BRCA1 or BRCA2 were additionally assessed for loss of heterozygosity. The top 10 genes that were affected in most of our samples were included in the annotation of the cluster plot, as well as BRCA1 and BRCA2.
Mutational signatures
We counted the occurrence of SNV, indel, and di-nucleotide variation (DNV) contexts that are described at https://cancer.sanger.ac.uk/cosmic/signatures. This included: (i) 96 trinucleotide contexts, which are composed of one of six classes of base substitutions (C>A, C>G, C>T, T>A, T>C, T>G) in combination with the immediate 5’ and 3’ flanking nucleotides, di-nucleotide variation (DNV); (ii) indels stratified by homopolymer length, the presence of flanking repeats or the presence of flanking homology. Additionally, structural variation contexts were extracted by counting translocations, insertions, inversions, deletions and duplications, categorized below and above 10Kb.
Mutational signature analysis was performed with the bioconductor R-package “MutationalPatterns”29. We derived the absolute contribution of the COSMIC v.3 signatures from the mutation context counts for each sample using the “golden ratio search” method. We initially included all COSMIC signatures, and ranked them in a top-list by their overall contribution to our samples. We then evaluated which COSMIC signatures contribute consistently to a high number of our samples in runs with fewer signatures from that top-list (Fig. S4a). We decided on eight SBS, five DBS and five ID COSMIC signatures that contribute consistently to our samples.
Determining HR-status
We assessed HR-status using the Classifier of HOmologous Recombination Deficiency (CHORD), a random forest model that uses the relative counts of various mutation contexts (primarily deletions with flanking microhomology and 1-100kb structural duplications) for predicting HR-deficiency21.
Cluster analysis
As input for the clustering we used the categorized SNV, DNV, indels and SV counts, and the ploidy fraction. Normalization for each feature was done by mean-centering the values and then dividing them by the standard deviation. Features that are not independent from each other were centered and normalized together instead of feature-wise to preserve individual differences, i.e., the SNVs (C>A, C>G, C>T, T>A, T>C, T>G) were processed together, as well as the indels and the SVs. The distance measure used was (1 - pearson correlation) and hierarchical clustering was performed with the “ward.D” method. To identify the optimal number of clusters we employed the elbow method. In the elbow plot (see supplementary plots) no clear elbow was visible (Fig. S5a). We therefore applied bootstrapping using the package “pvclust”30 to determine at which cluster number stable clusters (p-value above 95%) occur (Fig. S5d). We also performed principle component analysis (PCA) on the data and found that seven principal components explained 95% of the variance (Fig. S5b). We decided on seven clusters as the optimal number after assessing the results from the bootstrapping and the PCA.
Actionability analysis
To identify actionable targets per patient, driver genes were explored in three databases: CIViC31, CGI32 and OncoKB33. Both approved (level A - validated association) and experimental (level B - clinical evidence) therapies were extracted, and a further differentiation was made based on on- and off-label availability for ovarian cancer patients. Drugs available as standard treatments for primary and or recurrent ovarian cancer (platinum agents and PARP-inhibitors) were excluded from this actionability analysis.
Statistics
Plotting and statistical analyses were performed using R (software package, version 3.5.2). Kaplan meier plots were created using the R-package Survival (https://cran.r-project.org/web/packages/survival/). Statistical tests were performed with the Wilcoxon rank sum test. Correction for multiple testing was performed with the Bonferroni correction.
Data availability
All data described in this study are freely available for academic use from the Hartwig Medical Foundation through standardized procedures and request forms that can be found at https://www.hartwigmedicalfoundation.nl/en/appyling-for-data/.
Data Availability
All data described in this study are freely available for academic use from the Hartwig Medical Foundation through standardized procedures and request forms that can be found at https://www.hartwigmedicalfoundation.nl/en/appyling-for-data/.
Author contributions
Conceptualization: CW, JK, RZ, EC, PW
Methodology: CW, JK, RZ, EC, PW
Software: CW, JK, AH, LN
Validation: CW, JK
Formal analysis: CW, JK, AH, LN
Investigation: CW, JK
Resources: CW, IB, MJ, PO, CSM, MS, RZ, PW
Data curation: CW, JK
Writing - original draft preparation: CW, EC, RZ, PW
Writing - review and editing: all authors
Visualization: CW, JK
Supervision: WK, RZ, EC, PW
Project administration: CW, JK
Funding acquisition: WK, RZ, PW
Competing Interests Statement
M. Jalving reports the following competing interests: Advisory Board, honoraria to institution: Merck, BMS, Novartis, Pierre Fabre, Tesaro, AstraZeneca. Clinical studies: BMS, AbbVie, Merck, Cristal Therapeutics. The other authors declare no competing interests.
Supplementary data
Figure S1. Treatments before biopsy and recurrence patterns.
Figure S2. WGS quality control and primary-recurrence comparison.
Figure S3. Genome-wide copy number aberrations per histological subtype.
Figure S4. Mutational signatures.
Figure S5. Unsupervised hierarchical clustering.
Figure S6. Distribution of features of the unsupervised hierarchical clustering.
Figure S7. Distribution of SVs and mutational signatures per cluster.
Figure S8. Survival probability per cluster.
Table S1. Clinical data.
Table S2. Genomic data.
Table S3. Significant driver genes on cohort level - dNdSCV.
Table S4. Significant copy number aberration peaks - GISTIC.
Table S5. COSMIC mutational signatures - relative contributions.
Table S6. Driver genes analysis.
Table S7. Actionability analysis.
Acknowledgements
This publication and the underlying data have been made possible partly on the basis of the data that Hartwig Medical Foundation and the Center of Personalized Cancer Treatment (CPCT, The Netherlands) have made available to the study. We thank all local principal investigators, medical specialists and nurses of all participating centers for their contribution to patient inclusion for this study. We thank the members of the Cuppen and van Haaften laboratories for their valuable input. We acknowledge the Dutch Cancer Registration and thank Maaike van der Aa and Reini Bretveld for their help with obtaining additional clinical data. Figure 1a was created with icons from www.flaticon.com. This work was supported by the Gieskes Strijbis Foundation (1816199) and the Dutch Cancer Society (UU2015-7743).