Abstract
Despite the abundance of somatic structural variations (SVs) in cancer, the underlying molecular mechanisms of their formation remain unclear. Here, we use 6,193 whole-genome sequenced tumors to study the contributions of transcription and DNA replication collisions to genome instability. After deconvoluting robust SV signatures in three independent pan-cancer cohorts, we detect transcription-dependent replicated-strand bias, the expected footprint of transcription-replication collision (TRC), in large tandem duplications (TDs). Large TDs are abundant in female-enriched, upper gastrointestinal tract and prostate cancers. They are associated with poor patient survival and mutations in TP53, CDK12, and SPOP. Upon inactivating or suppressing CDK12, cells display significantly more TRCs and R-loops. Inhibition of WEE1, a cell cycle regulator that promotes DNA repair, selectively inhibits the growth of cells with loss of CDK12. Our data suggest that large TDs in cancer form due to TRC, and their presence can be used as a biomarker for prognosis and treatment.
Competing Interest Statement
A. Ashworth reports personal fees from Tango Therapeutics, Azkarra Therapeutics, Ovibio, Kytarro, Cytomx, Cambridge Science Corporation, Genentech, Gladiator, Circle, Bluestar, Earli, Ambagon, Trial Library, Phoenix Molecular Designs, GSK, Prolynx; grants from SPARC, and AstraZeneca outside the submitted work; in addition, he holds patents on the use of PARP inhibitors held jointly with AstraZeneca from which he has benefited financially (and may do so in the future). F.Y. Feng reports personal fees from Bluestar Genomics, Astellas, Foundation Medicine, Exact Sciences, Tempus, POINT Biopharma, Janssen, Bayer, Myovant, Roivant, SerImmune, Bristol Meyers Squibb, Novartis, and personal fees from POINT Biopharma outside the submitted work and other support from Artera. J. Chou reports consulting fees from Exai Bio outside the submitted work.
Funding Statement
The work was supported by the University of Chicago and UChicago Comprehensive Cancer Center (L.Y.). The work was also supported by Young Investigator Awards from the Prostate Cancer Foundation (to J.C. and H.L.), a Challenge Award from the Prostate Cancer Foundation (to F.Y.F.), the Department of Defense Physician's Research Award (W81XWH-20-1-0136 to J.C.), as well as funds from the Benioff Initiative for Prostate Cancer Research at UCSF (to J.C., M.K., F.Y.F. and A.A.) and the Martha and Bruce Atwater Breast Cancer Research Program at UCSF (to M.K.).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
All data are publicly available prior to this manuscript.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
The patient cohorts used in this study are summarized in Supplementary Table S1. There were 2,583 primary tumors in the PCAWG cohort 33, 2,367 metastatic tumors in the Hartwig Medical Foundation cohort 34, 570 metastatic tumors in the POG570 cohort 35, 532 primary breast cancers (BRCA-EU), 115 primary ovarian cancers (OV-AU) and 218 primary prostate cancers (PRAD-UK). Somatic variants (SNVs, CNVs and SVs), gene expression quantifications and clinical information (tumor histology and patient survival) were obtained from https://dcc.icgc.org/pcawg for PCAWG cohort, https://hartwigmedical.github.io for Hartwig cohort, https://www.bcgsc.ca/downloads/POG570/ for POG570 cohort, https://dcc.icgc.org/ for OV-AU and PRAD-UK cohorts. Gene expression data of BRCA-EU were provided by Dr. Marcel Smid. A subset of breast cancers in the PCAWG cohort had molecular subtypes annotated in a previous study 67. Gene expression quantifications from tumor adjacent normal samples of the Cancer Genome Atlas (TCGA) cohort were downloaded from Genomic Data Commons (https://portal.gdc.cancer.gov/). Somatic SVs (CCLE2019 release) in 328 cancer cell lines and drug response data (PRISM Repurposing 19Q4) 54 were downloaded from the Cancer Dependency Portal (https://depmap.org/portal/download/all/). Human reference genome assembly GRCh37 (hg19) was used in the entire study. Reference genome mappability was obtained from https://bismap.hoffmanlab.org/. Oncogenes were obtained from the COSMIC cancer gene census (https://cancer.sanger.ac.uk/census). GC content was calculated using bedtools for 50bp windows flanking both sides of each SV breakpoint. CpG islands, centromere, telomere, repeat annotation, and Lamina associated domains (LAD) for human Tig3 lung fibroblasts were downloaded from UCSC Genome Table Browser (https://genome.ucsc.edu/). Non-B DNA annotation was downloaded from non-B DB 68 (https://nonb-abcc.ncifcrf.gov/). ENCODE Chromatin Immunoprecipitation Sequencing (ChIP-Seq) data for H3k27ac, H3k27me3, H3k36me3, H3k4me1, H3k4me2, H3k4me3, H3k79me2, H3k9ac, H3k9me3, H4k20me1 of GM12878 cell line were downloaded from the UCSC composite track (http://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeUwHistone). Nucleosome occupancy for the K562 cell line was downloaded from ENCODE (https://www.encodeproject.org/files/ENCFF000VNN/). Topologically associating domain (TAD) boundaries of GM12878 69 were downloaded from NCBI (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63525). WES data for CDK12 KO clones are deposited in National Institutes of Health (NIH) Sequence Read Archive (SRA) under BioProject ID PRJNA932332.