Abstract
Most current studies rely on short-read sequencing to detect somatic structural variation (SV) in cancer genomes. Long-read sequencing offers the advantage of better mappability and long-range phasing, which results in substantial improvements in germline SV detection. However, current long-read SV detection methods do not generalize well to the analysis of somatic SVs in tumor genomes with complex rearrangements, heterogeneity, and aneuploidy. Here, we present Severus: a method for the accurate detection of different types of somatic SVs using a phased breakpoint graph approach. To benchmark various short- and long-read SV detection methods, we sequenced five tumor/normal cell line pairs with Illumina, Nanopore, and PacBio sequencing platforms; on this benchmark Severus showed the highest F1 scores (harmonic mean of the precision and recall) as compared to long-read and short-read methods. We then applied Severus to three clinical cases of pediatric cancer, demonstrating concordance with known genetic findings as well as revealing clinically relevant cryptic rearrangements missed by standard genomic panels.
Competing Interest Statement
S.A. is an employee and stockholder of Oxford Nanopore Technologies. A.K., P.C., K.S., D.C., A.C. are employees of Google LLC and own Alphabet stock as part of the standard compensation package. E.G. served on advisory boards for Jazz Pharmaceuticals and Syndax Pharmaceuticals. M.S.F. is part of the speakers bureau for Bayer and PacBio. The remaining authors declare no competing interests.
Funding Statement
The work was supported in part by the Intramural Research Program of the NIH. This work utilized the computational resources of the NIH HPC Biowulf cluster. (http://hpc.nih.gov). ONT sequencing of the HCC1395 cell line was supported by the National Cancer Institute of the National Institutes of Health under Award Number U01CA253405. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The authors would like to thank the patients and families who donated their samples for this research. M.S.F. and E.G. would like to thank Braden's Hope for Childhood Cancer, Elizabeth and Monte McDowell, the Black & Veatch Foundation, and Big Slick for their generous support. M.S.F., E.G., and L.L. would also like to thank Children's Mercy Oncology Biorepository study personnel: Judy Vun, Amie Hatfield, and Robin Ryan; as well as Jason Seymour and Keiondra Sanders in the Children's Mercy Research Institute (CMRI) Biorepository, for their assistance with sample collection and processing; and Maggie Gibson, Adam Walter, Laura Puckett in the CMRI Genomics Core for their assistance with sequencing. Y.L. is funded by the NCI-UMD Partnership Program. E.K.M. was supported by the State of Maryland. B.P. was supported by the National Human Genome Research Institute (NHGRI) under award numbers R01HG010485, U01HG013748, U24HG011853, U24HG010262, and U41HG010972, and from NIH award OT2OD033761.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
For the cell lines analysis, Institutional Review Board of National Institutes of Health considers patient-derived cell lines as non-human subjects, and no approval was required. There are, however, ethical considerations, as the cell lines were derived prior to establishing the research use consent mechanism, and no such consent was received. Commercially available cell lines used in this study are anonymized, and the risks of identifying original patients or their immediate family members are low. On the other hand, openly releasing this data will significantly benefit research into developing new methods for detecting somatic variants - a critical task in current and future precision cancer therapies. We concluded that the benefits outweigh the risks and followed the practices established by the NCI and NHGRI in the TCGA tumor cell line data release (https://www.cancer.gov/ccg/research/genome-sequencing/tcga/history/ethics-policies). For the three leukemia/lymphoma cases, patients were enrolled by Children's Mercy Hospital (CMH) into its institutional Tumor Bank research study, which was approved by the CMH Institutional Review Board and included patient consent for the collection, processing, storage, and sequencing of patient samples.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes