ABSTRACT
Mutational signature analysis is commonly performed in genomic studies surveying cancer and normal somatic tissues. Here we present SigProfilerExtractor, an automated tool for accurate de novo extraction of mutational signatures for all types of somatic mutations. Benchmarking with a total of 33 distinct scenarios encompassing 1,900 simulated signatures operative in more than 60,000 unique synthetic genomes demonstrates that SigProfilerExtractor outperforms thirteen other tools across all datasets with and without noise. For simulations with 5% noise, reflecting high-quality genomic datasets, SigProfilerExtractor outperforms other approaches by elucidating between 20% and 50% more true positive signatures while yielding more than 5-fold less false positive signatures. Applying SigProfilerExtractor to 2,778 whole-genome sequenced cancers reveals three previously missed mutational signatures. Two of the signatures are confirmed in independent cohorts with one of these signatures associating with tobacco smoking. In summary, this report provides a reference tool for analysis of mutational signatures, a comprehensive benchmarking of bioinformatics tools for extracting mutational signatures, and several novel mutational signatures including a signature putatively attributed to direct tobacco smoking mutagenesis in bladder cancer and in normal bladder epithelium.
Competing Interest Statement
MV is an employee of NVIDIA corporation. BSA and LBA are inventors of a US Patent 10,776,718 for source identification by non-negative matrix factorization. All other authors declare no competing interests.
Footnotes
Benchmarking of 3 additional tools (total of 13 benchmarked tools), several additional analyses, and minor updates throughout the manuscript.