Data Availability
For our discovery cohort, we used case data from TCGA and control data from twelve population-based studies in the database of Genotypes and Phenotypes (dbGaP) (http://www.ncbi.nlm.nih.gov/gap). We downloaded TCGA germline WES bam files from National Cancer Institute Cancer Genomics Hub (cgHub), a predecessor to the Genomic Data Commons (https://portal.gdc.cancer.gov) which is no longer online. We extracted control fastq files from the NCBI Short Read Archive (SRA) for the following dbGaP studies:phs000209, phs000276, phs000296, phs000298, phs000424, phs000654, phs000687, phs000806, phs000876, phs000971, phs001000 and phs001101. For replication, we used the exome calls from BioMe Biobank of Icahn School of Medicine at Mount Sinai (ISMMS).
http://www.ncbi.nlm.nih.gov/gap
http://biomebiobank.mssm.edu:8080/biobank/service/biobankHome.jsp