PT - JOURNAL ARTICLE AU - Ryan L. Collins AU - Joseph T. Glessner AU - Eleonora Porcu AU - Lisa-Marie Niestroj AU - Jacob Ulirsch AU - Georgios Kellaris AU - Daniel P. Howrigan AU - Selin Everett AU - Kiana Mohajeri AU - Xander Nuttle AU - Chelsea Lowther AU - Jack Fu AU - Philip M. Boone AU - Farid Ullah AU - Kaitlin E. Samocha AU - Konrad Karczewski AU - Diane Lucente AU - Epi25 Consortium AU - James F. Gusella AU - Hilary Finucane AU - Ludmilla Matyakhina AU - Swaroop Aradhya AU - Jeanne Meck AU - Dennis Lal AU - Benjamin M. Neale AU - Jennelle C. Hodge AU - Alexandre Reymond AU - Zoltan Kutalik AU - Nicholas Katsanis AU - Erica E. Davis AU - Hakon Hakonarson AU - Shamil Sunyaev AU - Harrison Brand AU - Michael E. Talkowski TI - A cross-disorder dosage sensitivity map of the human genome AID - 10.1101/2021.01.26.21250098 DP - 2021 Jan 01 TA - medRxiv PG - 2021.01.26.21250098 4099 - http://medrxiv.org/content/early/2021/01/28/2021.01.26.21250098.short 4100 - http://medrxiv.org/content/early/2021/01/28/2021.01.26.21250098.full AB - Rare deletions and duplications of genomic segments, collectively known as rare copy number variants (rCNVs), contribute to a broad spectrum of human diseases. To date, most disease-association studies of rCNVs have focused on recognized genomic disorders or on the impact of haploinsufficiency caused by deletions. By comparison, our understanding of duplications in disease remains rudimentary as very few individual genes are known to be triplosensitive (i.e., duplication intolerant). In this study, we meta-analyzed rCNVs from 753,994 individuals across 30 primarily neurological disease phenotypes to create a genome-wide catalog of rCNV association statistics across disorders. We discovered 114 rCNV-disease associations at 52 distinct loci surpassing genome-wide significance (P=3.72×10−6), 42% of which involve duplications. Using Bayesian fine-mapping methods, we further prioritized 38 novel triplosensitive disease genes (e.g., GMEB2 in brain abnormalities), including three known haploinsufficient genes that we now reveal as bidirectionally dosage sensitive (e.g., ANKRD11 in growth abnormalities). By integrating our results with prior literature, we found that disease-associated rCNV segments were enriched for genes constrained against damaging coding variation and identified likely dominant driver genes for about one-third (32%) of rCNV segments based on de novo mutations from exome sequencing studies of developmental disorders. However, while the presence of constrained driver genes was a common feature of many pathogenic large rCNVs across disorders, most of the rCNVs showing genome-wide significant association were incompletely penetrant (mean odds ratio=11.6) and we also identified two examples of noncoding disease-associated rCNVs (e.g., intronic CADM2 deletions in behavioral disorders). Finally, we developed a statistical model to predict dosage sensitivity for all genes, which defined 3,006 haploinsufficient and 295 triplosensitive genes where the effect sizes of rCNVs were comparable to deletions of genes constrained against truncating mutations. These dosage sensitivity scores classified disease genes across molecular mechanisms, prioritized pathogenic de novo rCNVs in children with autism, and revealed features that distinguished haploinsufficient and triplosensitive genes, such as insulation from other genes and local cis-regulatory complexity. Collectively, the cross-disorder rCNV maps and metrics derived in this study provide the most comprehensive assessment of dosage sensitive genomic segments and genes in disease to date and set the foundation for future studies of dosage sensitivity throughout the human genome.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThese studies were supported by the National Institutes of Health grants HD081256, NS093200, HD096326, and MH106826. R.L.C. was supported by NHGRI T32HG002295 and NSF GRFP #2017240332. H.B. was supported by NIDCR K99DE026824. This work was supported by grants from the Swiss National Science Foundation (31003A_182632 to A.R. and 310030-189147, 32473B-166450 to Z.K.). M.E.T. was supported by Desmond and Ann Heathwood.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:This study was approved by the Partners Healthcare Institutional Review Board Protocol #2013P000323. Data from the UK BioBank was accessed via application #50765 (PI: Talkowski), and data from the Simons Foundation for Autism Research Initiative was accessed via SFARIbase application #573206All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesCode Availability: All code used in this study has been provided in a single repository on GitHub (https://github.com/talkowski-lab/rCNV2). Where applicable, scripts have been provided with documentation and help text. We also have provided a Docker image hosted on DockerHub (https://hub.docker.com/r/talkowski/rcnv) and Google Container Registry (https://gcr.io/gnomad-wgs-v2-sv/rcnv) that contains all dependencies necessary to execute the code identically as presented in this study. Data Availability: Most data generated in this study, including summary statistics from association tests, have been provided as Supplemental Tables or Supplemental Files. Large Supplemental Data Files have been temporarily hosted in a public Google Cloud Storage Bucket until formal publication in a peer-reviewed journal, as described in the Supplemental Information. Data from existing publications or public resources can be accessed according to their original source, as described in the corresponding Methods section detailing their curation. All other data not otherwise described here or in the Methods will be made available upon request. https://storage.googleapis.com/rcnv_project/public/collins_medrxiv_2021/sliding_window_sumstats.tar.gz https://storage.googleapis.com/rcnv_project/public/collins_medrxiv_2021/gene_based_sumstats.tar.gz