Mono- and bi-allelic effects of coding variants on disease in 176,899 Finns

Identifying Mendelian diseases with recessive inheritance is challenging as the majority of cases are caused by compound heterozygous genotypes which require sequencing data in families to definitively identify. Bottleneck events, such as in the Finnish population, enrich specific homozygous variants to higher frequencies and thus facilitate identification of disease associations through easily recognized homozygous genotypes. Here, we study homozygous and heterozygous effects of 82,516 coding variants on 2,444 disease endpoints using nationwide electronic health record (EHR) data of 176,899 Finns. We find known and novel associations to homozygous genotypes across a broad spectrum of phenotypes such as retinal dystrophy, adult-onset cataract and female infertility (13/20 of which would have been missed by the traditional additive GWAS model). With these results, and supporting simulations, we demonstrate the added benefit of homozygous scans in GWAS. We further use these results to explore inheritance patterns of known Mendelian variants. We find many Mendelian variants whose inheritance cannot be adequately described with the traditional definition of dominant or recessive. In particular, we find disease risk in heterozygous carriers of variants known to cause disease with recessive inheritance, as well as for variants labeled benign in ClinVar. Our results demonstrate how biobank efforts, particularly in founder populations, can broaden our understanding of the impact of genetic variants.


33
Identifying Mendelian diseases with recessive inheritance is challenging as the majority of  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.06.21265920 doi: medRxiv preprint

72
Collectively, these currently 36 "founder diseases" are referred to as the "Finnish disease  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.06.21265920 doi: medRxiv preprint NFE at same sample sizes. As expected, however, observed P/LP variants were at higher 127 MAF in FIN (median MAF 9.2x10 -5 ) than NFE (median MAF = 4.7x10 -5 , p-value = 6x10 -7 ,

128
Wilcoxon rank test), with the difference particularly pronounced in 133 variants in Finnish 129 disease heritage genes 17 (see Supplementary Figure S1    is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.06.21265920 doi: medRxiv preprint data of 26 variants, labeled as P/LP in ClinVar by at least one submitter, with genome-152 wide significant associations in FinnGen (Supplementary Figure S2). We then investigated 153 global effects of known ClinVar variants in a phenome-wide association analysis 154 (pheWAS) of 2,444 disease phenotypes derived from health registry data using SAiGE 27 155 cognizant of the fact that for many rare variants we may only be powered to identify their 156 disease effects with moderate and not genome-wide levels of significance. To characterize 157 the broader impact of these variants, we compared the ClinVar variants to randomly 158 sampled intergenic variants in 15 MAF bins and in the same 3Mb windows using different 159 p-value thresholds. As anticipated, we found significantly more phenotype associations 160 than expected at all p-value thresholds for variants that were labeled as P/LP in ClinVar in 161 genes described to cause disease with dominant inheritance (classification: OMIM). We 162 also found a global association with disease phenotypes for variants listed as benign or 163 likely benign (B/LB) in ClinVar regarded as "not implicated in monogenic disease" 30 and 164 often considered neutral 31 . 16 B/LB variants were even the most probable causal SNP of 165 a GWAS locus following statistical finemapping 24 (see Table 1). The ClinVar annotation   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.06.21265920 doi: medRxiv preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

233
In summary, we find global as well as individual association signals of variants previously

492
We first studied effects of variants previously reported in human disease in 493 ClinVar 28 in > 170k participants of FinnGen. Here, we found that multiple variants