Analysis of 200,000 exome-sequenced UK Biobank subjects implicates genes involved in increased and decreased risk of hypertension ================================================================================================================================= * David Curtis ## Abstract **Background** Previous analyses have identified common variants along with some specific genes and rare variants which are associated with risk of hypertension but much remains to be discovered. **Methods and Results** Exome-sequenced UK Biobank participants were phenotyped based on having a diagnosis of hypertension or taking anti-hypertensive medication to produce a sample of 66,123 cases and 134,504 controls. Variants with minor allele frequency (MAF) < 0.01 were subjected to a gene-wise weighted burden analysis, with higher weights assigned to variants which are rarer and/or predicted to have more severe effects. Of 20,384 genes analysed, two genes were exome-wide significant, *DNMT3A* and *FES*. Also strongly implicated were *GUCY1A1* and *GUCY1A1*, which code for the subunits of soluble guanylate cyclase. There was further support for the previously reported effects of variants in *NPR1* and protective effects of variants in DBH. An inframe deletion in *CACNAD10* with MAF = 0.005, rs72556363, is associated with modestly increased of hypertension. Other biologically plausible genes highlighted consist of *CSK, AGTR1, ZYX* and *PREP*. All variants implicated were rare and cumulatively they are not predicted to make a large contribution to the population risk of hypertension. **Conclusions** This approach confirms and clarifies previously reported findings and also offers novel insights into biological processes influencing hypertension risk, potentially facilitating the development of improved therapeutic interventions. This research has been conducted using the UK Biobank Resource. Keywords * Hypertension * biobank * exome * *DNMT3A* * *GUCY1A1* * *GUCY1A1* ## Introduction Hypertension is an important risk factor for disease which has a heritable component and a recent large genome wide association study of common variant effects identified 901 loci with enrichment in relevant tissues (blood vessels, heart, adrenal tissue and adipose tissue) and pathways (angiotensinogen, calcium channels, progesterone, natriuretic peptide receptor, angiotensin converting enzyme, angiotensin receptors and endothelin receptors) (1). Selection pressures tend to mean that common variants individually have small effect sizes and it can be difficult to interpret their biological effects (2). By contrast, rare variants can potentially have large effect sizes and clear biological mechanisms as exemplified by a number of monogenic causes of hypertension such as congenital adrenal hyperplasia, familial hyperaldosteronism and pseudohypoaldosteronism, which can be caused by variants in *CYP11B1, CYP11B2, WNK1, WNK4, KLHL3, CUL3, SCNN1B, SCNN1G, CYP17A1, HSD11B2, NR3C2* and *KCNJ5* (3). Additionally, variants in *CACNA1H, CACNA1D* and *CLCN2* have now also been identified as causes of familial hyperaldosteronism while somatic mutations in *ATP1A1* or *ATP2B3* can produce aldosterone-producing adrenal adenomas with consequent hypertension (4). Recessively acting variants in *GUCY1A1* (previously labelled *GUCY1A3*) can cause moyamoya disease and two unrelated subjects with moyamoya disease who also had achalasia and hypertension were found to have compound heterozygote variants in this gene (5). A more recent study has shown that moyamoya disease is itself a risk factor for hypertension (6). Using hypertension or blood pressure as phenotypes, gene-based analyses aggregating rare, nonsynonymous variants implicated *PTMT1*, DBH and *NPR1* in a large meta-analysis and also showed that the minor allele of a rare, nonsynonymous variant in *DBH*, rs3025380, was associated with lower blood pressure (7). Another study reported that three individual nonsynonymous variants in *NPR1* were associated with increased (rs35479618 and rs116245325) and decreased (rs61757359) blood pressure and showed that this could be explained by the effects of these variants on guanylate cyclase activity (8). The growing availability of sequence data means that it may become possible to study the wider effects of rare, functional variants in the general population. This may implicate novel genes or may demonstrate a wider role for genes already implicated in severe familial disorders. Exome sequence data is now available for 200,000 of the 500,000 UK Biobank subjects (9). We have recently analysed this in order to illuminate the effect of rare, coding variants on susceptibility to hyperlipidaemia and a number of other common traits with complex inheritance and we now apply the same approach to study the contribution of rare variants to risk of developing hypertension (10). ## Methods The UK Biobank dataset was downloaded along with the variant call files for 200,632 subjects who had undergone exome-sequencing and genotyping by the UK Biobank Exome Sequencing Consortium using the GRCh38 assembly with coverage 20X at 95.6% of sites on average (9). UK Biobank had obtained ethics approval from the North West Multi-centre Research Ethics Committee which covers the UK (approval number: 11/NW/0382) and had obtained informed consent from all participants. The UK Biobank approved an application for use of the data (ID 51119) and ethics approval for the analyses was obtained from the UCL Research Ethics Committee (11527/001). All variants were annotated using the standard software packages VEP, PolyPhen and SIFT (11–13). To obtain population principal components reflecting ancestry, version 2.0 of *plink* ([https://www.cog-genomics.org/plink/2.0/](https://www.cog-genomics.org/plink/2.0/)) was run with the options --*maf 0*.*1 --pca 20 approx* (14,15). The UK Biobank sample contains 503,317 subjects of whom 94.6% are of white ethnicity. As we have discussed previously, it has become standard practice for investigators to simply discard data from participants with other ancestries and we regard this as regrettable (16). We demonstrated that if population principal components are included as covariates then it is possible to include all participants, regardless of ancestry, in the type of weighted burden analysis described here without inflation of the test statistic. To define cases, a similar approach was used as was previously implemented for the investigation of hyperlipidaemia and T2D (10,17,18). The hypertension phenotype was determined from four sources in the dataset: self-reported diagnosis recorded as hypertension or essential hypertension; reporting taking medication for high blood pressure; reporting taking any of a list of named medications commonly used to treat high blood pressure ([https://www.nhs.uk/conditions/high-blood-pressure-hypertension/](https://www.nhs.uk/conditions/high-blood-pressure-hypertension/)); having an ICD10 diagnosis of essential hypertension, hypertensive heart disease or hypertensive renal disease in hospital records or as a cause of death. Subjects in any of these categories were deemed to be cases with hypertension while all other subjects were taken to be controls. The same analytic methods as had been used previously were applied, with the description repeated here for the reader’s convenience .The SCOREASSOC program was used to carry out a weighted burden analysis to test whether, in each gene, sequence variants which were rarer and/or predicted to have more severe functional effects occurred more commonly in cases than controls. Attention was restricted to rare variants with minor allele frequency (MAF) <= 0.01 in both cases and controls. As previously described, variants were weighted by overall MAF so that variants with MAF=0.01 were given a weight of 1 while very rare variants with MAF close to zero were given a weight of 10 (16). Variants were also weighted according to their functional annotation using the GENEVARASSOC program, which was used to generate input files for weighted burden analysis by SCOREASSOC (19,20). The weights were informed from the analysis of the effects of different categories of variant in *LDLR* on hyperlipidaemia risk (10). Variants predicted to cause complete loss of function (LOF) of the gene were assigned a weight of 100. Nonsynonymous variants were assigned a weight of 5 but if PolyPhen annotated them as possibly or probably damaging then 5 or 10 was added to this and if SIFT annotated them as deleterious then 20 was added. In order to allow exploration of the effects of different types of variant on disease risk the variants were also grouped into broader categories to be used in multivariate analyses as described below. The full set of weights and categories is displayed in Table 1. As described previously, the weight due to MAF and the weight due to functional annotation were multiplied together to provide an overall weight for each variant. Variants were excluded if there were more than 10% of genotypes missing in the controls or if the heterozygote count was smaller than both homozygote counts in the controls. If a subject was not genotyped for a variant then they were assigned the subject-wise average score for that variant. For each subject a gene-wise weighted burden score was derived as the sum of the variant-wise weights, each multiplied by the number of alleles of the variant which the given subject possessed. For variants on the X chromosome, hemizygous males were treated as homozygotes. View this table: [Table 1.](http://medrxiv.org/content/early/2021/02/12/2021.02.10.21251503/T1) Table 1. The table shows the weight which was assigned to each type of variant as annotated by VEP, Polyphen and SIFT as well as the broad categories which were used for multivariate analyses of variant effects (11–13). For each gene, logistic regression analysis was carried out including the first 20 population principal components and sex as covariates and a likelihood ratio test was performed comparing the likelihoods of the models with and without the gene-wise burden score. The statistical significance was summarised as a signed log p value (SLP), which is the log base 10 of the p value given a positive sign if the score is higher in cases and negative if it is higher in controls. Gene set analyses were carried out as before using the 1454 “all GO gene sets, gene symbols” pathways as listed in the file *c5*.*all*.*v5*.**.*symbols*.*gmt* downloaded from the Molecular Signatures Database at [http://www.broadinstitute.org/gsea/msigdb/collections.jsp](http://www.broadinstitute.org/gsea/msigdb/collections.jsp) (21). For each set of genes, the natural logs of the gene-wise p values were summed according to Fisher’s method to produce a chi-squared statistic with degrees of freedom equal to twice the number of genes in the set. The p value associated with this chi-squared statistic was expressed as a minus log10 p (MLP) as a test of association of the set with the hyperlipidaemia phenotype. For selected genes, additional analyses were carried out to clarify the contribution of different categories of variant. As described previously, logistic regression analyses were performed on the counts of the separate categories of variant as listed in Table 1, again including principal components and sex as covariates, to estimate the effect size for each category (10). The odds ratios (ORs) associated with each category were estimated along with their standard errors and the Wald statistic was used to obtain a p value, except for categories in which variants occurred fewer than 50 times in which case Fisher’s exact test was applied to the variant counts. The associated p value was converted to an SLP, again with the sign being positive if the OR was greater than 1, indicating that variants in that category tended to increase risk. Data manipulation and statistical analyses were performed using GENEVARASSOC, SCOREASSOC and R (22). ## Results There were 66,123 cases and 134,504 controls. There were 20,384 genes for which there were qualifying variants. Given that there were 20,384 informative genes, the critical threshold for the absolute value of the SLP to declare a result as formally statistically significant is -log10(0.05/20384) = 5.61 and this was achieved by two genes, *DNMT3A* (SLP = 8.21) and *FES* (SLP = 6.10). The quantile-quantile (QQ) plot for the SLPs obtained for all genes except *DNMT3A* is shown in Figure 1. This shows that the test appears to be well-behaved and conforms well with the expected distribution. Omitting the genes with the 100 highest and 100 lowest SLPs, which might be capturing a real biological effect, the gradient for positive SLPs is 1.096 with intercept at 0.006 and the gradient for negative SLPs is 1.080 with intercept at 0.02, indicating only modest inflation of the test statistic in spite of the fact that participants of all ancestries are included. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/02/12/2021.02.10.21251503/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2021/02/12/2021.02.10.21251503/F1) Figure 1. QQ plot of SLPs obtained for weighted burden analysis of association with hypertension showing observed against expected SLP for each gene, omitting results for *DNMT3A*, which has SLP = 8.21. Table 2 shows all the genes achieving SLP with absolute value greater than 3, equivalent to an uncorrected p value of 0.001. Given that 20,384 genes were tested, one would expect that by chance about 20 would reach this level of significance whereas in fact there are 42. Thus it is possible that some of these highly ranked genes do demonstrate a biological signal which fails to reach statistical significance after correction for multiple testing. For *NPR1*, the analysis was repeated excluding data from the previously reported individually significant variants rs35479618, rs116245325 and rs61757359. This resulted in a reduction of the SLP from 5.14 to 4.38. For *DBH*, the analysis was repeated without rs3025380 and this resulted in a change in SLP from -3.40 to -2.11. The full list of results for all genes is provided in Supplementary Table S1. View this table: [Table 2.](http://medrxiv.org/content/early/2021/02/12/2021.02.10.21251503/T2) Table 2. Genes with absolute value of SLP exceeding 3 or more (equivalent to p<0.001) for test of association of weighted burden score with hypertension. In order to see if any additional genes were highlighted by analysing gene sets, gene set analysis was performed as described above after first dividing the gene-wise log p values by the average inflation factor of 1.09 before combining them using Fisher’s method. Given that 1,454 sets were tested, a critical MLP to achieve to declare results significant after correction for multiple testing would be log10(1454*20) = 4.46 and this was achieved by just one set, labelled CHROMATIN. This set contains 35 genes, including DNMT3A. The sets achieving MLP > 3 are shown in Table 3 and the full results of the gene set analyses are listed in Supplementary Table S2. The results for all sets in Table 3 were examined in more detail and all genes with an absolute value SLP > 1.3 (equivalent to p < 0.05) were listed. All of these sets contained *DNMT3A* and none of the other nominally significant genes appeared to be obviously relevant to hypertension. These results are presented in Supplementary Table S3. For the genes listed in Table 2 which appeared to be of interest, additional multivariate analyses were performed to elucidate the contribution to the overall result from different categories of variant. The results of this analysis for *DNMT3A* are shown in Table 4A. From this it can be seen that the signal comes from disruptive and splice site variants which are predicted to cause LOF and which are between them associated with an OR of about 1.9. However, variants annotated as probably damaging by PolyPhen are also commoner in cases and are associated with an OR of 1.5. Table 4B shows that the result for *FES* is driven by a small number of disruptive variants which are commoner in cases with OR of 2.8. It is striking that both *GUCY1A1* and *GUCY1B1* are ranked among the top 7 genes since they code for subunits of the same guanylate cyclase and their results are shown in full in Table 4C and 4D. This shows that while the effect for *GUCY1A1* is driven by LOF variants these are very rare in *GUCY1B1* and for this gene there seems to be an additional contribution from an excess of 5 prime UTR variants among cases. These occur at 44 different locations and detailed inspection of the output file showed these all to be individually rare, such that there was no single variant which could be seen to making a significant contribution to the overall effect. For the other genes thought to be of interest, Table 5 provides a summary of the results for LOF variants along with any other variant category individually significant at p < 0.05. Full results for analyses of variant categories are presented in Supplementary Table S4. View this table: [Table 4.](http://medrxiv.org/content/early/2021/02/12/2021.02.10.21251503/T3) Table 4. Results from logistic regression analysis showing the effects on risk of hypertension of different categories of variant within *DNMT3A, FES, GUCY1A1* and *GUCY1B1*. Odds ratios for each category are estimated including principal components and sex as covariates. The SLP is also obtained from this multivariate analysis except when there are fewer than 50 variants in a category, when Fisher’s exact test is used instead. View this table: [Table 4A.](http://medrxiv.org/content/early/2021/02/12/2021.02.10.21251503/T4) Table 4A. Results for *DNMT3A*. View this table: [Table 4B.](http://medrxiv.org/content/early/2021/02/12/2021.02.10.21251503/T5) Table 4B. Results for *FES*. View this table: [Table 4C.](http://medrxiv.org/content/early/2021/02/12/2021.02.10.21251503/T6) Table 4C. Results for *GUCY1A1*. View this table: [Table 4D.](http://medrxiv.org/content/early/2021/02/12/2021.02.10.21251503/T7) Table 4D. Results for *GUCY1B1*. View this table: [Table 5.](http://medrxiv.org/content/early/2021/02/12/2021.02.10.21251503/T8) Table 5. Summary results of variant category analysis for additional genes of interest. Results are shown for disruptive and splice site variants and for any other variant categories significant at p < 0.05. Of the previously implicated genes listed in the Introduction, aside from *GUCY1A1* only the following were significant at p < 0.05: *CYP11B1* (SLP = 1.63), *NR3C2* (SLP = 1.67) and *CACNA1H* (SLP = -2.05). Analyses of the variant categories were carried out for these genes and a summary of the results is shown in Table 6, which provides the results for LOF variants and for other categories which were significant at p < 0.05. These are mostly unremarkable and it is difficult to draw firm conclusions although a few results are worth noting. For most genes LOF variants are very rare so that one cannot gain a clear estimate of their effect. However the results for *WNK1, WNK4, CLCN2* and *ATP2B3* suggest that LOF variants in these genes do not have a very major effect on risk of hypertension. The most striking result is that for *CACNA1D* the InDel category produces SLP = 8.10 with OR = 1.30. InDel variants occur at 9 locations in this gene and inspection of the detailed results revealed that 8 of these are very rare so the result reflects the effect of a single inframe deletion, 3:53808664-CCTT>C. This is rs72556363, which results in the loss of a phenylalanine residue, p.Phe1923del, and according to gnomAD it has allele frequency of 0.0043 in non-Finnish Europeans and is extremely rare or absent from other populations. In the UK Biobank subjects we observe a frequency of 0.0049 in controls and 0.0064 in cases. Full results for analyses of variant categories in these genes are presented in Supplementary Table S5. View this table: [Table 6.](http://medrxiv.org/content/early/2021/02/12/2021.02.10.21251503/T9) Table 6. Summary results of variant category analysis for previously implicated genes. Results are shown for disruptive and splice site variants and for any other variant categories significant at p < 0.05. ## Discussion These analyses provide a broad overview of some impacts of very rare genetic variants in a large sample broadly representative of the population. It should be pointed out that analytic approach used assumes that all variants in gene have the same direction of effect. Since any random variant is more likely to impair the function of a gene than to enhance it, weighted burden analysis is not expected to detect effects of gain of function variants because these are likely to be swamped by other variants. Likewise the method is expected to be relatively insensitive at detecting variants which have a purely recessive effect. Of course it is quite possible that the effect of complete loss of function of one gene may be modified by functional variants affecting the other copy of the gene but such complex heterozygous effects are difficult to detect. Additionally, a population sample differs from a specially recruited case-control sample in that it is not enriched for the phenotype in question and hence will have less power to detect either rare recessively acting variants or extremely rare variants with a dominant effect. Thus, we expect that there may well be important rare variant effects in addition to the ones which this study has highlighted. Another contrast with specifically defined case-control studies is that the phenotype needs to be derived from measures which are provided and in order to improve power it is desirable to use information which is available for a large number of participants. The phenotype used here attempts to reflect a clinical diagnosis of hypertension but of course will not do this as accurately as could be achieved in a specifically ascertained sample. Some participants may be on antihypertensive medication for indications other than hypertension, some may have undiagnosed hypertension and for some the diagnosis may simply be incorrect. Measured blood pressure itself was not used to inform the phenotype, in part because it might reflect blood pressure on medication and in part because single measurements may be less informative than indirectly relying on the clinical decisions which have fed into assigning a diagnosis or starting a prescription. The advantages of having a large sample to some extent balance the disadvantage of a less accurate phenotype. When considering the contribution of risk within the population, we should note that the variants involved are rare. Although a MAF threshold of 0.01 was used, the majority of variants analysed are very much rarer than this and for the variant categories with the most severe consequences the cumulative frequency of variants in the category is also low. This means that few subjects will carry more than one variant with a severe consequence and we can say that the mean count of variants of a particular category is a good approximation for the proportion of the subjects who carry a variant of that category. The result for *DNMT3A* seems unlikely to be due to chance since it would remain significant at p = 0.00016 even after correction for the number of genes tested. The results show that LOF variants in this gene are associated with nearly doubling the OR for hypertension and are present in about 1 in a thousand people, while a slightly larger number will have a variant annotated as probably damaging by PolyPhen which moderately increases hypertension risk. *DNMT3A* is a DNA methyltransferase and nonsynonymous variants in it have been reported as causes of two different syndromes, Tatton-Brown-Rahman syndrome with overgrowth and intellectual disability and Heyn-Sproul-Jackson syndrome with microcephalic dwarfism, neither of which has hypertension as part of the phenotype (23,24). In a series of 210 patients with an overgrowth syndrome similar to Sotos syndrome but with no *NSD1* mutation, four had de novo nonsynonymous mutations in DNMT3A and two had stop variants (25). One of the stop variants was inherited from a normal mother in whom it was thought a somatic mutation had occurred and for the other the father’s DNA was not available. Given the frequency of LOF variants in *DNMT3A* observed in our samples, it seems possible that the observation of two patients with stop variants was coincidental. Thus it may be that, while certain specific nonsynonymous variants can cause severe phenotypes, generally reduced functioning of *DNMT3A* does not cause marked problems but is associated with increased risk of hypertension by mechanisms which are at present unclear. Although *FES* (SLP = 6.10) only just reaches criteria for exome-wide significance, confidence in this result is somewhat increased by the fact that a nearby SNP, rs2521501, shows robust evidence for association (26). However the mechanisms by which it might influence hypertension risk are unclear as it codes for a tyrosine kinase which is involved in various signalling pathways and which may have a role in haematopoiesis and regulating the innate immune response (27). The results for *GUCY1A1* (SLP = 5.54) and *GUCY1B1* (SLP = 3.92) are more compelling, given that they code for two different subunits of the same protein, soluble guanylate cyclase, and given that recessively acting variants in *GUCY1A1* have previously been reported in cases of moyamoya disease with hypertension (5). Soluble guanylate cyclase is responsible for detecting NO signalling in order to produce vasodilation and other responses, and the central role of this pathway in the control of blood pressure is well-established from animal studies while guanylate cyclase stimulators have been developed as treatments for pulmonary hypertension (28). The findings reported here are the first to directly demonstrate that impaired functioning of either of these genes represents a risk factor for systemic hypertension in the general population, with nearly 1 in a thousand people carrying a LOF variant in one of them which approximately doubles the OR for hypertension. The results for *NPR1* (SLP = 5.14) represent a replication of the previously reported findings and confirm that, although certain nonsynonymous variants such as rs61757359 may be associated with reduced blood pressure, in general variants which impair the functioning of this gene increase the risk of hypertension (7,8). Around 1 in 500 people carries a variant annotated by PolyPhen as probably damaging and overall such variants are associated with a modest increase in risk with OR = 1.32 (1.05 - 1.66). *SMAD6* (SLP = 4.10) has a role in signalling pathways and has not previously been clearly implicated in hypertension risk although a recent report describes how exome sequencing of 37 children with renovascular hypertension revealed a frameshift variant classified as likely pathogenic variant in *SMAD6* in one patient (29). *SMAD6* variants are known to predispose to cardiovascular malformations including bicuspid aortic valve related aortopathy (30–32). The results reported here suggest that LOF variants in this gene may have a moderate effect on increased risk of hypertension in the general population. Recessively acting variants in *IFT172* (SLP = 3.39) can cause ciliopathies and there is a report of a child with compound heterozygous variant who presented with growth retardation and subsequently developed retinopathy, metaphyseal dysplasia and, at the age of 11, hypertension (33). However there does not seem to be other evidence to implicate *IFT172* in hypertension risk so this result would require replication in other samples. *CSK* (SLP = 3.37) is located in the 15q24 locus which, as reviewed recently, is implicated by multiple GWAS for hypertension (34). Following up the results of eQTL analyses, these authors demonstrated that mice with gene-silencing or haploinsufficiency of *Csk* had increased blood pressure and showed that this effect could be moderated by PP2, an inhibitor of Src. Although the results reported here are not formally significant after correction for multiple testing, the additional support provided by these GWAS findings and functional studies does suggest that variants in CSK, in particular those annotated as deleterious by SIFT, might be a risk factor for hypertension. The results for *DBH* (SLP = -3.40) provide further support for the previously reported findings that variants in this gene are associated with blood pressure (7). In particular, the results suggest that variants annotated as deleterious by SIFT are on average associated with a slightly reduced risk of developing hypertension. Although these variants are individually rare, about 1 person in 20 carries one of them. *AGTR1* (SLP = -3.77) codes for a receptor for angiotensin II so it seems very plausible from a biological point of view that variants impairing its function might be protective against hypertension in spite of the fact that no association with common variants has been detected (35). The results suggest that very rare gene disruptive variants can about halve the OR for hypertension. *ZYX* (SLP = -3.83) is potentially of interest because it codes for zyxin, the protein responsible for sensing stretch in endothelial cells and vascular smooth muscle cells, as occurs in hypertension, and mediating their response to this by changing the expression of other genes (36). The findings reported here suggest that impaired functioning of this gene may reduce risk of hypertension. *PREP* (SLP = -5.03) narrowly fails to meet conventional criteria for exome-wide significance but is clearly of interest because its product, prolyl endopeptidase, also known as prolyl oligopeptidase or post-proline cleaving enzyme, has recently been shown to be responsible for converting circulating angiotensin II to angiotensin-(1-7) in the circulation and in lungs, a process which is largely independent of ACE2, which carries out this conversion in the kidney (37). The results we report suggest that rare, functional variants in *PREP* are protective against hypertension but it is not clear which categories of variant are responsible and although LOF variants are commoner in controls they are too rare for conclusions to be drawn. In mice, loss of this gene results in reduced ability to metabolise ACE2 and a more hence a more prolonged systemic hypertensive response to exogenously administered ACE2 (37). While it is not obvious why impaired functioning of this gene might be protective against hypertension these findings do seem worthy of further exploration. The finding that an inframe deletion in *CACNAD10*, rs72556363, is associated with increased of hypertension is consistent with reports that very rare germline and somatic variants in this gene can result in aldosterone-producing adenomas and primary aldosteronism (38). Although the variant varies in frequency between populations, essentially being restricted to those with European ancestry, this result does not seem likely to be due to an artefact of population stratification because the frequency in controls is similar to that reported in non-Finnish Europeans whereas the frequency in cases is even higher. It seems to represent a modest risk factor for hypertension without producing severe hyperaldosteronism which is found in about 1% of subjects with European ancestry. Overall, these analyses provide an overview of some of the impacts rare, coding variants may have on the risk of hypertension in the general population. The validity of some novel findings will become clearer when exome sequence data is released for the remaining 300,000 UK Biobank participants or if they can be tested in other samples or followed up in functional studies. All the variants implicated are very rare and in view of their effect sizes arguably do not make an important contribution to risk from a public health point of view. Nor are they probably helpful as individual measures of risk, partly because they can only be detected by sequencing. Although it may be reasonable to assume that LOF variants within a given gene will tend to have a similar effect on phenotype, the same cannot be said of nonsynonymous variants and even within a given category the effect of such variants is likely to be vary considerably. Thus, for variants which are individually extremely rare it is in general not possible to make a clear interpretation regarding their likely effect. The main value of these findings is probably in highlighting genes and biological pathways of relevance in order ultimately to inform improved therapeutic approaches. ## Data Availability The raw data is available on application to UK Biobank. Detailed results with variant counts cannot be made available because they might be used for subject identification. Scripts and relevant derived variables will be deposited in UK Biobank. Software and scripts used to carry out the analyses are also available at https://github.com/davenomiddlenamecurtis. ## Conflicts of interest The author declares he has no conflict of interest. ## Data availability The raw data is available on application to UK Biobank. Detailed results with variant counts cannot be made available because they might be used for subject identification. Scripts and relevant derived variables will be deposited in UK Biobank. Software and scripts used to carry out the analyses are also available at [https://github.com/davenomiddlenamecurtis](https://github.com/davenomiddlenamecurtis). ## Acknowledgments This research has been conducted using the UK Biobank Resource. The author wishes to acknowledge the staff supporting the High Performance Computing Cluster, Computer Science Department, University College London. This work was carried out in part using resources provided by BBSRC equipment grant BB/R01356X/1. The author wishes to thank the participants who volunteered for the UK Biobank project. * Received February 10, 2021. * Revision received February 10, 2021. * Accepted February 12, 2021. * © 2021, Posted by Cold Spring Harbor Laboratory The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission. ## References 1. 1.Evangelou E, Warren HR, Mosen-Ansorena D, Mifsud B, Pazoki R, Gao H, et al. Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits. Nat Genet. 2018 Oct 1;50(10):1412–25. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0205-x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30224653&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F02%2F12%2F2021.02.10.21251503.atom) 2. 2.Wang Y, Wang J-G. Genome-Wide Association Studies of Hypertension and Several Other Cardiovascular Diseases. Pulse. 2018;6(3–4):169–86. 3. 3.Ehret GB, Caulfield MJ. Genes for blood pressure: An opportunity to understand hypertension. Vol. 34, European Heart Journal. Eur Heart J; 2013. p. 951–61. 4. 4.Scholl UI, Stölting G, Schewe J, Thiel A, Tan H, Nelson-Williams C, et al. CLCN2 chloride channel mutations in familial hyperaldosteronism type II. Nat Genet. 2018 Mar 1;50(3):349– 54. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0048-5&link_type=DOI) 5. 5.Wallace S, Guo DC, Regalado E, Mellor-Crummey L, Bamshad M, Nickerson DA, et al. Disrupted nitric oxide signaling due to GUCY1A3 mutations increases risk for moyamoya disease, achalasia and hypertension. Clin Genet. 2016 Oct 1;90(4):351–60. 6. 6.Lee J, Kim SK, Kang HG, Ha IS, Wang KC, Lee JY, et al. High prevalence of systemic hypertension in pediatric patients with moyamoya disease years after surgical treatment. J Neurosurg Pediatr. 2020 Nov 8;25(2):131–7. 7. 7.Liu C, Kraja AT, Smith JA, Brody JA, Franceschini N, Bis JC, et al. Meta-analysis identifies common and rare variants influencing blood pressure and overlapping with metabolic trait loci. Nat Genet. 2016 Oct 1;48(10):1162–70. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3660&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27618448&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F02%2F12%2F2021.02.10.21251503.atom) 8. 8.Vandenwijngaert S, Ledsky CD, Lahrouchi N, Khan MAF, Wunderer F, Ames L, et al. Blood Pressure-Associated Genetic Variants in the Natriuretic Peptide Receptor 1 Gene Modulate Guanylate Cyclase Activity. Circ Genomic Precis Med. 2019 Aug 1;12(8):e002472. 9. 9.Szustakowski JD, Balasubramanian S, Sasson A, Khalid S, Bronson PG, Kvikstad E, et al. Advancing Human Genetics Research and Drug Discovery through Exome Sequencing of the UK Biobank. medRxiv. 2020 Jan 1;2020.11.02.20222232. 10. 10.Curtis D. Analysis of 200,000 exome-sequenced UK Biobank subjects illustrates the contribution of rare genetic variants to hyperlipidaemia. medRxiv. 2021; 11. 11.McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016 Jun 6;17(1):122. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13059-016-0974-4&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27268795&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F02%2F12%2F2021.02.10.21251503.atom) 12. 12.Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet. 2013 Jan;7 Unit7.20. 13. 13.Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009 Jun 25;4(8):1073–81. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nprot.2009.86&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19561590&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F02%2F12%2F2021.02.10.21251503.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000268858700007&link_type=ISI) 14. 14.Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015 Dec 25;4(1):7. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13742-015-0047-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25722852&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F02%2F12%2F2021.02.10.21251503.atom) 15. 15.Galinsky KJ, Bhatia G, Loh PR, Georgiev S, Mukherjee S, Patterson NJ, et al. Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia. Am J Hum Genet. 2016 Mar 3;98(3):456–72. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2015.12.022&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26924531&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F02%2F12%2F2021.02.10.21251503.atom) 16. 16.Curtis D. Multiple Linear Regression Allows Weighted Burden Analysis of Rare Coding Variants in an Ethnically Heterogeneous Population. Hum Hered. 2021 Jan 7;1–10. 17. 17.Curtis D. A weighted burden test using logistic regression for integrated analysis of sequence variants, copy number variants and polygenic risk score. Eur J Hum Genet. 2019 Jan 26;27(1):114–24. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41431-018-0272-6&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F02%2F12%2F2021.02.10.21251503.atom) 18. 18.Curtis D. Weighted burden analysis in 200,000 exome-sequenced subjects characterises rare variant effects on risk of type 2 diabetes. medRxiv. 2021 Jan 21;2021.01.08.21249453. 19. 19.Curtis D. A rapid method for combined analysis of common and rare variants at the level of a region, gene, or pathway. Adv Appl Bioinform Chem. 2012;5:1–9. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22888262&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F02%2F12%2F2021.02.10.21251503.atom) 20. 20.Curtis D. Pathway analysis of whole exome sequence data provides further support for the involvement of histone modification in the aetiology of schizophrenia. Psychiatr Genet. 2016;26:223–7. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/YPG.0000000000000132&link_type=DOI) 21. 21.Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTAyLzQzLzE1NTQ1IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDIvMTIvMjAyMS4wMi4xMC4yMTI1MTUwMy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 22. 22.R Core Team. R: A language and environment for statistical computing. Vienna, Austria., Austria.: R Foundation for Statistical Computing; 2014. 23. 23.Tatton-Brown K, Seal S, Ruark E, Harmer J, Ramsay E, Del Vecchio Duarte S, et al. Mutations in the DNA methyltransferase gene DNMT3A cause an overgrowth syndrome with intellectual disability. Nat Genet. 2014 Mar 9;46(4):385–8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2917&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24614070&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F02%2F12%2F2021.02.10.21251503.atom) 24. 24.Heyn P, Logan C V., Fluteau A, Challis RC, Auchynnikava T, Martin CA, et al. Gain-of-function DNMT3A mutations cause microcephalic dwarfism and hypermethylation of Polycomb-regulated regions. Nat Genet. 2019 Jan 1;51(1):96–105. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0274-x&link_type=DOI) 25. 25.Tlemsani C, Luscan A, Leulliot N, Bieth E, Afenjar A, Baujat G, et al. SETD2 and DNMT3A screen in the Sotos-like syndrome French cohort. J Med Genet. 2016 Nov 1;53(11):743–51. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6OToiam1lZGdlbmV0IjtzOjU6InJlc2lkIjtzOjk6IjUzLzExLzc0MyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzAyLzEyLzIwMjEuMDIuMTAuMjEyNTE1MDMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 26. 26.Ehret GB, Munroe PB, Rice KM, Bochud M, Johnson AD, Chasman DI, et al. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature. 2011 Oct 6;478(7367):103–9. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature10405&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21909115&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F02%2F12%2F2021.02.10.21251503.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000295575400043&link_type=ISI) 27. 27.Zirngibl RA, Senis Y, Greer PA. Enhanced Endotoxin Sensitivity in Fps/Fes-Null Mice with Minimal Defects in Hematopoietic Homeostasis. Mol Cell Biol. 2002 Apr 15;22(8):2472–86. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoibWNiIjtzOjU6InJlc2lkIjtzOjk6IjIyLzgvMjQ3MiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzAyLzEyLzIwMjEuMDIuMTAuMjEyNTE1MDMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 28. 28.Buys E, Sips P. New insights into the role of soluble guanylate cyclase in blood pressure regulation. Vol. 23, Current Opinion in Nephrology and Hypertension. NIH Public Access; 2014. p. 135–42. 29. 29. DHHM Viering, Chan MMY, Hoogenboom L, Iancu D, JHF de Baaij, Tullus K, et al. Genetics of renovascular hypertension in children. J Hypertens. 2020 Oct 1;38(10):1964–70. 30. 30.Tan HL, Glen E, Töpf A, Hall D, O’Sullivan JJ, Sneddon L, et al. Nonsynonymous variants in the SMAD6 gene predispose to congenital cardiovascular malformation. Hum Mutat. 2012 Apr;33(4):720–7. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/humu.22030&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22275001&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F02%2F12%2F2021.02.10.21251503.atom) 31. 31.Luyckx I, MacCarrick G, Kempers M, Meester J, Geryl C, Rombouts O, et al. Confirmation of the role of pathogenic SMAD6 variants in bicuspid aortic valve-related aortopathy. Eur J Hum Genet. 2019 Jul 1;27(7):1044–53. 32. 32.Gillis E, Kumar AA, Luyckx I, Preuss C, Cannaerts E, Beek G van de, et al. Candidate gene resequencing in a large bicuspid aortic valve-associated thoracic aortic aneurysm cohort: SMAD6 as an important contributor. Front Physiol. 2017 Jun 13;8(JUN). 33. 33.Lucas-Herald AK, Kinning E, Iida A, Wang Z, Miyake N, Ikegawa S, et al. A Case of Functional Growth Hormone Deficiency and Early Growth Retardation in a Child With IFT172 Mutations. J Clin Endocrinol Metab. 2015 Apr 1;100(4):1221–4. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1210/jc.2014-3852&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25664603&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F02%2F12%2F2021.02.10.21251503.atom) 34. 34.Lee HJ, Kang JO, Kim SM, Ji SM, Park SY, Kim ME, et al. Gene silencing and haploinsufficiency of Csk increase blood pressure. PLoS One. 2016 Jan 11;11(1). 35. 35.Ji LD, Li JY, Yao BB, Cai XB, Shen QJ, Xu J. Are genetic polymorphisms in the renin-angiotensin-aldosterone system associated with essential hypertension? Evidence from genome-wide association studies. Vol. 31, Journal of Human Hypertension. Nature Publishing Group; 2017. p. 695–8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/jhh.2017.29&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28425437&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F02%2F12%2F2021.02.10.21251503.atom) 36. 36.Ghosh S, Kollar B, Nahar T, Suresh Babu S, Wojtowicz A, Sticht C, et al. Loss of the mechanotransducer zyxin promotes a synthetic phenotype of vascular smooth muscle cells. J Am Heart Assoc. 2015 Jun 12;4(6):e001712. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NToiYWhhb2EiO3M6NToicmVzaWQiO3M6MTE6IjQvNi9lMDAxNzEyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDIvMTIvMjAyMS4wMi4xMC4yMTI1MTUwMy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 37. 37.Serfozo P, Wysocki J, Gulua G, Schulze A, Ye M, Liu P, et al. Ang II (Angiotensin II) Conversion to Angiotensin-(1-7) in the Circulation Is POP (Prolyloligopeptidase)-Dependent and ACE2 (Angiotensin-Converting Enzyme 2)-Independent. Hypertens (Dallas, Tex 1979). 2020 Jan 1;75(1):173–82. 38. 38.Scholl UI, Goh G, Stölting G, De Oliveira RC, Choi M, Overton JD, et al. Somatic and germline CACNA1D calcium channel mutations in aldosterone-producing adenomas and primary aldosteronism. Nat Genet. 2013 Sep;45(9):1050–4. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2695&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23913001&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F02%2F12%2F2021.02.10.21251503.atom)