Abstract
Background Previous analyses have identified common variants along with some specific genes and rare variants which are associated with risk of hypertension but much remains to be discovered.
Methods and Results Exome-sequenced UK Biobank participants were phenotyped based on having a diagnosis of hypertension or taking anti-hypertensive medication to produce a sample of 66,123 cases and 134,504 controls. Variants with minor allele frequency (MAF) < 0.01 were subjected to a gene-wise weighted burden analysis, with higher weights assigned to variants which are rarer and/or predicted to have more severe effects. Of 20,384 genes analysed, two genes were exome-wide significant, DNMT3A and FES. Also strongly implicated were GUCY1A1 and GUCY1B1, which code for the subunits of soluble guanylate cyclase. There was further support for the previously reported effects of variants in NPR1 and protective effects of variants in DBH. An inframe deletion in CACNA1D with MAF = 0.005, rs72556363, is associated with modestly increased risk of hypertension. Other biologically plausible genes highlighted consist of CSK, AGTR1, ZYX and PREP. All variants implicated were rare and cumulatively they are not predicted to make a large contribution to the population risk of hypertension.
Conclusions This approach confirms and clarifies previously reported findings and also offers novel insights into biological processes influencing hypertension risk, potentially facilitating the development of improved therapeutic interventions. This research has been conducted using the UK Biobank Resource.
Introduction
Hypertension is an important risk factor for disease which has a heritable component and a recent large genome wide association study of common variant effects identified 901 loci with enrichment in relevant tissues (blood vessels, heart, adrenal tissue and adipose tissue) and pathways (angiotensinogen, calcium channels, progesterone, natriuretic peptide receptor, angiotensin converting enzyme, angiotensin receptors and endothelin receptors) (Evangelou et al., 2018). Selection pressures tend to mean that common variants individually have small effect sizes and it can be difficult to interpret their biological effects (Wang and Wang, 2018). By contrast, rare variants can potentially have large effect sizes and clear biological mechanisms as exemplified by a number of monogenic causes of hypertension such as congenital adrenal hyperplasia, familial hyperaldosteronism and pseudohypoaldosteronism, which can be caused by variants in CYP11B1, CYP11B2, WNK1, WNK4, KLHL3, CUL3, SCNN1B, SCNN1G, CYP17A1, HSD11B2, NR3C2 and KCNJ5 (Ehret and Caulfield, 2013). Additionally, variants in CACNA1H, CACNA1D and CLCN2 have now also been identified as causes of familial hyperaldosteronism while somatic mutations in ATP1A1 or ATP2B3 can produce aldosterone-producing adrenal adenomas with consequent hypertension (Scholl et al., 2018). Recessively acting variants in GUCY1A1 (previously labelled GUCY1A3) can cause moyamoya disease and two unrelated subjects with moyamoya disease who also had achalasia and hypertension were found to have compound heterozygote variants in this gene (Wallace et al., 2016). A more recent study has shown that moyamoya disease is itself a risk factor for hypertension (Lee et al., 2020). Using hypertension or blood pressure as phenotypes, gene-based analyses aggregating rare, nonsynonymous variants implicated PTMT1, DBH and NPR1 in a large meta-analysis and also showed that the minor allele of a rare, nonsynonymous variant in DBH, rs3025380, was associated with lower blood pressure (Liu et al., 2016). Another study reported that three individual nonsynonymous variants in NPR1 were associated with increased (rs35479618 and rs116245325) and decreased (rs61757359) blood pressure and showed that this could be explained by the effects of these variants on guanylate cyclase activity (Vandenwijngaert et al., 2019). KDM1A codes for LSD1, which removes methyl groups from the methylated lysine 4 residue of histone 3 (H3K4), and there have been reports of association between variants in KDM1A and salt-sensitive hypertension in humans, while heterozygous Lsd1 knock-out mice have salt-sensitive hypertension with increased aldosterone production (Huang et al., 2019; Pojoga et al., 2011; Williams et al., 2012).
The growing availability of sequence data means that it may become possible to study the wider effects of rare, functional variants in the general population. This may implicate novel genes or may demonstrate a wider role for genes already implicated in severe familial disorders. Exome sequence data is now available for 200,000 of the 500,000 UK Biobank subjects (Szustakowski et al., 2020). We have recently analysed this in order to illuminate the effect of rare, coding variants on susceptibility to hyperlipidaemia and a number of other common traits with complex inheritance and we now apply the same approach to study the contribution of rare variants to risk of developing hypertension (Curtis, 2021a).
Methods
The UK Biobank dataset was downloaded along with the variant call files for 200,632 subjects who had undergone exome-sequencing and genotyping by the UK Biobank Exome Sequencing Consortium using the GRCh38 assembly with coverage 20X at 95.6% of sites on average (Szustakowski et al., 2020). UK Biobank had obtained ethics approval from the North West Multi-centre Research Ethics Committee which covers the UK (approval number: 11/NW/0382) and had obtained informed consent from all participants. The UK Biobank approved an application for use of the data (ID 51119) and ethics approval for the analyses was obtained from the UCL Research Ethics Committee (11527/001). All variants were annotated using the standard software packages VEP, PolyPhen and SIFT (Adzhubei et al., 2013; Kumar et al., 2009; McLaren et al., 2016). To obtain population principal components reflecting ancestry, version 2.0 of plink (https://www.cog-genomics.org/plink/2.0/) was run with the options --maf 0.1 --pca 20 approx (Chang et al., 2015; Galinsky et al., 2016).
The UK Biobank sample contains 503,317 subjects of whom 94.6% are of white ethnicity. As we have discussed previously, it has become standard practice for investigators to simply discard data from participants with other ancestries and we regard this as regrettable (Curtis, 2021b). We demonstrated that if population principal components are included as covariates then it is possible to include all participants, regardless of ancestry, in the type of weighted burden analysis described here without inflation of the test statistic.
To define cases, a similar approach was used as was previously implemented for the investigation of hyperlipidaemia and T2D (Curtis, 2021a, 2021c, 2019). The hypertension phenotype was determined from four sources in the dataset: self-reported diagnosis recorded as hypertension or essential hypertension; reporting taking medication for high blood pressure; reporting taking any of a list of named medications commonly used to treat high blood pressure (https://www.nhs.uk/conditions/high-blood-pressure-hypertension/); having an ICD10 diagnosis of essential hypertension, hypertensive heart disease or hypertensive renal disease in hospital records or as a cause of death. Subjects in any of these categories were deemed to be cases with hypertension while all other subjects were taken to be controls.
The same analytic methods as had been used previously were applied, with the description repeated here for the reader’s convenience. The SCOREASSOC program was used to carry out a weighted burden analysis to test whether, in each gene, sequence variants which were rarer and/or predicted to have more severe functional effects occurred more commonly in cases than controls. Attention was restricted to rare variants with minor allele frequency (MAF) <= 0.01 in both cases and controls. As previously described, variants were weighted by overall MAF so that variants with MAF=0.01 were given a weight of 1 while very rare variants with MAF close to zero were given a weight of 10 (Curtis, 2021b). Variants were also weighted according to their functional annotation using the GENEVARASSOC program, which was used to generate input files for weighted burden analysis by SCOREASSOC (Curtis, 2016, 2012). The weights were informed from the analysis of the effects of different categories of variant in LDLR on hyperlipidaemia risk (Curtis, 2021a). Variants predicted to cause complete loss of function (LOF) of the gene were assigned a weight of 100. Nonsynonymous variants were assigned a weight of 5 but if PolyPhen annotated them as possibly or probably damaging then 5 or 10 was added to this and if SIFT annotated them as deleterious then 20 was added. In order to allow exploration of the effects of different types of variant on disease risk the variants were also grouped into broader categories to be used in multivariate analyses as described below. The full set of weights and categories is displayed in Table 1. As described previously, the weight due to MAF and the weight due to functional annotation were multiplied together to provide an overall weight for each variant. Variants were excluded if there were more than 10% of genotypes missing in the controls or if the heterozygote count was smaller than both homozygote counts in the controls. If a subject was not genotyped for a variant then they were assigned the subject-wise average score for that variant. For each subject a gene-wise weighted burden score was derived as the sum of the variant-wise weights, each multiplied by the number of alleles of the variant which the given subject possessed. For variants on the X chromosome, hemizygous males were treated as homozygotes.
For each gene, logistic regression analysis was carried out including the first 20 population principal components and sex as covariates and a likelihood ratio test was performed comparing the likelihoods of the models with and without the gene-wise burden score. The statistical significance was summarised as a signed log p value (SLP), which is the log base 10 of the p value given a positive sign if the score is higher in cases and negative if it is higher in controls.
Gene set analyses were carried out as before using the 1454 “all GO gene sets, gene symbols” pathways as listed in the file c5.all.v5.0.symbols.gmt downloaded from the Molecular Signatures Database at http://www.broadinstitute.org/gsea/msigdb/collections.jsp (Subramanian et al., 2005). For each set of genes, the natural logs of the gene-wise p values were summed according to Fisher’s method to produce a chi-squared statistic with degrees of freedom equal to twice the number of genes in the set. The p value associated with this chi-squared statistic was expressed as a minus log10 p (MLP) as a test of association of the set with the hyperlipidaemia phenotype.
For selected genes, additional analyses were carried out to clarify the contribution of different categories of variant. As described previously, logistic regression analyses were performed on the counts of the separate categories of variant as listed in Table 1, again including principal components and sex as covariates, to estimate the effect size for each category (Curtis, 2021a). The odds ratios (ORs) associated with each category were estimated along with their standard errors and the Wald statistic was used to obtain a p value, except for categories in which variants occurred fewer than 50 times in which case Fisher’s exact test was applied to the variant counts. The associated p value was converted to an SLP, again with the sign being positive if the OR was greater than 1, indicating that variants in that category tended to increase risk.
Data manipulation and statistical analyses were performed using GENEVARASSOC, SCOREASSOC and R (R Core Team, 2014).
Results
There were 66,123 cases and 134,504 controls. There were 20,384 genes for which there were qualifying variants. Given that there were 20,384 informative genes, the critical threshold for the absolute value of the SLP to declare a result as formally statistically significant is -log10(0.05/20384) = 5.61 and this was achieved by two genes, DNMT3A (SLP = 8.21) and FES (SLP = 6.10). The quantile-quantile (QQ) plot for the SLPs obtained for all genes except DNMT3A is shown in Figure 1. This shows that the test appears to be well-behaved and conforms well with the expected distribution. Omitting the genes with the 100 highest and 100 lowest SLPs, which might be capturing a real biological effect, the gradient for positive SLPs is 1.096 with intercept at 0.006 and the gradient for negative SLPs is 1.080 with intercept at 0.02, indicating only modest inflation of the test statistic in spite of the fact that participants of all ancestries are included.
Table 2 shows all the genes achieving SLP with absolute value greater than 3, equivalent to an uncorrected p value of 0.001. Given that 20,384 genes were tested, one would expect that by chance about 20 would reach this level of significance whereas in fact there are 42. Thus it is possible that some of these highly ranked genes do demonstrate a biological signal which fails to reach statistical significance after correction for multiple testing. For NPR1, the analysis was repeated excluding data from the previously reported individually significant variants rs35479618, rs116245325 and rs61757359. This resulted in a reduction of the SLP from 5.14 to 4.38. For DBH, the analysis was repeated without rs3025380 and this resulted in a change in SLP from −3.40 to −2.11. The full list of results for all genes is provided in Supplementary Table S1.
In order to see if any additional genes were highlighted by analysing gene sets, gene set analysis was performed as described above after first dividing the gene-wise log p values by the average inflation factor of 1.09 before combining them using Fisher’s method. Given that 1,454 sets were tested, a critical MLP to achieve to declare results significant after correction for multiple testing would be log10(1454*20) = 4.46 and this was achieved by just one set, labelled CHROMATIN. This set contains 35 genes, including DNMT3A. The full results of the gene set analyses are listed in Supplementary Table S2. Four other sets achieved SLP > 3 and these and for these all genes with an absolute value SLP > 1.3 (equivalent to p < 0.05) were listed. All of these sets also contained DNMT3A and none of the other nominally significant genes appeared to be obviously relevant to hypertension. These results are presented in Supplementary Table S3.
For the genes listed in Table 2 which appeared to be of interest, additional multivariate analyses were performed to elucidate the contribution to the overall result from different categories of variant. The results of this analysis for DNMT3A are shown in Table 3A. From this it can be seen that the signal comes from disruptive and splice site variants which are predicted to cause LOF and which are between them associated with an OR of about 1.9. However, variants annotated as probably damaging by PolyPhen are also commoner in cases and are associated with an OR of 1.5. Table 3B shows that the result for FES is driven by a small number of disruptive variants which are commoner in cases with OR of 2.8. It is striking that both GUCY1A1 and GUCY1B1 are ranked among the top 7 genes since they code for subunits of the same guanylate cyclase and their results are shown in full in Table 3C and 3D. This shows that while the effect for GUCY1A1 is driven by LOF variants these are very rare in GUCY1B1 and for this gene there seems to be an additional contribution from an excess of 5 prime UTR variants among cases. These occur at 44 different locations and detailed inspection of the output file showed these all to be individually rare, such that there was no single variant which could be seen to making a significant contribution to the overall effect. For the other genes thought to be of interest, Table 4 provides a summary of the results for LOF variants along with any other variant category individually significant at p < 0.05. Full results for analyses of variant categories are presented in Supplementary Table S4.
Of the previously implicated genes listed in the Introduction, aside from GUCY1A1 only the following were significant at p < 0.05: CYP11B1 (SLP = 1.63), NR3C2 (SLP = 1.67) and CACNA1H (SLP = −2.05). Analyses of the variant categories were carried out for these genes and a summary of the results is shown in Table 5, which provides the results for LOF variants and for other categories which were significant at p < 0.05. These are mostly unremarkable and it is difficult to draw firm conclusions although a few results are worth noting. For most genes LOF variants are very rare so that one cannot gain a clear estimate of their effect. However the results for WNK1, WNK4, CLCN2 and ATP2B3 suggest that LOF variants in these genes do not have a very major effect on risk of hypertension. The most striking result is that for CACNA1D the InDel category produces SLP = 8.10 with OR = 1.30. InDel variants occur at 9 locations in this gene and inspection of the detailed results revealed that 8 of these are very rare so the result reflects the effect of a single inframe deletion, 3:53808664-CCTT>C. This is rs72556363, which results in the loss of a phenylalanine residue, p.Phe1923del, and according to gnomAD it has allele frequency of 0.0043 in non-Finnish Europeans and is extremely rare or absent from other populations. In the UK Biobank subjects we observe a frequency of 0.0049 in controls and 0.0064 in cases. Full results for analyses of variant categories in these genes are presented in Supplementary Table S5.
Discussion
These analyses provide a broad overview of some impacts of very rare genetic variants in a large sample broadly representative of the population. It should be pointed out that the analytic approach used assumes that all variants in gene have the same direction of effect. Since any random variant is more likely to impair the function of a gene than to enhance it, weighted burden analysis is not expected to detect effects of gain of function variants because these are likely to be swamped by other variants. Likewise the method is expected to be relatively insensitive at detecting variants which have a purely recessive effect. Of course it is quite possible that the effect of complete loss of function of one gene may be modified by functional variants affecting the other copy of the gene but such complex heterozygous effects are difficult to detect. Additionally, a population sample differs from a specially recruited case-control sample in that it is not enriched for the phenotype in question and hence will have less power to detect either rare recessively acting variants or extremely rare variants with a dominant effect. Thus, we expect that there may well be important rare variant effects in addition to the ones which this study has highlighted.
Another contrast with specifically defined case-control studies is that the phenotype needs to be derived from measures which are provided and in order to improve power it is desirable to use information which is available for a large number of participants. The phenotype used here attempts to reflect a clinical diagnosis of hypertension but of course will not do this as accurately as could be achieved in a specifically ascertained sample. Some participants may be on antihypertensive medication for indications other than hypertension, some may have undiagnosed hypertension and for some the diagnosis may simply be incorrect. Measured blood pressure itself was not used to inform the phenotype, in part because it might reflect blood pressure on medication and in part because single measurements may be less informative than indirectly relying on the clinical decisions which have fed into assigning a diagnosis or starting a prescription. The advantages of having a large sample to some extent balance the disadvantage of a less accurate phenotype.
When considering the contribution of risk within the population, we should note that the variants involved are rare. Although a MAF threshold of 0.01 was used, the majority of variants analysed are very much rarer than this and for the variant categories with the most severe consequences the cumulative frequency of variants in the category is also low. This means that few subjects will carry more than one variant with a severe consequence and we can say that the mean count of variants of a particular category is a good approximation for the proportion of the subjects who carry a variant of that category.
The result for DNMT3A seems unlikely to be due to chance since it would remain significant at p = 0.00016 even after correction for the number of genes tested. The results show that LOF variants in this gene are associated with nearly doubling the OR for hypertension and are present in about 1 in a thousand people, while a slightly larger number will have a variant annotated as probably damaging by PolyPhen which moderately increases hypertension risk. DNMT3A is a DNA methyltransferase and nonsynonymous variants in it have been reported as causes of two different syndromes, Tatton-Brown-Rahman syndrome with overgrowth and intellectual disability and Heyn-Sproul-Jackson syndrome with microcephalic dwarfism, neither of which has hypertension as part of the phenotype (Heyn et al., 2019; Tatton-Brown et al., 2014). Missense variants in DNMT3A have also been reported in autism spectrum disorder and mice with Dnmt3a haploinsufficiency produced by heterozygous deletion of exon 19 have increased body weight and some behavioural alterations (Christian et al., 2020). In a series of 210 patients with an overgrowth syndrome similar to Sotos syndrome but with no NSD1 mutation, four had de novo nonsynonymous mutations in DNMT3A and two had stop variants (Tlemsani et al., 2016). One of the stop variants was inherited from a normal mother in whom it was thought a somatic mutation had occurred and for the other the father’s DNA was not available. Given the frequency of LOF variants in DNMT3A observed in our samples, it seems possible that the observation of two patients with stop variants was coincidental. Thus it may be that, while certain specific nonsynonymous variants can cause severe phenotypes, generally reduced functioning of DNMT3A does not cause marked problems but is associated with increased risk of hypertension.
Although DNMT3A variants have not previously been reported to be associated with hypertension, as reviewed recently these findings are consistent with a body of results relating to the histone lysine demethylase LSD1, which together implicate LSD1 hypofunction in salt-sensitive hypertension (Hirohama and Fujita, 2019). In our data disruptive variants in KDM1A, the gene for LSD1, are associated with an OR of 1.9 but they are too rare for firm conclusions to be drawn. DNMT3A methylates DNA conditional on the associated H3K4 residue being unmethylated and LSD1 accomplishes this demethylation, meaning that reduced function of LSD1 is expected to reduce DNMT3A-dependent DNA methylation (Hashimoto et al., 2010; Shi et al., 2004). In mice, Dnmt3a deficiency, which can be produced by knockdown or as a consequence of foetal exposure to dexamethasone or low protein maternal diet, has been shown to lead to reduced methylation of the gene for angiotensin receptor type 1a, Agtr1a, leading to increased Agtr1a expression and salt-induced hypertension (Kawakami-Mori et al., 2018). Thus, a consistent picture emerges that reduced DNMT3A activity can increase risk for hypertension, whether this is due to genetic variants in DNMT3A itself, LSD1 hypofunction or the uterine environment.
Although FES (SLP = 6.10) only just reaches criteria for exome-wide significance, confidence in this result is somewhat increased by the fact that a nearby SNP, rs2521501, shows robust evidence for association (Ehret et al., 2011). However the mechanisms by which it might influence hypertension risk are unclear as it codes for a tyrosine kinase which is involved in various signalling pathways and which may have a role in haematopoiesis and regulating the innate immune response (Zirngibl et al., 2002).
The results for GUCY1A1 (SLP = 5.54) and GUCY1B1 (SLP = 3.92) are more compelling, given that they code for two different subunits of the same protein, soluble guanylate cyclase, and given that recessively acting variants in GUCY1A1 have previously been reported in cases of moyamoya disease with hypertension (Wallace et al., 2016). Soluble guanylate cyclase is responsible for detecting NO signalling in order to produce vasodilation and other responses, and the central role of this pathway in the control of blood pressure is well-established from animal studies while guanylate cyclase stimulators have been developed as treatments for pulmonary hypertension (Buys and Sips, 2014). The findings reported here are the first to directly demonstrate that impaired functioning of either of these genes represents a risk factor for systemic hypertension in the general population, with nearly 1 in a thousand people carrying a LOF variant in one of them which approximately doubles the OR for hypertension.
The results for NPR1 (SLP = 5.14) represent a replication of the previously reported findings and confirm that, although certain nonsynonymous variants such as rs61757359 may be associated with reduced blood pressure, in general variants which impair the functioning of this gene increase the risk of hypertension (Liu et al., 2016; Vandenwijngaert et al., 2019). Around 1 in 500 people carries a variant annotated by PolyPhen as probably damaging and overall such variants are associated with a modest increase in risk with OR = 1.32 (1.05 - 1.66).
SMAD6 (SLP = 4.10) has a role in signalling pathways and has not previously been clearly implicated in hypertension risk although a recent report describes how exome sequencing of 37 children with renovascular hypertension revealed a frameshift variant classified as likely pathogenic variant in SMAD6 in one patient (Viering et al., 2020). SMAD6 variants are known to predispose to cardiovascular malformations including bicuspid aortic valve related aortopathy (Gillis et al., 2017; Luyckx et al., 2019; Tan et al., 2012). The results reported here suggest that LOF variants in this gene may have a moderate effect on increased risk of hypertension in the general population.
Recessively acting variants in IFT172 (SLP = 3.39) can cause ciliopathies and there is a report of a child with compound heterozygous variant who presented with growth retardation and subsequently developed retinopathy, metaphyseal dysplasia and, at the age of 11, hypertension (Lucas-Herald et al., 2015). However there does not seem to be other evidence to implicate IFT172 in hypertension risk so this result would require replication in other samples.
CSK (SLP = 3.37) is located in the 15q24 locus which, as reviewed recently, is implicated by multiple GWAS for hypertension (Lee et al., 2016). Following up the results of eQTL analyses, these authors demonstrated that mice with gene-silencing or haploinsufficiency of Csk had increased blood pressure and showed that this effect could be moderated by PP2, an inhibitor of Src. Although the results reported here are not formally significant after correction for multiple testing, the additional support provided by these GWAS findings and functional studies does suggest that variants in CSK, in particular those annotated as deleterious by SIFT, might be a risk factor for hypertension.
The results for DBH (SLP = −3.40) provide further support for the previously reported findings that variants in this gene are associated with blood pressure (Liu et al., 2016). In particular, the results suggest that variants annotated as deleterious by SIFT are on average associated with a slightly reduced risk of developing hypertension. Although these variants are individually rare, about 1 person in 20 carries one of them.
AGTR1 (SLP = −3.77) codes for a receptor for angiotensin II so it seems very plausible from a biological point of view that variants impairing its function might be protective against hypertension in spite of the fact that no association with common variants has been detected (Ji et al., 2017). The results suggest that very rare gene disruptive variants can about halve the OR for hypertension.
ZYX (SLP = −3.83) is potentially of interest because it codes for zyxin, the protein responsible for sensing stretch in endothelial cells and vascular smooth muscle cells, as occurs in hypertension, and mediating their response to this by changing the expression of other genes (Ghosh et al., 2015). The findings reported here suggest that impaired functioning of this gene may reduce risk of hypertension.
PREP (SLP = −5.03) narrowly fails to meet conventional criteria for exome-wide significance but is clearly of interest because its product, prolyl endopeptidase, also known as prolyl oligopeptidase or post-proline cleaving enzyme, has recently been shown to be responsible for converting circulating angiotensin II to angiotensin-(1-7) in the circulation and in lungs, a process which is largely independent of ACE2, which carries out this conversion in the kidney (Serfozo et al., 2020). The results we report suggest that rare, functional variants in PREP are protective against hypertension but it is not clear which categories of variant are responsible and although LOF variants are commoner in controls they are too rare for conclusions to be drawn. In mice, loss of this gene results in reduced ability to metabolise ACE2 and hence a more prolonged systemic hypertensive response to exogenously administered ACE2 (Serfozo et al., 2020). While it is not obvious why impaired functioning of this gene might be protective against hypertension these findings do seem worthy of further exploration.
The finding that an inframe deletion in CACNA1D, rs72556363, is associated with increased of hypertension is consistent with reports that very rare germline and somatic variants in this gene can result in aldosterone-producing adenomas and primary aldosteronism (Scholl et al., 2013). Although the variant varies in frequency between populations, essentially being restricted to those with European ancestry, this result does not seem likely to be due to an artefact of population stratification because the frequency in controls is similar to that reported in non-Finnish Europeans whereas the frequency in cases is even higher. It seems to represent a modest risk factor for hypertension without producing severe hyperaldosteronism, which is found in about 1% of subjects with European ancestry.
Overall, these analyses provide an overview of some of the impacts rare, coding variants may have on the risk of hypertension in the general population. The validity of some novel findings will become clearer when exome sequence data is released for the remaining 300,000 UK Biobank participants or if they can be tested in other samples or followed up in functional studies. All the variants implicated are very rare and in view of their effect sizes arguably do not make an important contribution to risk from a public health point of view. Nor are they probably helpful as individual measures of risk, partly because they can only be detected by sequencing. Although it may be reasonable to assume that LOF variants within a given gene will tend to have a similar effect on phenotype, the same cannot be said of nonsynonymous variants and even within a given category the effect of such variants is likely to be vary considerably. Thus, for variants which are individually extremely rare it is in general not possible to make a clear interpretation regarding their likely effect. The main value of these findings is probably in highlighting genes and biological pathways of relevance in order ultimately to inform improved therapeutic approaches.
Data Availability
The raw data is available on application to UK Biobank. Detailed results with variant counts cannot be made available because they might be used for subject identification. Scripts and relevant derived variables will be deposited in UK Biobank. Software and scripts used to carry out the analyses are also available at https://github.com/davenomiddlenamecurtis.
Conflicts of interest
The author declares he has no conflict of interest.
Data availability
The raw data is available on application to UK Biobank. Detailed results with variant counts cannot be made available because they might be used for subject identification. Scripts and relevant derived variables will be deposited in UK Biobank. Software and scripts used to carry out the analyses are also available at https://github.com/davenomiddlenamecurtis.
Preprint availability
A preprint describing this work has been posted at medRxiv with doi 10.1101/2021.02.10.21251503 (Curtis, 2021d).
Ethics statement
UK Biobank had obtained ethics approval from the North West Multi-centre Research Ethics Committee which covers the UK (approval number: 11/NW/0382) and had obtained informed consent from all participants. The UK Biobank approved an application for use of the data (ID 51119) and ethics approval for the analyses was obtained from the UCL Research Ethics Committee (11527/001).
Author contributions
DC carried out the analyses and prepared the manuscript.
Acknowledgments
This research has been conducted using the UK Biobank Resource. The author wishes to acknowledge the staff supporting the High Performance Computing Cluster, Computer Science Department, University College London. This work was carried out in part using resources provided by BBSRC equipment grant BB/R01356X/1. The author wishes to thank the participants who volunteered for the UK Biobank project.
Footnotes
There was a typo in the name of CACNA1D