A factorial Mendelian randomization study to systematically prioritize genetic targets for the treatment of cardiovascular disease

evidence by in addition Abstract Importance New drugs which provide benefit alongside statin mono-therapy are warranted to reduce risk of cardiovascular disease. Objective To systematically evaluate the genetically predicted effects of 8,851 genes and cardiovascular disease risk factors using data from the UK Biobank and subsequently prioritize their potential to reduce cardiovascular disease in addition to statin therapy. Design, Setting, and Participants A factorial Mendelian randomization study using individual level data from the UK Biobank study. This population-based cohort includes a total of 502,602 individuals aged between 40 and 96 years old at baseline who were recruited between 2006 to 2010. Exposures Genetic variants robustly associated with the expression of target genes in whole blood (based on P<5x10 -08 ) were used to construct gene scores using findings from the eQTLGen consortium (n=31,684). In this study, we describe an MR framework designed to systematically prioritize putative genetic targets which are predicted to have an independent and additive benefit alongside statin treatment. This approach was applied to evaluate genetically predicted effects of 8,851 genetic targets on measures of CVD risk in the UK Biobank study (UKB) 18 . Genes identified in this analysis were subsequently analyzed using 2x2 factorial to assess whether therapeutically targeting them may have a predicted reduced risk on CVD in addition to statin therapy. Finally, we performed a phenome-wide association study (PheWAS) to highlight any putative adverse effects for prioritized targets identified in the previous analyses. In doing so, we demonstrate the ability of this framework to capture diverse biological functions and recapitulate results outlined in preclinical studies, strengthening the validity of this approach to prioritize (and deprioritize) drug targets for therapeutic validation.


Keywords
Mendelian randomization, factorial study design, drug validation, phenome-wide association study, UK Biobank, . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Introduction Cardiovascular disease (CVD) is an increasingly prevalent public health concern and remains the leading cause of death worldwide 1 . Cholesterol lowering drugs such as statins (3-hydroxy-3-methylglutaryl coenzyme A reductase [HMGCR]-inhibitors) are regarded as the gold-standard treatment option in terms of lowering the risk of CVD including myocardial infarction and stroke 2,3 . Whilst the effectiveness of statins in risk reduction for both primary and secondary CVD has been established in randomized control trials (RCTs) 4,5 , they have also been reported to have adverse side-effects in certain patients, such as an increased risk of developing type-2 diabetes 6 and weight-gain 7 . Furthermore, there is major unmet clinical need for the identification of additional drugs to achieve adequately lower CVD risk in patients undergoing statin mono-therapy 8 or a viable alternative to it 9 .
Mendelian Randomization (MR) is a technique in causal inference which uses naturally occurring genetic variation to investigate the relationship between modifiable exposures (such as the anticipated effect of a drug) and disease outcomes 10,11 . By exploiting the random assortment of genetic alleles at birth, MR is often considered analogous to the allocation of individuals to drug and placebo groups in an RCT, without the concerns of non-adherence 12 .
As such, findings from MR are less prone to confounding and reverse causation which can hinder classical observational studies.
Recent studies have demonstrated the value of conducting genetic and MR analyses to mimic the putative effects of therapeutic intervention [13][14][15] . Multiple studies have examined the relative relationship between lifelong genetic inhibition of HMGCR and alternative genetic targets 16,17 . These studies have used a 2x2 factorial approach, which stratifies the sample population by allelic risk scores to estimate the separate and combined effects of genetic proxies of therapeutic intervention on CVD outcomes. Such developments have established . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16.20023010 doi: medRxiv preprint MR as a powerful approach for drug discovery and improved understanding of disease aetiology.
In this study, we describe an MR framework designed to systematically prioritize putative genetic targets which are predicted to have an independent and additive benefit alongside statin treatment. This approach was applied to evaluate genetically predicted effects of 8,851 genetic targets on measures of CVD risk in the UK Biobank study (UKB) 18 . Genes identified in this analysis were subsequently analyzed using 2x2 factorial to assess whether therapeutically targeting them may have a predicted reduced risk on CVD in addition to statin therapy. Finally, we performed a phenome-wide association study (PheWAS) to highlight any putative adverse effects for prioritized targets identified in the previous analyses. In doing so, we demonstrate the ability of this framework to capture diverse biological functions and recapitulate results outlined in preclinical studies, strengthening the validity of this approach to prioritize (and deprioritize) drug targets for therapeutic validation. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16.20023010 doi: medRxiv preprint

Study populations and outcomes
Single nucleotide polymorphisms (SNPs) robustly associated with changes to gene expression (i.e. P<5x10 -08 ) were selected as instrumental variables using findings from the eQTLGen consortium (n=31,684) 19 . Our inclusion criterion was based on genes whose expression could be instrumented by at least 2 independent SNPs within a 1Mb distance of a gene's transcription start site (known as cis-eQTL). This was to improve the robustness of findings in line with the assumptions of MR, further details of which can be found in supplementary methods. A reference panel of European individuals from the 1000 genome project (phase 3) was used to identify independent SNPs based on r 2 <0.01 20 .
The UK Biobank is a prospective cohort study with detailed genotype and phenotype data on up to 500,000 participants 18 . The CVD risk factors we evaluated in our initial analysis using data from UKB were; body-mass index (BMI), diastolic blood pressure (DBP), systolic blood pressure (SBP), low-density lipoproteins (LDL) and triglycerides (TG). A derived outcome encompassing all CVD outcomes from UKB using data from field 20002 (such as coronary heart disease, hypertension and hypercholesterolemia) was used in the factorial 2x2 MR analysis. Individuals were therefore categorized as a case in our derived variable if they were a case for any CVD outcomes in this field. The phenome-wide analysis used data on 569 outcomes from UKB as described previously 21 , a full list of which can be found in Supplementary Table 11.

Statistical analysis
Identifying associations between genetically predicted gene expression and risk factors for cardiovascular disease . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16.20023010 doi: medRxiv preprint To assess whether changes in gene expression have a putative causal role in CVD risk, cis-eQTL were harnessed as instrumental variables in a 2-sample MR analysis 22 . We applied the inverse variance weighted method (IVW) 23 to estimate the effect of genetically predicted gene expression on each of the 5 CVD traits in turn using the 'TwoSampleMR' package 24 . Genetically predicted effects which survived multiple testing (based on P < 0.05/(number of genes analyzed*number of traits)) were subsequently filtered to identify those which were candidates for therapeutic intervention and therefore carried forward to downstream analyses. Targets were also filtered to identify those which were "druggable" using a comprehensive list of genes collated from recent data-driven drug-discovery and target selection strategies [25][26][27][28] .

Systematic factorial Mendelian randomization analysis to prioritize therapeutic targets
Individual-level genetic data was used from participants in UKB to construct genetic risk scores (GRS) for each 'druggable' gene associated with any of the 5 traits from the previous analysis. GRS were constructed as the sum of the effect alleles for eQTL SNPs weighted by their eQTLGen regression coefficients. To account for multiple testing in the factorial analysis, the genetically predicted effect between each GRS and CVD in UKB (based on data from field 20002 as described above) was estimated by using logistic regression with adjustment for age, sex and the first 10 principal components (PC) of ancestry. Only genes whose predicted effect on the derived CVD outcome (based on a multiple testing correction of P<0.05/number of GRS evaluated) were analyzed in the factorial MR analysis. A previously published HMGCR score was constructed using 6 LDL-associated variants within 100kb of this gene to mimic the CVD lowering effect on statin therapy 16 .
We applied 2x2 factorial MR to systematically compare the effect of each gene associated with CVD in the previous analysis after accounting for the effect of the HMGCR score. The UKB dataset consisting of 334,915 individuals (described in further detail in the . CC-BY 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16.20023010 doi: medRxiv preprint supplementary methods) was first dichotomized into halves based on the median of the HMGCR score, and then into quarters based on the median of the potential novel gene target's GRS. The group consisting of individuals with below median values for both the HMGCR and novel gene scores was subsequently used as the baseline group in further analyses. Each of the other groups were analyzed in turn with the baseline group to estimate the effect of genetically predicted effects for statin inhibition, new drug effect and a combined therapy. A graphical illustration of this approach can be found in Figure 1. Analyses involved logistic regression on the CVD outcome adjusting for the same covariates as before.

Phenome-wide association study (PheWAS)
Finally, we undertook a PheWAS analysis for each GRS prioritized in the previous analyses to highlight any potentially unanticipated adverse effects of therapeutic intervention.
This was undertaken by evaluating the association between each GRS and 569 outcomes from the UKB. Continuous, binary and categorical traits were analyzed using linear, logistic and ordinal/multinomial logistic regression respectively. All analyses were adjusted for age, sex and the top 10 PCs. All analyses were undertaken using R (version 3.5.1) and all plots were created using the package 'ggplot2'. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16.20023010 doi: medRxiv preprint

A systematic Mendelian Randomization analysis to identify novel candidate genes associated with cardiovascular disease
The putative causal effect between the genetically predicted expression of 8,851 genes (Supplementary Table 1 Table 7). For example, the association between FADS1 and triglycerides would have been identified in a GWAS based on the strongest individual SNP effect for this gene (P=5.5x10 -109 ). However, the association between POM121C would have been overlooked by conventional single SNP analyses (lowest individual P=9.40x10 -5 ), yet the combined effect of all eQTL provided stronger evidence of association (IVW P=5.83x10 -13 ). POM121C encodes a nucleoporin whose expression may be involved in regulating insulin sensitivity and adipogenesis 29 .
There were 68 candidate genes of the 377 putative effects that were "druggable", as defined by recent data-driven drug-discovery and target selection strategies [25][26][27][28] (Supplementary Table 8). Of these, there was strong evidence of a genetically predicted effect . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16.20023010 doi: medRxiv preprint between the expression of 20 genes and the CVD outcome in UKB based on our GRS analysis (Bonferroni P=0.05/68=7.35x10 -4 ).
Prioritizing cardiovascular genes for therapeutic intervention using factorial Mendelian randomization Next, we sought to discern whether there was genetic evidence of an additive CVD risk lowering effect of the 20 identified candidate genes compared to an HMGCR score acting as a proxy for statin inhibition (Supplementary Table 9). The HMGCR score was strongly is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16.20023010 doi: medRxiv preprint benefit of combined therapy (i.e. statins + new drug) for these novel gene targets would result in a reduction of between 6-7% reduction in CVD risk over the placebo group.
Exploring phenome-wide associations to predict putative side effects of genetically targeted therapeutics As a proof of concept, we firstly undertook a PheWAS to evaluate the genetically predicted effects of HMGCR using the score used in the previous analysis ( Figure  Full results from the PheWAS analyses for each of the 20 genes taken forward from the previous analysis can be found in Supplementary Tables 13-32). As expected, the PheWAS results for prioritized genes indicated an enrichment for traits with known roles in mediating CVD risk, in addition to novel secondary effects which may raise concern for the development of compounds targeting multiple pathways. For instance, FDFT1 and NEGR1 were associated with 55 and 50 traits respectively (Bonferroni P=0.05/569=8. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16.20023010 doi: medRxiv preprint 14 08 ) which are markers of liver disease 31 . There was also evidence that NEGR1 expression associates with fluid intelligence score (b= 0.026; CI, 0.014-0.039; P=3.69x10 -5 ) in the same direction to anthropometric traits, suggesting that inhibiting NEGR1 may have deleterious consequences on this trait.
In contrast, findings from the PheWAS analysis suggested that alternative targets may provide more viable therapeutic opportunities for genes such as SLC5A11 and PRKCE ( Figure   5C  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16.20023010 doi: medRxiv preprint

Discussion
In this study we present a comprehensive analytical pipeline which harnesses Mendelian randomization to assess whether the inhibition of novel drug targets may further reduce risk of cardiovascular disease in addition to statin treatment. Applying this framework in the UK Biobank study prioritized 20 genetic targets which were predicted to provide additional therapeutic benefit in combination with statins, including known cooperative drug interactions 32 . Exploring the putative effects of these targets on 569 outcomes supported the validity of this methodology at estimating potential secondary drug effects. For example, the HMGCR score was strongly associated with HbA1C in the opposite direction to LDL cholesterol, which is indicative of the increased risk of type 2 diabetes which may accompany statin therapy based on clinical evidence 33 . This approach was subsequently applied to each of the 20 identified genes detected to further prioritize their potential as a therapeutic target.
The use of large-scale genotype-phenotype datasets is becoming increasingly important as an early drug-development tool for informed target validation 34  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16.20023010 doi: medRxiv preprint function. SQS-inhibitors have been developed to late-stage clinical trial but have been discontinued due to treatment associated hepatotoxicity 36 .
Similarly, while the factorial MR analyses provided evidence that NEGR1 may be an effective therapeutic target in lowering CVD risk in combination with statin treatment, its predicted effects on neurological/psychiatric traits may complicate its specificity. NEGR1 (neuronal growth regulator-1) is expressed in the hypothalamus plays a role in energy balance and food intake 37,38 as well as being linked previously with major depressive disorder 39 .
NEGR1 was associated with fluid intelligence score in our analysis which may raise concerns about the wider neurological safety of targeting this pathway.
Our results also highlight promising genes which may potentially make worthwhile targets. For example, predicted SLC5A11 activation was highly associated with reduced risk of anthropometric (e.g. BMI) and cardiovascular traits but did not associate with any secondary potentially adverse effects in the PheWAS. SLC transporters are widely implicated in health and disease, and there is growing interest in strategies for therapeutic inhibition and activation of this protein family 40 . Similarly, drugs targeting the PRKCE pathway may provide clinical benefit based on our analysis. PRKCE encodes PKC-ε, a member of the protein kinase C (PKC) serine/threonine protein kinases. The results of our PheWAS have predicted that PRKCE inhibition is not likely to elicit adverse secondary effects and may provide protection against high blood pressure. PKC-ε is known to be expressed in the heart and may confer a cardioprotective role during ischemic heart failure 41 . Various in vitro and in vivo studies have provided evidence for a further role in mediating hypertrophy which may be dependent on PRKCE expression levels in the ischemic heart 42 .

Limitations
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16.20023010 doi: medRxiv preprint The study has several methodological limitations. Firstly, genetically predicted molecular traits represent the cumulative effect of lifelong exposure on the outcome and therefore cannot be used to directly predict the short-term benefit of a putative drug 43 .
Secondly, the CVD outcomes derived from field 20002 in the UKB cohort are based on selfreport. As such we firstly analyzed measured CVD risk factors from this cohort so that the risk of potential bias by implementing self-reported data from the population cohort is mitigated.
Finally, our analysis is restricted to blood derived cis-eQTL due to availability of data, rendering them liable to loss of sensitivity to detect tissue-specific effects on disease susceptibility 44 . For example, the results of our analysis suggest that PSRC1 expression at the 1p13.3 locus may associate with LDL cholesterol, although previous functional endeavors and those that using liver-derived expression data suggest that the likely causal gene for this signal is SORT1 45,46 .

Conclusions
The use of large-scale genetic studies employing Mendelian randomization can provide a cost-effective approach to accelerate the identification of viable drug targets to treat disease. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16.20023010 doi: medRxiv preprint

Figure 1 -An illustration of the concept behind factorial Mendelian randomization
Using naturally occurring genetic mutations in a population, individuals are allocated to 4 different groups depending on whether they harbor mutations known to influence the regulation of two target genes. These mutations are combined into two scores, one for each gene, which can be used to mimic the potential impact of inhibiting their regulation. As such, people allocated to the placebo group have a score less than the median for both genes, Drug A/Drug B groups have a score higher than the median for one gene but not the other, whereas those in the combined therapy group have higher scores than the median for both genes.
By comparing incidence of disease in each group with the placebo group, it is possible to infer whether developing a drug for a novel gene target would yield an additive therapeutic benefit over current treatments. We have demonstrated this in our study to assess whether evaluating novel drug targets may be worthwhile in terms of treating coronary heart disease on top of HMGCR inhibition (i.e. statin therapy). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16.20023010 doi: medRxiv preprint

Figure 4 -A forest plot to illustrate 2x2 Factorial Mendelian Randomization estimates
A comparison of findings from the 2x2 factorial Mendelian randomization analysis for 4 drug targets (FDFT1, NEGR1, PRKCE and SLC5A11). Genetic risk scores for each of these genes were constructed in the UK Biobank cohort and evaluated against a previously devised score for HMGCR by Ference et al (2015). In turn, each score was compared with the HMGCR score to assess whether targeting them may lower cardiovascular disease risk in addition to statin therapy.
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16.20023010 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16.20023010 doi: medRxiv preprint Each point on the plot represents the association between the respective genetic score and a complex trait in UKB. The y-axis indicates the -log 10 p-value for the associations after orienting their direction of effect in line with predictive therapeutic treatment (i.e. statins have an LDL cholesterol lowering effect). Points are grouped and colored according to the corresponding subcategory for each trait. The horizontal dashed red line indicates the Bonferroni corrected threshold for multiple testing (-log10(0.05/(569)) = -log10(8.79x10 -5 ).
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02. 16.20023010 doi: medRxiv preprint