Evidence for Causal Effects of Smoking Initiation and Alcohol Consumption on Substance Use Outcomes

Background and Aims: The 'gateway' hypothesis proposes that initial use of drugs such as tobacco and alcohol can lead to subsequent more problematic drug use. However, it is unclear whether true casual pathways exist, or whether there is instead a shared underlying risk factor. We used bidirectional Mendelian Randomisation (MR) to test these two competing hypotheses. Methods: We conducted two-sample MR analyses, using genome-wide association data for smoking initiation, alcoholic drinks per week, cannabis use and dependence, cocaine and opioid dependence. We used several MR methods that rely on different assumptions: inverse-variance weighted (IVW), MR-Egger, weighted median, simple mode and weighted mode. Consistent results across these methods would support stronger inference. Results: We found evidence of causal effects from smoking initiation to increased drinks per week (IVW: beta=0.06; 95% CI 0.03 to 0.09; p-value=9.44x10-06), cannabis use (IVW: OR=1.34; 95% CI 1.24 to 1.44; p-value=1.95x10-14), and cannabis dependence (IVW: OR=1.68; 95% CI 1.12 to 2.51; p-value=0.01). We also found evidence of an effect of cannabis use on increased likelihood of smoking initiation (IVW: OR=1.39; 95% CI=1.08 to 1.80; p-value=0.01). We did not find evidence of an effect of drinks per week on substance use outcomes, except for weak evidence of an effect on cannabis use. We also found evidence of an effect of opioid dependence on increased drinks per week (IVW: beta=0.002; 95% CI=0.0005 to 0.003; p-value=8.61x10-03). Conclusions: Overall, we found evidence suggesting a causal pathway from smoking initiation to alcohol consumption, and both cannabis use and dependence, which may support the gateway hypothesis. However, we also found causal effects of cannabis use on smoking initiation, and opioid dependence on alcohol consumption, which suggests the existence of a shared risk factor. Further research should explore whether this is the case, and in particular the nature of any shared risk factors.


Introduction
Illicit substance use and substance use disorders result in a substantial global burden on a range of health conditions (1,2). Identifying causal risk factors in the development of problematic substance use is important in order to design successful interventions and prevent subsequent health problems.
The gateway hypothesis, in its simplest form, is the theory that initial use of legal 'gateway' drugs, which include tobacco and alcohol, may lead to illicit drug use, such as cannabis, cocaine and opioids (3)(4)(5). Previous studies have found associations between smoking initiation and use of a number of other substances, including alcohol (6), cannabis (7,8), cocaine (9) and opioids (10). Whilst these studies may support the gateway hypothesis, it is equally plausible that there are underlying shared risk factors, for example risk taking or impulsive behaviours. Previous studies have reported an association of ADHD with several substance use outcomes (11,12), and ADHD genetic risk with smoking initiation (13,14), supporting impulsivity as a potential shared risk factor. To our knowledge, the association between both smoking initiation and alcohol consumption and other substances has not previously been examined using causal inference methods. However, there is some evidence (e.g., from randomised controlled trials) that smoking cessation may also result in improved substance use outcomes, such as reduced substance use or abstinence (15); this supports a possible causal effect of smoking on substance use outcomes. Investigating this further may provide evidence as to whether these relationships are due to the gateway hypothesis or a shared underlying risk factor.
Mendelian Randomisation (MR) is a well-established method for causal inference, based on instrumental variable (IV) analysis, which attempts to overcome issues of residual . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. ; https://doi.org/10.1101/2021.01.12.21249649 doi: medRxiv preprint 5 confounding and reverse causation (16)(17)(18)(19). MR uses genetic variants, which are assigned randomly at conception, as IVs for an exposure of interest to estimate the causal relationship with an outcome of interest. In two-sample MR (20) the single nucleotide polymorphism (SNP)-exposure and SNP-outcome estimates are obtained from genome-wide association studies (GWAS) in independent samples and used to estimate the possible causal effects.
Here we apply this two-sample MR approach to investigate the possible causal effect between both smoking initiation and alcohol consumption (defined as drinks per week) and substance use outcomes of cannabis use and dependence, cocaine dependence and opioid dependence. We also examine the association between smoking initiation and alcohol consumption. We use a bidirectional approach (Figure 1) to assess whether there is evidence supporting the gateway hypothesis (i.e., that smoking initiation or alcohol consumption can lead to use of other substances and dependence), or whether there is evidence of a shared risk factor for both smoking initiation or alcohol consumption and other substance use outcomes. Some pathways (e.g., from opioid use to smoking initiation) are implausible, so that analyses in this direction act more as a sensitivity analysis, which could help identify a shared risk factor rather than a causal effect. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. ; https://doi.org/10.1101/2021.01.12.21249649 doi: medRxiv preprint 6

Data sources
We used a number of GWAS obtained from several consortia and other samples, the details of which are shown in Table 1. All GWAS were conducted in samples of European ancestry.
To avoid sample overlap in our analyses we used GWAS with some of the samples excluded from the consortia (see Table 1).

Smoking initiation
The smoking initiation GWAS (21) identified 378 genome-wide significant SNPs associated with ever being a smoker i.e., where participants reported ever being a regular smoker in their life. See Supplementary Materials for further details. The total sample size was 1,232,091 for the GSCAN consortium, however the sample size included for the GWAS in each of our analyses varied to avoid sample overlap (see Table 1). Where smoking initiation was the outcome in analyses with full data, we used GWAS results from a meta-analysis of 23andMe only and all results excluding 23andMe. The meta-analysis was conducted using the genome-wide association meta-analysis (GWAMA) software (22).

Drinks per week
The drinks per week GWAS (21)  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. ; https://doi.org/10.1101/2021.01.12.21249649 doi: medRxiv preprint 7 The cannabis use GWAS (23) identified 8 genome-wide significant SNPs associated with ever using cannabis. See Supplementary Materials for further details.

Cannabis dependence
The cannabis dependence GWAS (24) did not identify any genome-wide significant SNPs associated with cannabis dependence. Cases were established based on meeting three or more criteria for Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV) cannabis dependence.

Cocaine dependence
The cocaine dependence GWAS identified one genome-wide significant SNP associated with cocaine dependence. All participants were interviewed using the Semi-structured Assessment for Drug Dependence and Alcoholism (SSADA) and cocaine dependent cases were established based on responses according to the DSM-IV criteria. Cases reflect lifetime cocaine dependence.

Opioid dependence
The opioid dependence GWAS did not identify any genome-wide significant SNPs associated with opioid dependence. All participants were interviewed using the SSADA and opioid dependent cases were established based on responses according to the DSM-IV criteria.
Cases reflect lifetime opioid dependence.
Units for all binary measures were in log odds ratios and for the continuous drinks per week measure were per SD increase in the number of drinks per week.

Statistical analyses
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. Multiple MR methods were used to assess the causal effects of: i) the exposure of smoking initiation/alcohol consumption on substance use outcomes, and ii) substance use exposures on smoking initiation/alcohol consumption. These were inverse-variance weighted (IVW) (28), MR-Egger (29), weighted median (30), simple mode and weighted mode (31) MR methods. These methods make varying assumptions about instrument validity and therefore a consistent effect estimate across all methods would provide greater evidence for a causal effect. However, the IVW approach is the main method used here, with the others being sensitivity analyses for different assumptions. The IVW method constrains the intercept to be zero and the assumption is that all instruments are valid with no horizontal pleiotropy. Therefore, we also include results for the Cochran's test of heterogeneity between the individual SNPs included in the instrument, which can indicate possible horizontal pleiotropy. The MR-Egger method tests whether there is overall directional pleiotropy by not constraining the intercept, where a non-zero intercept would indicate the presence of directional horizontal pleiotropy. We also use the Rucker's Q-test to assess heterogeneity between the individual SNPs whilst adjusting for this directional pleiotropy.
The weighted median method provides an estimate under the assumption that at least 50% of the SNPs are valid instruments (i.e., satisfy the IV assumptions). Finally, the mode-based is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. ; https://doi.org/10.1101/2021.01.12.21249649 doi: medRxiv preprint approaches provide an estimate for the largest cluster of similar SNPs, where the SNPs not in that cluster could be invalid, with the weighted method taking into account the largest weights of SNPs. In addition, we also estimate effects for single SNP and leave-one-out analyses, with plots for these results included in the Supplementary Material, where there is evidence for a causal effect.
We also estimated the mean F statistic, unweighted and weighted I-squared values for each of the analyses (32). These are presented in Table S1 for all MR analyses. The I-squared value falls between zero and one and indicates the amount of bias in the 'NO Measurement Error' (NOME) assumption in the MR-Egger estimate. A value of 0.9 or above would indicate that there is minimal bias in the MR-Egger estimate. For values between 0.6 and 0.9 we have run simulation extrapolation (SIMEX) corrections and present these in place of the MR-Egger results. Specifically, we present these for the I-squared with the highest value out of the weighted and unweighted I-squared. SIMEX allows for the calculation of bias-adjusted point estimates for MR-Egger. However, for any values below 0.6 (where this bias is too large) it is not appropriate to run the SIMEX correction. We have indicated where this is the case, as these MR Egger results cannot be interpreted with confidence.
Finally, we conducted multivariable MR (MVMR) to investigate whether the causal effect of smoking initiation was independent of that for the drinks per week exposure for any substance use outcomes where both these exposures were associated with the outcome.
MVMR is an extension of MR which allows one to estimate the causal effect of multiple exposures on an outcome and assess whether each exposure is independent of the others (33). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. ; https://doi.org/10.1101/2021.01.12.21249649 doi: medRxiv preprint

Evidence of causal effects of smoking initiation on substance use outcomes
Our two-sample MR results (Table S2 and Figure 2) indicated a causal effect of smoking initiation on increased drinks per week (IVW: β=0.06; 95% CI 0.03 to 0.09; p-value=9.44x10 -06 ). These results were in a consistent direction across the different MR analyses (see also Figure S1), and there was evidence of a causal effect for all methods except MR Egger.
However, we did observe evidence of heterogeneity in these results for the IVW and MR Egger methods (see also Figure S2), but not horizontal pleiotropy (see also Figure S3). Leaveone-out analyses did not reveal that any particular SNP was driving the association ( Figure   S4).
We also found evidence of a causal effect of smoking initiation on cannabis use (IVW: OR=1.34; 95% CI=1.24 to 1.44; p-value=1.95x10 -14 ). These results were in a consistent direction for all MR analyses (see also Figure S5), although evidence for this causal effect was only found additionally for the weighted median method. There was evidence of heterogeneity with both the IVW and MR Egger methods (see also Figure S6) but not for horizontal pleiotropy (see also Figure S7). Leave-one-out analyses did not reveal that any particular SNP was driving the association ( Figure S8).
We found evidence of a causal effect of smoking initiation on cannabis dependence (IVW: OR=1.68; 95% CI 1.12 to 2.51; p-value=0.01). These results were in a consistent direction across all the MR Egger and weighted median methods, although evidence for these results was weak (see also Figure S9). There was no evidence of heterogeneity or horizontal . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. ; https://doi.org/10.1101/2021.01.12.21249649 doi: medRxiv preprint 11 pleiotropy (also see Figures S10 and S11) for these results. Leave-one-out analyses did not reveal that any particular SNP was driving the association ( Figure S12).
Finally, we did not find evidence of a causal effect of smoking initiation on cocaine dependence (IVW: OR=1.21; 95% CI 0.58 to 2.53; p-value=0.60) or opioid dependence (IVW: OR=1.41; 95% CI 0.62 to 3.20; p-value=0.41) with any of the MR analyses, except for some weak evidence for the SIMEX adjusted MR-Egger for cocaine dependence. There was no evidence of heterogeneity or horizontal pleiotropy for cocaine or opioid dependence.

Causal effects of substance use exposures on smoking initiation
For the direction of substance use to smoking initiation (Table S3 and Figure 3) we found evidence of a causal effect of cannabis use on smoking initiation (IVW: OR=1.39; 95% CI=1.08 to 1.80; p-value=0.01) for all MR analyses except MR Egger. These results were in a consistent direction across the different MR analyses (see also Figure S13). However, we did observe evidence of heterogeneity in these results for the IVW and MR Egger methods (see also Figure S14), but not horizontal pleiotropy (see also Figure S15). Leave-one-out analyses did not reveal that any particular SNP was driving the association ( Figure S16). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. There was evidence to suggest a causal effect of opioid dependence on drinks per week (IVW: β=0.002; 95% CI=0.0005 to 0.003; p-value=8.61x10 -03 ), although the effect size was very small and this was not the case for any of the other MR analyses (see also Figure S17).
There was no evidence of heterogeneity (see also Figure S18) or horizontal pleiotropy . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. ; https://doi.org/10.1101/2021.01.12.21249649 doi: medRxiv preprint 13 ( Figure S19). Leave-one-out analyses did not reveal that any particular SNP was driving the association ( Figure S20). ** Figure 5 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. ; https://doi.org/10.1101/2021.01.12.21249649 doi: medRxiv preprint

Discussion
We examined whether there was evidence for causal effects of smoking initiation and alcohol consumption on the use of cannabis and dependence on cannabis, cocaine and opioids, which may support the 'gateway' hypothesis. We also examined the reverse direction, where evidence of an association, particularly in both directions may be indicative of some other underlying common risk factor.
Our main findings were those for cannabis use and cannabis dependence outcomes, which suggest that ever smoking, in particular, may act as a gateway to subsequent cannabis use and perhaps even dependence, although evidence was weaker for the latter. This supports previous observational studies demonstrating an association between these phenotypes (7,8,34); however, our MR analyses have greater power and support stronger causal inference. Our results are in line with previous findings suggesting that tobacco is a gateway drug to other more problematic substance use (5,6,8,10). There was also some suggestion that alcohol consumption may also be causally associated with cannabis use; however, when examining this using MVMR there was no evidence for independent effects of alcohol consumption, only evidence for a causal effect of smoking initiation on cannabis use.
We also found evidence for a causal pathway from cannabis use to smoking initiation. It has been previously suggested that cannabis use may act as gateway to the initiation of tobacco use, which could be due to the form in which cannabis is used, for example if it is smoked with tobacco (35), which our results support. However, the fact that we find causal pathways between cannabis use and smoking initiation in both directions may point to this association being due to an underlying common risk factor, as opposed to either tobacco or cannabis being a gateway drug. Previous studies have suggested that impulsive or risk- is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. ; https://doi.org/10.1101/2021.01.12.21249649 doi: medRxiv preprint taking behaviours may be associated with smoking initiation and substance use (36)(37)(38). In addition, the cannabis use measure we used may capture underlying risk-taking behaviours more than the dependence measures, particularly for opioid and cocaine dependence and this may be why we see a more consistent association with this measure. However, further research is needed to establish whether there could be an underlying common cause, and if this might be related to risk-taking behaviours. Other potential shared risk factors should also be considered, and these may be genetic or environmental in origin and may also vary between the substance use and dependence phenotypes. In addition, it may be the case that smoking initiation, for example, only acts as a gateway to other substances in the presence of mediators such as stressful life events or adverse circumstances. Therefore, the mechanisms behind these associations need to be examined further.
Our finding of a causal effect of smoking initiation on increased drinks per week would suggest that ever smoking causally effects the amount of alcohol consumed. We do not find an association in the reverse direction, but it is plausible to suggest this may indicate an underlying risk-taking behaviour which effects alcohol consumption via smoking. However, it may also be the case that there is a biological mechanism behind this association, which should be studied further. Finally, we do see evidence of a causal effect of opioid dependence on increased drinks per week; however, due to the low power for the opioid dependence GWAS and the small effect size we would interpret this with caution. Opioid dependence (compared with ever use) is less likely explained by underlying risk-taking behaviour. Therefore, further research into alternative shared risk factors is warranted.

Limitations
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. ; https://doi.org/10.1101/2021.01.12.21249649 doi: medRxiv preprint Our study is the first, to our knowledge, to examine whether causal pathways may exist between smoking initiation/alcohol consumption and various substance use phenotypes, using an MR approach. However, there are several limitations which should be noted, for example some of our analyses may be limited in their power to detect a causal effect. This is particularly the case where smoking initiation was the outcome for the substance use dependence measures, where the discovery samples used for the GWAS were much smaller than those for drinks per week, cannabis use and smoking initiation. In addition, due to the lower number of SNPs used for the dependence exposures, there may also be issues with the instruments used being weak, which may be particularly problematic for MR-Egger. Therefore, our findings relating to these should be interpreted with caution and it would be useful to revisit this in the future once larger GWAS become available. In addition, we did find some evidence of heterogeneity and horizontal pleiotropy for different analyses meaning that these results should be interpreted in light of this, as some of the SNPs used may be associated with the outcome other than via the exposure. However, the additional MR analyses, which account for this, were generally in the same direction as our main results, although we were unable to formally test for directional pleiotropy in some cases where the I-squared estimate was too low.
Another consideration is that we found that the SNPs used in the cannabis use instrument are in high LD with those that were genome-wide significant in the smoking initiation GWAS.
Specifically, all of the SNPs used in our instrument for cannabis use were in high LD (r 2 >0.1, 500kb window) with the published genome-wide significant SNPs for smoking initiation. This overlap does not help us to disentangle the two possible explanations here (i.e., that smoking initiation is a gateway to cannabis use or that there is a common underlying risk . CC-BY 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. ; https://doi.org/10.1101/2021.01.12.21249649 doi: medRxiv preprint factor). In addition, the MR instruments used may not be valid for smoking as they may be picking up risk-taking behaviours more than smoking itself (39). Therefore, it would be useful to examine this further with other smoking related phenotypes such as smoking heaviness in stratified samples.
Finally, the MR analysis itself is subject to several limitations (17). For example, the GWAS used for MR may suffer from 'Winner's curse', where the SNP-exposure estimates may be overestimated, due to selecting SNPs with the smallest p-values and therefore the MR estimate will be biased towards the null. Thus, the actual effect sizes reported may not be accurate and therefore interpreting the direction of effect as opposed to the effect size itself is more valid in MR analyses. The MR effect estimate may also be biased due to trait heterogeneity, for example, different aspects of substance use behaviours may be associated with the same genetic variants and therefore it is difficult to get a precise estimate for a single aspect of any substance use behaviour.

Conclusion
Whilst our findings support the gateway hypothesis to some extent, they also point to a potential underlying common risk factor and with better powered GWAS or those with more precise instruments and additional research we may be able to interrogate this further.
Triangulating our results with other approaches would be useful to help answer this question (40,41). By doing so we may be able to identify risk factors to substance use more generally which could ultimately help with intervention design. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. ; https://doi.org/10.1101/2021.01.12.21249649 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. ; https://doi.org/10.1101/2021.01.12.21249649 doi: medRxiv preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. ; https://doi.org/10.1101/2021.01.12.21249649 doi: medRxiv preprint 28 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 13, 2021. ; https://doi.org/10.1101/2021.01.12.21249649 doi: medRxiv preprint