ACTR1A has pleiotropic effects on risk of leprosy, inflammatory bowel disease and atopy

Leprosy is a chronic infection of the skin and peripheral nerves caused by Mycobacterium leprae. Despite recent improvements in disease control, leprosy remains an important cause of infectious disability globally. Large-scale genetic association studies in Chinese, Vietnamese and Indian populations have identified over 30 susceptibility loci for leprosy. There is a significant burden of leprosy in Africa, however it is uncertain whether the findings of published genetic association studies are generalizable to African populations. To address this, we conducted a genome-wide association study (GWAS) of leprosy in Malawian (327 cases, 436 controls) and Malian (247 cases, 368 controls) individuals. In that analysis, we replicated five risk loci previously reported in China, Vietnam and India; MHC Class I and II, LACC1 (2 independent loci) and SLC29A3. We further identified a novel leprosy susceptibility locus at 10q24 (rs2015583: combined p=8.81x10-9; OR=0.51 [95% CI 0.40-0.64]). The leprosy risk locus is a determinant of ACTR1A RNA expression in CD4+ T cells (posterior probability of colocalization - PP=0.96). Furthermore, it demonstrates pleiotropy with established risk loci for inflammatory bowel disease and atopic disease. Reduced ACTR1A expression decreases susceptibility to leprosy and atopy but increases risk of inflammatory bowel disease. A shared genetic architecture for leprosy and inflammatory bowel disease has been previously described. We expand on this, strengthening the evidence that selection pressure driven by leprosy has shaped the evolution of autoimmune and atopic disease in modern populations. More broadly, our data highlights the importance of defining the genetic architecture of disease across genetically diverse populations, and that disease insights derived from GWAS in one population may not translate to all affected populations.


Introduction
: Manhattan plot of leprosy in Malawi and Mali. Evidence for association with leprosy at genotyped and imputed autosomal SNPs and indels (n = 9, 616, 523) in Malawi and Mali (492 cases,639 controls). Association statistics represent a fixed-effects meta-analysis of additive association with disease in Malawi and Mali. The red, dashed line denotes genome-wide significance (p = 5 × 10 −8 ).

83
GWAS replication and meta-analysis 84 We sought to replicate evidence for leprosy association observed in Malawi among leprosy cases and 85 healthy controls in Mali. Individuals with leprosy were recruited to the study at Mali's former national in Malawi and Mali using a fixed-effects meta-analysis ( Fig. 1). Of the 142 leprosy-associated loci identi-93 fied in the discovery analysis, 18 SNPs, at a single genomic locus at 10q24.32 ( Fig. 2A), showed evidence 94 of replication in Mali (p < 0.05) and overall evidence of association with leprosy exceeding genome-wide 95 significance (p < 5 × 10 −8 ). The variant with the strongest evidence for leprosy association at that locus 96 is rs2015583: p = 8.81 × 10 −9 , OR = 0.51 (95% CI 0.40 − 0.64). There is no evidence for heterogeneity of 97 effect between populations at rs2015583 (heterogeneity p = 0.871, Fig. 2B), and the data best supports 98 a model in which rs2015583 modifies risk of both paucibacillary and multibacillary leprosy (log10 Bayes 99 factor = 6.01, Fig. 2B). (A) Regional association plot of leprosy association at chr10q24.32. Association statistics represent a fixed-effects meta-analysis of additive association with disease in Malawi and Mali. SNPs are coloured according to linkage disequilibrium to rs2015583, and genotyped SNPs marked with black plusses. (B) Log-transformed odds ratios and 95% confidence intervals of rs2015583 association with leprosy in Malawi and Mali (top) and stratified by multibacillary and paucibacillary disease (middle). Posterior probabilities of models of rs2015583 association with leprosy: "Null", no association with leprosy; "MB", non-zero effect in multibacillary leprosy alone; "PB", non-zero effect in paucibacillary leprosy alone; "Both", the same non-zero effect is shared by individuals with multibacillary and paucibacillary leprosy. (C) Log-transformed odds ratios and 95% confidence intervals of rs2015583 association with gene expression in primary immune cells (top). Associations which colocalize with the leprosy association signal (ACTR1A expression in CD4 + T cells) are highlighted in pink. The ACTR1A eQTL in CD4+ T cells colocalizes with the risk locus for leprosy at chr10q24.32 (bottom). SNPs are coloured according to linkage disequilibrium to rs2015583 as above. (D) Log-transformed odds ratios and 95% confidence intervals of rs2015583 association (top) with immune-mediated diseases (IBD, inflammatory bowel disease; atopy; hayfever; childhood-onset asthma) and hematological indices (WCC, white cell count; Neut, neutrophil count; MCV, mean corpuscular volume; MCH, mean corpuscular hemoglobin). The ACTR1A eQTL in CD4 + T cells colocalizes with the GWAS locus for each trait at chr10q24.32 (bottom). SNPs are coloured according to linkage disequilibrium to rs2015583 as above.
6 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022 evidence for leprosy association in the HLA in Malawi and Mali at the level of SNPs and classical HLA 139 alleles. In a fixed-effects meta-analysis of leprosy association in Malawi and Mali (Fig. 3A,Supplementary 140 Table 5), the peak classical allele association is a class II allele: HLA-DQB1*04:02 (p = 6.74 × 10 −5 , 141 F DR = 0.0063, OR = 2.1 95% CI 1.75 − 2.51). We also observed a leprosy association in the class I 142 region, at HLA-B*49:01 (p × 6.02 × 10 −4 , F DR = 0.0156), which is independent of HLA-DQB1*04:02 143 ( Supplementary Fig. 3). No significant residual associations were observed after conditioning on both 144 DQB1*04:02 and HLA-B*49:01 ( Supplementary Fig. 4). There is no evidence for heterogeneity of effect Our identification of a genetic variant modifying leprosy risk in African populations, but not in Chinese 151 populations, highlights the inter-population heterogeneity that has been observed across large-scale ge-152 netic studies of leprosy susceptibility (Gzara et al., 2020;Wang et al., 2016;Wong et al., 2010). Our 153 study has adequate power (> 80%) to replicate (p < 0.05) findings at 7 of 34 previously-published leprosy 154 risk loci outside the HLA (Supplementary Table 6). We were able to replicate leprosy associations at 155 2 loci (Fig. 4A); a missense SNP in LACC1, rs3764147 (p = 0.004, OR = 1.36 95% CI 1.10 − 1.67), 156 and a missense SNP in SLC29A3, rs780668 (p = 0.034, OR = 1.28 95% CI 1.02 − 1.60). There is 157 no evidence for heterogeneity of effect between populations at rs3764147 or rs780668 (heterogeneity 158 p = 0.444 and p = 0.159, Fig. 4A), and the data best supports a model in which both rs3764147 and 159 rs780668 modify risk of both paucibacillary and multibacillary leprosy (log10 Bayes factors = 1.64 and 160 0.68 respectively, Fig. 4B,C). Among the 6 loci at which we were not able to demonstrate replication of 161 previously-published leprosy association despite adequate study power, there is no evidence that our lack 162 of replication reflects effects restricted to multibacillary or paucibacillary disease (Supplementary Table   163 7). We further considered whether our failure to replicate previously-reported leprosy associations could 164 represent differential linkage disequilibrium to an undefined causal locus between study populations. To 165 test this, we examined evidence for leprosy association within 250kb of each previously-reported leprosy 166 risk locus outside the HLA. In that analysis we identified a promoter variant in RAB32, rs34271799,  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022. ; Figure 3: MHC leprosy association in Malawi and Mali. (A) Regional association plot of leprosy association across the HLA region. Association statistics represent a fixed-effects meta-analysis of additive association with disease in Malawi and Mali. SNPs are coloured according to linkage disequilibrium to rs9270926, and genotyped SNPs marked with black plusses. Imputed classical HLA alleles are plotted as diamonds, with significantly associated (FDR < 0.05) alleles highlighted in blue. (B) Log-transformed odds ratios and 95% confidence intervals of HLA-DBQ1*04:02 and HLA-B*49:01 associations with leprosy in Malawi and Mali. (C) Log-transformed odds ratios and 95% confidence intervals of HLA-DBQ1*04:02 and HLA-B*49:01 associations with leprosy stratified by multibacillary and paucibacillary disease. (D) Posterior probabilities of models of HLA-DBQ1*04:02 and HLA-B*49:01 associations with leprosy: "Null", no association with leprosy; "MB", non-zero effect in multibacillary leprosy alone; "PB", non-zero effect in paucibacillary leprosy alone; "Both", the same non-zero effect is shared by individuals with multibacillary and paucibacillary leprosy. 8 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022. ; Figure 4: Replication of leprosy associations at LACC1 and SLC29A3 in Malawi and Mali. (A) Log-transformed odds ratios and 95% confidence intervals of rs3764147 and rs780668 associations with leprosy in Malawi and Mali. (B) Log-transformed odds ratios and 95% confidence intervals of rs3764147 and rs780668 associations with leprosy stratified by multibacillary and paucibacillary disease. (C) Posterior probabilities of models of rs3764147 and rs780668 associations with leprosy: "Null", no association with leprosy; "MB", non-zero effect in multibacillary leprosy alone; "PB", non-zero effect in paucibacillary leprosy alone; "Both", the same non-zero effect is shared by individuals with multibacillary and paucibacillary leprosy.

9
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022. ; https://doi.org/10.1101/2022.01.31.22270046 doi: medRxiv preprint risk in African populations. In common with many examples of trait-associated genetic variation identi-174 fied by GWAS (Maurano et al., 2012), variation at 10q24.32 modifies risk of leprosy through regulatory 175 effects on gene expression, specifically ACTR1A expression in CD4 + T cells. We expand upon this, 176 identifying evidence of pleiotropy at 10q24.32, demonstrating a shared genetic architecture of leprosy, 177 inflammatory bowel disease and atopy at this locus. Furthermore, we replicate previously identified 178 leprosy susceptibility loci at LACC1, SLC29A3, and with HLA Class I and II alleles. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022. ; https://doi.org/10. 1101/2022 observe no evidence of leprosy association in Malawi or Mali at 6 of the 7 non-HLA loci at which we have 207 adequate study power to assess this. We hypothesized that some of these inter-population differences 208 may reflect differential effects of genetic risk loci on multibacillary and paucibacillary disease, however we 209 find no evidence to support this in our data. Differential linkage disequilibrium between assayed variation 210 and a shared causal locus may explain some of the observed inter-population genetic heterogeneity of 211 leprosy risk. In keeping with this we observe modest evidence of leprosy association at RAB32, which is 212 distinct from that reported in Chinese populations. Understanding whether genetic variation at RAB32 213 is associated with leprosy risk in African populations, and whether this is distinct from that observed in 214 Chinese populations, will require replication in additional study populations.

215
Here we define regulatory variation at ACTR1A as a novel determinant of leprosy susceptibility in 216 African populations. Moreover, regulatory variation at ACTR1A has pleiotropic effects on hematological 217 indices in European populations and risk of IBD and atopy. A shared genetic architecture for leprosy 218 and IBD has been previously described. We expand on this, strengthening the evidence that selection 219 pressure driven by leprosy has shaped the evolution of immune-mediated disease in modern populations.

220
Our colocalization analyses identify ACTR1A as a potential therapeutic target for autoimmune and 221 atopic disease, and deepens our understanding of leprosy biology, which will be key in informing the 222 development of novel control strategies. More broadly, our data highlights the importance of defining 223 the genetic architecture of disease across genetically diverse populations, and that disease insights derived 224 from GWAS in one population may not readily translate to all affected populations.

226
Ethics and consent 227 All cases and controls were recruited following informed consent of the participant or their parent/guardian.

228
The study protocol detailing recruitment and sample collection within KPS, Malawi was approved by the  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022. ; https://doi.org/10.1101/2022.01.31.22270046 doi: medRxiv preprint genotypes such that all alleles are on the forward strand. Throughout genetic positions reflect GRCh37.

274
Sample quality control 275 We calculated per sample quality control (QC) metrics in PLINK (Purcell et al., 2007). For each sample 276 we calculated the proportion of missing genotype calls, heterozygosity and the mean X and Y channel 277 intensities. We plotted mean X and Y channel intensities (Supplementary Fig. 6) and missingness against 278 heterozygosity ( Supplementary Fig. 7), defining outlier samples using ABERRANT (Bellenguez et al.,  Fig. 10).

SNP quality control 291
Prior to genome-wide imputation, we extracted genotypes from non-duplicated, autosomal SNPs and 292 applied the following SNP QC filters; SNP missingness > 10%, minor allele frequency (MAF) < 1%, 293 Hardy-Weinberg equilibrium (HWE) p < 1 × 10 −20 and plate effect p < 1 × 10 −6 . HWE was calculated 294 among control samples for each cohort. Plate effect represents an association test of nondifferential 295 missingness with the plate on which each sample was genotyped. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022 alleles (42 class I and 51 class II) in downstream analysis.

311
Additional cross-platform quality control 312 We noted that relatively few Mali control samples (n=142) were available for inclusion in the association 313 analysis. To address this, we used genotypes from additional control samples (n=183)  Epidemiology, 2019) is highly analogous to the QC we applied to our study samples. MalariaGEN SNP 320 QC excluded poorly genotyped SNPs using the following metrics; SNP missingness (thresholds 2.5-10% 321 dependent on study population), MAF < 1%, HWE p < 1 × 10 −20 , plate effect p < 1 × 10 −3 and a 322 recall test quantifying changes in genotype following a re-clustering process p < 1 × 10 −6 . MalariaGEN   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022 variant passing QC we tested for association with leprosy case-control status in a logistic regression model 337 in SNPTEST (Marchini et al., 2007) in each cohort. At loci of interest, we used multinomial logistic 338 regression, implemented in SNPTEST, to estimate the effect of the genetic variation on leprosy risk 339 stratified by multibacillary and paucibacillary disease. We used control status as the baseline stratum 340 and cases of multibacillary and paucibacillary leprosy as strata. To account for confounding variation, in 341 particular population structure, we included the six major principal components of genotyping data in 342 all models. In addition, in Mali, we included genotyping platform as an additional categorical covariate.

343
At variants passing QC thresholds in both cohorts, we then performed genome-wide meta-analysis under 344 a frequentist fixed-effects model using BINGWA (Band et al., 2015). For association analysis using HLA 345 allele imputations we coded posterior probabilities of each HLA allele to represent carriage of 0, 1 or 2 346 copies of that allele. Association analysis and meta-analysis was performed in SNPTEST and BINGWA 347 as above. For HLA association analysis we corrected for the number of classical alleles tested (n=93) 348 and considered FDR < 0.05 to be significant.

349
Bayesian comparison of models of association 350 We compared models of association at loci of interest with multibacillary and paucibacillary leprosy, as 351 estimated by multinomial logistic regression, using a Bayesian approach. We considered four models of 352 effect, defined by the prior distributions on the effect size: 353 "Null": effect size = 0, i.e. no association with leprosy.

357
For each model we calculated approximate Bayes factors (Wakefield, 2009) and posterior probabilities, 358 assuming each model to be equally likely a priori. Statistical analysis was performed in R.

379
To assess evidence for pleiotropy with other disease traits we again used coloc to test for the presence of  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022 association and population-based linkage analyses. Am J Hum Genet, 81 (3), 559-575. https:

21
. CC-BY 4.0 International license It is made available under a perpetuity.

24
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022. ; https://doi.org/10.1101/2022.01.31.22270046 doi: medRxiv preprint Figure S5: Evidence for leprosy association at the RAB32 locus. (A) Log-transformed odds ratios and 95% confidence intervals of rs2275606 association (peak association in Chinese GWAS data) with leprosy in Malawi, Mali and China. (B) Log-transformed odds ratios and 95% confidence intervals of rs34271799 association with leprosy in Malawi and Mali. (C) Regional association plot of leprosy association at the RAB32 locus. Association statistics represent a fixed-effects meta-analysis of additive association with disease in Malawi and Mali. SNPs are coloured according to linkage disequilibrium to rs34271799, and genotyped SNPs marked with black plusses.

26
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022. ; https://doi.org/10.1101/2022.01.31.22270046 doi: medRxiv preprint Figure S6: Sample X and Y channel intensities. (A) Mean X and Y channel intensities for Malawi (top) and Mali (bottom) samples. Outlying samples were identified using ABERRANT and are highlighted (orange).

27
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022. ; https://doi.org/10.1101/2022.01.31.22270046 doi: medRxiv preprint Figure S7: Sample missingness and heterozygosity.
(A) Mean sample genotype missingness plotted against heterozygosity for Malawi (top) and Mali (bottom) samples. Outlying samples were identified using ABERRANT and are highlighted (orange).

28
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022. ; https://doi.org/10.1101/2022.01.31.22270046 doi: medRxiv preprint Figure S8: Population outliers. Plot of the major two principal components of genome wide genotyping data. Malawi study samples are plotted in orange and Mali study samples in green, against a background of African Genome Variation Project samples (gray). Outliers are highlighted (black rings).

29
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022. ; https://doi.org/10.1101/2022.01.31.22270046 doi: medRxiv preprint Figure S9: Principal components of Malawian genome-wide genotyping data. Individuals are color-coded according to self-reported ethnicity (top) and case-control status; cases in pink, controls in green (bottom).

30
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022. ; https://doi.org/10.1101/2022.01.31.22270046 doi: medRxiv preprint Figure S10: Principal components of Malian genome-wide genotyping data. Individuals are color-coded according to self-reported ethnicity (top), case-control status (middle; cases in pink, controls in green), and genotyping platform (bottom; Omni 2.5M in purple, Africa Diaspora Power Chip in gray).

31
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 31, 2022. ; https://doi.org/10. 1101/2022