Genome-wide association study identifies five risk loci for pernicious anemia and implicates the role of HLA-DR15 haplotype

Pernicious anemia is a rare condition characterized by vitamin B12 deficiency anemia due to lack of intrinsic factor, often caused by autoimmune gastritis. Patients with pernicious anemia have a higher incidence of other autoimmune disorders, such as type 1 diabetes, vitiligo and autoimmune thyroid issues. Therefore, the disease has a clear autoimmune basis, although the genetic susceptibility factors have thus far remained poorly studied. We conducted a genome-wide association study meta-analysis in 2,166 cases and 659,516 European controls from population-based biobanks and identified genome-wide significant signals in or near the PTPN22 (rs6679677, p=1.91 x 10-24, OR=1.63), PNPT1 (rs12616502, p=3.14 x 10-8, OR=1.70), HLA-DQB1 (rs28414666, p=1.40 x 10-16, OR=1.38), IL2RA (rs2476491, p=1.90 x 10-8, OR=1.22) and AIRE (rs74203920, p=2.33 x 10-9, OR=1.83) genes, thus providing the first robust associations between pernicious anemia and genetic risk factors. We further mapped the susceptibility in the HLA region to the HLA-DR15 haplotype. Analysis of associated diagnoses and disease trajectories confirm the association between pernicious anemia and thyroid issues, vitiligo, gastritis, stomach cancer, osteoporosis and other diagnoses.

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 14, 2020. ;  Figure 2).
On chromosome 2, the lead signal rs12616502 is in high LD with exonic non-synonymous variants in CCDC104 (r 2 =0.94, rs1045920) and PNPT1 (r 2 =0.94, rs7594497), and this region has been previously associated with both vitiligo, hypothyroidism and myelodysplastic syndrome (Supplementary Tables 1 and 3  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 14, 2020. ; . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 14, 2020. ; https://doi.org/10.1101/2020.10.13.20211912 doi: medRxiv preprint On chromosome 10, the sentinel variant rs2476491 is intronic to IL2RA. IL2RA encodes the interleukin-2 receptor alpha chain, thus being involved in regulating regulatory T-cells and immune tolerance, as regulatory T cells suppress autoreactive T-cells. Accordingly, this locus has previously been associated with multiple sclerosis, juvenile idiopathic arthritis, vitiligo and hypothyroidism (Supplementary Table 1).
Finally, the association on chromosome 21, rs74203920, is a missense variant in the AIRE gene, a known autoimmune regulator. Interestingly, mutations in AIRE are a known cause of autoimmune polyendocrinopathy syndrome type 1 (APS-1), which is a rare autosomal recessive syndrome, that sometimes includes pernicious anemia among other components 16 .
We used the individual level data in EstBB to evaluate the association between pernicious anemia and other diseases (defined by ICD-10 codes). According to our analysis, individuals with pernicious anemia have more diagnoses of other anemias and vitamin deficiencies ( vitamin D deficiency, E61 deficiency of other nutrient elements). Majority of these diagnoses reflect the etiology of pernicious anemia (gastritis) or symptoms (skin problems, syncope and collapse, depression, stomatitis), known comorbidities (vitiligo, thyroid issues 4,5 ), or diseases where pernicious anemia is a known risk factor (such as osteoporosis 17 and stomach cancer 18 ).
The association with spontaneous abortion is interesting, as although there is some evidence B12 deficiency and pernicious anemia could cause recurrent miscarriage [19][20][21] , the data is scarce and the link with spontaneous miscarriage has not been explored in depth.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 14, 2020. ; In summary, our analysis of 2,166 cases and 659,516 controls identifies robust risk loci for pernicious anemia in or near candidate genes with a known role in autoimmune conditions (PTPN22, HLA, IL2RA, AIRE). We further narrow the signal in the HLA region to the HLA-DR15 haplotype and propose PNPT1 as the most likely causal gene in the 2p16.1 locus. The associations between the identified loci and other autoimmune conditions, such as type 1 diabetes, vitiligo, and autoimmune thyroid conditions help to clarify the link between pernicious anemia and its common comorbidities. Analysis of associated diagnoses and disease trajectories confirm the association between pernicious anemia and thyroid issues, vitiligo, gastritis, stomach cancer, osteoporosis and other diagnoses, but also between pernicious anemia and spontaneous abortion.

Estonian Biobank
The Estonian Biobank is a population-based biobank with over 200,000 participants. The 150K data freeze was used for the analyses described in this paper. All biobank participants have signed a broad informed consent form. Individuals with pernicious anemia were identified using the ICD-10 code D51.0, and all biobank participants who did not have this diagnosis were considered as controls. Information on ICD codes is obtained via regular linking with the national Health Insurance Fund and other relevant databases 8 .
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
Samples were genotyped and PLINK format files were created using Illumina GenomeStudio v2.0.4. Individuals were excluded from the analysis if their call-rate was < 95% or if sex defined based on heterozygosity of X chromosome did not match sex in phenotype data. Before imputation, variants were filtered by call-rate < 95%, HWE p-value < 1e-4 (autosomal variants only), and minor allele frequency < 1%. Variant positions were updated to b37 and all variants were changed to be from TOP strand using GSAMD-24v1-0_20011747_A1-b37.strand.RefAlt.zip files from https://www.well.ox.ac.uk/~wrayner/strand/ webpage. Prephasing was done using Eagle v2.3 software 22 (number of conditioning haplotypes Eagle2 uses when phasing each sample was set to: --Kpbwt=20000) and imputation was done using Beagle v.28Sep18.793 23 with effective population size ne=20,000. Population specific imputation reference of 2297 WGS samples was used.
Association analysis was carried out using SAIGE (v0.38) 10 software implementing mixed logistic regression model with LOCO option, using sex, year of birth and 10 PCs as covariates in step I.

UK Biobank
The

FinnGen
FinnGen is a public-private partnership project combining data from Finnish biobanks and electronic health records from different registries. After a one-year embargo, the FinnGen . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 14, 2020. ; https://doi.org/10.1101/2020.10.13.20211912 doi: medRxiv preprint summary stats are available for download. In this study, we used the results from the FinnGen release R3, which includes data from 135,638 individuals and more than 1,800 disease endpoints.
FinnGen individuals have been genotyped with Illumina and Affymetrix arrays and imputed to the population-specific SISu v3 importation reference panel. Genetic association testing has been carried out with SAIGE 10 . The FinnGen disease endpoint "Vitamin B12 deficiency anemia" included all individuals with the ICD10 D51 diagnosis as cases. For more information on genotype data, disease endpoints and GWAS analyses, please see https://finngen.gitbook.io/documentation/.

GWAS meta-analysis
We extracted all genetic variants with a rs-number from the summary statistics of the three participating cohorts and conducted an inverse of variance weighted fixed-effects meta-analysis without genomic control using GWAMA 11 . A total of 30,907,385 variants were included in the meta-analysis. Genome-wide significance was set to p < 5 × 10 -8 .

HLA allele imputation in the EstBB
Imputation of HLA alleles from SNP data was carried out at the Broad Institute using the SNP2HLA tool 24 .
In the analysis we compared our significant GWAS loci to all eQTL Catalogue We lifted the GWAS summary statistics over to hg38 build to match the eQTL Catalogue.
For each genome-wide significant (p<5 × 10 -8 ) GWAS locus we extracted the 1Mbp radius of its top hit from QTL datasets and ran the colocalization analysis for those eQTL Catalogue traits that . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 14, 2020. ; https://doi.org/10.1101/2020.10.13.20211912 doi: medRxiv preprint had at least one cis-QTL within this region with p< 1 × 10 -6 . We considered two signals to colocalize if the posterior probability for a shared causal variant was 0.8 or higher. All results with a PP4 > 0. 8 Supplementary Table 3.

Mouse phenotypes
We used the Mouse Genome Database 12 ( http://www.informatics.jax.org) to evaluate the PNPT1 effect on phenotype in mouse models.

Analysis of associated phenotypes in EstBB
Using the individual level data in the EstBB, we conducted an analysis to find ICD10 diagnosis codes associated with the D51.0 diagnosis. We tested the association between pernicious anemia status (defined as ICD10 D51.0) and other ICD10 codes using logistic regression and adjusting for sex, age and 10 PCs. Bonferroni correction was applied to select statistically significant associations (Number of tested ICD main codes -1,944, corrected p-value threshold -2.5 × 10 -. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 14, 2020. ; https://doi.org/10.1101/2020.10.13.20211912 doi: medRxiv preprint