Genome-wide association study stratified by MAPT haplotypes identifies potential novel loci in Parkinson’s disease

Objective: To identify genetic factors that may modify the effects of the MAPT locus in Parkinson’s disease (PD). Methods: We used data from the International Parkinson’s Disease Genomics Consortium (IPDGC) and the UK biobank (UKBB). We stratified the IPDGC cohort for carriers of the H1/H1 genotype (PD patients n=8,492 and controls n=6,765) and carriers of the H2 haplotype (with either H1/H2 or H2/H2 genotypes, patients n=4,779 and controls n=4,849) to perform genome-wide association studies (GWASs). Then, we performed replication analyses in the UKBB data. To study the association of rare variants in the new nominated genes, we performed burden analyses in two cohorts (Accelerating Medicines Partnership – Parkinson Disease and UKBB) with a total sample size PD patients n=2,943 and controls n=18,486. Results: We identified a novel locus associated with PD among MAPT H1/H1 carriers near EMP1 (rs56312722, OR=0.88, 95%CI= 0.84–0.92, p= 1.80E-08), and a novel locus associated with PD among MAPT H2 carriers near VANGL1 (rs11590278, OR=1.69 95%CI=1.40–2.03, p=2.72E-08). Similar analysis of the UKBB data did not replicate these results and rs11590278 near VANGL1 did have similar effect size and direction in carriers of H2 haplotype, albeit not statistically significant (OR= 1.32, 95%CI= 0.94–1.86, p=0.17). Rare EMP1 variants with high CADD scores were associated with PD in the MAPT H2 stratified analysis (p=9.46E-05), mainly driven by the p.V11G variant. Interpretation: We identified several loci potentially associated with PD stratified by MAPT haplotype and larger replication studies are required to confirm these associations.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 15, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Introduction
Parkinson's disease (PD) is a neurodegenerative disorder with complex genetic background, neuropathology, and clinical features. It is possible that what we define today as PD, which is based on clinical presentation only, is a combination of multiple disorders with different etiologies and pathologies but with similar clinical phenotype. 1 PD is often categorized as an alpha-synucleinopathy, due to the typical accumulation of alpha-synuclein in Lewy bodies and Lewy neuritis. 2 However, other pathologies such as tau pathology are also common in PD, as ~50% of PD patients also show tau accumulation and deposition. 3 In specific genetic subtypes of PD, such as LRRK2-associated PD, alpha-synuclein pathology can be found in only ~50% of the cases, 4 whereas tau pathology can be abundant in more than 90% of patients. 5 In six brains from PD patients with the LRRK2 p.I2020T mutation, only one had alpha-synucleinopathy, while tau pathology was very common. 6 In contrast to LRRK2, in GBA1-associated PD, alpha-synuclein accumulation is very common, found in nearly all patients with GBA1 variants, and tau pathology is likely less common. 7 These genetic-neuropathological differences may suggest that tau is associated with specific genetic subtypes/variants. Tau is encoded by MAPT (microtubule associated protein tau), a gene associated with multiple neurological disorders, including PD. 8 The association with PD is mediated by the H1 and H2 MAPT haplotypes, created by an inversion of a large genomic region around MAPT. 9 The H1 haplotype is the most common haplotype and H2 is more rare and absent in non-Caucasian populations. 10 In PD, the H1 haplotype is associated with an increased risk of PD, whereas the H2 haplotype is associated with a reduced PD risk. 8 The H1 haplotype and specific subhaplotypes have been linked to more severe tau pathology or increased tau expression in some neurodegenerative diseases, 11-13 but whether or not these haplotypes affect tau pathology in PD is unclear. Recently, MAPT stratified analysis was done for Alzheimer's disease, 14 demonstrating potential novel loci and interactions.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 15, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 Since MAPT encodes tau, which is found in brains of many PD patients, but with differential distribution in different genetic subtypes (e.g. LRRK2 and GBA1), and since MAPT H1 and H2 haplotypes are associated with PD, we hypothesized that some genetic variants may be either more or less important in carriers of these haplotypes. To test this hypothesis, we performed stratified genome-wide association studies (GWAS) separately analyzing carriers of the H1/H1 genotype (PD patients n=8,492 and controls n=6,765) and carriers of the H2 haplotype (PD patients n=4,779 and controls n=4,849). We then examined whether specific variants are associated with PD in one of the genetically defined groups, but not in the other. To study the association of rare variants in the new nominated genes, we performed burden analyses in two cohorts with a total sample size PD patients n=2,943 and controls n=18,486.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 15, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 Methods:

Population and stratification
Our study included a total of 13,271 PD patients and 11,614 controls of European ancestry from the International Parkinson's Disease Genomics Consortium (IPDGC) with available sex, age and principal components data. Details on the cohorts composing the IPDGC data can be found To study the role of rare variants in genes potentially interacting with MAPT from the previous analyses, we used whole genome and whole exome sequencing data from two cohorts: the Accelerating Medicines Partnership -Parkinson Disease (AMP-PD, 2.5 release; PD patients n=2,341 and controls n=3,486), and the UK biobank (UKBB; PD patients n=602 and controls n=15,000). The AMP-PD cohorts are detailed in the acknowledgments. We then stratified selected cohorts for rare variant analysis by MAPT haplotype, including H2 haplotype carriers (PD patients n=215, controls n=6,081 for UKBB and PD patients n=786, controls n=1,438 for AMP-PD) and H1/H1 haplotype carriers (PD patients n=387, controls n=8,919 for UKBB and PD patients n=1,555, controls n=2,048 for AMP-PD).

Quality control and genome-wide association studies
Quality control procedures were performed for IPDGC genotyping data on individual and variant-level as previously described. 15 Two case-control GWASs were performed using logistic regression in plink 1.9, including SNPs with a minor allele frequency (MAF) of > 0.01. 16 We . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 15, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 applied sex, age, and 10 principal components (PC) as covariates. One GWAS compared PD patients and controls who carry the MAPT H2 haplotype, and the second GWAS compared PD patients and controls who carry the H1/H1 genotype. We calculated genomic inflation factor (λ) for GWASs in both cohorts. To account for the impact of sample size, we scaled the λ values to 1000 cases and 1000 controls (λ 1000 ) and created Quantile-Quantile plots as previously described. 17 SNPs that surpass the Bonferroni threshold for genome-wide studies (5.0 × 10 − 8 ) were considered statistically significant. We created mirror Miami plots representing GWASs for carriers and non-carriers using the Hudson package in R (https://github.com/anastasialucas/hudson). Independent SNPs and loci were defined using the functional mapping and annotation portal (FUMA). 18 To visually compare the effect size between carriers and noncarriers of the H2 haplotype, we constructed a beta-beta plot comparing the effect sizes in the two GWASs.
Variants with a difference of >5 fold in effect size were selected for further interaction analyses. We then examined whether interaction exists between the H2-tagging SNP rs8070723 and the SNPs that demonstrated differential effect sizes in the two GWASs. This analysis was done using plink and TLTO (Two-loci to OR) from VARI3 package. 19 To replicate these potential interactions, we used the UK biobank data and performed stratification analysis followed by interaction analysis. Participants of European ancestry were selected from data field 22006. Heterozygosity outliers, samples with high missingness rates and samples with sex chromosome aneuploidy according to data field 22027/22019 were also excluded. The cohort also underwent filtration for relatedness using Genome-wide Complex Trait Analysis (GCTA) with a threshold of 0.125. 20 In total, we included 344,597 unrelated participants of European origin, carriers of the H2 haplotype (PD patients n=550 and controls n=138,292), and carriers of the H1/H1 genotype (PD patients n=1,004 and controls n=204,749).
The power to replicate results in UKBB was calculated using Genetic Association Study (GAS) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Rare variants burden analysis
We then studied rare variants in the novel genes that were nominated in the MAPT stratified GWASs. We performed quality control for whole-genome and whole-exome sequencing data using standard procedures on individual and variant levels as previously described. 23 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Identification of variants associated with carriers and non-carriers of the MAPT H2 haplotype
To examine whether specific genetic variants are associated with carriers or non-carriers of the MAPT H2 haplotype, we performed two stratified GWASs. One GWAS included only H1/H1 genotype carriers (PD patients n=8,492 and controls n=6,765) and the second GWAS included carriers of either H1/H2 or H2/H2 genotypes, (PD patients n=4,779 and controls n=4,849). The genomic inflation values for H1 haplotype GWAS and H2/H2 haplotype GWAS were λ = 1.17 and λ = 1.11, respectively. When the genomic inflation was scaled to 1,000 cases and 1,000 controls for both cohorts, the values were the same, with λ 1000 =1.02 (Supplementary Figure 1).

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 15, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 attributed to reduced power of the H2 haplotype GWAS, although the beta of the NUCKS1 locus was more than two-fold larger in the H1/H1 genotype carriers compared to H2 haplotype carriers (-0.161 vs. -0.078, respectively). Similarly, the MCCC1 locus was only associated with PD among carriers of the H2 haplotype, but not among carriers of the H1/H1. The beta for the SNP tagging the MCCC1 locus, rs9858038, was more than two-fold larger in carriers of the H2 haplotype compared to carriers of the H1/H1 genotype (-0.224 vs. -0.108, respectively).
We attempted to replicate these results, specifically the novel loci, in the UKBB data, stratified by carriers and non-carriers of the MAPT H2 haplotype. Considering the number of MAPT haplotype carriers, allele frequency and effect size of novel loci, we calculated the statistical power for replication. We did not have sufficient power to replicate neither rs11590278 (60%) nor rs56312722 (54%) and as expected both associations did not reach statistical significance in the UKBB. However, the SNP rs11590278 near VANGL1 had similar effect direction in carriers of H2 haplotype (OR= 1.32, 95%CI= 0.94-1.86, p=0.17). The SNP rs56312722 near EMP1 had different direction of effect in carriers of H1/H1 haplotype (OR= 1.07, 95%CI= 0.97-1.16, p= 0.10).

Interaction analysis of MAPT H1 and H2 haplotypes with SNPs in the VANGL1 and EMP1
loci To identify SNPs for interaction analyses with the MAPT H2-tagging SNP rs8070723, we generated a beta-beta plot ( Figure 1D). We selected independent SNPs that had an effect size (beta) >5-fold larger in one GWAS compared to the other and were significant in the GWAS level in one of the two GWASs. The SNP rs11590278 (VANGL1) had an 8-fold difference in effect size when comparing MAPT H2 haplotype carriers and H1/H1 genotype carriers (b=0.52, p=2.72E-08 vs b=0.062, p=0.4, respectively). The SNP rs56312722 (EMP1) had a ten-fold difference in beta compared to H2 haplotype carriers (b=0.133, p=1.80E-08 vs b=0.013, p=0.6582, respectively).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Rare variant analysis further suggests an association of EMP1 with Parkinson's disease
To examine whether rare variants in EMP1 and VANGL1 play any role in PD, we performed burden analysis of variants with MAF<0.01. We found that rare variants with CADD score >20 were associated with PD in the meta-analysis of AMP-PD and UKBB cohorts (p=0.004; Supplementary Table 3). This association was driven by the UKBB cohort (p=0.0004).
Moreover, after we stratified the cohorts for H2 and H1/H1 MAPT haplotypes the same way we did for the GWAS, we found that the association of EMP1 with PD was driven by the H2 haplotype carriers in both AMP-PD (p=0.018) and UKBB (p=0.004) cohorts (meta-analysis p=9.46E-05). There were only two variants with high CADD score and only the p.V11G EMP1 variant was associated with PD in the UKBB cohort (OR=8.62, 95%CI=2.21-33.61; p=0.002) and in AMP-PD (OR=5.92, 95%CI=1.12-31.14, p=0.036) in the H2 stratified cohort (Supplementary Table 4). Furthermore, p.V11G was in linkage disequilibrium with the major allele of rs56312722 in EMP1 (D'=1.0). We did not find any association between rare variants in VANGL1 and PD in any of the cohorts.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Discussion:
In the current study we performed a stratified GWAS based on the MAPT H1 and H2 haplotypes, and identified three novel loci potentially associated with PD, one in carriers of the H1/H1 genotype (EMP1), one in carriers of the H2 haplotype (VANGL1), and one in both (SNAPC1).
These associations were not replicated in the UKBB data, which was underpowered to detect them. We also found that rare EMP1 variants were associated with PD in H2 haplotype carriers in both cohorts (AMP-PD and UKBB). Therefore, possibly there is a complex interaction between MAPT haplotypes and the EMP1 gene, with some variants being protective for H1/H1 haplotype and causative for H2 haplotype. Additional studies are needed to examine whether the VANGL1 and EMP1 loci are indeed associated with the H2 and H1 haplotypes in PD. As for the third locus, SNAPC1, it was not identified in a much larger GWAS, 15 suggesting that this could be a spurious association. However, it is also possible that methodological and/or population stratification-related issues due to inclusion of data from 23andMe masked the association in the larger GWAS. More studies are therefore needed into this locus to determine whether or not it is associated with PD.
MAPT stratified analysis has already been performed in Alzheimer's disease, 14 and an APOE-stratified GWAS has been recently performed in dementia with Lewy bodies, 29 both identified potential novel loci and interactions. Together with our current work, these studies demonstrate the advantage of a stratified GWAS approach for shedding more light on the genetics of complex neurological traits with multiple potential subtypes.
In the stratified analyses by MAPT H1 and H2 haplotypes, we nominated two novel loci potentially associated with PD, near the VANGL1 and EMP1 genes. EMP1 is encoding epithelial membrane protein 1 and this gene has not previously been implicated in the pathogenesis of PD.
In a recent RNA-seq study of cerebral white matter, upregulation of EMP1 was reported in multiple system atrophy patients. 30 GRIN2B, located in the EMP1 locus, was previously . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 15, 2023. ; https://doi.org/10.1101/2023.04.14.23288478 doi: medRxiv preprint associated with impulse control and related behaviors in PD patients. 31 The MAPT H1 haplotype has been repeatedly associated with cognitive decline in PD patients. 32-34 Likewise, the EMP1 locus has been previously associated with 176 traits including intelligence and cognitive performance (data retrieved from https://genetics.opentargets.org/). Thus, EMP1 could be an independent modifier of PD in MAPT H1 or H2 carriers. However, this hypothesis requires additional studies.
VANGL1 encodes the Van Gogh-like planar cell polarity protein 1 and was previously associated with neural tube defects, 35, 36 but not with neurodegenerative diseases. VANGL1 is a negative regulator of the canonical Wnt signaling pathway. 37 Notably, Wnt signaling has been suggested to be an important factor for dopaminergic neurons homeostasis and survival. 38,39 Therefore, the involvement of VANGL1 in PD pathogenesis should also be further studied. Both genes, EMP1 and VANGL1, are expressed in brain tissues based on GTEx (data retrieved from https://www.gtexportal.org/home). However, we did not find quantitative trait loci for SNPs in EMP1 and VANGL1 associated with PD in the current study.
The KANSL1 gene has recently been nominated as potentially driving an association for the H1 haplotype at the MAPT locus. 40 Additionally, it has been suggested that KANSL1 and KAT8 are two PINK1-dependent mitophagy regulators. 41 In our analysis, KAT8 almost reached genome-wide association significance in the H1/H1 haplotype analysis (p=5.21E-08) and was only nominally significant in the H2 haplotype analysis (p=0.0004), with no difference in effect direction ( Figure 1A). It is possible that this is due to an interaction between KAT8 and the H1 haplotype, but it could also be a power issue as the GWAS with H1/H1 carriers is much larger than H2. Therefore, further analysis is needed to determine whether KAT8 locus does indeed interact with the MAPT H1 haplotype.
Our analyses have several limitations. First, we have separated the discovery IPDGC cohort into smaller groups of carriers and non-carriers which caused imbalance in group size depending on the variant frequency and, thus, different power. We have also performed . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 15, 2023. ; https://doi.org/10.1101/2023.04.14.23288478 doi: medRxiv preprint replication analyses in UKBB, which only had 550 PD H2 haplotype carries and 1,004 PD H1/H1 haplotype carriers after filtering. This makes UKBB underpowered to detect the effects seen in our discovery cohorts. Since we only studied European samples, similar studies in non-European cohorts are warranted, as well as replication of our findings in independent and larger cohorts. Another limitation is that we had several GWAS-significant loci with low MAF < 5%.
Therefore, these signals require cautious interpretation and further confirmation.
To conclude, using a stratified GWAS approach, we nominated two potentially novel loci which are associated with specific MAPT gene haplotypes in PD. Our results along with previous similar studies suggest that risk loci stratification analysis could help us to genetically characterize subtypes of PD patients better and discover novel genetic associations.

Data Availability
Full code used in the analysis has been made available on git-hub (https://github.com/ganorlab/PD_stratified_GWAS).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Acknowledgements:
We would like to thank the all the participants in the different cohorts. We would also like to thank all members of the International Parkinson Disease Genomics Consortium (IPDGC is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Financial Disclosure
Authors have nothing to report.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 15, 2023. ; https://doi.org/10.1101/2023.04.14.23288478 doi: medRxiv preprint References: . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 15, 2023. ; https://doi.org/10.1101/2023.04.14.23288478 doi: medRxiv preprint  t  h  r  o  u  g  h  p  u  t  s  e  q  u  e  n  c  i  n  g  d  a  t  a  .  N  u  c  l  e  i  c  a  c  i  d  s  r  e  s  e  a  r  c  h  .  2  0  1  0  ;  3  8  (  1  6  )  :  e  1  6  4  -e  .   2  6  .  R  e  n  t  z  s  c  h  P  ,  W  i  t  t  e  n  D  ,  C  o  o  p  e  r  G  M  ,  S  h  e  n  d  u  r  e  J  ,  K  i  r  c  h  e  r  M  .  C  A  D  D  :  p  r  e  d  i  c  t  i  n  g  t  h  e  d  e  l  e  t  e  r  i  o  u  s  n  e  s  s   o  f  v  a  r  i  a  n  t  s  t  h  r  o  u  g  h  o  u  t  t  h  e  h  u  m  a  n  g  e  n  o  m  e  .  N  u  c  l  e  i  c  a  c  i  d  s  r  e  s  e  a  r  c  h  .  2  0  1  9  ;  4  7  (  D  1  )  :  D  8  8  6  -D  9  4  .   2  7  .  L  e  e  S  ,  E  m  o  n  d  M  J  ,  B  a  m  s  h  a  d  M  J  ,  e  t  a  l  .  O  p  t  i  m  a  l  u  n  i  f  i  e  d  a  p  p  r  o  a  c  h  f  o  r  r  a  r  e  -v  a  r  i  a  n  t  a  s  s  o  c  i  a  t  i  o  n   t  e  s  t  i  n  g  w  i  t  h  a  p  p  l  i  c  a  t  i  o  n  t  o  s  m  a  l  l  -s  a  m  p  l  e  c  a  s  e  -c  o  n  t  r  o  l  w  h  o  l  e  -e  x  o  m  e  s  e  q  u  e  n  c  i  n  g  s  t  u  d  i  e  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 15, 2023. ; https://doi.org/10.1101/2023.04.14.23288478 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 15, 2023. ; https://doi.org/10.1101/2023.04.14.23288478 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 15, 2023. ; https://doi.org/10.1101/2023.04.14.23288478 doi: medRxiv preprint  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 15, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023