ABSTRACT
Reading and writing are crucial for many aspects of modern life but up to 1 in 10 children are affected by dyslexia [1, 2], which can persist into adulthood. Family studies of dyslexia suggest heritability up to 70% [3, 4], yet no convincing genetic markers have been found due to limited study power [5]. Here, we present a genome-wide association study representing a 20-fold increase in sample size from prior work, with 51,800 adults self-reporting a dyslexia diagnosis and 1,087,070 controls. We identified 42 independent genome-wide significant loci: 17 are in genes linked to or pleiotropic with cognitive ability/educational attainment; 25 are novel and may be more specifically associated with dyslexia. Twenty-three loci (12 novel) were validated in independent cohorts of Chinese and European ancestry. We confirmed a similar genetic aetiology of dyslexia between sexes, and found genetic covariance with many traits, including ambidexterity, but not neuroanatomical measures of language-related circuitry. Causal analyses revealed a directional effect of dyslexia on attention deficit hyperactivity disorder and bidirectional effects on socio-educational traits but these relationships require further investigation. Dyslexia polygenic scores explained up to 6% of variance in reading traits in independent cohorts, and might in future enable earlier identification and remediation of dyslexia.
INTRODUCTION
The ability to read is crucial for success at school and access to employment, information, and health/social services, and is related to attained socio-economic status [6]. Dyslexia is a neurodevelopmental disorder characterised by severe reading difficulties, present in between 5 and 17.5% of the population, depending on diagnostic criteria [1, 2]. It often involves impaired phonological processing (the decoding of sound units, or phonemes, within words) and frequently co-occurs with psychiatric and other developmental disorders [7], especially attention-deficit hyperactivity disorder (ADHD) [8, 9] and speech and language disorders [10, 11]. Dyslexia may represent the low extreme of a continuum of reading ability, a complex multifactorial trait with heritability estimates ranging from 40 to 80% [12, 13]. Identifying genetic risk factors not only aids increased understanding of the biological mechanisms, but may also expand diagnostic capabilities, facilitating earlier identification of individuals prone to dyslexia and comorbid disorders for specific support.
Prior genome-wide investigations of dyslexia have been limited to linkage analyses of affected families [14] or modest (N < 2,300 cases) association studies of diagnosed children and adolescents [5]. Candidate genes from linkage studies show inconsistent replication and genome-wide association studies (GWAS) have not found significant associations, although LOC388780 and VEPH1 were supported in gene-based tests [5]. Larger cohorts are vital for increasing sensitivity to detect novel genetic associations of small effect. We present the largest dyslexia GWAS to date, with 51,800 adults self-reporting a dyslexia diagnosis and 1,087,070 controls, all of whom are research participants with the personal genetics company 23andMe, Inc. We validate our association discoveries in independent cohorts, provide functional annotations of significant variants (mainly SNPs, single-nucleotide polymorphisms) and potential causal genes, and estimates of SNP-based heritability. Lastly, we investigate genetic correlations with reading and related skills, health, socio-economic, and psychiatric measures, and test for causal effects of dyslexia on ADHD, cognitive-educational abilities, and economic outcomes.
RESULTS
Genome-wide associations
The full dataset included 51,800 (21,513 males, 30,287 females) participants responding “yes” to the question “Have you been diagnosed with dyslexia?” (cases) and 1,087,070 (446,054 males, 641,016 females) participants responding “no” (controls). Participants were ≥18 years (mean ages of cases and controls were 49.9 years (SE 16.3) and 51.8 years (SE 16.6), respectively). We identified 42 independent genome-wide significant associated loci (p < 5 × 10−8) and 64 loci with suggestive significance (p < 1 × 10−6) (Figure 1; Supplementary Table 1). Genomic inflation was moderate (λGC = 1.18) and consistent with polygenicity (see Q-Q plot, Supplementary Figure 1). All significant variants were autosomal, except Xq27.3:rs5904158; their regional association plots are shown in Supplementary Figures 4.i.-xl. Seventeen index variants were in high linkage with published (genome-wide significant) associated SNPs in the NHGRI GWAS Catalog [15] (15 were associated with cognitive/educational traits; Supplementary Tables 1 and 2). Of these, four (rs4696277, rs34349354, rs9696811, rs72841395) were pleiotropic with general cognitive ability [16] based on HEIDI-outlier analysis [17]. The remaining 25 associated loci with no evidence of published genome-wide associations with traits expected to overlap with dyslexia or were not pleiotropic with cognitive ability were considered as novel (Table 1).
Of 38 associated loci (the 4 remaining were tagged by indels unavailable in validation cohorts), three (rs13082684; rs34349354; rs11393101) were significant at a Bonferroni-corrected level (P<.05/38) in the GenLang consortium GWAS meta-analysis (Eising et al., in preparation) of reading (N = 33,959) and spelling (N = 18,514) ability. At p < 0.05, 18 were observed in GenLang, three in the NeuroDys case-control GWAS [5] (N = 2,274 cases), and five in the Chinese Reading Study of reading accuracy and fluency (N = 2,270; Supplementary Method) (Table 1; Supplementary Tables 3-6).
Gene-based tests identified 173 significantly associated genes (Supplementary Table 7) but no significantly enriched biological pathways (Supplementary Table 8). Using LDSC, we estimated the liability-scale SNP-based heritability of dyslexia to be h2SNP = 0.152 (SE = 0.006), assuming a population prevalence of 5% (from 23andMe), and h2SNP = 0.189 (SE = 0.008) for a 10% prevalence [1, 2].
We also performed sex-specific GWAS and age-specific GWAS (younger or older than 55 years) because dyslexia prevalence was higher in our younger (5.34% in 20 to 30-year olds) than older (3.23% in 80 to 90-year olds) participants. These showed high consistency with the main GWAS (Supplementary Figures 2 and 3). Genetic correlation estimated by LDSC was 0.91 (p = 8.26 × 10−253) in males and females, and 0.97 (p = 2.32 × 10−268) between younger and older adults (Supplementary Table 16).
Fine-mapping and functional annotations
Within the credible variant set (Supplementary Table 1), missense variants were the most common (55%) of the coding variants; Supplementary Figure 5 summarises all predicted variant effects. Predicted deleterious variants (by SIFT score) were identified in R3HCC1L, SH2B3, CCDC171, C1orf87, LOXL4, DLAT, ALG9, and SORT1. Within the credible variant set no genes were especially intolerant to functional variation (smallest LoFtool percentile was .39). For the 42 associated loci, the most probable gene targets of each was estimated by the Overall V2G score from OpenTargets (Supplementary Table 9). Two of the index variants could be causal: the missense variant, rs12737449 (C1orf87), and variant rs3735260 (AUTS2) had Combined Annotation Dependent Depletion (CADD) scores suggestive of deleteriousness to gene function [18] (Supplementary Table 10). The AUTS2 variant RegulomeDB rank of 2b indicated a regulatory role; its chromatin state supported location at an active transcription start site [19, 20]. The function of genes located at (or nearby) significant index variants appears in Supplementary Table 11.
Of the 173 significant genes from genome-wide gene-based tests in MAGMA, 129 could be functionally annotated (Supplementary Table 12). Protein-coding and non-coding sequences are actively conserved in approximately three quarters of these genes, 63% are more intolerant to variation than average, and 33% are intolerant to loss-of-function mutations. Gene property analysis for general tissues confirmed the importance of brain and specific regions (Supplementary Tables 13 and 14). Levels of brain expression for the 136 genes proximal to the top single variants are shown in Supplementary Table 15. Twenty-one showed high general brain expression levels (of these, PPP1R1B, NPM1, PMVK, and WASF3 were significant in gene-based tests). Of the 12 brain regions assessed, gene expression was generally highest in the cerebellar hemisphere, cerebellum, and cerebral cortex (consistent with gene property analysis results). For the 173 significant genes from gene-based tests, there was enrichment of differentially expressed genes in brain subcortical and cortical areas (Supplementary Figure 6).
Partitioned heritability
SNP-based heritability of dyslexia partitioned by functional annotation showed significant enrichment for conserved regions and H3K4me1 clusters (Supplementary Table 16 and Supplementary Figure 7). There was enrichment in genes expressed in the frontal cortex, cortex, and anterior cingulate cortex (p < 4.17 × 10−3) (Supplementary Table 17 and Supplementary Figure 8), but not for brain cell type (Supplementary Table 18 and Supplementary Figure 9). Enrichment was seen in enhancer and promoter regions with the chromatin marks H3M4me1 and H3K4me3 chromatin marks in multiple central nervous system (CNS) tissues (Supplementary Tables 19 and 20; Supplementary Figures 10 and 11). Reading, an offshoot of spoken language, is a uniquely human trait, but there was no enrichment for a range of annotations related to human evolution spanning the last 30 million to 50,000 years [21] (Supplementary Table 21).
Genetic correlations (LDSC)
Genetic correlations were estimated for 98 traits (Supplementary Table 22 and Figure 2) including reading and spelling measures from GenLang (Supplementary Figure 12) and brain subcortical structure volumes, total cortical surface area and thickness from the Enhancing Neuro Imaging Genetic through Meta-Analysis (ENIGMA) consortium. Sixty-three traits showed genetic correlations with dyslexia at the Bonferroni-corrected significance threshold (P<0.05/98; Figure 2). Genetic correlations (rg) with quantitative reading and spelling measures ranged from -.70 to -.75, with phoneme awareness -0.62, and nonword repetition -.45. Childhood/adolescent performance IQ rg was lower (-.19) than adult verbal-numerical reasoning [22] (-.50) and work-related/vocational qualifications (.50) but similar to childhood IQ and educational attainment [23] (-.32 and -.22, respectively). Positive rg included: jobs involving heavy manual work [23] (.40), ADHD [24] (.53), equal use of right and left hands [23] (.38), and pain measures [23] (average = .31). Of the 11 ENIGMA measures tested, only intracranial volume was significantly correlated with dyslexia (rg = - .14). Targeted investigation of 80 UK Biobank structural neuroimaging measures including surface-based morphometry and diffusion-weighted imaging for brain circuitry linked to language were non-significant at a corrected significance level using spectral decomposition of matrices (Supplementary Table 23).
Mendelian randomisation
We investigated potential causal relationships between dyslexia and a selection of other relevant traits. Two-sample inverse variance weighted Mendelian Randomisation (MR) analyses were compatible with a causal effect of dyslexia on ADHD (OR = 1.43, 95% CI: 1.28-1.61) and adult general cognitive ability (β = -0.21, SE = 0.02) (Supplementary Figure 13 and Supplementary Table 24 for complete results). We detected a small causal effect of dyslexia on secondary school achievement (β = -0.03, SE = 0.01), but inconsistent support across inverse variance weighted MR, MR Egger, weighted median MR and weighted mode MR for educational attainment and income. Because ADHD and general cognitive ability can precede dyslexia, this directional path was tested, but was only significant for cognitive ability. The few available SNPs forming an instrument for ADHD likely biased that test towards the null, so we cannot rule out a bidirectional link. In tests for directional pleiotropy, Egger intercepts did not significantly differ from zero, supporting the assumption of no direct effect from the instrument to outcome. However, there may be confounding from dynastic, assortative mating and population stratification effects that could not be tested in this study.
Polygenic score analyses
Dyslexia polygenic scores (PGS) based on the 23andMe dyslexia GWAS were computed in four independent cohorts, and overall, higher PGS were associated with lower reading and spelling accuracy (Supplementary Table 25). In two Australian population-based samples (1,647 adolescents, 1,163 adults), the dyslexia PGS explained up to 3.6% of variance in nonword reading (phonological decoding). Dyslexia PGS did not correlate with nonword repetition (phonological short-term memory). In developmental cohorts enriched for reading difficulties, the dyslexia PGS explained 3.7% (UKdys; N=930) and 5.6% (CLDRC; N=717) of variance in word recognition tests.
Analyses of dyslexia associations from the literature
Of 75 previously reported dyslexia associations, none showed genome-wide significance in our analyses (Supplementary Table 26). Nineteen of these targeted variants—in ATP2C2, CMIP, CNTNAP2, DCDC2, DIP2A, DYX1C1, FOXP2, KIAA0319L and PCNT—showed association surviving Bonferroni-correction which accounted for LD (p < .05/68.7). In gene-based tests of 14 candidate genes from literature [25, 26] association at a Bonferroni level (p < .05/14) was seen for KIAA0319L (p = 1.84 × 10−4) and ROBO1 (p = 1.53 × 10−3) (Supplementary Table 27). CNTNAP2 was marginal (p = .004). Targeted gene-set analysis of three pathways previously implicated in dyslexia (Supplementary Table 28) showed replication-level support (p = 2.00 × 10−3) for the axon guidance pathway (comprising 216 genes).
DISCUSSION
In the largest GWAS of dyslexia to date (>50,000 self-reported diagnoses), we identified 42 significant independent loci. Twenty-five of these represent novel associations that have not been uncovered in GWAS of related cognitive traits; 11 of these were validated in the GenLang consortium GWAS meta-analysis of reading/spelling in English and other European languages, and another in a Chinese language cohort. 38% of the significant SNPs overlap with variants from general cognitive ability GWAS, consistent with twin studies that find genetic variation in reading disability is explained by general and reading-specific cognitive ability [13]. Similar to other complex traits, each significant locus showed small effects (ORs ranging 1.04 to 1.12), consistent with high polygenicity. Our estimated SNP-based heritability of 15% (5% dyslexia prevalence) was similar to the 19% reported in a smaller GWAS [5], but lower than heritability estimates from twin studies (40-80%) [27, 28]. This difference may be partly due to effects of rare and structural variants [29], which have been implicated in reading and related traits [30, 31].
Whereas AUTS2 has been implicated in autism [32], intellectual disability [33], and dyslexia [34], the variant we uncover (rs3735260) represents the strongest AUTS2 SNP association with a neurodevelopmental trait to date. Amongst our findings were other known neurodevelopmental genes, like TANC2 (implicated in language delay and intellectual disability [35, 36]), and especially GGNBP2 (linked to neurodevelopmental delay [37] and autism [38]) with its variant rs34349354 supported in all our validation cohorts. However, rs34349354 is pleiotropic with general cognitive ability, and based on eQTL evidence is more likely linked to ZNHIT3, associated with cognitive performance and co-localising with molecular QTLs [16]. Notably, none of the more established candidate genes for dyslexia approached genome-wide significance in our results.
Consistent with studies of other neurodevelopmental disorders [39, 40], partitioning of SNP-based heritability revealed enrichment in conserved regions, and at H3K4me1 and H3K4me3 clusters in the CNS (marking enhancers and promoters respectively). Since reading/writing systems are built on our capacities for spoken language, it is plausible that evolutionary changes on the human lineage helped shape the underlying genetic architecture [41]. However, we did not find enrichment of significant associations for curated annotations spanning different periods of hominin prehistory.
Our self-reported dyslexia diagnosis binary trait showed strong negative genetic correlations with quantitative reading and spelling measures, supporting the validity of this measure in the 23andMe cohort, and suggesting that reading skills and disorder are not qualitatively distinct. The positive genetic correlation between hearing difficulties and dyslexia is consistent with genetic correlations reported for childhood reading skill [42], suggesting that hearing problems at an early age could affect acquisition of phonological processing skills.
Dyslexia showed moderately negative genetic correlations with adult verbal-numerical reasoning. MR demonstrated a moderate causal effect of dyslexia on general cognitive ability and a larger reverse causal effect, highlighting the importance of general cognitive processes in reading acquisition. The lack of a strong genetic correlation of dyslexia with (nonverbal) performance IQ suggests that the causal effect is on verbal ability and that individuals with dyslexia are disadvantaged on verbal IQ tests [43]. This putative causal effect was less pronounced for secondary school achievement (perhaps due to school adjustments/support) and not robustly supported for college/university education.
There was little support for common genetic variation in dyslexia being related to inter-individual differences in subcortical volumes, or structural connectivity and morphometry for brain regions implicated in language processing in adults. Thus, the phenotypic correlations previously reported between dyslexia and aspects of neuroanatomy may in large part reflect environmental shaping of the brain, perhaps through the process of reading itself [44]. Left-handedness and ambidexterity show little genetic overlap with each other [45] yet are both phenotypically linked to neurodevelopmental disorders/cognitive abilities [46, 47]. We report a significant genetic correlation between dyslexia and self-reported equal hand use, but not left-handedness, supporting the ambidexterity theory [48].
Dyslexia and ADHD [8, 9] are comorbid (24% reporting ADHD in our cases versus 9% in controls), and we show a moderate genetic correlation, potentially reflecting shared endophenotypes like deficits in working memory and attention [49]. Our MR finding of a causal effect of dyslexia on ADHD implies that a reduction in dyslexia prevalence (achievable through remediation) could reduce ADHD prevalence. However, this result is tentative given the potential for unmeasured confounding with the genetic instrument. Although dyslexia and ASD were not genetically correlated, the latter encompasses diverse neurodevelopmental phenotypes including sub-groups with varying educational attainment and IQ [40]. Unexpected genetic correlations with pain-related traits suggest that individuals with dyslexia may have a lower threshold for pain perception. Links between pain and other neurodevelopmental disorders have been reported [50].
Dyslexia polygenic scores were correlated with lower achievement on reading and spelling tests in population-based and reading-disorder enriched samples, especially for nonword reading, a measure of phonological decoding typically impaired in dyslexia. Polygenic scores could become a valuable tool to help identify children with a propensity for dyslexia, enabling learning support prior to development of reading skills.
In summary, we report 42 novel independent significant loci associated with dyslexia, 25 of which have not been associated with cognitive-educational traits and should be prioritised for follow up as dyslexia candidates. Functional annotation highlights conserved and enhancer regions of the genome. Dyslexia shows positive genetic correlations with ADHD, vocational qualifications, physical occupations, ambidexterity, and pain perception, and negative correlations with academic qualifications and cognitive ability; family-based methods are needed to dissociate pleiotropic and causal effects.
Data Availability
The full summary statistics for each dyslexia GWAS presented in this paper will be made available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of the 23andMe participants. Interested investigators should email dataset-request@23andme.com and reference this paper for more information.
FINANCIAL SUPPORT
EE, GA, BM, BSP, CF and SEF are supported by the Max Planck Society (Germany). The Chinese Reading Study was supported by grants from the National Natural Science Foundation of China Youth Project (Grant No. 61807023), the Youth Fund for Humanities and Social Sciences Research of the Ministry of Education (Grant No. 19YJC190023 and 17XJC190010), and the Natural Science Basic Research Plan in Shaanxi Province of China (Grant No. 2021JQ-309).SP is funded by the Royal Society.
CONFLICTS OF INTEREST
Pierre Fontanillas, Adam Auton, and the 23andMe Research Team are employed by and hold stock or stock options in 23andMe, Inc.
ETHICAL STANDARDS
Participants provided informed consent and participated in the research online, under a protocol approved by the external AAHRPP-accredited IRB, Ethical & Independent Review Services (E&I Review). Participants were included in the analysis on the basis of consent status as checked at the time data analyses were initiated.
DATA AVAILABILITY
The full summary statistics for each dyslexia GWAS presented in this paper will be made available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of the 23andMe participants. Interested investigators should email dataset-request{at}23andme.com and reference this paper for more information.
ACKNOWLEDGEMENTS
We thank the research participants and employees of 23andMe Inc, the GenLang Consortium, the Brisbane Adults Reading Study, and the Chinese Reading Study.
The following members of the 23andMe Research Team contributed to this study: Stella Aslibekyan, Adam Auton, Elizabeth Babalola, Robert K. Bell, Jessica Bielenberg, Katarzyna Bryc, Emily Bullis, Daniella Coker, Gabriel Cuellar Partida, Devika Dhamija, Sayantan Das, Sarah L. Elson, Teresa Filshtein, Kipper Fletez-Brant, Pierre Fontanillas, Will Freyman, Pooja M. Gandhi, Karl Heilbron, Barry Hicks, David A. Hinds, Ethan M. Jewett, Yunxuan Jiang, Katelyn Kukar, Keng-Han Lin, Maya Lowe, Jey McCreight, Matthew H. McIntyre, Steven J. Micheletti, Meghan E. Moreno, Joanna L. Mountain, Priyanka Nandakumar, Elizabeth S. Noblin, Jared O’Connell, Aaron A. Petrakovitz, G. David Poznik, Morgan Schumacher, Anjali J. Shastri, Janie F. Shelton, Jingchunzi Shi, Suyash Shringarpure, Vinh Tran, Joyce Y. Tung, Xin Wang, Wei Wang, Catherine H. Weldon, Peter Wilton, Alejandro Hernandez, Corinna Wong, Christophe Toukam Tchakouté