Abstract
Both sex and gender are characteristics that play a key role in risk and resilience in health and well-being. Current research lacks the ability to quantitatively describe gender and gender diversity, and is limited to endorsement of categorical gender identities, which are contextually and culturally dependent. A more objective, dimensional approach to characterizing gender diversity will enable researchers to advance the health of gender-diverse people by better understanding how genetic factors interact to determine health outcomes. To address this research gap, we leveraged the Gender Self-Report (GSR), a questionnaire that captures multiple dimensions of gender diversity. We then performed polygenic score associations with brain-related traits like cognitive performance, personality, and neuropsychiatric conditions. The GSR was completed by N = 818 independent adults with or without autism in the SPARK cohort, and GSR factor analysis identified two factors: Binary (divergence from gender presumed by designated sex to the opposite) and Nonbinary (divergence from male and female gender norms) Gender Diversity (BGD and NGD, respectively). We performed polygenic associations (controlling for age, sex, and autism diagnostic status) in a subset of N = 452 individuals and found higher polygenic propensity for cognitive performance was associated with greater BGD (β = 0.017, p = 0.049) and NGD (β = 0.036, p = 0.002), and higher polygenic propensity for educational attainment was also associated with greater NGD (β = 0.030, p = 0.015). We did not observe any significant associations with personality or neuropsychiatric polygenic scores in this sample. Overall, our results suggest cognitive processes and gender diversity share overlapping genetic factors, indicating the biological utility of the GSR while also underscoring the importance of quantitatively measuring gender diversity in health research contexts.
1 Introduction
Sex and gender can have major impacts on health outcomes [1] by impinging on the underlying molecular mechanisms of disease and well-being [2]. In health research, designated sex at birth is a more straightforward variable to include than gender, which is diverse in multiple dimensions and is complex to adequately characterize. Gender diversity, which, like designated sex, is a normal manifestation of human diversity, is an especially crucial variable to incorporate in mental health and neuropsychiatric research. Groups that express higher levels of gender diversity than the cisgender proportional majority, such as the LGBTQ+ (lesbian, gay, bisexual, transgender, and queer) community, often have greater rates of anxiety and depression and are more likely to attempt suicide [3]. It has also been shown that discrimination and resilience mediate these negative outcomes in LGBTQ+ college students [4], but the contribution of genetic factors is unknown. Importantly, variation in gender diversity is not uniquely confined to transgender or nonbinary identities. People who identify as cisgender also exhibit variation in gender diversity and expression [5] that would be lost in studies only reporting categorical descriptors of gender. Therefore, a quantitative, dimensional characterization of gender that is independent of categorical gender self-endorsement will enable genetic researchers to appropriately incorporate gender diversity in their analyses. Ultimately, this will advance the health of gender diverse people through a greater understanding of how genetic factors interact with gender diversity in determining health outcomes. Because an individual’s personal identity informs gender, it follows that gender experience is enabled and shaped by the brain. Therefore, if gender and gender diversity interact with genetic propensity in determining health outcomes, then brain-enabled traits like cognitive performance, personality, and neuropsychiatric conditions may offer insight into how these biological and experiential factors interact to promote well-being.
To address this question, we collected data from N = 818 adults regarding their gender identity and expression and sexual orientation. Additionally, the participants filled out the Gender Self-Report, a 30-item self-report questionnaire that captures multiple dimensions of gender diversity. The study participants were recruited from the SPARK cohort [6], which is nationwide genetic study that includes over 270,000 individuals with and without autism. Previous sociological studies have shown there is an enrichment in gender diversity [7] in autistic samples as compared to the general population. Likewise, general population samples of individuals who are transgender or nonbinary are more likely to be autistic and have elevated autistic traits [8]. This extensive prior research showing the overlap of autism and gender diversity makes SPARK the ideal cohort for understanding the genetic factors that contribute to gender diversity.
2 Materials and methods
2.1 SPARK cohort description
SPARK [6] is a United States-based nationwide autism study of over 270,000 individuals. Independent adults in SPARK, with or without autism, were invited to an online Research Match in which they were given the Gender Self Report (GSR) and also asked other questions regarding their sexual orientation, gender identity, and gender expression. This Research Match study was approved by the University of Iowa IRB (IRB# 201611784) and SPARK is approved by the Western IRB (IRB# 20151664). All participants provided informed consent.
2.2 Derivation of Gender Self Report (GSR) factors
The Gender Self-Report (GSR) itemset was developed through a multi-step multi-input community driven process with autistic cisgender, autistic gender-diverse, and non-autistic cisgender and gender-diverse collaborators. A purposive sampling approach was employed across 8 separate U.S. national recruitments (N=1,654) to maximize the breadth of the GSR calibration sample and enrich the sample based on the following key characteristics: ASD, gender-diverse identities (binary and nonbinary), the intersection of ASD and gender-diverse identities, transition age/young adult age, and female designation at birth within the entire sample and within ASD, specifically. This purposive sampling resulted in an overall calibration sample that was 37.5% autistic, 32.6% gender diverse, and 38.9% cisgender sexual minority. Two-dimensional normal mixture with graded response model adequately fit the data and yielded two valid and reliable factors: Female-Male Continuum (F-MC) and Nonbinary Gender Diversity (NGD). A transformation of the F-MC scores based on designated sex at birth produced Binary Gender Diversity (BGD) scores (i.e., the distance on the binary gender spectrum from individual’s designated sex at birth). GSR calibration utilized differential item functioning, an equity-based psychometric method to identify and reduce bias, in this case by age as well as autism status. Empirical reliability coefficients for response pattern EAP scores were 0.75 for Nonbinary Gender Diversity and 0.85 for Binary Gender Diversity. GSR factors performed well across the following validation metrics: (1) construct validity; GSR scores followed expected score patterns comparing gender identity subgroups, (2) convergent validity; GSR scores correlated with existing gender-related measures and in expected directions, and (3) ecological validity; GSR scores aligned with report of gender-related treatment requests/receipt. These two factors, BGD and NGD, are the two phenotypes used in the the subsequent analyses.
2.3 Genotype quality control and imputation
Version 3 Freeze (2019) and Version 4 (2020) genotypes were first merged using PLINK [9]. The merged genotypes were then lifted from hg38 to hg19 using the LiftOver tool [10]. The merged genotypes included 43,209 individuals and 616,321 variants and were then quality controlled using the BIGwas quality control pipeline [11]. The default parameters were used, except for skipping Hardy-Weinberg tests and including the flag due to the SPARK cohort being family-based and not a general population sample. The pre-QC annotation step removed 21 variants (N = 616,299 variants remaining). The SNP QC step removed 101,600 variants due to missingness at a threshold of 0.02 (N = 514,699 variants remaining). The sample QC step removed 1,114 individuals due to missingness, 67 individuals due to heterozygosity, and 176 due to duplicates (monozygotic twins). An additional 9,533 individuals were removed due to genetic ancestry from principal component projections (N = 32,422 individuals remaining). The QC’d set of N = 514,699 variants and N = 32,422 individuals were then imputed to the TopMed [12] reference panel using the Michigan Imputation Server [13] with the phasing and quality control steps included and to output variants with imputation quality r2 > 0.3. After the genotype imputation, the variants were filtered to only the HapMap SNPs (N = 1,054,330 variants) with imputation quality r2 ≥ 0.8 using bcftools [14]. Next, they were lifted over from hg38 to hg19 using the VCF-liftover tool (https://github.com/hmgu-itg/VCF-liftover) and the alleles normalized to the hg19 reference genome. Finally, the files were converted to PLINK files with N = 1,018,200 final variants.
2.4 Polygenic score (PGS) calculations
Polygenic scores were calculated using LDpred2 [15] and the bigsnpr tools [16] in R [17]. Because SPARK is family-based, an external LD reference based on 362,320 European individuals of the UK Biobank (provided by the developers of LDpred2) was used to calculate the genetic correlation matrix, estimate heritability, and calculate the infinitesimal beta weights. Polygenic scores were calculated from the following genome-wide association studies performed by the Psychiatric Genomics Consortium: ADHD (2019) [18], autism (2019) [19], bipolar disorder (2021) [20], anorexia nervosa (2019) [21], major depression (2019) [22], schizophrenia (2020) [23], and OCD (2018) [24]. Polygenic scores were calculated from genome-wide association studies performed by the Social Science Genetic Association Consortium for cognitive performance (2018) and educational attainment (2018) [25]. The public LDpred2 beta weights from the Polygenic Index Repository [26] were used to calculate polygenic scores for extraversion, neuroticism, openness, risk tolerance, depressive symptoms, loneliness, and subjective well-being.
2.5 Polygenic score associations with GSR factors
Phenotypes were available for N = 818, and N = 460 also had genetic data that passed quality control. This subset was pruned to remove related individuals using GCTA [27] with a relatedness threshold of 0.05 (N = 8 individuals removed, N = 452 remaining). The polygenic scores (PGS) for the 452 individuals were then centered to have a mean of 0 and scaled to have a standard deviation of 1. Polygenic score main effects were obtained by linear modeling with age in months, designated sex at birth, and autism diagnostic status (autism dx) included as covariates: GSR factor ∼ age + sex + autism dx + PGS.
3 Results
3.1 Distribution of the two quantitative gender diversity phenotypes
The demographic characteristics of the SPARK cohort are shown in Table 1. The cohort was majority autistic and designated female at birth. Approximately one-third of the cohort endorsed a non-cisgender gender identity.
The factor analyses of the Gender Self Report (GSR) generated two factors: Binary Gender Diversity (BGD) and Nonbinary Gender Diversity (NGD). BGD is the divergence from gender presumed by designated sex to the opposite, and NGD is the divergence from male and female gender norms. The distribution of the factor scores for the N = 818 individuals are shown in Figure 1. The factors range from 0 (no gender diversity) to 1 (high gender diversity), with the mode being for both near 0 (Figure 1A). The overall trends show higher gender diversity in females (Figure 1B) and autistic individuals (Figure 1C). Individuals who self-identify as transgender trended towards higher BGD, and likewise individuals who self-identify as nonbinary and/or gender neutral trended towards higher NGD (Figure 1D. Non-heterosexual sexual orientation identities (lesbian, gay, bisexual, pansexual, queer) also trended towards higher BGD and NGD (Figure 1 E).
3.2 Polygenic score associations with the Gender Self Report factors
Polygenic score main effects were obtained by linear modeling (lm) with age in months, designated sex at birth, and autism diagnostic status included as covariates: GSR factor ∼ age + sex + autism dx + PGS. Autism diagnostic status was included as a covariate because autism is genetically confounded and in our SPARK cohort and is also phenotypically confounded with gender diversity because the non-autistic participants are the parents of an autistic child (see Discussion for greater clarification on this). The beta coefficients from the PGS score term are shown Figure 2A. Higher polygenic scores for cognitive performance were significantly associated with higher BGD (lm: β = 0.017, p = 0.049) and NGD (lm: β = 0.036, p = 0.002), meaning that polygenic propensity for greater cognitive performance is predictive of elevated BGD and NGD. Additionally, higher polygenic scores for educational attainment were also associated with higher NGD (lm: β = 0.030, p = 0.015). No neuropsychiatric polygenic scores were significantly associated with gender diversity, although higher obsessive compulsive disorder (OCD) polygenic scores trended towards higher BGD (lm: β = 0.013, p = 0.116), and higher bipolar disorder polygenic scores trended towards lower NGD (lm: β = −0.019, p = 0.12).
4 Discussion
We analyzed two quantitative characterizations of gender diversity, Binary Gender Diversity (BGD) and Nonbinary Gender Diversity (NGD), from factor analysis of the Gender Self-Report (GSR) in a neurodiverse sample of N = 818 adults in the SPARK autism cohort. In this sample, we found greater gender diversity in female, autistic, and LGBTQ+ identifying individuals. Due to the structure of SPARK, we were only able to collect data from independent adults with autism or non-autistic parents of children with autism. Therefore, the elevated gender diversity in the autistic subset should be interpreted with the major caveat that the non-autistic parents were older and presumed to adhere to more traditional gender roles. Still, these results are in line with rigorous prior research that has shown the enrichment for gender diversity in autism [7]. Intriguingly, while our results showed higher gender diversity in LGBTQ+ individuals, many people who identify as cisgender also showed evidence of gender diversity. This underscores the value of the GSR in capturing dimensional gender diversity beyond self-endorsed identities. The formation of gender identity is a complex and multi-factorial process [28], and is contextualized by numerous factors like time (e.g., age, generation), region, and culture. Additionally, the conceptualization of these identities requires understanding of how they relate to other points of reference, which can be different for some autistic people [29] who may struggle with understanding social and gender norms.
We performed polygenic score (PGS) associations of GSR factors with brain-related PGS, including cognitive performance, educational attainment, personality traits, and neuropsychiatric conditions in N = 452 individuals, 44% of whom are autistic. Our linear models included age, designated sex at birth, and autism diagnostic status as covariates. For our analyses, we treated autism as a genetic confound with the PGS associations because autism and educational attainment are genetically correlated [19], and the higher autism PGS have been associated with cognitive performance in a general population sample [30]. However, future work should analyze the interaction between autism and PGS in their associations with gender diversity. The major limitation of our PGS associations was the small sample size, and hence we were unable to power an analysis stratified on autism diagnostic status. Despite the small sample size, we did observe higher polygenic propensity for cognitive performance was significantly associated with greater gender diversity (BGD and NGD). Higher polygenic propensity for educational attainment was also significantly associated with greater NGD. Overall, these first-of-their-kind results indicate that of the genetic factors that may contribute to gender diversity, the most immediately detectable in our small sample were those related to cognitive performance. This finding may suggest that cognitive capacity is a necessary ingredient in the development of more complex and nuanced gender identities. We expect to see additional associations emerge in future studies with larger sample size that will illuminate both the genetic and environmental/experiential contributions to gender diversity.
Data Availability
The SPARK genetic data can be obtained at SFARI Base. The SPARK Research Match data will be available to qualified, approved researchers through SFARI Base upon publication of this article.
Data availability statement
The SPARK genetic data can be obtained at SFARI Base: https://base.sfari.org
The SPARK Research Match data will be available to qualified, approved researchers through SFARI Base upon publication of this article.
Funding
This work was supported by the National Institutes of Health (MH105527 and DC014489 to JM) and the National Institute of Mental Health (R01MH100028 to JS), as well as grants from the Simons Foundation (SFARI 516716 to JM), the Clinical and Translational Science Award (KL2TR001877 to JM), the Fahs-Beck Fellow Grant to JS, and the National Institutes of Health Predoctoral training grant (T32GM008629 to TT) This work was supported by the University of Iowa Hawkeye Intellectual and Developmental Disabilities Research Center (Hawk-IDDRC) through the Eunice Kennedy Shriver National Institute of Child Health and Human Development (P50HD103556).
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Acknowledgments
We are grateful to all of the individuals and families in SPARK, the SPARK clinical sites, and SPARK staff. We appreciate obtaining access to genetic and phenotypic data for SPARK data on SFARI Base.