Introduction

Extraversion is a personality trait characterized by the tendency to experience positive emotions, to be active and feel energetic, to be talkative and to enjoy social interactions. Extraversion is associated with numerous psychosocial, lifestyle and health outcomes, such as academic and job performance, well-being, obesity, substance use, physical activity, bipolar disorder, borderline personality disorder, Alzheimer’s disease, and longevity (De Moor et al. 2006, 2011; Distel et al. 2009a; Furnham et al. 2013; Judge et al. 2013; Middeldorp et al. 2011; Rhodes and Smith 2006; Sutin et al. 2011; Terracciano et al. 2008; Terracciano et al. 2014; Weiss et al. 2008).

Extraversion can be measured with multiple inventories that have been developed as part of different personality theories. For example, extraversion is one of the five personality domains as assessed with the Neuroticism–Extraversion–Openness to Experience (NEO) personality inventories (Costa and McCrae 1992). Extraversion is also included in Eysenck’s three-dimensional theory of personality (Eysenck and Eysenck 1964, 1975; Eysenck et al. 1985). In Cloninger’s theory on temperaments and characters (Cloninger 1987; Cloninger et al. 1993), Harm Avoidance, Novelty Seeking and Reward Dependence are related to extraversion (De Fruyt et al. 2000). Tellegen’s personality theory posits the higher order domain of Positive Emotionality (Patrick et al. 2002), which resembles and is highly correlated with extraversion (Church 1994).

We showed recently, by performing an Item Response Theory (IRT) analysis using test linking (Kolen and Brennan 2004), that item data on Extraversion, Reward dependence and Positive Emotionality can be harmonized to broadly assess the same underlying extraversion construct (van den Berg et al. 2014). This harmonization was performed in over 160,000 individuals from 23 cohorts participating in the Genetics of Personality Consortium (GPC). Briefly, harmonization was carried out in each cohort separately by first fitting an IRT model to data from individuals who had completed at least two different personality questionnaires. Next, based on calibrated item parameters, personality scores were estimated based on all available data for each individual, irrespective of what personality questionnaire was used. The harmonized extraversion phenotype was heritable. A broad-sense heritability of 49 % was estimated, based on a meta-analysis in six twin cohorts that are included in the GPC (29,501 twin pairs), of which 24 % was due to additive genetic variance and 25 % due to non-additive genetic variance. The broad-sense heritability estimate is similar to heritability estimates obtained for extraversion as assessed with single measurement instruments (Bouchard and Loehlin 2001; Distel et al. 2009b; Finkel and McGue 1997; Keller et al. 2005; Rettew et al. 2008; Yamagata et al. 2006). Some evidence for qualitative sex differences in the genetic influences on extraversion was suggested by a genetic correlation in opposite-sex twin pairs of 0.38 (van den Berg et al. 2014). Extraversion becomes more genetically stable during adolescence until it is almost perfectly genetically stable in adulthood (Briley and Tucker-Drob 2014; Kandler 2012), that is, the same genes are responsible for extraversion measured at different ages.

A handful of genome-wide association (GWA) studies for extraversion have been published, aimed at detecting specific single nucleotide polymorphisms (SNPs) that explain part of the heritability. The first GWA study for personality, which focused on the five NEO personality traits, was conducted in 3972 adults (Terracciano et al. 2010). No genome-wide significant SNP associations were found for extraversion, although some interesting associations with P-values <10−5 were seen with SNPs in two cadherin genes and the brain-derived neurotrophic factor (BDNF) gene. A subsequent meta-analysis of GWA results for the NEO personality traits, conducted in 17,375 subjects, also did not yield any genome-wide significant associations for extraversion (De Moor et al. 2012). Two other GWA studies reported a similar lack of genome-wide significance for Cloninger’s temperament scales (Service et al. 2012; Verweij et al. 2010). Interestingly, a study that performed a genetic complex trait analysis (GCTA; Yang et al. 2010) for neuroticism and extraversion in around 12,000 unrelated individuals reported that 12 % (SE = 3 %) of the variance in extraversion was explained by common SNPs of additive effect (Vinkhuyzen et al. 2012). Taken together, the results from twin and genome-wide studies suggest that common SNPs of additive effect are important, that genetic non-additivity may play a role, and that large sample sizes are likely to be required to identify specific variants.

In this paper, we report the results of the largest meta-analysis of GWA results for extraversion so far, carried out in 29 cohorts that participate in the GPC. A total of 63,030 subjects with harmonized extraversion and genome-wide genotype data were included in the meta-analysis. A 30th cohort was used for replication. In this consortium we reported earlier on a genome-wide significant hit for neuroticism (De Moor et al. 2015), indicating that we may begin to analyze data from sufficiently large samples, to obtain the first significant findings from GWA studies for personality. In addition to meta-analysis of GWA results, we computed weighted polygenic scores in an independent cohort and associated them with extraversion, and estimated variance explained by SNPs in two large cohorts.

Materials and methods

Cohorts

The full meta-analysis was performed on 63,030 subjects from 29 discovery cohorts. All samples were of European origin. Twenty-one cohorts were from Europe, six from the United States and two from Australia. Sample sizes of the individual cohorts ranged from 177 to 7210 subjects. Please note that some cohorts were also part of previously published GWA studies on extraversion. The Generation Scotland: Scottish Family Health Study (GS:SFHS) cohort was included as a replication sample (9,783 subjects). A brief overview of all cohorts is provided in Table 1. A description of each individual cohort is found in the Supplementary materials and methods (see also De Moor et al. 2015).

Table 1 Overview of 29 discovery cohorts and 1 replication cohort of the Genetics of Personality Consortium

Phenotyping

A harmonized latent extraversion score was estimated for all participants in all 29 cohorts that were included in the GWA meta-analysis. This score was based on all available extraversion item data for each individual (for a detailed description see van den Berg et al. 2014). Extraversion item data came from the extraversion scales of the NEO Personality Inventory, the NEO Five Factor Inventory, the 50-item Big-Five version of the International Personality Item Pool inventory, the Eysenck Personality Questionnaire and the Eysenck Personality Inventory, from the Reward Dependence scale of the Cloninger’s Tridimensional Personality Questionnaire, and from the Positive Emotionality scale of the Multidimensional Personality Questionnaire (see van den Berg et al. 2014 and Supplementary materials and methods). In the GS:SFHS cohort that was included for replication of top signals, extraversion was based on the summed score of the extraversion scale of the EPQ Revised Short Form.

Genotyping and imputation

Genotyping in all cohorts was carried out on Illumina or Affymetrix platforms, after which quality control (QC) was performed, followed by imputation of genotypes. QC of genotype data was performed in each cohort separately, with comparable but cohort specific criteria. Standard QC checks included tests of European ancestry, sex inconsistencies, Mendelian errors, and high genome-wide homozygosity. Checks for relatedness were conducted in those cohorts that aimed to include unrelated individuals only. Other checks of genotype data were based on minor allele frequencies (MAF), SNP call rate (% of subjects with missing genotypes per SNP), sample call rate (% of missing SNPs per subject) and Hardy–Weinberg Equilibrium (HWE). Genotype data were imputed using the 1000Genomes phase 1 version 3 (build37, hg19) reference panel with standard software packages such as IMPUTE, MACH, or Minimac, see Supplementary Table 1.

Statistical analyses

GWA analysis per cohort

GWA analyses were conducted independently in each cohort. Since the cohorts used different research designs (case–control, population twin studies, extended pedigrees, etc.), GWA methods were optimized for each cohort. Extraversion scores were regressed on each SNP under an additive model, with sex and age included as covariates. Covariates such as ancestry Principal Components (PCs) were added if deemed necessary for a particular cohort. In all analyses, the uncertainty of the imputed genotypes was taken into account, either using dosage scores or mixtures of distributions. In those cohorts that included related individuals, the dependency among participants was accounted for using cohort-specific methods. Standard software packages for GWA analyses were used (see Supplementary Table 1).

Meta-analysis of GWA results across cohorts

A meta-analysis of the GWA results was conducted with the weighted inverse variance method in METAL (http://www.sph.umich.edu/csg/abecasis/metal/index.html). Excluded from meta-analysis were poorly imputed SNPs (r 2 < 0.30 or proper_info < 0.40) and SNPs with low MAF (MAF < √(5/N), which corresponds to less than 5 estimated individuals in the least frequent genotype group, under the assumption of HWE). This resulted in a total number of 7,460,147 unique SNPs in the final meta-analysis (with 1.1–6.6 M SNPs across cohorts). For 2182 SNPs, SNP locations could not be matched with rs names. For an additional 516,362 SNPS, results were based on one cohort only and therefore left out of the analysis, so that the results are based on 6,941,603 SNPs. Genomic control inflation factors (lambda), Manhattan plots and quantile–quantile plots per cohort are provided in Supplementary Table 2 and Supplementary Figs. 1, 2. A P value of 5 × 10−8 was used as the threshold for genome-wide significance.

The meta-analysis results (P-values per SNP) were used as the input to compute P-values at the gene level. We performed these analyses in KGG (Li et al. 2012). A P-value of 2.87 × 10−6 was used as the threshold for genome-wide significance in these gene-wide analyses, based on controlling for the false-discovery rate (Benjamini and Hochberg 1995).

All GWAS SNP top hits with a P-value smaller than 1 × 10−5 were selected for replication in the GS:SFHS cohort.

Polygenic risk score analysis

Additional analyses were conducted to test whether extraversion could be predicted in an independent target cohort based on the GWA meta-analysis results. The target cohort was the Netherlands Twin Register (NTR) cohort (8648 subjects). Polygenic risk scores for this cohort were estimated using LDpred (Vilhjalmsson et al. 2015) that takes into account linkage disequilibrium among the SNPs. The estimation was based on a GWA meta-analysis in which the NTR and NESDA cohorts were excluded (further referred to as the discovery set). With the LD-corrected polygenic risk scores, generalized estimating equation (GEE) modeling was applied to test whether the polygenic risk scores predicted extraversion in the target cohort. The covariates age, sex and ten PCs were included as fixed effects in the model. The model also included a random intercept with family number as the cluster variable, to account for dependency among family members. Outliers on the PCs, including ethnic outliers, were excluded from the analysis.

Variance explained by SNPs

In the NTR cohort and the QIMR Berghofer Medical Research Institute (QIMR) adult cohort (see also Supplementary materials and methods), GCTA software (Visscher et al. 2010; Yang et al. 2010) was used to estimate the proportion of variance in extraversion that can be explained by common SNPs of additive effect. In the NTR, this analysis was carried out in a set of 3597 unrelated individuals and in the QIMR adult cohort this was done in 3369 unrelated individuals (in each cohort one member per family was selected with harmonized extraversion and genome-wide SNP data). GCTA analysis was based on best guess genotypes obtained in PLINK using a threshold of a maximum genotype probability >0.70, and additionally filtering on r-squared >0.80. Next, in estimating the GRM matrix in the GCTA software, SNPs with MAF <0.05 were excluded. The additive genetic relationship matrices (GRM) estimated based on SNPs for all individuals formed the basis to estimate the proportion of phenotypic variance explained by SNPs in the NTR and QIMR cohorts. In other words, it was determined to what extent phenotypic similarity between individuals corresponds to genetic similarity (at the SNP level). For both NTR and QIMR, sex, age and a set of population-specific PCs were included as covariates.

Results

Meta-analysis of GWA results

Meta-analysis of GWA results across the 29 discovery cohorts did not yield genome-wide significant SNPs associated with extraversion. The lowest P-value observed was 2.9 × 10−7 for a SNP located on chromosome 2. There were 74 SNPs with P-values <1 × 10−5. The Manhattan and quantile–quantile plots are provided in Figs. 1 and 2. A list with the top five SNPs is given in Table 2. A list with all SNPs that reached the level of suggestive genome-wide significance (P < 1 × 10−5) is found in Supplementary Table 3. The results of all SNPs can be downloaded from www.tweelingenregister.org/GPC. A gene-based test showed one significant hit for LOC101928162, a long non-coding RNA site, P = 2.87 × 10−6. A list with the top five genes from the gene-based analysis is provided in Table 3. Supplementary Table 4 provides the top 30 genes. Among the top 30 genes was Brain-Derived Neurotrophic Factor (BDNF, P = 0.0003), a gene also implicated, though not genome-wide significant, in Terracciano et al. (2010), as was the BDNF anti-sense RNA gene (P = 0.0001).

Fig. 1
figure 1

Manhattan plot for meta-analysis results of 29 discovery cohorts for extraversion in the Genetics of Personality Consortium

Fig. 2
figure 2

Quantile-Quantile plots for meta-analysis results of 29 discovery cohorts for extraversion in the Genetics of Personality Consortium

Table 2 Top SNPs from the meta-analysis of GWA results in 29 discovery cohorts for extraversion, and their replication in the GS:SFHS cohort, in the Genetics of Personality Consortium
Table 3 Top genes from the meta-analysis of GWA results in 29 discovery cohorts for Extraversion in the Genetics of Personality Consortium

Results of the follow-up analysis of the top five SNPs in the GS:SFHS cohort can be found in Table 2. Of the top five SNPs, none showed a significant effect. For an overview of the replication results of all top SNPs with P-value <1 × 10−5 see Supplementary Table 3. Of the 74 SNPs tested in the replication cohort, three SNPs showed nominal evidence of association (P < 0.05), which is less than the number expected based on chance alone (0.05 × 74 = 3.7).

Polygenic risk score analysis

There were 8201 persons individuals with polygenic scores for prediction of extraversion. The LDpred-based genetic risk scores significantly predicted extraversion in the target cohort, B = 0.059, X 2(1) = 27.30, P < 0.001.

Variance explained by SNPs

In the NTR cohort, an estimated 5.0 % (SE = 7.2) of the variance in extraversion was explained by all SNPs, but this estimate was not significantly different from zero (P = 0.24). In the QIMR cohort, 0.0001 % (SE = 15) of the variance was explained by SNPs (P = 0.46).

Discussion

This study assessed the influence of common genetic variants on extraversion in 63,030 individuals from 29 cohorts in the GPC. First, a meta-analysis of GWA analyses across 29 discovery cohorts showed no genome-wide significant SNPs. Top SNPs detected in the meta-analysis of GWA results in the discovery phase were not replicated in the GS:SFHS cohort. The SNPs with lowest P-values have no previously reported relationship with personality, psychopathology or brain functioning. Polygenic risk scores based on the meta-analysis results predicted extraversion in an independent data set. SNP-based heritabilities for extraversion were not significantly different from zero in two large cohorts of the GPC.

Although there were no genome-wide significant results for individual SNPs, in the gene-based analysis, there was a significant hit for one locus, LOC101928162. This is long noncoding RNA site whose function remains elusive. Interestingly, among the top 30 genes were genes previously implicated in extraversion or in psychiatric disorders associated with extraversion. The low P-value for CRTAC1 (P = 2.97 x 10−5), harks back to an interesting extraversion SNP (rs7088779) in a previous GWAS on personality (Amin et al. 2013) that is located between CRTAC1 and C10orf28RELN (P = 5.69 x 10−5) has been reported to increase the risk for schizophrenia and bipolar disorder (Kuang et al. 2011; Ovadia and Shifman 2011), while ADAM12 (7.65 x 10−5) was previously found to be involved in schizophrenia (Farkas et al. 2010), and bipolar disorder treatment (Nadri et al. 2007). The BDNF gene was also implicated in a previous extraversion GWAS (Terracciano et al. 2010), though not genome-wide significant. Liu et al. (2005) reported a trend towards association of BDNF variants with substance abuse, Jiao et al. (2011) reported an association with obesity, and Lang et al. (2007) and Beuten et al. (2005) reported associations with smoking behavior. As extraversion is known to be associated with lifestyle, obesity and substance abuse, we deem BDNF to be an interesting candidate gene for extraversion in future studies, along with CRTAC, ADAM12 and RELN.

With the current meta-analysis we more than tripled the sample size as compared to the largest previously published meta-analysis for extraversion (De Moor et al. 2012). In contrast to neuroticism, no genome-wide significant SNPs were found. Some have argued (Turkheimer et al. 2014) that the heritability of personality traits represents nonspecific genetic background, which is composed of so many genetic variants with extremely small effect sizes that individually these have no causal biological interpretation. It may be that extraversion differs in this respect from neuroticism. One other difference was indicated from the analyses of the IRT-based extraversion and neuroticism scores: whereas for neuroticism no evidence for genotype x sex interaction was seen (van den Berg et al. 2014), for extraversion there was significant evidence for sex limitation. It also is interesting to note that despite the fact that for extraversion no genome-wide significant findings emerged for single SNPs, we were able to predict extraversion in an independent dataset, based on the polygenic risk cohorts from the discovery set. This indicates that some true signal is entailed in the meta-analysis results.

The results of the polygenic risk score analysis are in contrast with the results from the GCTA analysis, in which no significant proportion of variance explained by SNPs was detected in two large cohorts of the GPC. Our study on neuroticism reported a SNP-based heritability of 15 % (De Moor et al. 2015). The current extraversion GCTA findings are also somewhat at odds with two previous GCTA studies for personality traits. One study focused on neuroticism and extraversion as measured with different instruments in four cohorts, and found on average 12 % explained variance for extraversion, although across cohorts these estimates varied widely (0–27 %) (Vinkhuyzen et al. 2012). Estimates for neuroticism also varied, but were generally lower than for extraversion in this study, with an average of 6 % explained variance. In another study, between 4.2 and 9.9 % of explained variances were found for the four Cloninger temperaments in a combined sample of four cohorts (Verweij et al. 2012). The proportions of variances for Harm Avoidance, Novelty Seeking and Persistence were significant at P < 0.05, whereas interestingly the proportion of variance for Reward Dependence was not. It should be noted that both these studies included the QIMR cohort in their analyses, so there is some overlap in subjects across studies. The difference is that in the earlier studies extraversion and reward dependence were based on single personality inventories, while in our study extraversion scores harmonized among different personality inventories were analyzed. What our results and the results in the previous studies have in common though, is that the estimates are considerably smaller than the heritability estimates based on twin studies. Given that about half of the heritability of extraversion consists of non-additive genetic variance (van den Berg et al. 2014), it is not unlikely that this discrepancy is caused by the influence of common variants that interact within loci (dominance) or across loci (epistasis). In addition, the influence of rare variants may be implicated. The relatively limited influence of common additive genetic variation, as well as a previously reported finding that higher levels of inbreeding are associated with less socially desirable personality trait levels, has led to the idea that the genetic variation in personality traits may have been maintained by mutation–selection balance (Verweij et al. 2012), and our results are consistent with this idea.

This study comes with some limitations. Genotyping, QC, and imputation were carried out separately in each cohort. Any difference in procedures may have caused some loss of statistical power to detect SNPs in the meta-analysis. Similarly, extraversion item data were harmonized as much as possible (van den Berg et al. 2014), but the Reward Dependence item data from the TCI were least successfully linked to the extraversion data from the other inventories. This may also have caused some loss in power. Importantly however, it should be noted that by combining genotype and phenotype data across cohorts as performed in this study, a substantial increase in sample size was obtained. It is nontrivial that the gain in power associated with this increase in sample size largely outweighs any potential loss in power due to any remaining genotyping or phenotyping differences across cohorts.

In conclusion, extraversion is a heritable, highly polygenic personality trait with a genetic background that may be qualitatively different from that of other complex behavioral traits. Future studies are required to increase our knowledge of which types of genetic variants, by which modes of gene action, constitute the heritable nature of extraversion. Ultimately, this knowledge can be used to increase our understanding of how extraversion is related to various important psychosocial and health outcomes.