Gender-Related Variables for Health Research

Mathias W. Nielsen; Marcia L. Stefanick; Diana Peragine; Torsten B. Neilands; John P. A. Ioannidis; Louise Pilote; Judith J. Prochaska; Mark R. Cullen; Gillian Einstein; Ineke Klinge; Hannah LeBlanc; Hee Y. Paik; Londa Schiebinger

doi:10.1101/2020.09.17.20196824

Abstract

This study develops a gender assessment tool—the Stanford Gender-Related Variables for Health Research—for use in clinical and population research, including large-scale health surveys involving diverse Western populations. While analyzing sex as a biological variable is widely mandated, gender as a sociocultural variable is not, largely because the field lacks quantitative tools for analyzing the influence of gender on health outcomes. We conducted a comprehensive review of English-language measures of gender from 1975 to 2015 to identify variables across three domains: gender norms, gender-related traits, and gender relations. This yielded 11 variables tested with 44 items in three US cross-sectional survey populations: two internet-based (N= 2,051; N= 2,135) and a patient-research registry (N= 489), conducted between May 2017 and January 2018. Exploratory and confirmatory factor analyses distilled 11 constructs to 7 gender-related variables: caregiver strain, work strain, independence, risk-taking, emotional intelligence, social support, and discrimination. Regression analyses, adjusted for age, ethnicity, income, education, sex assigned at birth, and self-reported gender identity, identified associations between these gender-related variables and self-rated general health, physical and mental health, and health-risk behaviors. Our new instrument can be used to develop health interventions based on a fuller understanding of gender associations with health.

Main Text

Introduction

Nearly 20 years ago the U.S. Institute of Medicine (IOM, now the Academies of Science) recognized that sex (biology) matters in determining health outcomes but also that gender (sociocultural factors related to social gender norms and expectations, gender-related traits and identities, and gender relations) interacts with sex to influence health and disease processes across the lifespan (1, 2). In the past decade, both the Canadian Institutes of Health Research (2010) (3) and the European Commission (2014) (4) have endorsed integrating sex and gender (usually as male/female binaries) into health research, and the US National Institutes of Health (NIH) has mandated the inclusion of Sex as a Biological Variable (SABV—2016) (5). Yet still today, sex and gender are often inappropriately conflated in the biomedical literature (6), and the NIH has not mandated Gender as a Sociocultural Variable (GASV) largely because the field lacks quantitative tools for analyzing the influence of gender on health outcomes.

Gender behaviors interact with biological sex factors in health outcomes. COVID-19 fatality rates, for example, are proportionally higher in men (about 60%) in most countries, despite a similar proportion of men and women contracting the virus. Sex differences in viral receptor, viral reproduction, antibody production and efficacy, among other factors related to basic biological sex differences, account for some differences, but gender plays a major role as well. For example, smoking rates (higher in men), preventive measures such as hand washing (lower in men), and healthcare seeking (delayed in men) likely contribute to these outcomes (7).

Despite several efforts to examine gender and health (8, 9), the field lacks adequate tools to assess gender. To address this problem, we set out to develop a new instrument – the Stanford Gender-Related Variables for Health Research (GVHR).

Our interest was piqued by a 2007 study that reported that men with higher “femininity” scores had lower risk of coronary heart disease. No such relationship was observed among women (10). Similarly, a 2015 study found that, independent of biological sex, young adults with a gender score more strongly associated with “feminine characteristics” were more likely to experience a recurrence of acute coronary syndrome (ACS)—regardless of whether they identified as a man or a woman (a non-binary option was not offered) (11, 12).

These innovative studies sought to account for both sex and gender. We were concerned, however, that the conclusions were based on outdated gender constructs, such as the Bem Sex Role Inventory (BSRI, 1974) (13), developed using predominantly white, higher socioeconomic U.S. undergraduate student participants and based on outdated notions of masculinity and femininity (9, 10) or their cognates. We also saw that the GENESIS-PRAXY gender score used in the 2015 study was developed and tested in a sample of 909 heart-disease patients and had not been cross-validated in broader patient, or non-patient, populations. We were further concerned that despite its focus on gender, the GENESIS-PRAXY study used logistic regression with biological sex as the outcome to determine which variables to include in its final score. We also found that the internal consistency of the measure had not been explicitly tested.

More importantly, we were concerned that gender was reduced to a bipolar index of “masculinity” and “femininity,” i.e. concepts which were historically construed as complementary oppositions (16), with a man being one thing and a woman being the opposite (for example, rational/emotional; public/private; mind/body). These concepts are too broad and imprecise to be useful in health research. If the goal is to provide physicians and policy makers with gender-related health interventions, measures of gender-related behaviors, such as caregiving or risk-taking, should be labelled as such and not reduced to a “masculinity” versus “femininity” score.

To address these limitations, we develop a new gender-variables instrument. In a systematic review of the English-language literature from 1975 to 2015, we identified 74 eligible scales used in gender-related measures (see Materials and Methods). Several limitations made it impossible to simply plug them into a new questionnaire. Very few existing scales (N=8) focus on associations between gender and health (17–21). The eligible gender-related scales are generally restricted to either men or women and assess either masculinity (22) or femininity (23, 24), or both as unipolar or bipolar constructs (13, 25–31) (e.g., hyper-masculinity or hyper-femininity) (19, 21, 32–42). Further, most scales rely on “agree-disagree” ratings, making them susceptible to acquiescence bias (43).

From the 74 scales, we distilled 11 composite gender constructs and developed 44 items to measure gender, which we subjected to exploratory and confirmatory factor analysis (EFA and CFA) in three independent U.S. survey samples. This reduced the original 11 gender-related variables to 7 factors: caregiver strain, work strain, independence, risk-taking, emotional intelligence, social support, and discrimination and the 44 survey items to 25 (see Materials and Methods). We then examined the relevance of the derived subscales in allowing for more precise analysis of variations in health-related quality of life, obesity, and risky health behaviors. This initial step toward developing more comprehensive and precise survey-based measures of gender, in relation to health, captures key aspects of three dimensions of gender that can be deployed quantitatively in diverse clinical research or large health surveys: 1) Gender Norms; 2) Gender-Related Traits and 3) Gender Relations (see Tables S1 and S2), each of which may correlate with sex assigned at birth or self-reported gender identity, without a predetermined coding of that behavior as masculine or feminine.

Box 1. Three interrelated dimensions of gender

Gender refers to culturally constructed norms and relations that shape the identities, attitudes, behaviors, bodily appearances, and habits of women, men, and gender-diverse individuals. Gender is multidimensional and complex, changing as social norms and values change. Gender also intersects with other sociocultural categories. The gender variables developed here appear alongside self-reported variables collected in our survey, including: sex assigned at birth, self-reported gender identity, sexual orientation, ethnicity, age, individual and household income, and education, among others (see SI text).

Gender Norms consist of legislated and both spoken and unspoken cultural rules produced through social institutions (such as governments, families, schools, workplaces, laboratories, universities, or boardrooms), cultural products (such as technologies, science, literature, and social media), and broader local and global cultures (77, 78). Here, we measure individuals’ adherence to gender norms through self-reported behaviors (not attitudes or stereotypes).
Gender-Related Traits refers to how individuals or groups perceive and present themselves in relation to gender norms. Our key interest is how individuals think and act vis-à-vis cultural meanings ascribed to gender (79, 80). Thus, we measure gender-related traits through self-reported assessments of personality attributes.
Gender Relations refer to how individuals interact with other people and institutions in specific sociocultural contexts. This dimension encompasses how gender shapes social interactions in families, schools, workplaces, and other public settings (78, 81). We measure gender relations through self-reported assessments of social relations between people of different gender identities within social institutions and societies at large.

These three meta-categories are not mutually exclusive. Gender is multidimensional: any given individual may experience different configurations of gender norms, traits, and relations that cannot be subsumed into a “masculine” or “feminine” score or considered “fixed”.

Results

EFA, CFA, and Reliability Assessment

Table 1 reports the standardized regression weights, Raykov’s ρ, and item scoring for the trimmed CFA model in samples 1, 2 and 3, with seven variables. The 7 factors listed above represent our final gender-related variables. Our analyses yielded low to moderate inter-factor correlations (Table S3), which is not unusual in multidimensional gender measures (19, 44, 45). We calculated mean-item subscale scores for each factor and used these as predictors in the regressions presented below. For subscales including continuous variables, mean-item scores were calculated based on standardized variables (z-scores). All variables are scored from lower to higher levels of the given constructs.

View this table:

Table 1.

Factor loadings for the trimmed CFA models in Samples 1, 2 and 3 and Raykov’s ρ for each factor.

Figure 1 displays the z-scores (averaged by group) for the 7 gender-related variables for respondents seeing themselves as men, women, and gender fluid/non-binary in sample one. The figure demonstrates the advantage of capturing specific gender-related behaviors and attitudes through multiple variables.

Fig 1.

Gender-related variables capturing specific behaviors and attitudes. The figure displays the z-scores for the seven gender-related variables for respondents seeing themselves as men (green), women (orange) and gender fluid/Non-binary (grey) in sample 1 (N= 1893).

Associations with self-rated health and health-risk behaviors

Existing research shows notable sex differences in health-related quality of life, obesity, and risky health behaviors, such as smoking and heavy drinking (46–48). Here, we examine the relevance of our gender-related variables in predicting self-related health and health-risk behaviors.

Associations between our 7 gender-related variables, sex assigned at birth, self-reported gender identity and physical health, mental health, and activity limitations due to poor physical or mental health were analyzed in samples 1 and 2 using negative binomial regressions, with birth year, personal income, education level, race, and ethnicity as covariates. Only associations that are consistent across samples 1 and 2 are reported here. Measures of reported birth sex and gender identity were highly correlated. Therefore, we ran all models twice: once including birth sex and once including gender identity. Here, we report the outcomes of the models including birth sex as a key predictor (see Tables S4-S6 for specifications on the regression models including gender identity).

As displayed in Table 2, caregiver strain and discrimination were associated with lower physical health, mental health, and activity levels in both samples, whereas social support was associated with higher mental health and activity levels in both samples. The results for the remaining associations were inconclusive in one or both samples.

View this table:

Table 2.

Adjusted incidence rate ratios and 95% confidence intervals of associations with health-related measures in negative binomial regressions

Associations between our gender-related variables, sex assigned at birth, self-reported gender identity and general health status, smoking, vaping, binge drinking, and BMI were analyzed using logistic regressions in samples 1 and 2 (Table 3, adjusted for year of birth, personal income, education level, ethnicity, and race), and we ran separate models with sex and gender identity (see Tables S7-S11 for specifications on the regression models including gender identity). In both samples, caregiver strain, discrimination, and male birth sex were associated with fair or poor self-rated health, while risk-taking and social support predicted good, very good, or excellent self-rated health. Further, caregiver strain and work strain were associated with smoking, while discrimination was associated with vaping and higher levels of risk-taking was associated with binge drinking. Caregiver strain, low levels of risk-taking, discrimination, and male birth sex were associated with overweight. The results for the remaining associations were inconclusive in one or both samples.

View this table:

Table 3.

Adjusted odds ratios and 95% CIs of associations with health-related measures in logistic regressions

Combining the data from samples 1 and 2 (and adjusting for sample in the regression models), all associations with self-related health and health-risk behaviors reported above persisted at the 99.9% confidence level (Figures 2 and 3), as did the following associations: emotional intelligence with smoking and male birth sex with vaping and binge drinking.

Fig 2.

Adjusted incidence rate ratios of associations with recent physical health, mental health and activity limitations. This figure displays the outcomes of the negative binomial regressions predicting health outcomes in the combined sample (sample 1 + sample 2) (Physical health, N=3,879; mental health, N=3,880; activity limitations, N=3,876). Error bars represent 99.9% confidence intervals. See Tables S20-S22 for model specifications.

Fig 3.

Adjusted odds ratios of associations with health status, smoking, vaping, binge drinking and BMI. This figure displays the outcomes of the binary logistic regressions predicting health outcomes in the combined sample (sample 1 + sample 2) (health status, N= 3,894; smoking, N=3,892 vaping, N=3,891; binge drinking, N=3,887; BMI, N= 3,851. Error bars represent 99.9% confidence intervals. See Tables S23-S27 for model specifications.

Discussion

Following a comprehensive review of gender measures from 1975 to 2015, we have applied a rigorous process to identify key aspects of gender for the purpose of developing a new gender assessment tool for use in clinical and population research, including large-scale health surveys involving diverse Western populations. Through exploratory and confirmatory factor analyses, we reduced the original 44 survey items to 25 and the 11 original constructs to 7 gender-related variables: caregiver strain, work strain, independence, risk-taking, emotional intelligence, social support, and discrimination.

Each variable captures an important aspect of gender within the populations studied. Each variable measures an individual participant’s self-reported behavior or attitude for that characteristic and is designed to be scored individually, as a distinct human behavior or attribute. Behaviors are not coded “masculine” or “feminine,” and we recommend against consolidating the variable scores into these unipolar or bipolar indices. Studies that reduce gender-related variables to “femininity” or “masculinity” scores give little guidance for behavioral interventions.

For example, if caregiver strain is found to be associated with higher risk of recurrence or death in patients with ACS, it should be reported as such. Subsuming gender-related factors into masculine or feminine indices will reduce, rather than improve, the precision and applicability of survey-based measures of health.

Clinicians and public health researchers can employ the Stanford GVHR to gain a fuller understanding of gender differences in health outcomes than can be captured by simply asking people to self-identify their sex or gender. Specifically, we recommend that researchers start by measuring all 7 variables alongside sex assigned at birth, self-reported gender identity, and other relevant factors, such as sexual orientation, ethnicity, age, income, and education, to specify which characteristics, traits and behaviors may be predictive of the health issue in focus. In subsequent studies and patient-based outcome measures, the focus may be restricted to a subset of variables of documented relevance to a specific disease or health condition.

The Stanford GVHR is specific to the survey cohorts, time, place, and culture in which it was developed and tested. Because gender norms, traits, and relations vary across and within cultures and change over time (49), we recommend that variables be updated with each generation or even more frequently. Obvious limitations are that our variables were developed from English-language literature and validated in nonprobability samples, recruited online and through a research registry in U.S. populations, and the age composition of our samples (median ages range from 36 to 50 years) does not represent typical patient populations encountered in clinical practice. We strongly encourage more research to test the validity of the gender-related variables within and across age groups and cultures, as well as across a wide range of global settings.

Any association of a gender-related variable with a health outcome does not infer causality. Our assessment was cross-sectional. Confounding or even reverse causality (health phenotypes affecting gender-related behaviors and attitudes) should be considered. Another limitation is the small number of items per construct which might limit generalizability; at the same time, a smaller number of items likely increases the usefulness of the questionnaire to practitioners and researchers. An avenue for further research is the expansion the number of items for each construct, especially emotional intelligence, which had less than ideal reliability across all three samples. In addition, a few of the initial gender-related constructs that were not retained in the exploratory factor analysis, such as competition and quality of family relationships, may have been deselected due to insufficient items in the study’s initial pool of attitudes and behaviors and should be reconsidered in future research.

In the future, the proposed list of variables to measure other gender-related factors could be expanded to integrate factors such as decision-making power (including over household resources and health expenditures) and the distribution of labor among both same- and different-sex cohabiting or romantic partners. Indeed, many of the measures that have informed our work may be outdated (noting that some date back to the 1970s and 1980s). It may be desirable to complement our proposed list of variables with more “timely” variables that better represent how specific patients and persons conceive of gender in 2020. It would also be interesting to explore associations between our gender-related variables and other health-related aspects such as health literacy, health-seeking behavior, and provider-patient interactions.

In conclusion, this project is an initial step toward developing more comprehensive and precise survey-based measures of gender in relation to health. Our questionnaire is designed to shed light on how specific gender-related behaviors and attitudes contribute to health and disease processes, irrespective of—or in addition to—biological sex and self-reported gender identity. Use of these gender-related variables in experimental studies, such as clinical trials, may also help us understand if gender has an important role as treatment effect modifiers and would thus need to be further considered in treatment decision-making.

Materials and Methods

Our research complies with all ethical regulations. Stanford University’s Institutional Review Board approved the study. We obtained informed consent from all participants.

Literature search

We searched PsycINFO, PsycTESTS, and PubMed for all English-language studies using gender-related tests and scales, from 1975 through 2015, to identify existing questionnaires or scales and construct a comprehensive list of typical traits and/or characteristics used in gender-related measures in both psychology and medicine. We screened an initial sample of 2,981 articles from PubMed, PsycTESTS, and PsycINFO from which 405 articles were deemed relevant for further interrogation, within which 127 unique gender-related test and scales were identified.

We also screened existing literature reviews published in books and found four additional scales. Altogether 131 gender-related questionnaires were sorted into three overarching categories for analyzing gender norms, gender-related traits, and gender relations. Further, we checked citation frequencies for each scale to determine how often it has been used in the literature. All gender scales with at least 20 Google Scholar citations within the last 10 years were selected for further investigation, of which 74 scales met the criteria. All articles published from 2006-2015 were retained for further investigation (the search methods, selection criteria, and review procedures are specified in SI text, Figure S1, and Tables S28-S30).

Questionnaire development

In developing the questionnaire, we recognized the need to minimize the time burden of completing the questionnaire; therefore, we limited the initial item pool to 3-6 items per construct (see Methods). To avoid acquiescence bias, we presented our items as construct-specific questions. The 74 scales guided our selection of core characteristics to be included among our gender-related measures.

1. Gender Norms

Guided by three of the 74 eligible scales, respondents’ adherence to gender norms was measured by three composite constructs (caregiver strain, time use, and work strain) consisting of 16 items.

Caregiver strain captures perceived consequences of responsibility for unpaid, long-term caregiving to children, partners, friends, and elderly (excluding housework and caregiving occupations) (50, 51) and consists of three items, adapted from Graessel and colleagues (52), recorded on a five-point scale: emotional exhaustion, physical exhaustion, and worries about the future caused by caregiving for someone in need, such as a child, elder, partner, or disabled family member. Higher scores on these items indicate higher levels of caregiver strain.

Time use measures individual hours spent per day, recorded as open-ended numerical estimates of daily time-spent (53), in the following categories: paid work, household activities, eating and drinking, leisure and sport, caring for others, sleeping, and commuting. The items were adapted from the American Time Use Survey.

Work strain measures job strain and emotional job demands as six items, recorded on a five-point scale: work speed, work repetition, emotional job demands, physical job demands, perceived risk, and physical hazards at work. The first four items were adapted from Karasek and Theorell (54); the last two were developed by the authors. Higher scores on each item indicate higher levels of work strain.

2. Gender-Related Traits

Guided by 31 of the original 74 eligible scales, gender-related traits were measured as five composite constructs: competitive, risk-taking, independence, communal, and expressive, consisting of 16 items.

Competitive consists of two items, recorded on a five-point scale, asking respondents how often they find themselves competing with others in situations that do not call for competition and how competitive they are in general, compared to others. The first item is modified from Ryckman et al. (55), the second is developed by the authors. Higher scores on each item indicate higher levels of competitiveness.

Risk-taking focuses on physical and behavioral, measured by three items adapted from Dohmen et al. (56), recorded on a five-point scale: general risk-taking behavior, risk-taking when making financial decisions, and risk-taking with respect to recreational activities. Higher scores on each item indicates higher levels of risk-taking.

Independence is a personality trait characterized by a focus on the person as an individual, not as part of a community or group, which includes agency, self-confidence, self-determination, and decision-making ability, but not self-control or self-esteem (57). Independence was based on three items, recorded on a five-point scale, adapted from Bakker et al. (58), Clark et al. (59), and Triandis et al. (60), asking respondents how important it is for them to be independent, how often they turn to others for help when in need, and how important it is for them to solve their problems independently. A higher score indicates a higher level of independence for all three items.

Communal is a trait characterized by a focus on the individual as part of a group or community, an orientation towards relationships and a concern for others’ needs and well-being (61). We used four items, scored on a five-point scale, asking respondents how often they worry about what other people think about them, how often they take other people’s needs into account when making important decisions, how often friends talk to them about their problems, and how easy it is for them to spot when someone in a group is feeling uncomfortable. Item one was developed by the authors, item two was adapted from Clark et al. (59), and items three and four were adapted from Baron-Cohen and Wheelwright (62). Higher scores indicate higher levels of communal orientation.

Expressive captures abilities to recognize and express emotions, such as sadness, anger, frustration, compassion, joy, or affection and includes aspects of emotional intelligence, i.e. individuals’ ability to recognize what they feel, manage those emotions, and use emotions in problem solving (63). We used four items, recorded on a five-point scale, asking respondents how often they talk to friends about their problems, how easy it is for them to understand their own feelings, how easy it is for them to express what they are feeling, and how easy it is for them to ask other people for help when in need. Item one was developed by the authors, item two was adapted from Salovey and colleagues (64), item three was adapted from Gross and John (65), and item four was adapted from Clark and colleagues (59). Higher scores indicate higher levels of expressivity.

3. Gender Relations

Guided by 40 eligible scales identified in the literature search, gender relations was measured by three composite constructs: social support, discrimination, and quality of family relationships, consisting of 13 items and a single-item measure of personal income.

Social support captures perceived satisfaction with the type (physical, emotional, informational, and financial), the availability, and level of support a person might receive. Support may come from partners, relatives, friends, coworkers, health-care systems, the larger community, etc.

Social support consists of four items recorded on a five-point-scale measuring the availability and level of social support a person might receive. We asked respondents how often, within the past year, they had someone they could ask for advice, someone to show them love and affection, someone to help them with daily chores, and how often they felt lonely. Item one, two, and three were adapted from the ENRICHD Social Support Inventory (66) and item four was developed by the authors. Higher scores indicate higher levels of social support.

Discrimination refers to “systemic unfair treatment” and can occur at multiple levels. We measured micro or interpersonal discrimination. Specifically, we asked the respondents how often they had felt discriminated against because of their gender, in general, when getting hired, when at school, when receiving medical care, in other public settings, and in their family. The items were recorded on a five-point-scale, with higher scores indicating more frequent experiences of gender discrimination.

Quality of family relationships measures experiences of harmony and conflict in familial settings, based on two self-developed items recorded on a five-point-scale asking respondents about how they would describe the quality of their relationship with close relatives (in the past year) and how often they had argued with close relatives (in the past year). Higher scores indicate higher levels of perceived quality in family relationships.

Initial testing of the questionnaire

Content validity of the first draft of the questionnaire was assessed by nine members of the author-group who had not taken part in the construction of the item-list, some with technical expertise in the construction of survey-questionnaires and some with expertise in gender-related aspects of health. Each coauthor was asked to rate each item with respect to its relevance in measuring a given variable construct. Ratings were made on a four-point scale (4= very relevant, 3= relevant but needs minor alteration, 2= unable to assess relevance without major revision, 1= not relevant). These coauthors were also invited to suggest additional items, if they considered the proposed item-list inadequate in capturing a given construct. To improve item quality, we conducted seven cognitive interviews with people recruited by posters in the local area who varied on demographic markers such as education-level, job, age, ethnicity, and gender. In the interviews, we used verbal probing techniques to identify questions that the interviewees found vague and unclear and to elicit how they arrived at answers to the questions. We did this by asking them to reflect on what the items meant to them, how they would rephrase the question in their own words, how they came up with their answers to the questions, and whether the questions were easy or hard to answer and why.

Survey Participants

Participants were recruited from the U.S. through two online services and a health research registry: Prolific, Amazon Mechanical Turk, and the Stanford Research Registry, which consists of ∼ 4,000 former adult patients at the Stanford University Medical Center who have agreed to be contacted for participation in research studies. We used the web-based Qualtrics software to collect data from Prolific (sample 1) in August and September 2017, Mechanical Turk (sample 2) in December 2017 and January 2018, and the Stanford Research Registry in May and June 2017 (sample 3). Sample 1 consisted of 2,051 respondents; 1,992 completed the survey. Sample 2 consisted of 2,135 respondents; 2,043 completed the survey. Sample 3 consisted of 489 respondents; 452 completed the survey. Sample characteristics are presented in Table 4.

View this table:

Table 4.

Sample Characteristics.

Procedures

Self-rated health was assessed using the Health-related Quality of Life Core Module (CDC HRQoL-4). This module consists of four items about perceived general health, recent physical health, recent mental health, and recent activity limitations. The ordinal question about perceived general health did not meet the assumption of proportionality of odds required for ordered logistic regressions. Hence, we dichotomized the item into (1) fair or poor and (2) good, very good or excellent. We measured current smoking and current vaping by number of cigarettes smoked per day and number of times vaping per day. These variables were dichotomized in the analysis (not smoking= 0, smoking= 1; not vaping= 0, vaping= 1) due to a high frequency of zero values (>75%). Binge drinking was measured by the frequency of consuming five or more drinks on one occasion for males and four or more drinks on one occasion for females (within the last three months) (67). We followed standard procedure and recoded these items into a unisex dichotomous variable (binge drinking less than monthly= 0, binge drinking monthly, weekly, or daily/almost daily= 1). BMI was calculated based on self-reported height and weight and dichotomized for analysis to reflect under or normal weight (BMI<25= 0) and overweight or obese (BMI≥25= 1) (item phrasing and response options for the health questions are reported in Table S31). Specifications on the nine demographic covariates (including item phrasing and response options) used in the regressions are presented in Table S32).

Exploratory factor analysis

We opted to start our analysis with EFA, rather than CFA, because we considered EFA to be the most appropriate first step for a survey measure of this novelty. While the systematic review allowed us to distill core gender-related attitudes and behaviors, we were uncertain how many latent factors would emerge in the subsequent testing. Moreover, we were uncertain how several of the items would distribute across factors (e.g. the items for time-use and quality in family relationships) and we wanted to leave open the possibility that various items would cross-load onto factors other than their parent factor, potentially leading to fewer latent factors than initially expected. As described in the results section, this happened to be the case. In addition, since we had the opportunity and resources to collect three survey samples, a complementary approach combining EFA and CFA allowed us to benefit from advantages of each method.

The EFA was based on iterated principal axis factoring as the extraction method and Promax (oblique) rotation to allow for correlated factors. Exploratory factor analysis and statistical analyzes were done in SPSS. In the EFA, we examined questions from all three gender categories in a common factor model with multiple factors. All respondents with missing data for relevant items were removed from sample 1 prior to the analysis. To allow for analysis of the largest possible sample, 10 items targeting caregivers and employees were recoded so that people not currently caring for someone in need or not currently employed (or employed in the past) were ascribed the value 1, which represents no strain due to caregiving or work (see Table S33).

We subjected the 44 questionnaire items to EFA in sample 1. Velicer’s minimum average partial test suggested a 7-factor structure (Table S34, screeplot, communalities, and unique variances are presented in Figure S2 and Table S35) (68). The conceptual clarity of this solution also best resembled the thematic gender dimensions identified in the literature review. This solution retained 35 of the 44 items subjected to EFA and explained 51% of the variance in item scores. For purposes of interpretability, we excluded all items with loadings below 0.40.

Factor one in this solution includes six items addressing perceived discrimination. Factor two consists of seven items capturing daily time spent on work and work strain-related characteristics. Factor three encompasses four items concerning perceived strain and time-use related to caregiving. Factor four includes five items capturing competitive and risk-taking behavior. Factor five encompasses three items concerning perceived social support. Factor six includes six items capturing empathy and expressive behavior. Finally, factor seven includes four questions about independence. The distribution of items on factors was consistent across alternative factor rotation methods (Tables S36-S39).

Confirmatory factor analysis

The CFA was carried out in SPSS AMOS Graphics 26 and based on Maximum Likelihood estimations. We allowed the factors to be correlated. The Likelihood-Ratio test (also known as the χ²-test) is highly sensitive to even small departures of the data from exact fit, especially in large-N samples. Moreover, χ² values increase with sample size and the number of variables in the model (69). Therefore, we followed Cheung and Rensvold (70) and Yuan and Bentler (71) and determined global model fit and invariance based on the approximate-fit statistics.

Specifically, we used the Tuckler Lewis Index (TLI), the root mean square error of approximation (RMSEA), and the standardized root mean square residual (SRMR), relying on conventional fit criteria (72). The 35 items and 7 factors retained in the EFA were submitted to CFA in samples 1, 2, and 3. Following Gerbing and Hamilton (73), we used CFA to refine the EFA solution identified in sample 1. Next, we cross-validated the outcomes of this CFA in samples 2 and 3. Configural invariance was examined separately in samples 2 and 3. We used Multiple-groups CFA to assess metric and scalar invariance across samples 2 and 3. We removed all observations with missing data for one or several of the 35 items subjected to CFA in sample 1, and for the 25 items subjected to CFA in samples 2 and 3. This slightly reduced sample 1 from n= 2,051 to n= 2,009, sample 2 from n= 2,135 to n= 2,054, and sample 3 from n=489 to n=449. One item concerning perceived gender discrimination in education had high rates of missing data in sample 3 (N>100). Hence, we restricted the cross-validation in sample 3, and the multiple-groups CFA in samples 2 and 3, to the remaining 24 variables. To avoid a Heywood estimate on the Social support factor, we followed Chen et al. (74) and constrained the error variance of one item (socsupchores) to 0.001 in samples 2 and 3.

The initial 35-item solution based on the EFA did not perform satisfactorily with respect to global model fit in sample 1 (χ²= 5897.38, df= 539, p= 0.00, TLI= 0.81, RMSEA= 0.07, SRMR= 0.07). To obtain acceptable model fit, we examined the factor loadings, removing all items with loadings ≤ 0.5. The “trimmed” solution, consisting of 25 items, exhibited good fit to the data (χ²= 1362.53, df= 254, p= 0.00, TLI= 0.95, RMSEA= 0.05, SRMR= 0.04) (Table S40). The cross-validation of this final model in samples 2 and 3, with no equality constraints, also exhibited reasonable fit to the data, indicating configural invariance (sample 2: χ²= 1440.7, df= 255, p= 0.00, TLI= 0.94, RMSEA= 0.05, SRMR= 0.04; sample 3: χ²= 496.5, df= 232, p= 0.00,TLI= 0.93, RMSEA=0.05, 3: SRMR=0.05) (Table S41).

A more restricted multiple-groups CFA with factor loadings assumed to be equal across samples 2 and 3 also supported metric invariance (χ²= 1899.849, df= 481, p= 0.00, TLI= 0.94, RMSEA= 0.03, SRMR= 0.04) (Table S42). Further restrictions with both factor loadings and intercepts assumed to be equal across samples 2 and 3 also indicated reasonable fit compared to prior models, supporting scalar invariance (χ²= 2167.051, df= 505, p= 0.00, TLI= 0.93, RMSEA= 0.04, SRMR= 0.04) (Table S43). As a sensitivity check, we also ran the multiple groups CFA in samples 2 and 3 with the item on perceived gender discrimination in education included (sample 2= 2,054; sample 3= 348) and obtained comparable model fit (metric invariance: χ²= 2149.723, df= 528, p= 0.00, TLI= 0.93, RMSEA= 0.04, SRMR= 0.04; scalar invariance: χ²= 2380.150, df= 553, p= 0.00, TLI= 0.93, RMSEA= 0.04, SRMR= 0.04) (Tables S44-S45).

Reliability

The reliabilities of the factors implied by the final CFA solution were assessed in all samples using Raykov’s ρ. ρ was computed using James Gaskin’s ‘Validity master tool’ (75), and following conventional criteria (76), we considered values >0.60 desirable.

Data Availability

Data and code needed to evaluate the conclusions are available here: https://osf.io/7yje9/

Data and materials availability

Data and code needed to evaluate the conclusions are available here: https://osf.io/7yje9/.

Supplementary Information for

Gender-Related Variables for Health Research

Supplementary Information Text

Materials

Questionnaire: The Stanford Gender-Related Variables for Health Research

Please specify your current employment status:

▢ Employed Full-time
▢ Employed Part-time
▢ Homemaker
▢ Retired
▢ Student
▢ Unemployed

People see themselves in different ways. There are no right or wrong answers to the following questions. We just want to know what’s true for you. Choose the answer that best describes you.

In general, how prepared are you to take risks?
- ◯ Not at all prepared
- ◯ Slightly prepared
- ◯ Moderately prepared
- ◯ Very prepared
- ◯ Completely prepared
How prepared are you to take risks….
View this table:
People see themselves in different ways. There are no right or wrong answers to the following questions. We just want to know what’s true for you. Choose the answer that best describes you.

i. How important is it for you to be independent?
- ◯ Extremely important
- ◯ Very important
- ◯ Moderately important
- ◯ Slightly important
- ◯ Not at all important

How important is it for you to solve your problems on your own?

◯ Extremely important
◯ Very important
◯ Moderately important
◯ Slightly important
◯ Not at all important

People see themselves in different ways. There are no right or wrong answers to the following questions. We just want to know what’s true for you. Choose the answer that best describes you.

iii. How often do friends talk to you about their problems?
- ◯ Never
- ◯ Once in a while
- ◯ Sometimes
- ◯ Most of the time
- ◯ Always
People see themselves in different ways. There are no right or wrong answers to the following questions. We just want to know what’s true for you. Choose the answer that best describes you.

How often do you talk to your friends about your problems?
- ◯ Never
- ◯ Once in a while
- ◯ Sometimes
- ◯ Most of the time
- ◯ Always
How easy is it for you to express what you are feeling to others?
- ◯ Not at all easy
- ◯ Slightly easy
- ◯ Moderately easy
- ◯ Very easy
- ◯ Extremely easy

Are you currently responsible for taking care of someone in need? By “taking care of someone in need” we mean providing unpaid assistance and support to someone who has physical or psychological needs, such as a child, elder, partner, or disabled family member.

◯ I am currently responsible for taking care of someone in need
◯ I have been responsible for taking care of someone in the past
◯ I have never been responsible for taking care of someone in need
◯ I am currently responsible for taking care of someone in need, and I have been responsible for taking care of someone in the past

Display This Question

If Are you currently responsible for taking care of someone in need? By “taking care of someone in n… != I have never been responsible for taking care of someone in need

In the past year, how often did you feel emotionally exhausted because of your caretaking responsibilities?

◯ Never
◯ Once in a while
◯ Sometimes
◯ Most of the time
◯ Always

In the past year, how often did you feel physically exhausted because of your caretaking responsibilities?

◯ Never
◯ Once in a while
◯ Sometimes
◯ Most of the time
◯ Always

In the past year, how often have your caretaking responsibilities caused you to worry about the future?

◯ Never
◯ Once in a while
◯ Sometimes
◯ Most of the time
◯ Always

We are interested in how you spend your time on an average weekday, Monday through Friday. Please provide your best estimate. On average, how many hours per weekday do you spend on the following.

Work (paid work, studying, internships, etc.):____________

Taking care of someone in need (caring for children, elders, partners in need, etc.):____________

We are interested in how you feel about your current job, including your daily work activities as an employee or student. For each of the following questions, select the answer that best describes your work activities. If you have several jobs, please think about the job that you spend most hours doing per week.

How often does your job require working fast?
- ◯ Never
- ◯ Once in a while
- ◯ Sometimes
- ◯ Most of the time
- ◯ Always
How often does your job involve repetitive tasks?
- ◯ Never
- ◯ Once in a while
- ◯ Sometimes
- ◯ Most of the time
- ◯ Always
How often do you feel emotionally exhausted from your work activities?
- ◯ Never
- ◯ Once in a while
- ◯ Sometimes
- ◯ Most of the time
- ◯ Always
How often do you feel physically exhausted from your work activities?
- ◯ Never
- ◯ Once in a while
- ◯ Sometimes
- ◯ Most of the time
- ◯ Always
People sometimes look to others for companionship, assistance, or other types of physical or emotional support. The following questions ask you about the support available to you when you need it. Choose the answer that best describes your situation.

In the past year, how often did you have someone…
View this table:
The following questions ask about how often you have felt discriminated against because of your gender. You may not be certain about your answers to these questions, but we would like you to choose the answer that best describes your experience.

Because of your gender, how often have you felt discriminated against?
- ◯ Never
- ◯ Once in a while
- ◯ Sometimes
- ◯ Most of the time
- ◯ Always
Because of your gender how often have you felt discriminated against…
View this table:

What was your birth sex?

◯ Male
◯ Female
◯ Intersex
◯ Other: Please specify:_____________________________
◯ Prefer not to state

What is your gender? Please select all that apply.

▢ Man
▢ Woman
▢ Gender fluid/Non-binary
▢ Other. Please specify:_____________________________
▢ Prefer not to state

In what year were you born?

▾ 2000 … 1900

Which categories describe you? (You may choose more than one.)

▢ White
▢ Hispanic, Latino, or Spanish
▢ Black or African American
▢ Asian
▢ Native American or Alaska Native
▢ Middle Eastern or North African
▢ Native Hawaiian or Pacific Islander
▢ Other

What was your income last calendar year?

Please combine all incomes. “Incomes” include wages, salaries, small business earnings, social security, armed forces pay, special cash bonuses and subsistence allowances.

◯ Less than $10,000
◯ $10,000 - $19,999
◯ $20,000 - $29,999
◯ $30,000 - $39,999
◯ $40,000 - $49,999
◯ $50,000 - $59,999
◯ $60,000 - $69,999
◯ $70,000 - $79,999
◯ $80,000 - $89,999
◯ $90,000 - $99,999
◯ $100,000 - $149,999
◯ $150,000 - $199,999
◯ More than $200,000
◯ Prefer not to say

What is the highest degree or level of school you have completed? If currently enrolled, please report the highest degree received.

◯ No schooling completed
◯ Preschool to 8th grade
◯ Some high school, no diploma
◯ High school graduate, diploma or equivalent (GED)
◯ Some college credit, no degree
◯ Trade/technical/vocational training
◯ Associate degree
◯ Bachelor’s degree
◯ Master’s degree
◯ Professional degree
◯ Doctorate degree
What is your height?
◯ Feet_________________________________________
◯ Inches_________________________________________
What is your weight (in lbs)?
How many cigarettes do you smoke per day? Not including e-cigarettes.
◯ Average #:_________________________________________
How many times do you vape or use an e-cigarette product per day?
________________________________________

Display This Question:

If What was your birth sex? = Male

In the past 3 months, how often have you had 5 or more drinks on one occasion?

◯ Never
◯ Less than monthly
◯ Monthly
◯ Weekly
◯ Daily or almost daily

Display This Question

If What was your birth sex? = Female

In the past 3 months, how often have you had 4 or more drinks on one occasion?

◯ Never
◯ Less than monthly
◯ Monthly
◯ Weekly
◯ Daily or almost daily

In general, would you say your health is…

◯ Excellent
◯ Very good
◯ Good
◯ Fair
◯ Poor

Methods: Documentation of Literature survey

Here we document the systematic literature search that guided our development of the gender variables. The description of the literature survey is divided into five steps. Step 1 specifies the test-specific literature searches in PsycInfo, PsycNet, and PubMed with respect to database specific search limitations, search terms and selection criteria. Step 2 lists all of the gender measures derived from the articles identified through the searches in PsycInfo, PsycNet and PubMed. In Step 3, we sort the identified gender measures into three overarching categories for analyzing gender (norms, identity and relations), and use citation frequencies to determine the prevalence and use of each scale in the literature. In Step 4, we identify the core traits and characteristics of relevance to each of the three categories of gender-related traits, gender norms, and gender relations. Finally, Step 5 documents the search strategy used to identify additional construct specific measures of relevance to our gender variables.

Step 1: Test-specific literature search in PsycINFO, PsycNET, and PubMed

This section specifies our search strategy. We carried out systematic searches in PsychINFO, PsycNET, and PubMED. We restricted the time span to the period from January 1975 through 2015.

Search limitations

We used the following database-specific search limitations:

PsycINFO

Document type: Research literature in peer reviewed scholarly journals; Language: English; Population: Human; Classification: “Personality Scales & Inventories”; “Tests & Testing”.

PsycNet

Age-group: Adulthood (18 years or older); Document type: Abstract collection, Peer-reviewed articles; PsycNET Classifiers: “Tests and testing”; “Personality Scales & Inventories”; “Clinical Psychological Testing”; “Health Psychology Testing”; “Educational measurement”; “Occupational and employment testing”; “Consumer opinion and attitudes”.

PUBMED

Ages: all ages; Species: Humans; Language: English; Article types: all.

The search was narrowed to the following gender/sex related MeSH-terms: “gender identity”; “sex”; “sexism”; “interpersonal relations”; “female”, “male”. We excluded: “sexual and gender disorders”; “gender dysphoria”; “sex reassignment surgery”; “sex reassignment procedures.
We narrowed our search to the following “test-related” MeSH terms: “psychological tests”; “neuropsychological tests”; “personality tests”; “psychometrics”; “pain measurement”; ‘behavior rating scale”; “thematic apperception test”; “test anxiety scale”; “personal construct theory”; “Wechsler scales”; “personality assessment”.

Search strings

We used the following search strings in the three databases:

PsycINFO

(“Gender Identity” OR “Gender Gap” OR “Gender Equality” OR “Sex Role Attitudes” OR “Sex Roles”)

PsycNet

(“Gender” OR “Sex”) NOT (“sexual”)

PubMed

(Psychological tests[mh] OR Neuropsychological tests[mh] OR Personality tests[mh] OR Psychometrics[mh] OR Pain measurement[mh] OR Behavior rating scale[mh] OR Thematic Apperception Test[mh] OR Test anxiety scale[mh] OR Personal Construct Theory[mh] OR Wechsler Scales[mh] OR Personality Assessment[mh]) AND (Sex[mh] OR Gender identity[mh] OR Interpersonal relations[mh] OR Sexism[mh] OR women[mh] OR men[mh] or female [mh] OR male [mh]) AND (scale*[ti] OR measure*[ti] OR compos*[ti] OR inventor*[ti] OR index*[ti] OR indic*[ti] OR score*[ti] OR psychomet*[ti] OR instrument*[ti] OR batter*[ti] OR assess*[ti] OR test*[ti] OR rating*[ti]) AND (Gender*[ti] OR sex[ti] OR sexes[ti] OR mascu*[ti] OR femin*[ti] OR male[ti] OR males[ti] OR female*[ti] OR man[ti] OR woman[ti] OR men[ti]] OR women[ti] OR androgy*[ti] OR boy[ti] OR girl[ti] OR boys[ti] OR girls[ti]).

The literature on gender- and sex-related scales is extremely extensive and for reasons of feasibility, it will not be possible to capture everything through a broad search in PubMed.

An important caveat related to our selected search string in PubMed is the emphasis on gender/ sex related search terms in abstract titles. To provide a concrete example: Our search string may not capture potentially relevant scales that do not include the following terms in their titles: (Gender*[ti] OR sex[ti] OR sexes[ti] OR mascu*[ti] OR femin*[ti] OR male[ti] OR males[ti] OR female*[ti] OR man[ti] OR woman[ti] OR men[ti]] OR women[ti] OR androgy*[ti] OR boy[ti] OR girl[ti] OR boys[ti] OR girls[ti]). However, for influential scales we will likely capture several follow up studies adapting or testing the validity of these scales. Test of coverage is captured in a later step.

The search strings presented above returned 519 studies in PsycINFO, 432 studies in PsycNET and 2030 studies in PubMed (Fig. S1).

Selection criteria

We read through abstract and titles and used the following selection criteria to identify relevant publications:

All studies with scales or tests measuring aspects of gender are eligible, regardless of purpose of use in article (+); articles using non-gender related measures to illuminate gender differences on a given trait or topic (-); articles focusing on children or adolescents (17 years old or younger) (-); articles capturing scales measuring issues of gender norms or relations (e.g. child-rearing, romantic relationships, division of domestic labor) (+); articles capturing scales concerning sexuality, sexual health, sexual violence, domestic violence and issues of gender- and sex development in childhood and adolescence (-) and body image (-).

Eligble publications

Using these, we identified 405 eligible publications: 315 in PsycINFO, 65 in PsychNET and 89 in PubMed. Of these, 64 publications were dublicates or triplicates (Fig. S1)

Step 2: Identifying measures of gender

As a second step in the review process, we searched through all of the 405 eligible references to identify relevant gender measures employed in each paper. We derived 127 unique gender measures through this approach. Here, we list all of the 127 gender measures and the publications in which they were used. Four additional gender measures were identified through literature reviews published in books. These additional measures are reported at the bottom of this section. The gender measures are listed in alphabetized order. Adherence to Extreme Gender Role Beliefs:

Step 3: Sorting and determining the impact of gender measures

As a third step in the review process, we sorted the 131 gender measures into three overarching categories for analyzing gender (norms, identity and relations). Given the large number of gender measures, we decided to use citation frequencies to determine the prevalence and use of each measure in the literature. All measures receiving at least 20 citations in Google Scholar within the last ten years were selected for further investigation, of which 74 scales met the criteria. All articles published from 2006-2015 were retained for further investigation. All articles published in 2005 or earlier, with less than 20 citations in Google Scholar were assorted into the “Ineligible Scales” category. This category also comprises measures that we found irrelevant after reading through abstract and full text. The scales assorted into each category (norms, identity and relations) are ranked based on their citation impact. Ineligible scales are listed at the bottom of this section.

Gender norms (N=3)

Coparenting and Family Rating System

Original scale: McHale, J. P., Kuersten-Hogan, R., & Lauretti, A. (2000). Evaluating coparenting and family-level dynamics during infancy and early childhood: The Coparenting and Family Rating System. Family observational coding systems: Resources for systemic research, 151-170.

Web of Science (2006->): Missing

Google scholar (2006->): 57

Parental Child-rearing Behaviour Scale

Original scale: Meunier, J. C., & Roskam, I. (2007). Psychometric properties of a parental childrearing behavior scale for French-speaking parents, children, and adolescents. European Journal of Psychological Assessment, 23(2), 113-124.

Web of Science (2006->): 17 (full database), 3 (health-related subject categories)

Google scholar (2006->): 29

The Orientation towards Domestic Labor

Original scale: Hawkins, A. J., Marshall, C. M., & Allen, S. M. (1998). The Orientation Toward Domestic Labor Questionnaire: Exploring dual-earner wives’ sense of fairness about family work. Journal of Family Psychology, 12(2), 244.

Web of Science (2006->): 7 (full database)

Google scholar (2006->): 28

Gender-related traits (N=31)

Bem Sex Role Inventory

Original scale: Bem, S. L. (1974). The measurement of psychological androgyny. Journal of consulting and clinical psychology, 42, 155-162 Web of Science (2006->): 802 (full database), 100 (health research) Google scholar (2006->): 3670

Minnesota Multiphasic Personality Inventory (broader, including masc. and fem.)

Original scale: Hathaway, S. R., & McKinley, J. C. (1951). Minnesota Multiphasic Personality Inventory; manual (Revised).

Web of Science (2006->): Missing

Google scholar (2006->): 937

Conformity to Masculine Norms Inventory

Original scale: Mahalik, J. R., Locke, B. D., Ludlow, L. H., Diemer, M. A., Scott, R. P., Gottfried, M., & Freitas, G. (2003). Development of the Conformity to Masculine Norms Inventory. Psychology of Men & Masculinity, 4(1), 3.

Web of Science (2006->) Missing

Google scholar (2006->): 676

Gender Role Conflict Scale

Original scale: O’Neil, J. M., Helms, B. J., Gable, R. K., David, L., & Wrightsman, L. S. (1986). Gender-Role Conflict Scale: College men’s fear of femininity. Sex roles,14(5-6), 335-350.

Web of Science (2006->): 185 (full database), 20 (health research)

Google scholar (2006->): 576

Personal Attributes Questionnaire

Original scale: Spence, J. T., Helmreich, R. L., & Stapp, J. (1974). The Personal Attributes Questionnaire: A measure of sex role stereotypes and masculinity-femininity.

Web of Science (2006->): Missing

Google scholar (2006->): 428

Male Role Norms Scale

Original scale: Thompson, E. H., & Pleck, J. H. (1986). The structure of male role norms. The American Behavioral Scientist, 29(5), 531.

Web of Science (2006->): 142 (full database) 15 (health)

Google scholar (2006->): 368

Masculinity Ideology

Original scale: Pleck, J. H., Sonenstein, F. L., & Ku, L. C. (1993). Masculinity ideology: Its impact on adolescent males’ heterosexual relationships. Journal of Social issues, 49(3), 11-29.

Web of Science (2006->): 131 (full database), 69 (health research)

Google scholar (2006->): 312

Californian Psychological Inventory

Original scale: Gough, H. G. (1975). Manual for the California psychological inventory. Consulting Psychologists Press.

Web of Science (2006->): Missing

Google scholar (2006->): 314

Masculine Gender Role Stress

Original scale: Eisler, R. M., & Skidmore, J. R. (1987). Masculine gender role stress scale development and component factors in the appraisal of stressful situations. Behavior Modification, 11(2), 123-136.

Web of Science (2006->): 100 (full database), 19 (health)

Google scholar (2006->): 301

Traditional Machismo and Caballerismo Scale

Original scale: Arciniega, G. M., Anderson, T. C., Tovar-Blank, Z. G., & Tracey, T. J. (2008). Toward a fuller conception of Machismo: Development of a traditional Machismo and Caballerismo Scale. Journal of Counseling Psychology, 55(1), 19.

Web of Science (2006->): 100 (full database), 33 (health research)

Google scholar (2006->): 276

Conformity to Feminine Norms Inventory

Original scale: Mahalik, J. R., Morray, E. B., Coonerty-Femiano, A., Ludlow, L. H., Slattery, S. M., & Smiler, A. (2005). Development of the conformity to feminine norms inventory. Sex Roles, 52(7-8), 417-435.

Web of Science (2006->): 82 (full database), 17 (health research)

Google scholar (2006->): 241

Male Role Norms Inventory

Original scale: Levant, R. F., Hirsch, L. S., Celentano, E., & Cozza, T. M. (1992). The male role: An investigation of contemporary norms. Journal of Mental Health Counseling.

Web of Science (2006->): Missing

Google scholar (2006->): 191

Hypermasculinity Inventory

Original scale: Mosher, D. L., & Sirkin, M. (1984). Measuring a macho personality constellation. Journal of Research in Personality, 18(2), 150-163.

Web of Science (2006=>): 83 (full database), 4 (health research)

Citations Google (2006=>): 251

Brannon Masculinity Scale

Original scale: Brannon, R., & Juni, S. (1984). A scale for measuring attitudes about masculinity. Psychological Documents, 14.

Web of Science (2006->): 2 (full database)

Google scholar (2006->): 141

Feminine Gender Role Stress

Original scale: Gillespie, B. L., & Eisler, R. M. (1992). Development of the feminine gender role stress scale: A cognitive-behavioral measure of stress, appraisal, and coping for women. Behavior Modification, 16(3), 426-438.

Web of Science (2006->): 31 (full database)

Google scholar (2006->): 78

Hyperfemininity

Original scale: Murnen, S. K., & Byrne, D. (1991). Hyperfeminity: Measurement and Initial Validation of the Construct. The Journal of Sex Research, 479-489.

Web of Science (2006->): 31 (full database) 4 (health)

Google scholar (2006->): 76

Adjective Check List (Masculinity and Femininity scales)

Original scale: Zuckerman, M. (1960). The development of an affect adjective check list for the measurement of anxiety. Journal of Consulting Psychology, 24(5), 457.

Web of Science (2006->): 14 (full database)

Google scholar (2006->): 72

Normative Male Alexithymia Scale

Original scale: Levant, R. F., Good, G. E., Cook, S. W., O’Neil, J. M., Smalley, K. B., Owen, K., & Richmond, K. (2006). The normative Male Alexithymia Scale: Measurement of a gender-linked syndrome. Psychology of Men & Masculinity, 7(4), 212.

Web of Science (2006->): Missing

Google scholar (2006->): 65

Japanese Gender Role Index

Original scale: Sugihara, Y., & Katsurada, E. (2002). Gender role development in Japanese culture: Diminishing gender role differences in a contemporary society. Sex roles, 47(9-10), 443-452.

Web of Science (2006->): 13 (full database) 2 (health)

Google scholar (2006->): 44

Femininity Ideology Scale

Original scale: Levant, R., Richmond, K., Cook, S., House, A. T., & Aupont, M. (2007). The femininity ideology scale: Factor structure, reliability, convergent and discriminant validity, and social contextual variation. Sex Roles, 57, 373–383.

Web of Science (2006=>): 15 (full database), 3 (health research)

Google scholar (2006=>): 42

Gender Role Journey Measure

Original scale: O’Neil, J. M., Egan, J., Owen, S. V., & Murry, V. M. (1993). The gender role journey measure: Scale development and psychometric evaluation. Sex Roles,28(3-4), 167-185.

Web of science (2006=>): 10 (full database), 1 (health research)

Google scholar (2006=>): 43

Auburn Differential Masculinity Inventory

Original scale: Burk, L. R., Burkhart, B. R., & Sikorski, J. F. (2004). Construction and Preliminary Validation of the Auburn Differential Masculinity Inventory. Psychology of Men & Masculinity, 5(1), 4.

Web of Science (2006->): Missing

Google scholar (2006=>): 36

PRF Andro Scale (Personality Research Form)

Original scale: Berzins, J. I., Welling, M. A., & Wetter, R. E. (1978). A new measure of psychological androgyny based on the Personality Research Form. Journal of consulting and clinical psychology, 46(1), 126.

Web of Science (2006->): 8 (full database), 1 (health research)

Google scholar (2006->): 33

Sex-Role Identity Scale

Original scale: Storms, M. D. (1979). Sex role identity and its relationships to sex role attributes and sex role stereotypes. Journal of Personality and Social Psychology, 37(10), 1779.

Web of science (2006=>): 15 (full database), 2 (health research)

Google scholar (2006=>): 27

New Masculine Gender Role Discrepancy

Original scale: Rummell, C. M., & Levant, R. F. (2014). Masculine gender role discrepancy strain and self-esteem. Psychology of Men & Masculinity, 15(4), 419.

Too new for citations

Russian Male Norms Inventory

Original scale: Janey, B. A., Kim, T., Jampolskaja, T., Khuda, A., Larionov, A., Maksimenko, A., … & Shipilova, A. (2013). Development of the Russian Male Norms Inventory. Psychology of Men & Masculinity, 14(2), 138.

Too new for citations

Measurement of Men’s Perceived Inexpressiveness

Original scale: Wong, Y. J., Horn, A. J., Gomory, A. M., & Ramos, E. (2013). Measure of Men’s Perceived Inexpressiveness Norms (M2PIN): Scale development and psychometric properties. Psychology of Men & Masculinity, 14(3), 288.

Too new for citations

Positive Negative Sex-role Inventory

Original scale: Berger, A., & Krahé, B. (2013). Negative attributes are gendered too: Conceptualizing and measuring positive and negative facets of sex-role identity. European Journal of Social Psychology, 43(6), 516-531.

Too new for citations

Social Roles Questionnaire

Original scale: Baber, K. M., & Tucker, C. J. (2006). The social roles questionnaire: A new approach to measuring attitudes toward gender. Sex Roles, 54(7-8), 459-467.

Too new for citations

Indian Gender role identity

Basu, J. (2010). Development of the Indian gender role identity scale. Journal of the Indian Academy of Applied Psychology, 36(1), 25-34.

Too new for citations

Inventory of Subjective Masculinity

Wong, Y. J., Shea, M., Lafollette, J. R., Hickman, S. J., Cruz, N., & Boghokian, T. (2011). The inventory of subjective masculinity experiences: Development and psychometric properties. The Journal of Men’s Studies,19(3), 236-255.

Too new for citations

Gender Relations (N=40)

The Ambivalent Sexism Inventory

Original scale: Glick, P., & Fiske, S. T. (1996). The ambivalent sexism inventory: Differentiating hostile and benevolent sexism. Journal of personality and social psychology, 70(3), 491.

Web of Science (2006=>): 792 (full database), 41 (health research)

Google scholar (2006=>): 1960

Modern and Old-fashioned Sexism

Original scale: Swim, J. K., Aikin, K. J., Hall, W. S., & Hunter, B. A. (1995). Sexism and racism: Old-fashioned and modern prejudices. Journal of Personality and Social Psychology, 68, 199–214 Web of Science (2006=>):306 (full database), 12 (health research)

Google scholar (2006=>): 801

Dyadic Adjustment Scale across Gender

Original scale: Christensen, A., & Heavey, C. L. (1990). Gender and social structure in the demand/withdraw pattern of marital conflict. Journal of personality and social psychology, 59(1), 73.

Web of science (2006=>): 193 (full database), 10 (health research)

Google scholar (2006=>): 488

Multidimensional Measure of Sexual Minority Identity

Original scale: Mohr, J., & Fassinger, R. (2000). Measuring dimensions of lesbian and gay male experience. Measurement and Evaluation in Counseling and Development, 33(2), 66-66.

Web of science (2006=>): 123 (full database), 32 (health research)

Google scholar (2006=>): 346

Relationship Assessment Scale

Original scale: Hendrick, S. S., Dicke, A., & Hendrick, C. (1998). The relationship assessment scale. Journal of Social and Personal Relationships, 15(1), 137-142.

Web of science (2006=>): 124 (full database), 34 (health research)

Google scholar (2006=>): 337

Neosexism Scale

Original scale: Tougas, F., Brown, R., Beaton, A. M., & Joly, S. (1995). Neosexism: Plus ça change, plus c’est pareil. Personality and Social Psychology Bulletin, 21(8), 842-849.

Web of science (2006=>): 111 (full database), 1 (health research)

Google scholar (2006=>): 308

Attitudes toward Women Scale

Original scale: Spence, J. T., Helmreich, R., & Stapp, J. (1973). A short version of the Attitudes toward Women Scale (AWS). Bulletin of the Psychonomic Society,2(4), 219-220.

Web of Science (2006->): 104 (full database), 41 (health research)

Google scholar (2006->): 289

Routine and Strategic Relational Maintenance Scale

Stafford, L., Dainton, M., & Haas, S. (2000). Measuring routine and strategic relational maintenance: Scale revision, sex versus gender roles, and the prediction of relational characteristics. Communications Monographs, 67(3), 306-323.

Web of Science (2006->): Missing

Google scholar (2006->): 211

Sex-Role Stereotypes

Original scale: Rosenkrantz, P., Vogel, S., Bee, H., Broverman, I., & Broverman, D. M. (1968). Sex-role stereotypes and self-concepts in college students. Journal of consulting and clinical psychology, 32(3), 287.

Web of Science (2006->): 43 (full database),

Google scholar (2006->): 210

Relationship Belief Inventory

Original scale: Eidelson, R. J., & Epstein, N. (1982). Cognition and relationship maladjustment: Development of a measure of dysfunctional relationship beliefs. Journal of consulting and clinical psychology, 50(5), 715.

Web of science (2006=>): 46 (full database), 1 (health research)

Google scholar (2006=>): 196

The Romantic Beliefs Scale

Original scale: Sprecher, S., & Metts, S. (1989). Development of the Romantic Beliefs Scale and examination of the effects of gender and gender-role orientation.Journal of Social and Personal Relationships, 6(4), 387-411.

Web of science (2006=>): 26 (full database), 2 (health research)

Google scholar (2006=>): 133

Relationship Authenticity

Original scale: Lopez, F. G., & Rice, K. G. (2006). Preliminary development and validation of a measure of relationship authenticity. Journal of Counseling Psychology, 53(3), 362.

Web of science (2006=>): 53 (full database), 5 (health research)

Google scholar (2006=>): 123

Sex-Role Egalitarianism Scale

Original scale: Beere, C. A., King, D. W., Beere, D. B., & King, L. A. (1984). The Sex-Role Egalitarianism Scale: A measure of attitudes toward equality between the sexes. Sex Roles, 10(7-8), 563-576.

Web of science (2006=>): 41 (full database), 5 (health research)

Google scholar (2006=>): 122

Quick Discrimination Index

Original scale: Ponterotto, J. G., Burkard, A., Rieger, B. P., Grieger, I., D’Onofrio, A., Dubuisson, A., Heenehan, M., Millstein, B., Parisi, M., Rath, J. F., & Sax, G. (1995). Development and initial validation of the Quick Discrimination Index (QDI). Educational and Psychological Measurement, 55, 1026-1031.

Web of science (2006=>): 31 (full database), 2 (health research)

Google scholar (2006=>): 110

Attitudes toward Traditional and Egalitarian Sex-Roles

Original scale: Larsen, K. S., & Long, E. (1988). Attitudes toward sex-roles: Traditional or egalitarian?. Sex Roles, 19(1-2), 1-12.

Web of science (2006=>): 43 (full database), 2 (health research)

Google scholar (2006=>): 110

Mutual Psychological Development Questionnaire

Original scale: Genero, N. P., Miller, J. B., Surrey, J., & Baldwin, L. M. (1992). Measuring perceived mutuality in close relationships: Validation of the Mutual Psychological Development Questionnaire. Journal of Family Psychology, 6(1), 36.

Web of science (2006=>): Missing

Google scholar (2006=>): 103

Swedish Classical and Modern Sexism Scales

Original scale: Ekehammar, B., Akrami, N., & Araya, T. (2000). Development and validation of Swedish classical and modern sexism scales. Scandinavian journal of psychology, 41(4), 307-314. Web of science (2006=>): 19 (full database), 3 (health research)

Google scholar (2006=>): 65

Sex-Role Ideology

Original scale: Kalin, R., & Tilby, P. J. (1978). Development and validation of a sex-role ideology scale. Psychological reports, 42(3), 731-738.

Web of Science (2006->): 19 (full database), 2 (health research)

Google scholar (2006->): 59

Quality in Relationships Inventory within Couples

Original scale: Pierce, G. R. (1994). The Quality of Relationships Inventory: Assessing the interpersonal context of social support.

Web of science (2006=>): Missing

Google scholar (2006=>): 52

Partner Behaviour Inventory

Original scale: Doss, B. D., & Christensen, A. (2006). Acceptance in romantic relationships: the frequency and acceptability of partner behavior inventory. Psychological Assessment, 18(3), 289. Web of science (2006=>): 13 (full database), 4 (health research)

Google scholar (2006=>): 45

Managing Affect and Differences Scale

Original scale: Arellano, C. M., & Markman, H. J. (1995). The Managing Affect and Differences Scale (MADS): A self-report measure assessing conflict management in couples. Journal of Family Psychology, 9(3), 319

Web of science (2006=>): 8 (full database)

Google scholar (2006=>): 33

Women as Managers Scale

Original scale: Peters, L. H., Terborg, J. R., & Taynor, J. (1974). Women as Managers Scale:(WAMS): a Measure of Attitudes Toward Women in Management Positions. Journal Supplement Abstract Service of the American Psychological Association.

Web of science (2006=>): Missing

Google scholar (2006=>): 54

Gender Role Egalitarian Attitudes

Original Scale: Chang, L. (1999). Gender role egalitarian attitudes in Beijing, Hong Kong, Florida, and Michigan. Journal of Cross-Cultural Psychology, 30(6), 722-741.

Web of Science (2006->): 22 (full database) 1 (health)

Google scholar (2006->): 54

Adherence to Extreme Gender Role Beliefs

Original scale: Hamburger, M. E., Hogben, M., McGowan, S., & Dawson, L. J. (1996). Assessing Hypergender Ideologies: Development and Initial Validation of a Gender-Neutral Measure of Adherence to Extreme Gender-Role Beliefs. Journal of Research in Personality, 30(2), 157-178.

Web of Science (2006->): 23 (full database) 3 (health)

Google scholar (2006->): 45

Gender Attitude Inventory

Original scale: Ashmore, R. D., Del Boca, F. K., & Bilder, S. M. (1995). Construction and validation of the Gender Attitude Inventory, a structured inventory to assess multiple dimensions of gender attitudes. Sex roles, 32(11-12), 753-785.

Web of Science (2006->): 15 (full database)

Google scholar (2006->): 45

Sex-Role Orientation

Original scale: Brogan, D., & Kutner, N. G. (1976). Measuring sex-role orientation: A normative approach. Journal of Marriage and the Family, 31-40.

Web of science (2006=>): 12 (full database) 4 (health)

Google scholar (2006=>): 44

Gender ideology as an identity

Original scale: Kroska, A. (2000). Conceptualizing and measuring gender ideology as an identity. Gender & Society, 14(3), 368-394.

Web of Science (2006->): 12 (full database) 2 (health)

Google scholar (2006->): 36

Gender Role Beliefs Scale

Original scale: Kerr, P. S., & Holden, R. R. (1996). Development of the gender role beliefs scale (GRBS). Journal of Social Behavior and Personality, 11(5), 3.

Web of Science (2006->): 9 (full database)

Google scholar (2006->): 34

Male Attitude Norms Inventory II

Original scale: Luyt, R. (2005). The Male Attitude Norms Inventory-II A Measure of Masculinity Ideology in South Africa. Men and Masculinities, 8(2), 208-229.

Web of science (2006=>): Missing

Google scholar (2006=>): 36

Sexist Attitudes toward Women

Original scale: Benson, P. L., Institute, S., & Vincent, S. (1980). Development and validation of the sexist attitudes toward women stale (SATWS). Psychology of Women Quarterly, 5(2), 276-291. Web of science (2006=>): 8 (full database), 4 (health)

Google scholar (2006=>): 31

Race and Gender-Specific Stress Measure for African-American Women

Original scale: Jackson, F. M. (2005). The Development of a Race and Gender-Specific Stress Measure for African-American Women: Jackson, Hogue, Phillips Contextualized Stress Measure. Ethnicity & disease, 15(4), 594-600.

Web of science (2006=>): 16 (full database), 10 (health research)

Google scholar (2006=>): 26

Egalitarian Sex Role Attitudes

Suzuki, A. (1991). Egalitarian sex role attitudes: Scale development and comparison of American and Japanese women. Sex Roles, 24(5-6), 245-259.

Web of science (2006=>): 15 (full database), 4 (health research)

Google scholar (2006=>): 26

Schedule of Sexist Events

Original scale: Yoder, J. D., & McDonald, T. W. (1998). Measuring sexist discrimination in the workplace: Support for the validity of the schedule of sexist events. Psychology of Women Quarterly, 22, 487–491.

Google scholar (2006=>): 22

Web of Science (2006=>): 8 (full database) 1 (health research)

German Relationship Assessment Scale

Original scale: Dinkel, A., & Balck, F. (2005). An evaluation of the German relationship assessment scale. Swiss Journal of Psychology, 64(4), 259-263.

Web of science (2006=>): 6 (full database), 1 (health research)

Google scholar (2006=>): 22

Concord-Index (measure of family relationships)

Original scale: Lee, P. H., Stewart, S. M., Lun, V., Bond, M. H., Yu, X., & Lam, T. H. (2012). Validating the concord index as a measure of family relationships in China.Journal of Family Psychology, 26(6), 906.

Too new for citations

Conflict Disengagement Inventory (CDI)

Original scale: Sanford, K. (2014). A latent change score model of conflict resolution in couples Are negative behaviors bad, benign, or beneficial?. Journal of Social and Personal Relationships, 0265407513518156.

Too new for citations

Gender Minority Stress and Resilience

Original scale: Testa, R. J., Habarth, J., Peta, J., Balsam, K., & Bockting, W. (2015). Development of the Gender Minority Stress and Resilience Measure. Psychology of Sexual Orientation and Gender Diversity, 2(1), 65.

Too new for citations

Gender-Based Attitudes Toward Marriage and Child Rearing

Original scale: Adams, M., Coltrane, S., & Parke, R. D. (2007). Cross-ethnic applicability of the gender-based attitudes toward marriage and child rearing scales. Sex roles,56(5-6), 325-339.

Too new for citations

Patriarchal Beliefs Scale

Original scale: Yoon, E., Adams, K., Hogge, I., Bruner, J. P., Surya, S., & Bryant, F. B. (2015). Development and validation of the Patriarchal Beliefs Scale. Journal of counseling psychology, 62(2), 264.

Too new for citations

African American Men’s Gendered Racism Stress Inventory

Original scale: Schwing, A. E., Wong, Y. J., & Fann, M. D. (2013). Development and validation of the African American Men’s Gendered Racism Stress Inventory. Psychology of Men & Masculinity, 14(1), 16.

Too new for citations

Ineligible Scales (N=57)

NEO Personality Inventory (broader scale)

Original scale: Costa, P. T., & McCrae, R. R. (1985). The NEO personality inventory: Manual, form S and form R. Psychological Assessment Resources.

Web of Science (2006->): Missing

Google scholar (2006->): 2200

Justification: Doesn’t measure gender.

Conformity to Traditional Masculinity and Relationship Satisfaction

Original scale: Burn, S. M., & Ward, A. Z. (2005). Men’s Conformity to Traditional Masculinity and Relationship Satisfaction. Psychology of Men & Masculinity, 6(4), 254.

Web of Science (2006->): Missing

Google scholar (2006->): 99

Justification: Its traits and dimensions are based on the Conformity to Masculine Norms scale. This scale is already included.

Gender Diagnosticity

Original scale: Lippa, R., & Connelly, S. (1990). Gender diagnosticity: A new Bayesian approach to gender-related individual differences. Journal of Personality and Social Psychology, 59(5), 1051. Web of Science (2006->): 44 (full database), 5 (health research)

Google scholar (2006->): 61

Justification: Measure used to predict not measure gender

Gender Differences in Sexual Attitudes

Original scale: Hendrick, S., Hendrick, C., Slapion-Foote, M. J., & Foote, F. H. (1985). Gender differences in sexual attitudes. Journal of Personality and Social Psychology,48(6), 1630.

Web of Science (2006->): 21 (full database), 9 (health research)

Google scholar (2006->):61

Justification: Focuses on sexuality, not gender.

Masculine Gender Identity In females

Original scale: Blanchard, R., & Freund, K. (1983). Measuring masculine gender identity in females. Journal of Consulting and Clinical Psychology, 51(2), 205.

Web of Science (2006->): 9 (full database) 5 (health)

Google scholar (2006->): 21

Justification: Specifically designed to measure transgender behavior. Items (behaviors and sentiments) do not capture gender related traits and characteristics.

Male Assessment of Self-Objectification

Original scale: Daniel, S., Bridges, S. K., & Martens, M. P. (2014). The development and validation of the Male Assessment of Self-Objectification (MASO). Psychology of Men & Masculinity, 15(1), 78.

Too new for citations

Justification: Measures Body-related aspects of gender (not deemed relevant for our measure)

Masculine Body Ideal Distress Scale

Original scale: Kimmel, S. B., & Mahalik, J. R. (2004). Measuring masculine body ideal distress: Development of a measure. International Journal of Men’s Health,3(1), 1.

Too new for citations

Justification: Measures Body-related aspects of gender (not deemed relevant for our measure)

Subjective Masculinity stress

Original scale: Wong, Y. J., Shea, M., Hickman, S. J., LaFollette, J. R., Cruz, N., & Boghokian, T. (2013). The Subjective Masculinity Stress Scale: Scale development and psychometric properties. Psychology of Men & Masculinity,14(2), 148.

Too new for citations

Justification: Does not rely on predefined gender characteristics and traits. Respondents define their own “gender stressors” and hereafter rate them.

Men’s Perceived Inexpressiveness Norms

Too new for citations

Justification: Measures expressivity in men, not gender.

Reference Group Identity Dependence Scale

Original scale: Wade, J. C., & Gelso, C. J. (1998). Reference Group Identity Dependence Scale A Measure of Male Identity. The Counseling Psychologist, 26(3), 384-412.

Web of Science (2006->): 11 (full database), 1 (health research)

Google scholar (2006->): 28

Justification: Does not address gender traits and characteristics

A Gender-Based Measurement Invariance Study of the Sociocultural Attitudes toward Appearance Questionnaire

Original scale

Wheeler, D. L., Vassar, M., & Hale, W. D. (2011). A gender-based measurement invariance study of the Sociocultural Attitudes Toward Appearance Questionnaire-3. Body image, 8(2), 168-172.

Too new for citations

Justification: Measures Body-related aspects of gender (not deemed relevant for our measure)

LGBT Ally Identity Measure

Original scale: Jones, K. N., Brewster, M. E., & Jones, J. A. (2014). The creation and validation of the LGBT Ally Identity Measure. Psychology of Sexual Orientation and Gender Diversity, 1(2), 181.

Too new for citations

Justification: Does not measure gender

O’ Kelly Women’s Belief Scale

Original scale: O’Kelly, M. (2011). Psychometric properties of the O’Kelly women’s belief scales. Journal of Rational-Emotive & Cognitive-Behavior Therapy, 29(3), 145-157.

Too new for citations

Justification: Does not address gender traits and characteristics

Lesbian and Gay Identity Scale

Original scale: Mohr, J., & Fassinger, R. (2000). Measuring dimensions of lesbian and gay male experience. Measurement and Evaluation in Counseling and Development, 33(2), 66-66.

Too new for citations

Justification: Doesn’t measure gender

Wiggins Interpersonal Behavior Circle

Original scale: Wiggins, J. S. (1979). A psychological taxonomy of trait-descriptive terms: The interpersonal domain. Journal of personality and social psychology, 37(3), 395.

Too new for citations

Justification: Doesn’t measure gender

Social Sex-Role Inventory (DSI)

Original scale: Shively, M. G., & De Cecco, J. P. (1977). Components of sexual identity. Journal of homosexuality, 3(1), 41-48.

Web of Science (2006->): 23 (full database), 7 (health research)

Google scholar (2006->): 142

Justification: Doesn’t measure gender

Masculine Gender Role Discrepancy

Original scale: Pleck, J. H. (1995). The gender role strain paradigm: An update. Web of Science (2006->): Missing

Google scholar (2006->): 590

Justification: This is a theory, not a scale

Omnibus Personality Inventory

Original scale: Heist, P., & Yonge, G. D. (1968). Omnibus personality inventory. Psychological Corporation.