Is there, or is there not, a gender gap in mathematics performance? And if there is, is it biological or cultural?
Although the presence of a gender gap in the U.S. tends to be regarded as an obvious truth, evidence is rather more equivocal. One meta-analysis of studies published between 1990 and 2007, for example, found no gender differences in mean performance and nearly equal variability within each gender. Another meta-analysis, using 30 years of SAT and ACT scores, found a very large 13:1 ratio of middle school boys to girls at the highest levels of performance in the early 1980s, which declined to around 4:1 by 1991, where it has remained. A large longitudinal study found that males were doing better in math, across all socioeconomic classes, by the 3rd grade, with the ratio of boys to girls in the top 5% rising to 3:1 by 5th grade.
Regardless of the extent of any gender differences in the U.S., the more fundamental question is whether such differences are biological or cultural. The historical changes mentioned above certainly point to a large cultural component. Happily, because so many more countries now participate in the Trends in International Mathematics and Science Study (TIMSS) and the Programme in International Student Assessment (PISA), much better data is now available to answer this question. In 2007, for example, 4th graders from 38 countries and 8th graders from 52 countries participated in TIMSS. In 2009, 65 countries participated in PISA.
So what does all this new data reveal about the gender gap? Overall, there was no significant gender gap in the 2003 and 2007 TIMSS, with the exception of the 2007 8th graders, where girls outperformed boys.
There were, of course, significant gender gaps on a country basis. Researchers looked at several theories for what might underlie these.
Contradicting one theory, gender gaps did not correlate reliably with gender equity. In fact, both boys and girls tended to do better in math when raised in countries where females have better equality. The primary contributor to this appears to be women’s income and rates of participation in the work force. This is in keeping with the idea that maternal education and employment opportunities have benefits for their children’s learning regardless of gender.
The researchers also looked at the more specific hypothesis put forward by Steven Levitt, that gender inequity doesn’t hurt girls' math performance in Muslim countries, where most students attend single-sex schools. This theory was not borne out by the evidence. There was no consistent link between school type and math performance across countries.
However, math performance in the 29 wealthier countries could be predicted to a very high degree by three factors: economic participation and opportunity; GDP per capita; membership of one of three clusters — Middle Eastern (Bahrain, Kuwait, Oman, Qatar, Saudi Arabia); East Asian (Hong Kong, Japan, South Korea, Singapore, Taiwan); rest (Russia, Hungary, Czech Republic, England, Canada, US, Australia, Sweden, Norway, Scotland, Cyprus, Italy, Malta, Israel, Spain, Lithuania, Malaysia, Slovenia, Dubai). The Middle Eastern cluster scored lowest (note the exception of Dubai), and the East Asian the highest. While there are many cultural factors differentiating these clusters, it’s interesting to note that countries’ average performance tended to be higher when students attribute less importance to mastering math.
The investigators also looked at the male variability hypothesis — the idea that males are more variable in their performance, and their predominance at the top is balanced by their predominance at the bottom. The study found however that greater male variation in math achievement varies widely across countries, and is not found at all in some countries.
In sum, the cross-country variability in performance in regard to gender indicates that the most likely cause of any differences lies in country-specific social factors. These could include perception of abilities as fixed vs malleable, attitude toward math, gender beliefs.
A popular theory of women’s underachievement in math concerns stereotype threat (first proposed by Spencer, Steele, and Quinn in a 1999 paper). I have reported on this on several occasions. However, a recent review of this research claims that many of the studies were flawed in their methodology and statistical analysis.
Of the 141 studies that cited the original article and related to mathematics, only 23 met the criteria needed (in the reviewers’ opinion) to replicate the original study:
- Both genders tested
- Math test used
- Subjects recruited regardless of preexisting beliefs about gender stereotypes
- Subjects randomly assigned to experimental conditions
Of these 23, three involved younger participants (< 18 years) and were excluded. Of the remaining 20 studies, only 11 (55%) replicated the original effect (a significant interaction between gender and stereotype threat, and women performing significantly worse in the threat condition than in the threat condition compared to men).
Moreover, half the studies confounded the results by statistically adjusting preexisting math scores. That is, the researchers tried to adjust for any preexisting differences in math performance by using a previous math assessment measure such as SAT score to ‘tweak’ the baseline score. This practice has been the subject of some debate, and the reviewers come out firmly against it, arguing that “an important assumption of a covariate analysis is that the groups do not differ on the covariate. But that group difference is exactly what stereotype threat theory tries to explain!” Note, too, that the original study didn’t make such an adjustment.
So what happens if we exclude those studies that confounded the results? That leaves ten studies, of which only three found an effect (and one of these found the effect only in a subset of the math test). In other words, overwhelmingly, it was the studies that adjusted the scores that found an effect (8/10), while those that didn’t adjust them didn’t find the effect (7/10).
The power of the adjustment in producing the effect was confirmed in a meta-analysis.
Now these researchers aren’t saying that stereotype threat doesn’t exist, or that it doesn’t have an effect on women in this domain. Their point is that the size of the effect, and the evidence for the effect, has come to be regarded as greater and more robust than the research warrants.
At a practical level, this may have led to too much emphasis on tackling this problem at the expense of investigating other possible causes and designing other useful interventions.