GitHub is an online community of 12 million developers sharing and collaborating on projects. With so many users publicly contributing code to open source projects, GitHub is the perfect place to go data mining for all sorts of interesting user behaviour. A group of computer scientists from California Polytechnic State University and North Carolina State University have analysed millions of contributions to investigate gender bias on the website and are reporting some unexpected findings.
It’s no secret that software development is a male-dominated industry and environments for open source development is no different. The authors of this latest study were interested in gender bias on GitHub and expected that women might have a harder time getting their contributions accepted. Interestingly, women actually had their contributions accepted more often than men, but only if they weren’t known to be women at the time.
The gender of users isn’t actually recorded by GitHub, so the researchers used gender information Google+ accounts. They did this by pulling over 4 million email addresses from GitHub user, of which 1.4 million also had Google+ accounts. The list itself hasn’t been published due to privacy concerns.
Ignoring the fact that people might not be honest about their gender on Google+ and that gender itself is more messy than simply male or female, the researchers now had gender identities for 1.4 million GitHub users. It look a lot of work to make the list and most GitHub users will not be aware of the Google+ accounts of fellow contributors. Of the 1.4 million users with known gender identities, the researchers then judged their GitHub profiles as obviously male, female, or gender neutral. If the name was Bob2010 and the avatar was a bearded guy, they would list the user as obviusly male. Sally2011 and an avatar of a woman would be listed as female. A username like “ilovepizza” (it’s true) with a default avatar would be listed as gender-neutral.
What all of this means is that the researchers knew, to some extent, the real gender and perceived gender of each user. One woman might be seen as a woman on GitHub due to her profile information, while another woman might appear gender-neutral. Using this information, the researchers could analyse millions of contribution requests to see how successful different groups were. They thought that maybe women would face gender bias and have their contributions blocked more often than men. This turned out to be true, but only if the women was perceived as women by other users. The women with gender-neutral profiles had contributions accepted more than the other woman and the men.
There are more men than women taking part in GitHub and contributing, so obviously there are more contributions from men. When you take this into account, 78.6% of women’s contributions are accepted and become part of the project’s code compared to 74.% of men’s. The numbers of men and women who were highly successful contributors differed too. A minority of users are so good that they have nearly 100% of their contributions accepted and merged into the project. Only 13.5% of men were close to having 100% of their code merged, but 25% of the women were that successful. For almost everything the researchers analysed, women were kicking ass.
Making sense of the data
The researchers tried to explain these results by testing a few hypotheses. Were women making smaller, easier contributions? Were they only better at specific programming languages? Were they just fixing known bugs rather than providing more creative and meaningful contributions? The researchers found that none of this was the case. Men were more likely to fix known bugs; women’s contributions provided more lines of code on average; and women were more likely to have contributions accepted regardless of the programming language.
Another idea was that the bias was caused by women being more likely to give up developing, so the few who are left have been at it longer and better at contributing. Contrastingly, men aren’t as likely to drop out so there are plenty of men who haven’t been there for long and are less likely to provide successful contributions. The researchers found that this couldn’t be the case as the women were more successful regardless of how long they have been contributing. Another angle to this argument could be that less successful women give up programming and move to other industries more often than men, so the average woman still contributing may have more programming experience or education that then average man on the website. Indeed, the researchers point out that women contributing to open source projects are more likely to have Masters and PhD degrees than men.
We could speculate with countless possible explanations. Maybe women aren’t held to the same standard as men? Maybe women entering open source are more prepared and educated going in? Maybe women are just awesome at coding? We can’t say for sure. All we can really take from this paper is that women on GitHub are more likely to have their work accepted than men but not because it’s short, easy, or less meaningful. For whatever reason, women kick ass at open source development on GitHub but have a tougher time when people can tell that they’re women.
Women are suffering due to gender bias, despite being better contributors.
The paper is yet to be peer-reviewed but a pre-print is available on PeerJ.
Main image © iStock/undrey