This document summarizes a quantitative study on gender representation and online participation. The study analyzed data from StackOverflow, WordPress, and Drupal to investigate gender ratios, levels of engagement, and other metrics. It found that women comprised 7-10% of participants compared to 1-5% for open source communities generally. While women asked more questions, there were no significant differences found in other engagement metrics between genders. The study hypothesized that competitive elements and anonymity may discourage some women from greater participation.
Gender, Representation and Online Participation: a Quantitative Study
1. Gender, Representation and Online
Participation:
a Quantitative Study
Dr Andrea Capiluppi
30 Oct 2013
Dept of Information Systems and Computing (DISC)
2. My research background
• Software engineering
–
–
–
–
Software maintenance & evolution
Software architectures, components & reuse
Effort estimation
Quantitative studies
• Open processes
– Open source products
– Social networks
• Wikipedia
• Q&A sites
3. The Fastest Q&A Site in the West
• StackOverflow is a “Question & Answer site for
programmers”
– Part of the StackExchange network
• Most questions are answered
– StackOverflow (92.6%)
– Yahoo! Answers (88.2%)
– KiN (~66%)
• Median answer time of only 11 minutes!
Mamykina, L., Manoim, B., Mittal, M., Hripcsak, G., & Hartmann, B. (2011, May).
Design lessons from the fastest q&a site in the west. In Proceedings of the SIGCHI
conference on Human factors in computing systems (pp. 2857-2866). ACM.
4. Game Mechanisms in SO
• SO is based on points
– Reputation points
• Good answer
• Good comment
• Good question
• ...
– Badges
• Popular Question
• Commentator
• Necromancer
• …
– Privileges: more points give access to more features
• Voting
• Commenting
• Editing
5. How this work started
• Major conference, paper painting the awesomeness
of StackOverflow
Lotufo, R., Passos, L., & Czarnecki, K.
(2012, June). Towards improving bug
tracking systems with game mechanisms.
In Mining Software Repositories (MSR),
2012 9th IEEE Working Conference on
(pp. 2-11). IEEE.
6. How this work started
• Paper was well received
• Questions from the audience:
– is SO attracting a male-only crowd?
• Wider questions:
– Are prizes, badges, reputation creating an unbalanced
participation?
– Is “gaming” lethal for a social network? Making it less
sustainable?
8. A bit of a touchy topic...
Regarding the FLOSS community as a
whole, have you ever observed
discriminatory behaviour against women?
FLOSSPOLS
Deliverable D16
Gender: Integrated
Report of Findings.
http://www.flosspols.o
rg/deliverables/D16H
TML/FLOSSPOLSD16Gender_Integrated_R
eport_of_Findings.ht
m, 2006.
9. Demoted skills
• Online status and reputation: 'pro' and 'rookie'
– Technical skills: coding, debugging, etc.
– Non-technical skills: usability, web design, etc.
• (…) the skill of web design was demoted to a ‘nontechnical’ status as it became a way in which women
described and approached their work [Kotamraju
2003]
Kotamraju, N. 2003. Art versus Codep: The Gendered
Evolution of Web Design df Skills. In Howard, P. and S. Jones
(eds) Society Online: The Internet in Context. London: Sage.
11. Aim of the study
• Provide quantifiable evidence of gender
participation and engagement
– Is gender ratio unbalanced?
– Is gender engagement unbalanced?
• Data sampling: Q&A sites
– StackOverflow
– Wordpress
– Drupal
13. Research questions:
• RQ1: What are the challenges with identifying gender
in online communities?
• RQ2: What is the rate of participation by women in
online communities?
• RQ3: What is the level of engagement by women in
online communities?
… (trying to) avoid moralistic messages
16. Empirical approach
• Data mining/Name extraction
• Gender resolution
• Detection of activity on
– StackOverflow
– Drupal
– WordPress
• Statistical comparison between gender
17. Data and name extraction
• StackOverflow public data dump
– 1,078,708 registered users
– Too much noise to automatically assign gender
– Random sampling
• 2% margin error
• 99% confidence interval
• Subset of 4,144 SO users
• Manual gender resolution
18. Data and name extraction II
• Drupal and WordPress
mailing lists
– Both separate Q&A into
various sub-lists
• Consulting
• Development
• Support
• …
– Name, Surname, email
address, text of email,
<<in_response_to>> tag
– All messages & authors
analysed
– Manual gender resolution
24. 14/11/13
P
A
S
G
E
E
T
24
W
&
Heuristics:
title + first h1
<title>Ben Kamens</title>
…
<h1>We’re willing
to be embarrassed about
what we
<em>haven’t</em>
done…</h1>
Ben Kamens We’re willing to
be embarrassed about what we
haven’t done…
Stanford Named
Entity Tagger
<PERSON>Ben
Kamens</PERSON> We’re
willing to be embarrassed
about what we haven’t done…
26. 14/11/13
P
A
S
Quality of gender resolution: Survey
G
E
E
T
26
W
SelfAs inferred Total
&
identification
M
M
F
F ?
60
2
3 43
5 4
+ avatars,
other social
media sites
(manually)
106
11
SelfAs inferred Total
identification M F ?
M
F
90
2
3 13
9 0
106
11
34. 14/11/13
P
A
S
G
E
E
T
34
W • [Gneezy,
&
Why?
Niederle, Rustichini 2003]: women are less
effective in mixed-gender competitive environments
• [Niederle, Vesterlund 2007]: women shy away from
competition and men embrace it
• To retain women we need different gamification
techniques
35. 14/11/13
P
A
S
Threats to validity
G
E
E
T
35
• Gender inference:
W
&
• Automated: Imprecise
tooling
• Manual: Errare humanum est
• Gender swapping
• Images of other people as avatars
• Celebrities, children, porn stars…
38. Questions?
Vasilescu, B., Capiluppi, A., Serebrenik A.
(2012): Gender, Representation and Online
Participation: A Quantitative Study of
StackOverflow Social Informatics
(SocialInformatics), 2012 International
Conference on, p. 332-338
●
Vasilescu, B., Capiluppi, A., Serebrenik A.
(2013): Men at work: the StackOverflow case Tiny
Transactions on Computer Science, 2
●
Vasilescu, B., Capiluppi, A., Serebrenik A.
(2013): Gender, Representation and Online
Participation: A Quantitative Study, Interacting
with Computers 2013; doi: 10.1093/iwc/iwt047
●
Editor's Notes
Advantages: controlled sample
Disadvantages: representative?
In any case: direction for future work
<number>
However, what is common to both Drupal and
WordPress is that the dierences in gender participation
occur mostly between mailing lists focussing on designing
technology (development, wp-hackers and wp-xmlrc)
and using technology (consulting, wp-docs and wp-edu).
<number>