Many of us believe that gender diversity in open source projects is important (for example, O’Reilly, Google, and the Python Software Foundation). (If you don’t, this isn’t going to convince you.) But what things are correlated with improved gender diversity, and what can we learn from similar historic industries?
Holden Karau and Matt Hunt explore the diversity of different projects, examine historic EEOC complaints, and detail parallels and historic solutions. To keep things interesting, Holden and Matt conclude with a comparative analysis of the state of OSS and various complaints handled by the EEOC in the ’60s, along with the solutions, suggestions, and binding settlements that were reached for similar diversity problems in other industries. This comparison is not legal advice but rather examples of what we can learn from early equal opportunity commission decisions.
Topics include:
Diversity of gender among the different levels of a given project’s leadership (committers, PMC, etc.)
The existence of codes of conduct
Language used in comments, code, and mailing lists
The rate of promotions for project participants
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Jupyter con 2018 Diversity Analytics & OSS Adventures
1. What things are correlated
with gender diversity
A data science stroll through the ASF and Jupyter
projects
By
@holdenkarau & @instantmatthew
2. What is this all about?
● Curiosity: few metrics on open source diversity exist
● Fun use of Jupyter, Spark, ML
● Pull requests welcome!
Lori Erickson
3. Who are you?
we have nothing in
common
Me: smart, funny,
straight, bald, New Yorker
Holden: trans, queer,
canadian San Franciscan,
wants you to follow her on
YouTube … etc
4. Or do we?
● English speaking bi-coastal North American techies
● Breathe same air, mortal
● Distinctive fashion sense
● A shared appreciation for the Cheesecake Factory
● Whisky
● Neither of us are talking on behalf of our employers today
5. Historical Perspective
● quote from “The Goods Girls Revolt”
○ “Writers come to magazine over the transom,” he said, “and women aren’t coming. We can’t
do anything if they aren’t interested”
● And a similar quote from open source luminaries
○ “I don’t have any experience working with women in programming projects; I don’t think that
any volunteered to work on Emacs or GCC.” - RMS
*The Good Girls Revolt: How the Women of Newsweek Sued their Bosses and Changed the Workplace
by Lynn Povich
sheologian
6. Recent studies
GitHub 2017
“These researchers found that women’s coding suggestions was accepted 71.8% of the time
when their gender was kept a secret, but only 62.5% of the time when their gender was
revealed.”
“Only 3% of the 5500 randomly selected respondents were women. 25% of those women
reported being exposed to language or content that made them uncomfortable”
7. What have we done?
Pulled data from git, meetup, etc,
done some ML magic to infer gender and get stats
Used Jupyter!
Made some pretty(ish) pictures
8. What you can’t get from this?
● Causation. Which correlation ain’t.
● Legal advice
● Academic quality data
Quirky Confectioner
Lawyer cat
objects!
9. Data sources/Methods
● Git commits and messages
● Inferred gender
● Gender from human review
● Project websites
● Mailing lists
● You can see our work - http://bit.ly/holdendDiversityAnalyticsRepo
○ And contribute… hint hint…..
Melissa Wiese
10. Such Data
● ~50 projects
● ~30gb of commits & posts
Human reviewed:
● Sampled down to ~1600 code contributors + all ~2600 committers
Andrey Belenko
14. Some other things stand out quickly...
● Broad base of companies (maybe different kinds of diversity or correlated)?
● Easy to find community page
● Get involved link right on the home page
● Academic funding sources (NSF) + GSOC
16. What are some interesting project attributes?
● Does the project have a code of conduct?
● Does the project have a stated way for people to become committers?
● Does the project have a contributing guide?
● What’s the sentiment of the projects user/dev list?
● PR acceptance rate
● Your ideas/suggestions - seriously e-mail us (and/or make PRs to the
notebook!)
j0035001-2
17. What about gender related attributes?
● Gender %s of code contributors
● Gender %s of mailing list users
● Gender %s of PMC / committers
● And correlations
charlene mcbride
21. Oh howdy, there’s some differences….
● Maybe it’s from our data collection methods
● Inferred gender is also known to have issues, especially with non-American
names, non-cis folks, etc.
● Inferred sentiment detection maybe not great?
○ I just used nltk vader cause w/e
22. How was the human data collection done?
Instructions:
Find the gender of the user in question. You can look at the e-mails sent in
response to them, but also feel free to search online to find other information
about the user (use the project information disambiguate cases of multiple people
with the same name).
List additional links possibly about the user used (e.g. linkedin, twitter, etc.)
Provided with:
E-mails in response to user, project name, author name, and github name
(All depending on what could be found)
DocChewbacca
27. Stage 3: Solutions to historical challenges
Remember the parallels in quotes? Maybe there are parallels in solutions?
● Short answer: hire women
○ In OSS we sometimes pretend we are not paid…. but a lot of us are.
● Longer answer: make training/mentorship programs to promote internal
candidates
○ Strangely enough mentoring programs existences was negatively correlated
● Explicit “try-outs”
○ (or ways of hiring people that wasn’t just friends)
● Not depending on randomly finding people
Nacho
28. Related work
● https://code.likeagirl.io/gender-bias-in-open-source-d1deda7dec28
● https://blog.bitergia.com/2016/10/11/gender-diversity-analysis-of-the-linux-ker
nel-technical-contributions/
● https://peerj.com/articles/cs-111/ (PR acceptance rates for women
insiders/outsiders)
● Livestreams of the data processing/collection -
http://bit.ly/holdenJupyterStreams
○ Did you know it’s perf season at Google? And Google is very metrics driven…. Also my
managers name is Steve.
Arthur Cruz
29. Special thanks!
Ann Spencer
Wrangler of cats and unicorns as the Head of Content at Domino Data Lab.
Formerly Data Editor at O'Reilly Media (aka Holden's editor).
Born and raised in San Francisco.
https://blog.dominodatalab.com/
30. Want to participate?
● New forum:
https://groups.google.com/forum/#!managemembers/oss-diversity-discussion
● Notebook code at https://github.com/holdenk/diversity-analytics /
http://bit.ly/holdendDiversityAnalyticsRepo
● Slides: https://www.slideshare.net/hkarau
● @holdenkarau & @instantmatthew
● And or come say hi to us @ Strata
Melissa Wiese
31. High Performance Spark!
Unrelated to this talk. I’ll have a book signing @ 3:20pm at
the O’Reilly booth.
You can also buy it from that scrappy Seattle bookstore,
Jeff Bezos needs another newspaper and I want a cup of
coffee.
http://bit.ly/hkHighPerfSpark