SlideShare a Scribd company logo
1 of 32
Download to read offline
What things are correlated
with gender diversity
A data science stroll through the ASF and Jupyter
projects
By
@holdenkarau & @instantmatthew
What is this all about?
● Curiosity: few metrics on open source diversity exist
● Fun use of Jupyter, Spark, ML
● Pull requests welcome!
Lori Erickson
Who are you?
we have nothing in
common
Me: smart, funny,
straight, bald, New Yorker
Holden: trans, queer,
canadian San Franciscan,
wants you to follow her on
YouTube … etc
Or do we?
● English speaking bi-coastal North American techies
● Breathe same air, mortal
● Distinctive fashion sense
● A shared appreciation for the Cheesecake Factory
● Whisky
● Neither of us are talking on behalf of our employers today
Historical Perspective
● quote from “The Goods Girls Revolt”
○ “Writers come to magazine over the transom,” he said, “and women aren’t coming. We can’t
do anything if they aren’t interested”
● And a similar quote from open source luminaries
○ “I don’t have any experience working with women in programming projects; I don’t think that
any volunteered to work on Emacs or GCC.” - RMS
*The Good Girls Revolt: How the Women of Newsweek Sued their Bosses and Changed the Workplace
by Lynn Povich
sheologian
Recent studies
GitHub 2017
“These researchers found that women’s coding suggestions was accepted 71.8% of the time
when their gender was kept a secret, but only 62.5% of the time when their gender was
revealed.”
“Only 3% of the 5500 randomly selected respondents were women. 25% of those women
reported being exposed to language or content that made them uncomfortable”
What have we done?
Pulled data from git, meetup, etc,
done some ML magic to infer gender and get stats
Used Jupyter!
Made some pretty(ish) pictures
What you can’t get from this?
● Causation. Which correlation ain’t.
● Legal advice
● Academic quality data
Quirky Confectioner
Lawyer cat
objects!
Data sources/Methods
● Git commits and messages
● Inferred gender
● Gender from human review
● Project websites
● Mailing lists
● You can see our work - http://bit.ly/holdendDiversityAnalyticsRepo
○ And contribute… hint hint…..
Melissa Wiese
Such Data
● ~50 projects
● ~30gb of commits & posts
Human reviewed:
● Sampled down to ~1600 code contributors + all ~2600 committers
Andrey Belenko
Stage One: Eyeballing Jennifer Morrow
So what do ASF & Jupyter projects look like?
Wait what’s that tall bar?
fabien duplan
Some other things stand out quickly...
● Broad base of companies (maybe different kinds of diversity or correlated)?
● Easy to find community page
● Get involved link right on the home page
● Academic funding sources (NSF) + GSOC
Stage 2: Science John Floyd
What are some interesting project attributes?
● Does the project have a code of conduct?
● Does the project have a stated way for people to become committers?
● Does the project have a contributing guide?
● What’s the sentiment of the projects user/dev list?
● PR acceptance rate
● Your ideas/suggestions - seriously e-mail us (and/or make PRs to the
notebook!)
j0035001-2
What about gender related attributes?
● Gender %s of code contributors
● Gender %s of mailing list users
● Gender %s of PMC / committers
● And correlations
charlene mcbride
Slides for Correlations
[Row(corr(sampled.nonmale_percentage, infered.nonmale_percentage)=0.8402836506347078, corr(sampled.nonmale_percentage,
Answer_code_of_conduct_easy)=-0.05088697801152734, corr(infered.nonmale_percentage,
Answer_code_of_conduct_easy)=0.004552341326140643, corr(sampled.nonmale_percentage,
Answer_code_of_conduct_exists)=-0.05088697801152734, corr(infered.nonmale_percentage,
Answer_code_of_conduct_exists)=0.004552341326140643, corr(sampled.nonmale_percentage,
Answer_committer_guide_easy)=-0.30915940064845393, corr(infered.nonmale_percentage,
Answer_committer_guide_easy)=-0.0381086842740672, corr(sampled.nonmale_percentage,
Answer_committer_guide_exists)=-0.34084081419416784, corr(infered.nonmale_percentage,
Answer_committer_guide_exists)=-0.03831572641820849, corr(sampled.nonmale_percentage,
Answer_contributing_guide_easy)=0.00950903602820991, corr(infered.nonmale_percentage,
Answer_contributing_guide_easy)=0.04837014770606781, corr(sampled.nonmale_percentage,
Answer_contributing_guide_exists)=0.0202429856533326, corr(infered.nonmale_percentage,
Answer_contributing_guide_exists)=0.03636869585244893, corr(sampled.nonmale_percentage,
Answer_mentoring_guide_easy)=-0.15392301526227192, corr(infered.nonmale_percentage,
Answer_mentoring_guide_easy)=-0.055002597763866734, corr(sampled.nonmale_percentage,
Answer_mentoring_guide_exists)=-0.15392301526227192, corr(infered.nonmale_percentage,
Answer_mentoring_guide_exists)=-0.055002597763866734, corr(sampled.nonmale_percentage,
has_female_or_enby_committer_magic)=0.18942118337810188, corr(infered.nonmale_percentage,
has_female_or_enby_committer_magic)=0.20349367651041672, corr(sampled.nonmale_percentage,
nonmale_committer_percentage_magic)=0.5441035627011365, corr(infered.nonmale_percentage,
nonmale_committer_percentage_magic)=0.35402599653343864, corr(sampled.nonmale_percentage,
R. Crap Mariner
This wasn’t much better
+------------------------------------------------------------.....
|corr(sampled.nonmale_percentage, infered.nonmale_percentage)|corr(sampled.nonmale_percentage,
Answer_code_of_conduct_easy)|corr(infered.nonmale_percentage, Answer_code_of_conduct_easy)|corr(sampled.nonmale_percentage,
Answer_code_of_conduct_exists)|corr(infered.nonmale_percentage, Answer_code_of_conduct_exists)|corr(sampled.nonmale_percentage,
Answer_committer_guide_easy)|corr(infered.nonmale_percentage, Answer_committer_guide_easy)|corr(sampled.nonmale_percentage,
Answer_committer_guide_exists)|.....
| 0.8402836506347078| -0.05088697801152734|
0.004552341326140643| -0.05088697801152734| 0.004552341326140643|
-0.30915940064845393| -0.0381086842740672| -0.34084081419416784|
-0.03831572641820849| 0.00950903602820991| 0.04837014770606781|
0.0202429856533326| 0.03636869585244893| -0.15392301526227192|
-0.05500259776386...| -0.15392301526227192| -0.05500259776386...|
0.18942118337810188| 0.20349367651041672| 0.5441035627011365|
0.35402599653343864| 0.27903907421646745| -0.19842388895891314|
0.018343520672052215| -0.0531287316430999| -0.04570527792465824|
-0.11407965948006175| -0.02941906552049...| 0.010923839206653968|
-0.19651751264222414| -0.2121016705878764| -0.20639989813410967|
-0.21973083941480384| -0.31067113317726425| -0.15172448698670876|
-0.31736988968372776| -0.17906926611311288| 0.14828713581114333|
-0.28798744559651446| 0.540848408698061| -0.11571044537290899|
0.5044867286902844| -0.44725076538864206| 0.4935819383384438|
R. Crap Mariner
Slides for Correlations
Inferred gender informationSampled gender information
Barry Badcock
Oh howdy, there’s some differences….
● Maybe it’s from our data collection methods
● Inferred gender is also known to have issues, especially with non-American
names, non-cis folks, etc.
● Inferred sentiment detection maybe not great?
○ I just used nltk vader cause w/e
How was the human data collection done?
Instructions:
Find the gender of the user in question. You can look at the e-mails sent in
response to them, but also feel free to search online to find other information
about the user (use the project information disambiguate cases of multiple people
with the same name).
List additional links possibly about the user used (e.g. linkedin, twitter, etc.)
Provided with:
E-mails in response to user, project name, author name, and github name
(All depending on what could be found)
DocChewbacca
First look Khairil Zhafri
Sentiment of mailing lists J. Triepke
And the rest…. Hajime
NAKANO
What about that inferred data?
Stage 3: Solutions to historical challenges
Remember the parallels in quotes? Maybe there are parallels in solutions?
● Short answer: hire women
○ In OSS we sometimes pretend we are not paid…. but a lot of us are.
● Longer answer: make training/mentorship programs to promote internal
candidates
○ Strangely enough mentoring programs existences was negatively correlated
● Explicit “try-outs”
○ (or ways of hiring people that wasn’t just friends)
● Not depending on randomly finding people
Nacho
Related work
● https://code.likeagirl.io/gender-bias-in-open-source-d1deda7dec28
● https://blog.bitergia.com/2016/10/11/gender-diversity-analysis-of-the-linux-ker
nel-technical-contributions/
● https://peerj.com/articles/cs-111/ (PR acceptance rates for women
insiders/outsiders)
● Livestreams of the data processing/collection -
http://bit.ly/holdenJupyterStreams
○ Did you know it’s perf season at Google? And Google is very metrics driven…. Also my
managers name is Steve.
Arthur Cruz
Special thanks!
Ann Spencer
Wrangler of cats and unicorns as the Head of Content at Domino Data Lab.
Formerly Data Editor at O'Reilly Media (aka Holden's editor).
Born and raised in San Francisco.
https://blog.dominodatalab.com/
Want to participate?
● New forum:
https://groups.google.com/forum/#!managemembers/oss-diversity-discussion
● Notebook code at https://github.com/holdenk/diversity-analytics /
http://bit.ly/holdendDiversityAnalyticsRepo
● Slides: https://www.slideshare.net/hkarau
● @holdenkarau & @instantmatthew
● And or come say hi to us @ Strata
Melissa Wiese
High Performance Spark!
Unrelated to this talk. I’ll have a book signing @ 3:20pm at
the O’Reilly booth.
You can also buy it from that scrappy Seattle bookstore,
Jeff Bezos needs another newspaper and I want a cup of
coffee.
http://bit.ly/hkHighPerfSpark
Questions?

More Related Content

What's hot

Dr. You or, How I Learned to Stop Worry and Love the PhD
Dr. You or, How I Learned to Stop Worry and Love the PhDDr. You or, How I Learned to Stop Worry and Love the PhD
Dr. You or, How I Learned to Stop Worry and Love the PhDOlga Botvinnik
 
Linked Open Govt Data - Sem Tech East
Linked Open Govt Data - Sem Tech EastLinked Open Govt Data - Sem Tech East
Linked Open Govt Data - Sem Tech EastJames Hendler
 
What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?Peter Mika
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in PracticePeter Mika
 
Understanding the Standards Gap
Understanding the Standards GapUnderstanding the Standards Gap
Understanding the Standards GapDan Brickley
 
Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015Peter Mika
 

What's hot (6)

Dr. You or, How I Learned to Stop Worry and Love the PhD
Dr. You or, How I Learned to Stop Worry and Love the PhDDr. You or, How I Learned to Stop Worry and Love the PhD
Dr. You or, How I Learned to Stop Worry and Love the PhD
 
Linked Open Govt Data - Sem Tech East
Linked Open Govt Data - Sem Tech EastLinked Open Govt Data - Sem Tech East
Linked Open Govt Data - Sem Tech East
 
What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in Practice
 
Understanding the Standards Gap
Understanding the Standards GapUnderstanding the Standards Gap
Understanding the Standards Gap
 
Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015
 

Similar to Jupyter con 2018 Diversity Analytics & OSS Adventures

When recommendation systems go bad - machine eatable
When recommendation systems go bad - machine eatableWhen recommendation systems go bad - machine eatable
When recommendation systems go bad - machine eatableEvan Estola
 
Networking 101 Arts Works Conference 2013 University of Alberta
Networking 101 Arts Works Conference 2013 University of AlbertaNetworking 101 Arts Works Conference 2013 University of Alberta
Networking 101 Arts Works Conference 2013 University of AlbertaChristine Gertz
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Diana Maynard
 
PyOhio 2015: You Gotta Want It
PyOhio 2015: You Gotta Want ItPyOhio 2015: You Gotta Want It
PyOhio 2015: You Gotta Want ItStephanie Hlppo
 
Step Up Your Survey Research - Dawn of the Data Age Lecture Series
Step Up Your Survey Research - Dawn of the Data Age Lecture SeriesStep Up Your Survey Research - Dawn of the Data Age Lecture Series
Step Up Your Survey Research - Dawn of the Data Age Lecture SeriesLuciano Pesci, PhD
 
Survey Research in Design
Survey Research in DesignSurvey Research in Design
Survey Research in DesignSam Ladner
 
AI and ChatGPT in Online Education
AI and ChatGPT in Online Education AI and ChatGPT in Online Education
AI and ChatGPT in Online Education D2L Barry
 
PARKER, LYNNE. PANEL: ENGAGING WOMEN IN ROBOTICS
PARKER, LYNNE.  PANEL: ENGAGING WOMEN IN ROBOTICSPARKER, LYNNE.  PANEL: ENGAGING WOMEN IN ROBOTICS
PARKER, LYNNE. PANEL: ENGAGING WOMEN IN ROBOTICSAlex Goldstein
 
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...Data Driven Innovation
 
Non-Experimental Methods
Non-Experimental MethodsNon-Experimental Methods
Non-Experimental MethodsKurt Luther
 
A World Without Contract Cheating - Keynote Presentation for University of Br...
A World Without Contract Cheating - Keynote Presentation for University of Br...A World Without Contract Cheating - Keynote Presentation for University of Br...
A World Without Contract Cheating - Keynote Presentation for University of Br...Thomas Lancaster
 
LIB300 Week 9 finding, analyzing, and documenting information
LIB300 Week 9 finding, analyzing, and documenting informationLIB300 Week 9 finding, analyzing, and documenting information
LIB300 Week 9 finding, analyzing, and documenting informationDr. Russell Rodrigo
 
Getting the Work Done [Code for America Summit 2018 Breakout Session]
Getting the Work Done [Code for America Summit 2018 Breakout Session]Getting the Work Done [Code for America Summit 2018 Breakout Session]
Getting the Work Done [Code for America Summit 2018 Breakout Session]Hana Schank
 
Essay About Rainwater Harvesting
Essay About Rainwater HarvestingEssay About Rainwater Harvesting
Essay About Rainwater HarvestingJamie Jackson
 
Ai demystified for HR and TA leaders
Ai demystified for HR and TA leadersAi demystified for HR and TA leaders
Ai demystified for HR and TA leadersAntonia Macrides
 
RMACC 2018 Keynote: Breaking the Glass Ceiling Identifying and Addressing Sel...
RMACC 2018 Keynote: Breaking the Glass Ceiling Identifying and Addressing Sel...RMACC 2018 Keynote: Breaking the Glass Ceiling Identifying and Addressing Sel...
RMACC 2018 Keynote: Breaking the Glass Ceiling Identifying and Addressing Sel...Lorna Rivera
 
OpenThreads: The Community of Mailing Lists presented at FOSS4G-NA
OpenThreads: The Community of Mailing Lists presented at FOSS4G-NAOpenThreads: The Community of Mailing Lists presented at FOSS4G-NA
OpenThreads: The Community of Mailing Lists presented at FOSS4G-NAAlyssa Wright
 
Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)
Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)
Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)Flupa
 
Equality and Technology_Gregory
Equality and Technology_GregoryEquality and Technology_Gregory
Equality and Technology_Gregorykarengregory2000
 

Similar to Jupyter con 2018 Diversity Analytics & OSS Adventures (20)

When recommendation systems go bad - machine eatable
When recommendation systems go bad - machine eatableWhen recommendation systems go bad - machine eatable
When recommendation systems go bad - machine eatable
 
Networking 101 Arts Works Conference 2013 University of Alberta
Networking 101 Arts Works Conference 2013 University of AlbertaNetworking 101 Arts Works Conference 2013 University of Alberta
Networking 101 Arts Works Conference 2013 University of Alberta
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?
 
PyOhio 2015: You Gotta Want It
PyOhio 2015: You Gotta Want ItPyOhio 2015: You Gotta Want It
PyOhio 2015: You Gotta Want It
 
Step Up Your Survey Research - Dawn of the Data Age Lecture Series
Step Up Your Survey Research - Dawn of the Data Age Lecture SeriesStep Up Your Survey Research - Dawn of the Data Age Lecture Series
Step Up Your Survey Research - Dawn of the Data Age Lecture Series
 
Survey Research in Design
Survey Research in DesignSurvey Research in Design
Survey Research in Design
 
AI and ChatGPT in Online Education
AI and ChatGPT in Online Education AI and ChatGPT in Online Education
AI and ChatGPT in Online Education
 
PARKER, LYNNE. PANEL: ENGAGING WOMEN IN ROBOTICS
PARKER, LYNNE.  PANEL: ENGAGING WOMEN IN ROBOTICSPARKER, LYNNE.  PANEL: ENGAGING WOMEN IN ROBOTICS
PARKER, LYNNE. PANEL: ENGAGING WOMEN IN ROBOTICS
 
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
 
Non-Experimental Methods
Non-Experimental MethodsNon-Experimental Methods
Non-Experimental Methods
 
Tool criticism
Tool criticismTool criticism
Tool criticism
 
A World Without Contract Cheating - Keynote Presentation for University of Br...
A World Without Contract Cheating - Keynote Presentation for University of Br...A World Without Contract Cheating - Keynote Presentation for University of Br...
A World Without Contract Cheating - Keynote Presentation for University of Br...
 
LIB300 Week 9 finding, analyzing, and documenting information
LIB300 Week 9 finding, analyzing, and documenting informationLIB300 Week 9 finding, analyzing, and documenting information
LIB300 Week 9 finding, analyzing, and documenting information
 
Getting the Work Done [Code for America Summit 2018 Breakout Session]
Getting the Work Done [Code for America Summit 2018 Breakout Session]Getting the Work Done [Code for America Summit 2018 Breakout Session]
Getting the Work Done [Code for America Summit 2018 Breakout Session]
 
Essay About Rainwater Harvesting
Essay About Rainwater HarvestingEssay About Rainwater Harvesting
Essay About Rainwater Harvesting
 
Ai demystified for HR and TA leaders
Ai demystified for HR and TA leadersAi demystified for HR and TA leaders
Ai demystified for HR and TA leaders
 
RMACC 2018 Keynote: Breaking the Glass Ceiling Identifying and Addressing Sel...
RMACC 2018 Keynote: Breaking the Glass Ceiling Identifying and Addressing Sel...RMACC 2018 Keynote: Breaking the Glass Ceiling Identifying and Addressing Sel...
RMACC 2018 Keynote: Breaking the Glass Ceiling Identifying and Addressing Sel...
 
OpenThreads: The Community of Mailing Lists presented at FOSS4G-NA
OpenThreads: The Community of Mailing Lists presented at FOSS4G-NAOpenThreads: The Community of Mailing Lists presented at FOSS4G-NA
OpenThreads: The Community of Mailing Lists presented at FOSS4G-NA
 
Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)
Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)
Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)
 
Equality and Technology_Gregory
Equality and Technology_GregoryEquality and Technology_Gregory
Equality and Technology_Gregory
 

Recently uploaded

Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceDelhi Call girls
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdfMatthew Sinclair
 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...SUHANI PANDEY
 
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...SUHANI PANDEY
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge GraphsEleniIlkou
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirtrahman018755
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...SUHANI PANDEY
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftAanSulistiyo
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...SUHANI PANDEY
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)Delhi Call girls
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...SUHANI PANDEY
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubaikojalkojal131
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...roncy bisnoi
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...tanu pandey
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...SUHANI PANDEY
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...SUHANI PANDEY
 

Recently uploaded (20)

(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
 

Jupyter con 2018 Diversity Analytics & OSS Adventures

  • 1. What things are correlated with gender diversity A data science stroll through the ASF and Jupyter projects By @holdenkarau & @instantmatthew
  • 2. What is this all about? ● Curiosity: few metrics on open source diversity exist ● Fun use of Jupyter, Spark, ML ● Pull requests welcome! Lori Erickson
  • 3. Who are you? we have nothing in common Me: smart, funny, straight, bald, New Yorker Holden: trans, queer, canadian San Franciscan, wants you to follow her on YouTube … etc
  • 4. Or do we? ● English speaking bi-coastal North American techies ● Breathe same air, mortal ● Distinctive fashion sense ● A shared appreciation for the Cheesecake Factory ● Whisky ● Neither of us are talking on behalf of our employers today
  • 5. Historical Perspective ● quote from “The Goods Girls Revolt” ○ “Writers come to magazine over the transom,” he said, “and women aren’t coming. We can’t do anything if they aren’t interested” ● And a similar quote from open source luminaries ○ “I don’t have any experience working with women in programming projects; I don’t think that any volunteered to work on Emacs or GCC.” - RMS *The Good Girls Revolt: How the Women of Newsweek Sued their Bosses and Changed the Workplace by Lynn Povich sheologian
  • 6. Recent studies GitHub 2017 “These researchers found that women’s coding suggestions was accepted 71.8% of the time when their gender was kept a secret, but only 62.5% of the time when their gender was revealed.” “Only 3% of the 5500 randomly selected respondents were women. 25% of those women reported being exposed to language or content that made them uncomfortable”
  • 7. What have we done? Pulled data from git, meetup, etc, done some ML magic to infer gender and get stats Used Jupyter! Made some pretty(ish) pictures
  • 8. What you can’t get from this? ● Causation. Which correlation ain’t. ● Legal advice ● Academic quality data Quirky Confectioner Lawyer cat objects!
  • 9. Data sources/Methods ● Git commits and messages ● Inferred gender ● Gender from human review ● Project websites ● Mailing lists ● You can see our work - http://bit.ly/holdendDiversityAnalyticsRepo ○ And contribute… hint hint….. Melissa Wiese
  • 10. Such Data ● ~50 projects ● ~30gb of commits & posts Human reviewed: ● Sampled down to ~1600 code contributors + all ~2600 committers Andrey Belenko
  • 11. Stage One: Eyeballing Jennifer Morrow
  • 12. So what do ASF & Jupyter projects look like?
  • 13. Wait what’s that tall bar? fabien duplan
  • 14. Some other things stand out quickly... ● Broad base of companies (maybe different kinds of diversity or correlated)? ● Easy to find community page ● Get involved link right on the home page ● Academic funding sources (NSF) + GSOC
  • 15. Stage 2: Science John Floyd
  • 16. What are some interesting project attributes? ● Does the project have a code of conduct? ● Does the project have a stated way for people to become committers? ● Does the project have a contributing guide? ● What’s the sentiment of the projects user/dev list? ● PR acceptance rate ● Your ideas/suggestions - seriously e-mail us (and/or make PRs to the notebook!) j0035001-2
  • 17. What about gender related attributes? ● Gender %s of code contributors ● Gender %s of mailing list users ● Gender %s of PMC / committers ● And correlations charlene mcbride
  • 18. Slides for Correlations [Row(corr(sampled.nonmale_percentage, infered.nonmale_percentage)=0.8402836506347078, corr(sampled.nonmale_percentage, Answer_code_of_conduct_easy)=-0.05088697801152734, corr(infered.nonmale_percentage, Answer_code_of_conduct_easy)=0.004552341326140643, corr(sampled.nonmale_percentage, Answer_code_of_conduct_exists)=-0.05088697801152734, corr(infered.nonmale_percentage, Answer_code_of_conduct_exists)=0.004552341326140643, corr(sampled.nonmale_percentage, Answer_committer_guide_easy)=-0.30915940064845393, corr(infered.nonmale_percentage, Answer_committer_guide_easy)=-0.0381086842740672, corr(sampled.nonmale_percentage, Answer_committer_guide_exists)=-0.34084081419416784, corr(infered.nonmale_percentage, Answer_committer_guide_exists)=-0.03831572641820849, corr(sampled.nonmale_percentage, Answer_contributing_guide_easy)=0.00950903602820991, corr(infered.nonmale_percentage, Answer_contributing_guide_easy)=0.04837014770606781, corr(sampled.nonmale_percentage, Answer_contributing_guide_exists)=0.0202429856533326, corr(infered.nonmale_percentage, Answer_contributing_guide_exists)=0.03636869585244893, corr(sampled.nonmale_percentage, Answer_mentoring_guide_easy)=-0.15392301526227192, corr(infered.nonmale_percentage, Answer_mentoring_guide_easy)=-0.055002597763866734, corr(sampled.nonmale_percentage, Answer_mentoring_guide_exists)=-0.15392301526227192, corr(infered.nonmale_percentage, Answer_mentoring_guide_exists)=-0.055002597763866734, corr(sampled.nonmale_percentage, has_female_or_enby_committer_magic)=0.18942118337810188, corr(infered.nonmale_percentage, has_female_or_enby_committer_magic)=0.20349367651041672, corr(sampled.nonmale_percentage, nonmale_committer_percentage_magic)=0.5441035627011365, corr(infered.nonmale_percentage, nonmale_committer_percentage_magic)=0.35402599653343864, corr(sampled.nonmale_percentage, R. Crap Mariner
  • 19. This wasn’t much better +------------------------------------------------------------..... |corr(sampled.nonmale_percentage, infered.nonmale_percentage)|corr(sampled.nonmale_percentage, Answer_code_of_conduct_easy)|corr(infered.nonmale_percentage, Answer_code_of_conduct_easy)|corr(sampled.nonmale_percentage, Answer_code_of_conduct_exists)|corr(infered.nonmale_percentage, Answer_code_of_conduct_exists)|corr(sampled.nonmale_percentage, Answer_committer_guide_easy)|corr(infered.nonmale_percentage, Answer_committer_guide_easy)|corr(sampled.nonmale_percentage, Answer_committer_guide_exists)|..... | 0.8402836506347078| -0.05088697801152734| 0.004552341326140643| -0.05088697801152734| 0.004552341326140643| -0.30915940064845393| -0.0381086842740672| -0.34084081419416784| -0.03831572641820849| 0.00950903602820991| 0.04837014770606781| 0.0202429856533326| 0.03636869585244893| -0.15392301526227192| -0.05500259776386...| -0.15392301526227192| -0.05500259776386...| 0.18942118337810188| 0.20349367651041672| 0.5441035627011365| 0.35402599653343864| 0.27903907421646745| -0.19842388895891314| 0.018343520672052215| -0.0531287316430999| -0.04570527792465824| -0.11407965948006175| -0.02941906552049...| 0.010923839206653968| -0.19651751264222414| -0.2121016705878764| -0.20639989813410967| -0.21973083941480384| -0.31067113317726425| -0.15172448698670876| -0.31736988968372776| -0.17906926611311288| 0.14828713581114333| -0.28798744559651446| 0.540848408698061| -0.11571044537290899| 0.5044867286902844| -0.44725076538864206| 0.4935819383384438| R. Crap Mariner
  • 20. Slides for Correlations Inferred gender informationSampled gender information Barry Badcock
  • 21. Oh howdy, there’s some differences…. ● Maybe it’s from our data collection methods ● Inferred gender is also known to have issues, especially with non-American names, non-cis folks, etc. ● Inferred sentiment detection maybe not great? ○ I just used nltk vader cause w/e
  • 22. How was the human data collection done? Instructions: Find the gender of the user in question. You can look at the e-mails sent in response to them, but also feel free to search online to find other information about the user (use the project information disambiguate cases of multiple people with the same name). List additional links possibly about the user used (e.g. linkedin, twitter, etc.) Provided with: E-mails in response to user, project name, author name, and github name (All depending on what could be found) DocChewbacca
  • 24. Sentiment of mailing lists J. Triepke
  • 25. And the rest…. Hajime NAKANO
  • 26. What about that inferred data?
  • 27. Stage 3: Solutions to historical challenges Remember the parallels in quotes? Maybe there are parallels in solutions? ● Short answer: hire women ○ In OSS we sometimes pretend we are not paid…. but a lot of us are. ● Longer answer: make training/mentorship programs to promote internal candidates ○ Strangely enough mentoring programs existences was negatively correlated ● Explicit “try-outs” ○ (or ways of hiring people that wasn’t just friends) ● Not depending on randomly finding people Nacho
  • 28. Related work ● https://code.likeagirl.io/gender-bias-in-open-source-d1deda7dec28 ● https://blog.bitergia.com/2016/10/11/gender-diversity-analysis-of-the-linux-ker nel-technical-contributions/ ● https://peerj.com/articles/cs-111/ (PR acceptance rates for women insiders/outsiders) ● Livestreams of the data processing/collection - http://bit.ly/holdenJupyterStreams ○ Did you know it’s perf season at Google? And Google is very metrics driven…. Also my managers name is Steve. Arthur Cruz
  • 29. Special thanks! Ann Spencer Wrangler of cats and unicorns as the Head of Content at Domino Data Lab. Formerly Data Editor at O'Reilly Media (aka Holden's editor). Born and raised in San Francisco. https://blog.dominodatalab.com/
  • 30. Want to participate? ● New forum: https://groups.google.com/forum/#!managemembers/oss-diversity-discussion ● Notebook code at https://github.com/holdenk/diversity-analytics / http://bit.ly/holdendDiversityAnalyticsRepo ● Slides: https://www.slideshare.net/hkarau ● @holdenkarau & @instantmatthew ● And or come say hi to us @ Strata Melissa Wiese
  • 31. High Performance Spark! Unrelated to this talk. I’ll have a book signing @ 3:20pm at the O’Reilly booth. You can also buy it from that scrappy Seattle bookstore, Jeff Bezos needs another newspaper and I want a cup of coffee. http://bit.ly/hkHighPerfSpark