SlideShare a Scribd company logo
1 of 22
Analyzing Rich-Club Behavior
in Open Source Projects
OpenSym 2019, the 15th International Symposium on Open Collaboration
Skövde, Sweden
Mattia Gasparini1, Javier Luis Cànovas Izquierdo2,
Robert Clarisò2, Marco Brambilla1, Jordi Cabot2
Politecnico di Milano1 Universitat Oberta de la Catalunya2
Introduction
• Git and Github data to analyze evolution,
success and management of Open Source
Software.
• Define developers behavioral patterns.
• Discover how collaborations between
developers work.
2
Problem
Statement
ANALYSIS OF
COLLABORATION
NETWORKS
COMMITS, ISSUES AND
PULL REQUESTS AS
SOURCES
DISCOVER PRESENCE OF
SPECIFIC COLLABORATION
STRUCTURES: RICH-CLUBS
3
Rich-club coefficient
• Graph structural property:
It represents the tendency of well-connected nodes (i.e.: hubs) to interact with other well-
connected nodes.
• Formulation:
𝜙 𝑘 =
2𝐸 𝑘
𝑁𝑘(𝑁𝑘 − 1)
𝜌 𝑘 =
𝜙(𝑘)
𝜙 𝑟𝑎𝑛𝑑𝑜𝑚(𝑘)
𝐸 𝑘: number of edges between nodes of degree greater or equal to 𝑘
𝑁𝑘: number of nodes with degree greater or equal to 𝑘
𝜙 𝑘 : rich-club coefficient
𝜌 𝑘 : normalized rich-club coefficient
4
Related Work
• Rich-club phenomenon for a specific project [2],
or for a single FLOSS community [3].
• Study of the presence of a rich-club effect
across the whole GitHub social network [4].
• Analysis on open source communities exploiting
email exchanges among participants [5].
5
[2] Weifeng Pan, Bing Li, Yutao Ma, and Jing Liu. 2011. Multi-granularity evolution analysis of software using complex network theory
[3] Guido Conaldi. 2010. Flat for the few, steep for the many: Structural cohesion and Rich-Club effect as measures of hierarchy and control in FLOSS communities
[4] Antonio Lima, Luca Rossi, and Mirco Musolesi. 2014. Coding Together at Scale: GitHub as a Collaborative Social Network
[5] Sergi Valverde and Ricard V. Solé. 2007. Self-organization versus hierarchy in open-source social networks
Case Study
6
Top-100 starred projects in 2016 on
GitHub
926K commits produced by 50K Git users
1.3M issues-related events generated by
118K GitHub users
280K pullrequest-related events
generated by 20K GitHub users
Analysis Pipeline
7
Data Collection &
Preprocessing
• Git repository cloning for
commits data using Gitana
• Github activities for issues
and PR activities querying
GHArchive
• Duplicity and clashing
problem
8
Graphs Construction
• Definition of 4 undirected graphs:
a. PR graph
b. Commits graph
c. Issues graph
d. Supergraph (a + b + c)
• Nodes: users
• Edges connect a pair of users if
they interacted on the same
element (issue, PR, file)
9
Graphs Example
Materialize PR graph (a) Materialize commits graph (b) Materialize issues graph (c) Materialize supergraph (d)
10
Rich-club Coefficient
Calculation
• Calculation using algorithm
implementation included in
NetworkX6
• Normalized coefficient
𝜌(𝑘): rich-club effect
relevant if 𝜌 𝑘 > 1
• Discard networks for which
randomization fails
11
[6] https://networkx.github.io/documentation/stable/reference/algorithms/rich_club.html
Rich-club Coefficient
Results
• 60 projects have a defined
coefficient for the
supergraph.
• Each graph presents a rich-
club effect, since 𝜌 𝑘 > 1
for some 𝑘
Materialize7:
Rich-Club
Supergraph
Coefficient
Maximum normalized coefficient (k =
49) corresponds to maximum club effect
with nodes of degree at least 49.
13[7] https://materializecss.com
Materialize:
Supergraph
14
Swift8:
Rich-Club
Supergraph
Coefficient
15[8] https://swift.org/
Swift:
Supergraph
16
Rich-club Coefficient Results
17
Maximum coefficient distribution
• Distribution of the maximum
rich-club coefficient for each
type of graph across the studied
projects.
• Mean value around 1 for issues
and commits graphs
coefficients: weak rich-club
presence.
• Mean value around 1.4 for PR
graphs coefficient: strong rich-
club presence.
Further insights
18
Multi-club users
• 25 over 60 projects present a set
of users belonging to multiple rich-
clubs.
• Distribution of multi-club users
across the 25 projects.
• Developers form community with
strong influence in each project
level.
Further insights
19
Conclusions
First systematic evaluation of the rich-club
behaviour on open source projects:
• 60% of projects shows rich-clubs in the
supergraph, mostly with a slight effect.
• Rich-club behavior could undermine the open
paradigma, but phenomeon requires further
analysis.
• Strong rich-club presence in PR graphs may
reside to criticality of the activity.
• 25 over 60 projects have users belonging to
multiple rich-clubs.
20
Future Work
Weighted rich-club
coefficient
Rich-club effect at module
and ecosystem level
Time dimension to
highlight temporal clubs
21
Questions?

More Related Content

What's hot

What's hot (9)

Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHIBig Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
 
Data mining based social network
Data mining based social networkData mining based social network
Data mining based social network
 
Social media community using optimized algorithm by M. Gomathi / Lecturer
Social media community using optimized algorithm by M. Gomathi / LecturerSocial media community using optimized algorithm by M. Gomathi / Lecturer
Social media community using optimized algorithm by M. Gomathi / Lecturer
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computing
 
Identifying news clusters using Q-analysis and Modularity
Identifying news clusters using Q-analysis and ModularityIdentifying news clusters using Q-analysis and Modularity
Identifying news clusters using Q-analysis and Modularity
 
Big Data Analysis- Live DATA PRESENTATION- Bitcoin Alpha trust network
Big Data Analysis- Live DATA PRESENTATION- Bitcoin Alpha trust networkBig Data Analysis- Live DATA PRESENTATION- Bitcoin Alpha trust network
Big Data Analysis- Live DATA PRESENTATION- Bitcoin Alpha trust network
 
Data mining for social media
Data mining for social mediaData mining for social media
Data mining for social media
 
From Argument Mapping to Argument Mining, and Back
From Argument Mapping to Argument Mining, and BackFrom Argument Mapping to Argument Mining, and Back
From Argument Mapping to Argument Mining, and Back
 
Navigating large graphs like a breeze with Linkurious
Navigating large graphs like a breeze with LinkuriousNavigating large graphs like a breeze with Linkurious
Navigating large graphs like a breeze with Linkurious
 

Similar to Analyzing rich club behavior in open source projects

Operationalisation of Collaboration Sunbelt 2015
Operationalisation of Collaboration Sunbelt 2015Operationalisation of Collaboration Sunbelt 2015
Operationalisation of Collaboration Sunbelt 2015
Dawn Foster
 
Birds of a Feather Flock Together? A Study of Developers’ Flocking and Migrat...
Birds of a Feather Flock Together? A Study of Developers’ Flocking and Migrat...Birds of a Feather Flock Together? A Study of Developers’ Flocking and Migrat...
Birds of a Feather Flock Together? A Study of Developers’ Flocking and Migrat...
IJCSIS Research Publications
 
PEARC17: The Community Software Repository from XSEDE: A Resource for the Nat...
PEARC17: The Community Software Repository from XSEDE: A Resource for the Nat...PEARC17: The Community Software Repository from XSEDE: A Resource for the Nat...
PEARC17: The Community Software Repository from XSEDE: A Resource for the Nat...
John-Paul Navarro
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
Ioan Toma
 

Similar to Analyzing rich club behavior in open source projects (20)

Operationalisation of Collaboration Sunbelt 2015
Operationalisation of Collaboration Sunbelt 2015Operationalisation of Collaboration Sunbelt 2015
Operationalisation of Collaboration Sunbelt 2015
 
Final Algos
Final AlgosFinal Algos
Final Algos
 
Network Relationships and Job Changes of Software Developers at Sunbelt 2016
Network Relationships and Job Changes of Software Developers at Sunbelt 2016Network Relationships and Job Changes of Software Developers at Sunbelt 2016
Network Relationships and Job Changes of Software Developers at Sunbelt 2016
 
Birds of a Feather Flock Together? A Study of Developers’ Flocking and Migrat...
Birds of a Feather Flock Together? A Study of Developers’ Flocking and Migrat...Birds of a Feather Flock Together? A Study of Developers’ Flocking and Migrat...
Birds of a Feather Flock Together? A Study of Developers’ Flocking and Migrat...
 
Leveraging the Crowd: Supporting Newcomers to Build an OSS Community
Leveraging the Crowd: Supporting Newcomers to Build an OSS CommunityLeveraging the Crowd: Supporting Newcomers to Build an OSS Community
Leveraging the Crowd: Supporting Newcomers to Build an OSS Community
 
Decentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic WebDecentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic Web
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigm
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
 
SocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfSocialCom09-tutorial.pdf
SocialCom09-tutorial.pdf
 
A data-driven approach for understanding Open Design @ Design For Next
A data-driven approach for understanding Open Design @ Design For NextA data-driven approach for understanding Open Design @ Design For Next
A data-driven approach for understanding Open Design @ Design For Next
 
DE gitConnect
DE gitConnectDE gitConnect
DE gitConnect
 
CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19
 
Experiences in the Design and Implementation of a Social Cloud for Volunteer ...
Experiences in the Design and Implementation of a Social Cloud for Volunteer ...Experiences in the Design and Implementation of a Social Cloud for Volunteer ...
Experiences in the Design and Implementation of a Social Cloud for Volunteer ...
 
Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018
 
Conor Hayes - Topics, tags and trends in the blogosphere
Conor Hayes - Topics, tags and trends in the blogosphereConor Hayes - Topics, tags and trends in the blogosphere
Conor Hayes - Topics, tags and trends in the blogosphere
 
PEARC17: The Community Software Repository from XSEDE: A Resource for the Nat...
PEARC17: The Community Software Repository from XSEDE: A Resource for the Nat...PEARC17: The Community Software Repository from XSEDE: A Resource for the Nat...
PEARC17: The Community Software Repository from XSEDE: A Resource for the Nat...
 
Participation Inequality and the 90-9-1 Principle in Open Source [OpenSym'2020]
Participation Inequality and the 90-9-1 Principle in Open Source [OpenSym'2020]Participation Inequality and the 90-9-1 Principle in Open Source [OpenSym'2020]
Participation Inequality and the 90-9-1 Principle in Open Source [OpenSym'2020]
 
IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
IronHacks Live: Info session #3 - COVID-19 Data Science ChallengeIronHacks Live: Info session #3 - COVID-19 Data Science Challenge
IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
 
GraphChain
GraphChainGraphChain
GraphChain
 

More from Marco Brambilla

Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Hierarchical Transformers for User Semantic Similarity - ICWE 2023Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Marco Brambilla
 
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Exploring the Bi-verse.A trip across the digital and physical ecospheresExploring the Bi-verse.A trip across the digital and physical ecospheres
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Marco Brambilla
 
Data Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extraction
Marco Brambilla
 
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Marco Brambilla
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...
Marco Brambilla
 
Web Science. An introduction
Web Science. An introductionWeb Science. An introduction
Web Science. An introduction
Marco Brambilla
 
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
Marco Brambilla
 
Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...
Marco Brambilla
 
Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...
Marco Brambilla
 

More from Marco Brambilla (20)

M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
 
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
 
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Hierarchical Transformers for User Semantic Similarity - ICWE 2023Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
 
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Exploring the Bi-verse.A trip across the digital and physical ecospheresExploring the Bi-verse.A trip across the digital and physical ecospheres
Exploring the Bi-verse. A trip across the digital and physical ecospheres
 
Conversation graphs in Online Social Media
Conversation graphs in Online Social MediaConversation graphs in Online Social Media
Conversation graphs in Online Social Media
 
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
 
Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals
 
Data Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extraction
 
Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018
 
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...
 
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
 
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
 
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
 
Big Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoBig Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di Milano
 
Web Science. An introduction
Web Science. An introductionWeb Science. An introduction
Web Science. An introduction
 
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
 
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
 
Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...
 
Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...
 

Recently uploaded

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Recently uploaded (20)

8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 

Analyzing rich club behavior in open source projects

  • 1. Analyzing Rich-Club Behavior in Open Source Projects OpenSym 2019, the 15th International Symposium on Open Collaboration Skövde, Sweden Mattia Gasparini1, Javier Luis Cànovas Izquierdo2, Robert Clarisò2, Marco Brambilla1, Jordi Cabot2 Politecnico di Milano1 Universitat Oberta de la Catalunya2
  • 2. Introduction • Git and Github data to analyze evolution, success and management of Open Source Software. • Define developers behavioral patterns. • Discover how collaborations between developers work. 2
  • 3. Problem Statement ANALYSIS OF COLLABORATION NETWORKS COMMITS, ISSUES AND PULL REQUESTS AS SOURCES DISCOVER PRESENCE OF SPECIFIC COLLABORATION STRUCTURES: RICH-CLUBS 3
  • 4. Rich-club coefficient • Graph structural property: It represents the tendency of well-connected nodes (i.e.: hubs) to interact with other well- connected nodes. • Formulation: 𝜙 𝑘 = 2𝐸 𝑘 𝑁𝑘(𝑁𝑘 − 1) 𝜌 𝑘 = 𝜙(𝑘) 𝜙 𝑟𝑎𝑛𝑑𝑜𝑚(𝑘) 𝐸 𝑘: number of edges between nodes of degree greater or equal to 𝑘 𝑁𝑘: number of nodes with degree greater or equal to 𝑘 𝜙 𝑘 : rich-club coefficient 𝜌 𝑘 : normalized rich-club coefficient 4
  • 5. Related Work • Rich-club phenomenon for a specific project [2], or for a single FLOSS community [3]. • Study of the presence of a rich-club effect across the whole GitHub social network [4]. • Analysis on open source communities exploiting email exchanges among participants [5]. 5 [2] Weifeng Pan, Bing Li, Yutao Ma, and Jing Liu. 2011. Multi-granularity evolution analysis of software using complex network theory [3] Guido Conaldi. 2010. Flat for the few, steep for the many: Structural cohesion and Rich-Club effect as measures of hierarchy and control in FLOSS communities [4] Antonio Lima, Luca Rossi, and Mirco Musolesi. 2014. Coding Together at Scale: GitHub as a Collaborative Social Network [5] Sergi Valverde and Ricard V. Solé. 2007. Self-organization versus hierarchy in open-source social networks
  • 6. Case Study 6 Top-100 starred projects in 2016 on GitHub 926K commits produced by 50K Git users 1.3M issues-related events generated by 118K GitHub users 280K pullrequest-related events generated by 20K GitHub users
  • 8. Data Collection & Preprocessing • Git repository cloning for commits data using Gitana • Github activities for issues and PR activities querying GHArchive • Duplicity and clashing problem 8
  • 9. Graphs Construction • Definition of 4 undirected graphs: a. PR graph b. Commits graph c. Issues graph d. Supergraph (a + b + c) • Nodes: users • Edges connect a pair of users if they interacted on the same element (issue, PR, file) 9
  • 10. Graphs Example Materialize PR graph (a) Materialize commits graph (b) Materialize issues graph (c) Materialize supergraph (d) 10
  • 11. Rich-club Coefficient Calculation • Calculation using algorithm implementation included in NetworkX6 • Normalized coefficient 𝜌(𝑘): rich-club effect relevant if 𝜌 𝑘 > 1 • Discard networks for which randomization fails 11 [6] https://networkx.github.io/documentation/stable/reference/algorithms/rich_club.html
  • 12. Rich-club Coefficient Results • 60 projects have a defined coefficient for the supergraph. • Each graph presents a rich- club effect, since 𝜌 𝑘 > 1 for some 𝑘
  • 13. Materialize7: Rich-Club Supergraph Coefficient Maximum normalized coefficient (k = 49) corresponds to maximum club effect with nodes of degree at least 49. 13[7] https://materializecss.com
  • 18. Maximum coefficient distribution • Distribution of the maximum rich-club coefficient for each type of graph across the studied projects. • Mean value around 1 for issues and commits graphs coefficients: weak rich-club presence. • Mean value around 1.4 for PR graphs coefficient: strong rich- club presence. Further insights 18
  • 19. Multi-club users • 25 over 60 projects present a set of users belonging to multiple rich- clubs. • Distribution of multi-club users across the 25 projects. • Developers form community with strong influence in each project level. Further insights 19
  • 20. Conclusions First systematic evaluation of the rich-club behaviour on open source projects: • 60% of projects shows rich-clubs in the supergraph, mostly with a slight effect. • Rich-club behavior could undermine the open paradigma, but phenomeon requires further analysis. • Strong rich-club presence in PR graphs may reside to criticality of the activity. • 25 over 60 projects have users belonging to multiple rich-clubs. 20
  • 21. Future Work Weighted rich-club coefficient Rich-club effect at module and ecosystem level Time dimension to highlight temporal clubs 21

Editor's Notes

  1. GitHub is the most popular service to develop and maintain open source software. Each user interacts with many other users in the project development process (commits, issues, pr), defining collaboration networks. Studying collaboration networks helps in discovering properties and behaviors that influence development, management and success of an OSS project.
  2. Developers collaborate mostly with the same fixed subset of other important colleagues, instead of spreading the cooperation to each component of the team.
  3. Formally, it cab be measured by the so called rich-club coefficient ϕ(k). Intuitively, ϕ(k) measures how far the set of nodes with degree k is from being a complete subgraph. The value of ϕ(k) ranges from 0 (all nodes are disconnected) to 1 (a clique), with higher values showing a stronger rich-club behavior in the network. It is monotonically increasing even for random networks, so a normalized coefficient has been introduced in literature: ϕ(k) is divided by the coefficient calculated for a random network with same degree distribution of the original one.
  4. Presence or absence of a rich-clubs in open source projects has not been studied in a systematic way and has not been applied to a large dataset as the one that GitHub can now provide.
  5. Clashing: same name of different users Duplicity: different names for the same users Solution: use SHA value to associate git commits to GitHub users (if still present)
  6. Two users are connected in the PR graph if they commented/interacted on the same PR…
  7. Calculaton of rich-club coefficient is run for each project’s supergraph to have a global view of the effect. Maximum value for each project is shown: each of the 60 graphs presents a rich club behavior, even if most of them have values only slightly higher than 1. For this reason, we want to better understand the correspondence between the coefficient and the actual graphs.
  8. The first example that we take is the materialize repositorty: rich-club coefficient with respect to node degree is presented. It is possible to notice a rich-club behavior for a range of degrees, with a peak on k=49, which should correspond to groups of nodes with degree at least 49 connected to each other.
  9. This seems to go against the open source paradigma: project “owned” by few users. Established in 2014 by a team of 4 developers, with 3,853 commits and 252 contributors. Nevertheless, the project only has two top contributors (more than 1,000 commits), which belong to the original team, and no frequent contributors
  10. Mixed behavior presence: slightly over than 1, then dramatically lower. The overall intuition is that the graph does not present rich-clubs
  11. It was publicly announced by Apple in 2014 and was later open sourced in December 2015. Currently, the project has more than 84k commits and 674 contributors, with 14 top contributors (more than 1.000 commits) and 44 frequent contributors (between 100 and 1.000 commits). Remarkably, 4 of the top contributors and 21 of the frequent contributors do not belong to Apple according to their GitHub profile. This is a sign that the project has successfully attracted and retained external talent.
  12. In this table, the 10 projects with highest coefficient for the supergraph are presented. Along with them, the coefficient for the other kind of graphs is calculated when possible. Infact,also these other graphs can «hide» other clubs structures.
  13. Maximum coefficient distribution for each kind of graph as a further insight. Blue line is the one already discussed.Green and orange line show commits and issues maximum coefficient distribution: density has a peak on 1 meaning that most of the graphs do not present strong rich-clubs. Red line has its peak around 1.4: most of the projects present evident rich-club structures. This behavior could be related to the fact that PR is the most critical level in open-source software development and few trusthworty developers are in charge of most of the tasks.
  14. We focused also the attention on the users: almost 50% of the projects, have users tha belongs to multiple clubs. The distribution presents the number of users shared across all the projects’ clubs: this means that, on average, 7 developers are in the PR club, as well as in the commits and issues club. These developers form a sub-community inside the project that has strong influence in all the project’s levels.
  15. As rich-club phenomenon is quite complex and also its application on OSS communities relatively new, plenty of further works can be done. First of all, we want to apply weighted coefficient version to check if other patterns arise. We want to extend the analysis at the module and the ecosystem level. And third, we want to introduce time variable: in this work the graphs are built using the entire data as a 1-year snapshot, but it is possible to build monthly graphs and check if temporal clubs show up.
  16. With this, I have concluded the presentation. Thank you for the attention.