These are the slides of my ICSME 2016 keynote, presented on 5 October 2016 in Raleigh, North Carolina. I focus on the difficulties of maintaining and evolving software ecosystems, large collections of interacting software components that are maintained by a large and active community of contributors and that evolve together in the same environment. Software ecosystems are becoming ubiquitous due to the omnipresence of open source software. I present several problems that arise during maintenance and evolution of software ecosystems, and I argue how some of these challenges should be addressed by adopting a socio-technical view and by relying on a multidisciplinary and mixed methods research approach. I illustrate this with examples of social network analysis, complex systems research, ecological biodiversity, and survival analysis.
7. Research Context
2012-2017 ongoing research project
“Ecological Studies of Open Source Software Ecosystems”
- Interdisciplinary research
- Use ideas from biological ecology to understand and
improve evolution of software ecosystems
A software ecosystem is a collection
of software projects that are
developed and evolve together in
the same environment.
Mircea Lungu
(PhD, 2008)
11. CRAN
• Increasing number of R packages hosted on GitHub
“non-transparent nature of the CRAN submission / rejection
process”
“CRAN […] is revealing some limitations of the current design. One
such problem is the general lack of dependency versioning in the
infrastructure.”
• Problems with breaking dependencies
“It is more and more of a pain if the package I’m depending on
breaks”
“One recent example was the forced roll-back of the ggplot2
update to version 0.9.0, because the introduced changes caused
several other packages to break.”
Decan et al. “When GitHub Meets CRAN: An Analysis of Inter-Repository Package
Dependency Problems.” SANER 2016
15. • Package leftpad
function leftpad (str, len, ch) {
str = String(str);
var i = -1;
if (!ch && ch !== 0) ch = ' ';
len = len - str.length;
while (++i < len) { str = ch + str; }
return str;
}
• What happened?
– Its developer unpublished all his modules from npm
“This impacted many thousands of projects. [...] We
began observing hundreds of failures per minute, as
dependent projects – and their dependents, and their
dependents... – all failed when requesting the now-
unpublished package.”
http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm
Example: leftpad
16. Departure of a
central contributor
• All bug handling became concentrated in 1 contributor
• Contributor suddenly left project, being dissatisfied
• Lasting negative impact on bug handling performance
Zanetti et al. “The rise and fall of a central contributor: Dynamics of social
organization and performance in the Gentoo community.” CHASE 2013
17. 17
Strict policy and tools for ensuring
backward compatibility
• “Prime Directive: When evolving the Component API
from release to release, do not break existing clients”
Bogart et al. “How to break an API: Cost negotiation and community values in
three software ecosystems.” FSE 2016
18. 18
May lead to stagnation
and drive away developers
– Coordination around synchronized yearly releases
slows down development
“If you have hip things, then you get people who create new
APIs on top of that […] These things don’t happen on the
Eclipse platform anymore.”
“you have to be very patient and know who to talk with […] in
order to get your patches accepted, and I think it’s very
intimidating for some new people to come on.”
Bogart et al. “How to break an API: Cost negotiation and community values in
three software ecosystems.” FSE 2016
19. Socio-Technical View
20
• Software ecosystems suffer
from problems because of
technical factors, social
reasons, or both.
• A socio-technical view
is therefore essential for
software ecosystem evolution
research.
20. Socio-Technical View
• Socio-technical analyses can benefit from
mixed method research
– Combine quantitative and qualitative methods
into a single study
• Empirical analysis of objective data
• user surveys and interviews
– Exploiting their complementarity increases
confidence of the findings
Johnson et al. Mixed methods research: A research paradigm whose time has
come. Educational Researcher 33(7): 14–26, 2004
21. Software Ecosystem (SECO)
Research Challenges
Understanding SECOs
• How are SECOs structured?
• What are their tools, habits, values, boundaries?
• How do they emerge and evolve over time?
• What are the mechanisms driving their dynamics?
• How do different SECOs compare?
• How to face technical challenges?
Serebrenik et al. “Challenges in Software Ecosystems Research.” IWSECO-WEA 2015
22. Software Ecosystem
Research Challenges
Supporting SECO communities
• How can they be made more sustainable and
resilient?
• How can we predict their evolution?
• How can we improve the SECO?
– In terms of productivity, quality, diversity,
maintainability, survival, popularity, …
Serebrenik et al. “Challenges in Software Ecosystems Research.” IWSECO-WEA 2015
23. Supporting SECOs
Increasing resilience & sustainability
24
Can the SECO
• resist to major disturbances?
• return to a stable equilibrium after a major
disturbance?
Possible approach:
• Estimate, predict and reduce risk of bus factor
24. Bus factor
Social view
Specific activity concentrated in few persons.
Examples:
– Single responsible for bug handling in Gentoo
– Only one developer knows some part of the code
25. Bus factor
Technical view
Too much software components depend on a
single software component.
– Makes components more brittle to future changes
– npm leftpad example
26. Bus factor
Active area of research
At least 4 GitHub projects compute (social) bus
factor.
Cosentino et al. “Assessing the bus factor of Git repositories.”
SANER 2015
Avelino et al. “A novel approach for estimating truck factors.”
ICPC 2016
29. Supporting SECOs
Improving quality
By increasing technical wealth
through reducing technical debt
“a concept in programming that reflects the extra
development work that arises when code that is
easy to implement in the short run is used instead
of applying the best overall solution”
(Ward Cunningham, 1992)
http://legacycoderocks.libsyn.com/technical-wealth-with-declan-wheelan
32. Supporting SECOs
Improving quality
Reducing social debt by removing community smells
– Organisational silo
• High decoupling and lack of communication between tasks
– Black cloud
• lack of people able to bridge the knowledge and experience gap
between distinct communities
– Prima-donnas
• Seemingly condescending and egotistical behaviour, irreceptiveness to
collaboration
– Sharing villainy
• Lack of knowledge exchange incentives
– Organisational skirmish
• Misalignment of organisational cultures between distinct communities
– …
33. Interdisciplinary research
“Many challenges we face are not solvable by people
remaining in their single discipline silos”…
www.newscientist.com/article/mg20928002-100-open-your-mind-to-interdisciplinary-research/
36. Social Network Analysis
Social network centrality measures
Degree
Number of in- or outgoing dependencies of a node.
Betweenness
Quantifies number of times a node acts as a bridge along the
shortest path between two other nodes.
Closeness
The more central a node, the lower its total distance from all
other nodes.
Eigenvector centrality and PageRank
Measures the influence of a node in a network.
38. Social Network Analysis
Can be used to
– detect social debt
– identify social bus factor
– predict software failures
– … and many more …
39. Social Network Analysis
Social bus factor in Gentoo Linux
– All bug handling became concentrated in one contributor
– Measured by significant increase of centralization and
performance.
Zanetti et al. “The rise and fall of a central contributor: Dynamics of social
organization and performance in the Gentoo community.” CHASE 2013
40. Social Network Analysis
Social bus factor in Gentoo Linux
– Contributor suddenly left the project, being
dissatisfied
– Sentiment analysis showed correlation with negative
emotions
– Lasting negative impact on the bug handling
performance of the community.
Zanetti et al. “The rise and fall of a central contributor: Dynamics of social
organization and performance in the Gentoo community.” CHASE 2013
41. Use of SNA to better predict software failures
– By combining program dependency information
with social network information
Social Network Analysis
Bird et al. “Putting it All Together: Using Socio-Technical Networks
to Predict Failures.” ISSRE 2009
Pinzger et al. “Can developer-module networks predict failures?”
FSE 2008
42. Mirroring hypothesis
Conway’s law
Software structure tends to mirror the
organisational/social structure
A.k.a. socio-technical congruence
alignment between technical dependencies and
social coordination in a project
43. Mirroring hypothesis
Conway’s law
• Evidence in favor: commercial “in-house” development
• Evidence against: “community-based” development
More modular software
=> emergent “complex network” structure?
MacCormack et al. “Exploring the duality between product and
organizational architectures: A test of the mirroring hypothesis.” Research
Policy, 2012.
Colfer et al. “The mirroring hypothesis: Theory, evidence and
exceptions.” Harvard Business School, 2010.
46. Interdisciplinary research
Complex Systems
“A new approach to science that investigates how
relationships between parts give rise to the
collective behaviors of a system and how the
system interacts and forms relationships with its
environment.”
Emergence: process whereby larger entities,
patterns, and regularities arise through interactions
among smaller or simpler entities that themselves
do not exhibit such properties.
47. Complexity Theory
Network Theory
Citation from Mitchell’s book:
“network thinking is providing novel ways to think
about difficult problems such as how to do efficient
search on the Web, […] how to manage large
organisations, how to preserve ecosystems, […]
and, more generally, what kind of resilience and
vulnerabilities are intrinsic to natural, social, and
technological networks, and how to exploit and
protect such systems.”
48. Complexity Theory
Network Theory
Some characteristics of complex networks:
Small-world property
• Low average path length between any two nodes
• Highly-clustered components linked through hubs
Skewed distributions (power law behaviour)
• Few nodes with very high in-degree (resp. out-degree),
many nodes with very small in-degree (resp. out)
49. Complexity Theory
Network Theory
Some characteristics of complex networks:
Scale-freeness
• Observed degree distribution is very similar
regardless of the scale of the observation
Scale-free networks are resilient
• Robust to deletion of random (non-hub) nodes
• vulnerable to the deletion of hubs
50. Complexity Theory
Network Theory
Examples of complex networks exhibiting these
characteristics
– World-Wide Web
– (Technical) software dependency graphs
– Social networks (e.g. Facebook)
– (Socio-technical) software ecosystems
52. Network Theory
Possible applications for SECOs
• Provide prediction/forecasting models
– of how SECOs emerge
– of how SECOs grow/evolve
• Estimate the resilience and sustainability of
SECOs after major disturbances
• Assess risk of deleting hub nodes bus factor!
53. Network Theory
Possible applications for SECOs
How do SECOs emerge and grow?
A popular model is preferential attachment
Over time, nodes with higher degree receive more links
than nodes with lower degree.
Extensions of this model have been proposed to
simulate the growth of complex software systems
By mimicking the principle of coupling & cohesion
Barabasi et al. Emergence of Scaling in Random Networks.
Science 286, 1999
Li et al. Multi-Level Formation of Complex Software Systems.
Entropy 18(178), 2016
56. Ecology and natural ecosystems
Biodiversity of species
E.g. hosts – parasites / plants – pollinators
58
Mutual dependency and
functional redundancy
Disappearing species may be
compensated by others if there is
sufficient diversity in both layers.
57. Ecology and natural ecosystems
Diversity metrics
• species richness = number of different species in the ecosystem
• species evenness (entropy) = relative abundance of the
population of each species in the ecosystem
• Shannon diversity index (relative entropy) = specialisation of a
given species in relation to the species in the other level
• Simpson index = degree of concentration when individuals are
classified into species
5
58. Software Ecosystems
Diversity in software ecosystems
62
Mutual dependency and
functional redundancy
Disappearance of projects or
contributors may be
compensated by others.
59. Software Ecosystems
Diversity
Are software project teams diverse?
– In terms of code ownership, types of activity,
gender balance, seniority, …
How does this diversity affect …
– defect-proneness?
– productivity?
– …
60. Software Ecosystems
Diversity
Success story of diversity measures:
Assess defect-proneness in software projects
• More focused developers introduce fewer defects.
• Modules receiving narrowly focused activity
are more likely to contain defects.
Posnett et al. Dual Ecological Measures of Focus in Software development.
ICSE 2013
61. Software Ecosystems
Gender Diversity
Effect of gender diversity on productivity?
Women underrepresented in programming
– industry: 16-18% female developers
– open source: ~10%
– social coding platforms:
• GitHub: ~9%
• StackOverflow: ~7%
Vasilescu et al. Gender and tenure diversity in GitHub teams. CHI 2015
A Data Set for Social Diversity Studies of GitHub Teams (MSR’15)
62. Software Ecosystems
Gender Diversity
Success story of diversity measures:
– Gender and tenure diversity are positive and
significant predictors of productivity
– Teams that are more balanced in terms of gender
and seniority have higher productivity rates
Vasilescu et al. Gender and tenure diversity in GitHub teams. CHI 2015
63. Interdisciplinary research
Survival Analysis
Statistical technique used in many disciplines to
analyze the time until the occurrence of an
event of interest
• Medicine
– Effect of treatment or medicine to cure disease
– Effect of disease on patient mortality
• Sociology
– Factors influencing marriage or divorce
65. Interdisciplinary research
Survival Analysis
Success story:
OSS project survival
Factors positively
influencing survival:
#contributors
Project age
Basis for prediction
models
Samoladas et al. Survival analysis on the duration of open source projects.
IST 2010
66. SECO Research Challenges
continued…
Understanding SECOs
• How do different SECOs compare?
• How to face technical challenges?
– Big data
– Privacy versus reproducibility
Serebrenik et al. “Challenges in Software Ecosystems Research.” IWSECO-WEA 2015
67. Research Challenge
Comparing SECOs
• Each software ecosystem
– has specific habits, expectations, change policies
– uses specific tools
• Taking into account these differences is
important
– to support SECO maintenance and evolution
– to generalise research findings across SECOs
Bogart et al. “How to break an API: Cost negotiation and community values in
three software ecosystems.” FSE 2016
Decan et al. “On the topology of package dependency networks – A
comparison of three programming language ecosystems.” WEA 2016
70. Research Challenge
Privacy vs reproducibility
How to preserve privacy of individuals?
– EU 2016/679 regulation on the protection of natural
persons with regard to the processing of personal data
and on the free movement of such data
“The principles of data protection should apply to any information
concerning an identified or identifiable natural person. “
– Appropriate anonimisation and privacy-preserving data
mining techniques needed
Fung et al. Privacy-preserving data publishing: A survey of recent
developments. ACM Computing Surveys 2010
Malik et al. Privacy preserving data mining techniques: Current scenario and
future prospects. IC3T 2012
71. Research Challenge
Privacy vs reproducibility
• Increase/ensure reproducible research results
– Awareness is increasing
– Solutions are being put into place
– Big data problems remain an issue
• How to reconcile privacy with reproducibility?
Gonzalez-Barahona et al. On the reproducibility of empirical software
engineering studies based on data retrieved from development repositories.
Emp. Softw. Eng. 2012
72. Wrap-up
Research on SECO evolution requires
– A socio-technical view
– Mixed method research
– Interdisciplinary research
Many technical challenges need to be faced
Are you willing to take up the challenge?
Editor's Notes
Put a picture of Belgium (comparing its size with the rest of Europe or the rest of the world), maybe with some nice picture of the important characteristics of Belgium (beer, frieten, mosselen, wafels, chocolade; kuifje, Magritte, …)
Locate Gent (place where I live), Aalst (where I was born) Brussels (where I studied), Mons and Charleroi (where I work) on this map and indicate the period (Brussels:1989-1993 studies; 1993-1999 PhD; 2000-2003 postdoc; 2003-2016 prof at UMONS
Put a picture of Belgium (comparing its size with the rest of Europe or the rest of the world), maybe with some nice picture of the important characteristics of Belgium (beer, frieten, mosselen, wafels, chocolade; kuifje, Magritte, …)
Locate Brussels and Mons on this map and indicate the period (Brussels:1989-1993 studies; 1993-1999 PhD; 2000-2003 postdoc; 2003-2016 prof at UMONS
Put a timeline of my life indicating the main milestones and compare them with important milestones in CS and SE:
1970 birth (mention twin brother)
1988 studies at VUB
1993 PhD studies at VUB
1999 PhD obtained -postdoc started
2003 position at UMONs
2016 now
Talk about main research achievements/topics studied during my career:
1994 – 2004 foundations of OO programming, OO design patterns, refactoring
1998 – now : model-driven software engineering: software modeling (UML), graph transformation, model transformation, model refactoring, model-inconsistency management
2010- 2016 software ecosystems, empirical software engineering
Add a slide with all my research collaborators over time
- people that I have (co-)directed their PhD
Ragnhild Van Der Straeten (2005), Werner Van Belle (2003) Tom Tourwé (2002), Mathieu Goeminne (2013), Jorge Pinna Puissant (2012), Romuald Deshayes (2015), Maelick Claes (2016)
People I have collaborated with
Alexander Serebrenik (2014), bogdan Vasilescu (2014), Serge Demeyer (2002, 2005, 2014), Ekatarina Pek (2014), Hans Vandierendonck (2011-2012), Anthony Cleve (2010, 2014), Xavier Blanc (2009), Gabriele Taentzer (2005,2007), Amnon Eden (2005, 2006), Pieter Van Gorp (2003, 2006), Dirk Jassens (2002, 2003)
All of these ecosystems are quite large, containing (tens of) thousands of different software components, with many interdependencies, an evolution history of many years, a large and active community of contributors.
Studying such software ecosystems can be quite challenging
Developing and maintaining components within these ecosystems can also be quite challenging.
CRAN only supports sequential version numbering, causing some developers to fork their own packages (e.g., ‘reshape’ to ‘reshape2’).
For JavaScript, we chose its NPM package manager (see www.npmjs.com).
isarray is downloaded >6M times a week, >25M times a month!
“In a lot of JavaScript environments, space is at a premium. [...] Several larger libraries like Underscore (and Lodash) have actually intentionally split themselves into sub-modules because people usually only ever load them to use a single merge function.”
“The package leftpad essentially contains a few lines of source code but has thousands of dependent projects, including Node and Babel.
When its developer decided to unpublish all his modules for npm, this had important consequences, “almost breaking the internet “
What happened?
- Everything started with the disagreement over a module name “kik”
Its developer unpublished *all* his 272 modules from npm, including leftpad
This caused thousands of dependent projects to break, including Node and Babel
The community stepped in within minutes to fix the problem.
Required NPM managers to go against their own policy by un-unpublishing the module
Gentoo, one of the open source Linux distributions, is another example of a popular ecosystem that has witnessed important problems during its evolution history. Zanetti et al. studied the social organisation structure, by analyzing the collaboration structure and dynamics of Gentoo’s bug tracking system over a period of ten years [16]. An increasing centralisation towards a single central contributor, followed by an unexpected departure of this contributor, caused a major disruption in the community’s bug handling performance. This case study reveals that, next to analyzing the technical aspects of an ecosystem (such as its package dependencies), it is equally important to address the social aspects.
M. S. Zanetti, I. Scholtes, C. J. Tessone, and F. Schweitzer, “The rise and fall of a central contributor: Dynamics of social organization and performance in the Gentoo community,” in Int’l Workshop on Cooperative and Human Aspects of Software Engineering, May 2013, pp. 49–56.
D. Garcia et al. The Role of Emotions in Contributors Activity: A Case Study of the Gentoo Community (SocialCom 2013)
C.Bogart,C.Ka ̈stner,J.Herbsleb,andF.Thung,“How to break an API: Cost negotiation and community values in three software ecosystems,” in Int’l Symp. Foundations of Software Engineering, 2016.
C.Bogart,C.Ka ̈stner,J.Herbsleb,andF.Thung,“How to break an API: Cost negotiation and community values in three software ecosystems,” in Int’l Symp. Foundations of Software Engineering, 2016.
ADD INFO ABOUT PYPI SIZE
Écosystèmes logiciels peu étudiés en tant que tels
Aspects sociaux de ces écosystèmes très peu étudiés
Pour comprendre le comportement d’un écosystème, il faut étudier les comportements sociaux à l’origine de son évolution.
The picture only shows the relation between contributors and projects, but obviously there are also communication relations direclty between the contributors, and dependency relations between the projects.
Mixed methods research is defined as “the class of research where the researcher combines quantitative and qualitative research methods or techniques into a single study”
Mechanisms driving the dynamics: Which mechanisms are favorable for their quality/evolution/popularity/survival?
How do SECOs compare? How can one generalise findings of one SECO to other SECOs?
Which aspects of a SECO are (domain-)specific and which are generic?
- Technical challenges: will be explained later on, if enough time available.
How can we better predict software failures?
How can we reduce the number of bugs?
Need for tool support… (prediction models, dashboards, …)
The Bus factor is the number of key contributors who would need to be incapacitated “get run over by a bus” to make a project unable to proceed.
Experimental support on GitHub
https://libraries.io/bus-factor
- While technical debt has been studied for software systems, it of course makes sense to extend it to software ECOSYSTEMS as well.
Talk about “bad smells” as possible indicators of technical debt
Quote by Ward Cunningham in 1992: “Shipping first-time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite. Objects make the cost of this transaction tolerable. The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise." The concept does not mean that debt should never be incurred. Just as leverage can help a company when used correctly, a quick solution can mean a faster time to market in software development. In addition, technical debt is not just poor code. Bad code is bad code, and technical debt can result from the work of good programmers under unrealistic project constraints.”
Use the “community smell” of “organisational silo” as a transition to the next slide, to explain that members of the “research community” should not stay within their own silo either (their own specific research discipline), but should communicate and colloborate with (and learn from) researchers from other disciplines.
Challenge: More Interdisciplinary research
Talk about borrowing ideas from other disciplines
Examples:
(analogy with research inspired from social network science that has managed to provide interesting new results in …)
draw inspiration from biology => diversity metrics
draw inspiration from medicine => survival analysis studies
Challenge: More Interdisciplinary research
Talk about borrowing ideas from other disciplines
Examples:
(analogy with research inspired from social network science that has managed to provide interesting new results in …)
draw inspiration from biology => diversity metrics
draw inspiration from medicine => survival analysis studies
Talk about borrowing ideas from other disciplines
Examples:
social network analysis =>
Study by Pinzger et al. “Can developer-module networks predict failures?” => study on Windows Vista; using network centrality measures
Study by Bird et al. 117 citations in Google Scholar (Int. Symp. Software Reliability Engineering): method performs as well or better and is able to more accurately identify those software components that have more post-release failures, with precision and recall rates as high as 85%.
Using so-called “network centrality measures” like betweenness centrality, closness centrality, eigenvector centrality, degree centrality
Preliminary study on Windows Vista and Eclipse
Talk about borrowing ideas from other disciplines
Examples:
social network analysis =>
Study by Pinzger et al. “Can developer-module networks predict failures?” => study on Windows Vista; using network centrality measures
Study by Bird et al. 117 citations in Google Scholar (Int. Symp. Software Reliability Engineering): method performs as well or better and is able to more accurately identify those software components that have more post-release failures, with precision and recall rates as high as 85%.
Using so-called “network centrality measures” like betweenness centrality, closness centrality, eigenvector centrality, degree centrality
Preliminary study on Windows Vista and Eclipse
Talk about borrowing ideas from other disciplines
Examples:
social network analysis =>
Study by Pinzger et al. “Can developer-module networks predict failures?” => study on Windows Vista; using network centrality measures
Study by Bird et al. 117 citations in Google Scholar (Int. Symp. Software Reliability Engineering): method performs as well or better and is able to more accurately identify those software components that have more post-release failures, with precision and recall rates as high as 85%.
Using so-called “network centrality measures” like betweenness centrality, closness centrality, eigenvector centrality, degree centrality
Preliminary study on Windows Vista and Eclipse
Tamburri claims that many of his “community smells”, that are indicators of social debt, could be detectable using social network analysis, i.e. by detecting specific patterns in the social network graph.
Social bus factor is probably related to a combination of high betweenness and low degree centrality.
Sentiment analysis was done based on messages sent to the gentoo-dev mailing list
Sentiment analysis was done based on messages sent to the gentoo-dev mailing list
Study by Pinzger et al. on Windows Vista; using network centrality measures
Study by Bird et al. (Int. Symp. Software Reliability Engineering): method performs as well or better and is able to more accurately identify those software components that have more post-release failures, with precision and recall rates as high as 85%.
Using so-called “network centrality measures” like betweenness centrality, closness centrality, eigenvector centrality, degree centrality
Preliminary study on Windows Vista and Eclipse
See http://blog.graphcommons.com/analyzing-the-npm-dependency-network/
M. Cataldo, J. D. Herbsleb, and K. M. Carley. Socio-technical congruence: A framework for assessing the impact of technical and work dependencies on software development productivity. In Int’l Symp. Empirical Software Engineering and Measurement, pages 2–11. ACM , 2008.
Another evidence against can be found in the paper “Socio-Technical Congruence in the Ruby Ecosystem” by Syeed et al. in OpenSym 2014. (Based on an analysis of the Ruby software ecosystem.)
The behavior of a complex system is bigger than the sum of its parts: the behaviour of the system as a whole cannot be understood by looking at the interaction between the individual entities that compose it.
The concept of a small world was originally observed in the late 1960’s by the social psychologist Stanley Milgram.
- S. Milgram, “The Small World Problem,” Psychology Today, 2, 1967 pp. 60–67.
- J. Travers and S. Milgram, “An Experimental Study of the Small World Problem,” Sociometry, 32(4), 1969 pp. 425–443.
The concept of a small world was originally observed in the late 1960’s by the social psychologist Stanley Milgram.
- S. Milgram, “The Small World Problem,” Psychology Today, 2, 1967 pp. 60–67.
- J. Travers and S. Milgram, “An Experimental Study of the Small World Problem,” Sociometry, 32(4), 1969 pp. 425–443.
Robustness to deletion in the sense that it does not change the structural/topological properties of the network, which remains scale-free, small-world, and skewed distribution after the deletion…
The vulnerability to deletion of hub nodes could be linked easily to the aforementioned notions of technical and social bus factors. Hub nodes have a considerably higher bus factor, since the ecosystem/network is much more vulnerable to their deletion. This implies that managers of the (eco)system should take care to “protect” these hub nodes from getting deleted…
Several models have been proposed that lead to scale-free networks.
A popular model is “preferential attachment”.
The idea of preferential attachment was proposed in 1999 by Barabasi et al.
A.L. Barabasi, R. Albert, emergence of scaling in radndom networks. Science 286,1999, pp. 509-512.
Li et al. [8] proposed an extended model of preferential attachment adapted to software systems, and used it to simulate growth models that mimic the well-known design principle of low coupling and high cohesion. If software developers strive towards this principle, they will naturally obtain systems containing highly cohe- sive components that are lowly coupled between them, reminiscent of the hubs and clusters structure presented in Section 3.
While this growth mechanism seems plausible, other mechanisms have been proposed. It remains an open question which mechanism actually causes the scale-free networks we can observe.
Preferential attachment has been used in software evolution research by several authors:
Valverde et al. [20] suggest that the emergence of scal- ing arises from logical optimisation process.
Myers et al. [15] proposed the process of refactoring to improve the structure of existing code as a possible explanation for the emergence of scale-free networks in software.
Inspired by Darwin’s ideas of evolutionary adaptation, Venkatasubramanian et al. proposed a generic model based on network parameters such as efficiency, robustness, cost, and environmental selection pressure [21]. Using a genetic algorithm their model was able to generate different types of network structures, depending on the chosen parameters.
Obtained topological network structures for varying valus of the “coupling ratio”, representing the possibility that a new edge connects two nodes in different modules, when new nodes are added to the existing network. Particularly, a larger value of Λ means a larger proportion of edges between nodes in different modules, which indicates that the nodes are more likely to connect the nodes in other modules. Conversely, a smaller value of Λ means a smaller proportion of edges between nodes in the same modules, which indicates that the nodes are more likely to connect the nodes in the same modules. Lower values of coupling ratio (e.g. 0.1 for (a)) lead to a more modular network structure.
Talk about borrowing ideas from other disciplines
Examples:
social network analysis =>
Study by Pinzger et al. “Can developer-module networks predict failures?” => study on Windows Vista; using network centrality measures
Study by Bird et al. 117 citations in Google Scholar (Int. Symp. Software Reliability Engineering): method performs as well or better and is able to more accurately identify those software components that have more post-release failures, with precision and recall rates as high as 85%.
Using so-called “network centrality measures” like betweenness centrality, closness centrality, eigenvector centrality, degree centrality
Preliminary study on Windows Vista and Eclipse
Ecological systems include a large number of bi-partite relationships, such as host-parasitoid or plant-pollinator [15]. These relationships are also modeled as bi-partite graphs in which nodes rep- resent species and edges represent a specific kind of relationship. For example, figure 3.3 represents bees and flowers species, as well as the pollinating relationships. Ecological bi-partite networks are very robust to perturbations because of the large diversity [58] of species and the functional redun- dancy of species in the network [41]. In many cases, if a flower species disappears, most bee species that relied on it for pollen can find pollen in other species. The diversity of flower species increases the chances that the extinction of a particular species species will not lead to the extinction of the others, while the functional redundancy increases the chances that bees can find similar pollens in other flowers.
Mutual dependency and functional redundancy:
disappearance of one species may be compensated by other species
if there is sufficient diversity in both layers
Can be used to study resistance and resilience of natural ecosystems
By studying diversity of species
Based on species analogy
Contributors are species that thrive in their environment of projects
Projects are species that thrive in their environment of contributors (human resources)
Talk about borrowing ideas from other disciplines
Examples:
draw inspiration from biology => diversity metrics
draw inspiration from medicine => survival analysis studies
Talk about borrowing ideas from other disciplines
Examples:
draw inspiration from biology => diversity metrics
draw inspiration from medicine => survival analysis studies
Projects with more contributors tend to survive longer
Projects that are older (i.e. more mature) are more likely to survive than younger projects. In the beginning, the survival curve goes down rapidly, than stabilisies
Effect of application domain (project type) may play a role as well, but no significant statistical evidence.
Which mechanisms are favorable for their quality/evolution/popularity/survival?
Volume: need to store, analyse and manipulate huge quantities of data when studying software ecosystems (containing tends of thousands of components and dependencies, a huge number of commits, thousands of contributors, millions of lines of code, …
Variety: need to deal with very heterogenous data: structured data (e.g. programs); semi-structured (e.g. e-mails); unstructured (e.g. unformatted texts). Coming from wide variety of data sources including version control, issue trackers, mailing lists, Q&A, Twitter communication, surveys and interviews. Even screen capture software, video/audio recordings, photographs, field notes of software developers collaborating in situ [Socha et al, “Wide-field ethnography: Studying software engineering in 2025 and beyond,” ICSE, 2016, pp. 797–802]
Veracity: Dealing with uncertain, inconsistent or missing data.
Velocity: new commits are made to GitHub several times every second. This may be less of an issue for empirical studies, in which the data is typically analyzed off-line. For automated tools that support the activities of a software ecosystem community (e.g. web-based dashboards), however, it may be important to rely in the most recent data in order to make informed decisions.
Therefore, appropriate techniques need to be developed and put into place to guarantee anonymity. Fung et al. presented a survey of research results and future directions in privacy-preserving data publishing [70]. Malik et al. pro- vided an overview of privacy-preserving data mining tools and techniques, and proposed future research directions [71].
B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu, “Privacy-preserving data publishing: A survey of recent developments,” ACM Comput. Surv., vol. 42, no. 4, pp. 14:1–14:53, Jun. 2010.
[71] M. B. Malik, M. A. Ghazi, and R. Ali, “Privacy preserving data mining techniques: Current scenario and future prospects,” in Int’l Conf. Computer and Communication Technology, Nov. 2012, pp. 26–32.