Software development is a largely collaborative effort, of which the actual encoding of program logic in source code is a relatively small part. Software developers have to collaborate effectively and communicate with their peers in order to avoid coordination problems. To date, little is known how developer communication during software development activities impacts the quality and evolution of a software.
In this thesis, we present and evaluate tools and techniques to recover communication data from traces of the software development activities. With this data, we study the impact of developer communication on the quality and evolution of the software through an in-depth investigation of the role of developer communication during software development activities. Through multiple case-studies on a broad spectrum of open-source software projects, we find that communication between developers stands in a direct relationship to the quality of the software. Our findings demonstrate that our models based on developer communication explain software defects as well as state-of-the art models that are based on technical information such as code and process metrics, and that social information metrics are orthogonal to these traditional metrics, leading to a more complete and integrated view on software defects. In addition, we find that communication between developers plays a important role in maintaining a healthy contribution management process, which is one of the key factors to the successful evolution of the software. Source code contributors who are part of the community surrounding open-source projects are available for limited times, and long communication times can lead to the loss of valuable contributions.
Our thesis illustrates that software development is an intricate and complex process that is strongly influenced by the social interactions between the stakeholders involved in the development activities. A traditional view based solely on technical aspects of software development such as source code size and complexity, while valuable, limits our understanding of software development activities. The research presented in this thesis consists of a first step towards gaining a more holistic view on software development activities.
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Quality and Evolution of a Software
1. T H E I M PA C T O F D E V E L O P E R C O M M U N I C AT I O N
O N T H E Q U A L I T Y A N D E V O L U T I O N O F A
S O F T WA R E S Y S T E M
Nicolas Bettenburg
May 20th, 2014
Queen’s University
School of Computing
A T H E S I S P R E S E N T E D T O T H E S C H O O L O F C O M P U T I N G
I N C O N F O R M I T Y W I T H T H E R E Q U I R E M E N T S
F O R T H E D E G R E E O F P H I L O S O P H I C A L D O C T O R
2. !2
OUTLINEPRESENTATION
B E G I N
I
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
I I
I V
E N D
MOTIVATION
LITERATURE
REVIEW
ANALYSIS &
RESULTS
I I I
TOOLS
V
CONCLUSIONS
i
5. !4
MOTIVATIONRESEARCH
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• Software Development is a highly social, collaborative process.
[Cataldo et al., CSCW 2006]
6. !4
MOTIVATIONRESEARCH
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• Developers spend up to 50% of their time communicating.
[Perry et al., IEEE Software 1994]
• Software Development is a highly social, collaborative process.
[Cataldo et al., CSCW 2006]
7. !4
MOTIVATIONRESEARCH
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• Developers spend up to 50% of their time communicating.
[Perry et al., IEEE Software 1994]
If communication plays such an integral part in developers’ work lives,
does it affect software development? How?
• Software Development is a highly social, collaborative process.
[Cataldo et al., CSCW 2006]
9. !5
MOTIVATIONRESEARCH
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
How can we put
communication data
in a relationship to
software quality?
10. !5
MOTIVATIONRESEARCH
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
How can we put
communication data
in a relationship to
software quality?
How can we put
communication data
in a relationship to
software evolution?
11. !5
MOTIVATIONRESEARCH
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
Where do we find
communication
data? How can we
mine communication
data ?
How can we put
communication data
in a relationship to
software quality?
How can we put
communication data
in a relationship to
software evolution?
12. !6
HYPOTHESISRESEARCH
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
“The communication between software developers plays a
key role in the quality and evolution of the software.”
13. !6
HYPOTHESISRESEARCH
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
“The communication between software developers plays a
key role in the quality and evolution of the software.”
Literature Review
What is already known
How that knowledge was gained
14. !6
HYPOTHESISRESEARCH
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
“The communication between software developers plays a
key role in the quality and evolution of the software.”
Literature Review
What is already known
How that knowledge was gained
Tools & Techniques
Mining Communication Data
Preparation for use in Experiments
15. !6
HYPOTHESISRESEARCH
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
“The communication between software developers plays a
key role in the quality and evolution of the software.”
Literature Review
What is already known
How that knowledge was gained
Tools & Techniques
Mining Communication Data
Preparation for use in Experiments
Software Quality
Internal Developers Communication
Discussing Bugs in the Software
16. !6
HYPOTHESISRESEARCH
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
“The communication between software developers plays a
key role in the quality and evolution of the software.”
Literature Review
What is already known
How that knowledge was gained
Tools & Techniques
Mining Communication Data
Preparation for use in Experiments
Software Quality
Internal Developers Communication
Discussing Bugs in the Software
Software Evolution
Communication between Developers
and External Contributors
Discussing Source Code Contributions
18. !8
REVIEWLITERATURE
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
Establish what is already known within the field of study about the relationships
between socio-technical information about the software, the software development
process, and the stakeholders, and software quality.
GOAL
• To determine what has already been investigated
• To gain a better understanding of the extraction of socio-technical information
• To identify challenges and open problems
20. !1 0Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
REVIEWLITERATURE
MAJOR FINDINGS
21. !1 0Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
REVIEWLITERATURE
MAJOR FINDINGS
• Social and Socio-Technical information can explain quality aspects
(software vulnerabilities) that technical information on its own cannot.
22. !1 0Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
REVIEWLITERATURE
MAJOR FINDINGS
• Social and Socio-Technical information can explain quality aspects
(software vulnerabilities) that technical information on its own cannot.
• Combinations of Social and Technical Information yield better
Models than using each source of information on its own.
23. !1 0Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
REVIEWLITERATURE
MAJOR FINDINGS
• Social and Socio-Technical information can explain quality aspects
(software vulnerabilities) that technical information on its own cannot.
• Socio-Technical Networks Together with Congruence Measures
provide empirical evidence that Conway’s Law does exist.
• Combinations of Social and Technical Information yield better
Models than using each source of information on its own.
25. !1 2
TOOLS AND TECHNIQUES
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
Off-The-Shelf Tools from Information Retrieval and Natural Language Processing
are not readily usable for mining communication data found in software repositories.
REALIZATION
26. !1 2
TOOLS AND TECHNIQUES
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
Off-The-Shelf Tools from Information Retrieval and Natural Language Processing
are not readily usable for mining communication data found in software repositories.
REALIZATION
• Emails and Chat messages are not newspaper articles!
27. !1 2
TOOLS AND TECHNIQUES
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
Off-The-Shelf Tools from Information Retrieval and Natural Language Processing
are not readily usable for mining communication data found in software repositories.
REALIZATION
• Emails and Chat messages are not newspaper articles!
• Unstructured data = mixtures of Natural Language text and Technical Information
28. !1 2
TOOLS AND TECHNIQUES
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
Off-The-Shelf Tools from Information Retrieval and Natural Language Processing
are not readily usable for mining communication data found in software repositories.
REALIZATION
• Emails and Chat messages are not newspaper articles!
• Unstructured data = mixtures of Natural Language text and Technical Information
• Automated processing of extracted communication data
29. !1 3Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
UNSTRUCTURED DATAMINING
From geek+@cmu.edu Wed Jan 21 08:11:26 1998
Date: Mon, 27 Jan 1997 12:50:44 -0500 (EST)
From: "Brian E. Gallew" <geek+@cmu.edu>
Subject: Re: [HACKERS] configure
!- ---559023410-851401618-854387445=:824
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
!> If you can grab a copy and run it on your machine, and send me
> the output, that would help alot.
!Here is a gzip'ed tar of the results.
!!=====================================================================
| Please do not shoot at the thermonuclear weapons! -- Deacon |
=====================================================================
| Finger geek@andrew.cmu.edu for my public key. |
=====================================================================
!- ---559023410-851401618-854387445=:824
Content-Type: APPLICATION/x-gzip
Content-Transfer-Encoding: BASE64
Content-Description: m88k-dg-dgux5.4R3.10.tar.gz
!H4sIAHDq7DICA+xba3vaSLLOV/MrepzsGHiQuNomOJ4MdnDMrC8csB17HQ8W
UgM9FpJWFxsmyX8/Vd0tIYGwye5kP+w5fp4EaHW9XV1dXbdu6bY1ZCNV1/Qx
ffWD/sql0k6tRl4RUq6IT1KWn4R/L5UI2ansVirVSrlWhpZqqVZ5RUqv/gN/
gedrLiGvHNvzRy71VvUbUYu6mvnqv+zvNbkYM48MmUkJfGrEG1PTJJ7uMscn
/ljzCdcND75TAvIJTN8j9pDoXHECl2ZeE5960OgGFrEt6Ac43szz6YR4NpLN
!
30. !1 3Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
UNSTRUCTURED DATAMINING
From geek+@cmu.edu Wed Jan 21 08:11:26 1998
Date: Mon, 27 Jan 1997 12:50:44 -0500 (EST)
From: "Brian E. Gallew" <geek+@cmu.edu>
Subject: Re: [HACKERS] configure
!- ---559023410-851401618-854387445=:824
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
!> If you can grab a copy and run it on your machine, and send me
> the output, that would help alot.
!Here is a gzip'ed tar of the results.
!!=====================================================================
| Please do not shoot at the thermonuclear weapons! -- Deacon |
=====================================================================
| Finger geek@andrew.cmu.edu for my public key. |
=====================================================================
!- ---559023410-851401618-854387445=:824
Content-Type: APPLICATION/x-gzip
Content-Transfer-Encoding: BASE64
Content-Description: m88k-dg-dgux5.4R3.10.tar.gz
!H4sIAHDq7DICA+xba3vaSLLOV/MrepzsGHiQuNomOJ4MdnDMrC8csB17HQ8W
UgM9FpJWFxsmyX8/Vd0tIYGwye5kP+w5fp4EaHW9XV1dXbdu6bY1ZCNV1/Qx
ffWD/sql0k6tRl4RUq6IT1KWn4R/L5UI2ansVirVSrlWhpZqqVZ5RUqv/gN/
gedrLiGvHNvzRy71VvUbUYu6mvnqv+zvNbkYM48MmUkJfGrEG1PTJJ7uMscn
/ljzCdcND75TAvIJTN8j9pDoXHECl2ZeE5960OgGFrEt6Ac43szz6YR4NpLN
!
31. !1 4Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
TECHNICAL INFORMATIONEXTRACTING
1
2
3
32. !1 5Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
COMMUNICATION DATA TO CODELINKING
Description
Comment 1
Depends on:
Blocks:
Show dependency tree
Attachments
Add an attachment (proposed patch, testcase, etc.)
Note
You need to log in before you can comment on or make changes to this bug.
Markus Keller 2004-11-30 13:27:58 EST
I20041130-0800
Wrong compiler error when interface overrides two methods with same signature
but different thrown exceptions: The call to ij.m() is OK, but eclipse flags it
with "Unhandled exception type IOException".
public class Over {
void x() throws ZipException {
IandJ ij= new K();
ij.m(); //wrong compile error
}
void y() throws ZipException {
K k= new K();
k.m();
}
}
interface I { void m() throws IOException; }
interface J { void m() throws ZipException; }
interface IandJ extends I, J {} // swap I and J to make compile error disappear
class K implements IandJ { public void m() throws ZipException { } }
Kent Johnson 2004-12-01 14:14:02 EST
This is not a MethodVerifier problem.
This error is thrown from FlowContext.checkExceptionHandlers()
X.java
Y.java
Over.java
ZipExcepti
on.java
IOExceptio
n.java
I.java J.javaK.java
33. !1 6
SUMMARYTOOLS AND TECHNIQUES
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
34. !1 6
SUMMARYTOOLS AND TECHNIQUES
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
Data Mining
Tools and
Techniques
for mining
communication
data from
software
repositories
35. !1 6
SUMMARYTOOLS AND TECHNIQUES
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
Data Mining
Tools and
Techniques
for mining
communication
data from
software
repositories
Data Extraction
Tools and
Techniques
for extracting
technical
information
data from
unstructured data
36. !1 6
SUMMARYTOOLS AND TECHNIQUES
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
Data Mining
Tools and
Techniques
for mining
communication
data from
software
repositories
Data Extraction
Tools and
Techniques
for extracting
technical
information
data from
unstructured data
Linking
Different
conceptual
approaches
for linking
communication
data to source
code
37. !1 6
SUMMARYTOOLS AND TECHNIQUES
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
Data Mining
Tools and
Techniques
for mining
communication
data from
software
repositories
Data Extraction
Tools and
Techniques
for extracting
technical
information
data from
unstructured data
Linking
Different
conceptual
approaches
for linking
communication
data to source
code
MUD Workshop
Advancing the
state of the art of
mining
unstructured data
for software
engineering
research
39. !1 8
SOFTWARE QUALITY
The Relationships Between Communication
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
And
PART 1
40. !1 9
OVERVIEWEXPERIMENTAL SETUP
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
DISCUSSION
TOOLS
METRICS
MODELPREDICTIONS
41. !1 9
OVERVIEWEXPERIMENTAL SETUP
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
DISCUSSION
TOOLS
METRICS
MODELPREDICTIONS
VARIABLES
42. !1 9
OVERVIEWEXPERIMENTAL SETUP
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
DISCUSSION
TOOLS
METRICS
MODELPREDICTIONS
VARIABLES
UNDERSTANDING
44. !2 0
OVERVIEWSOCIO-TECHNICAL METRICS
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
Socio-Technical
DIMENSIONS
Discussion Contents
• Source Code
• Patches
• Stack Traces
45. !2 0
OVERVIEWSOCIO-TECHNICAL METRICS
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
Socio-Technical
DIMENSIONS
Discussion Contents
• Source Code
• Patches
• Stack Traces
Communication Dynamics
• # Message
• Reply Time
• Length
• Subscribers
46. !2 0
OVERVIEWSOCIO-TECHNICAL METRICS
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
Socio-Technical
DIMENSIONS
Discussion Contents
• Source Code
• Patches
• Stack Traces
Social Structures
• # Discussion Participants
• Role in Project
• Reputation
• SNA Centrality
Communication Dynamics
• # Message
• Reply Time
• Length
• Subscribers
47. !2 0
OVERVIEWSOCIO-TECHNICAL METRICS
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
Socio-Technical
DIMENSIONS
Discussion Contents
• Source Code
• Patches
• Stack Traces
Social Structures
• # Discussion Participants
• Role in Project
• Reputation
• SNA Centrality
Communication Dynamics
• # Message
• Reply Time
• Length
• Subscribers
Workflow & Coordination
• (Re-)Assignments in
Lifecycle Model
49. !2 1
FINDINGSMAIN
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• We can use Statistical Models for Prediction AND Understanding
50. !2 1
FINDINGSMAIN
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• We can use Statistical Models for Prediction AND Understanding
• Communication Metrics can explain Software Defects as well as traditional
product and process metrics.
51. !2 1
FINDINGSMAIN
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• We can use Statistical Models for Prediction AND Understanding
• Communication Metrics can explain Software Defects as well as traditional
product and process metrics.
• Communication Metrics do NOT describe the same information as traditional
product and process metrics.
52. !2 1
FINDINGSMAIN
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• We can use Statistical Models for Prediction AND Understanding
• Communication Metrics can explain Software Defects as well as traditional
product and process metrics.
• Communication Metrics do NOT describe the same information as traditional
product and process metrics.
• Combinations of communication metrics with traditional metrics results
in more powerful models.
53. !2 2
SOFTWARE EVOLUTION
The Relationships Between Communication
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
And
PART 2
54. !2 3
SETUPEXPERIMENTAL
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
MAILING
LIST (LINUX)
TOOLS
METRICS
MODELS
VARIABLES
GERRIT
(ANDROID)
MANUAL ANALYSIS
LITERATURE
&
DOCUMENTATION
CONTRIBUTION
MANAGEMENT
MODEL
56. !2 4
FINDINGSMAIN
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• External Contributors are available for communication only for a limited time.
57. !2 4
FINDINGSMAIN
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• External Contributors are available for communication only for a limited time.
• Communication problems can lead to wasting efforts in implementing
undesired or already existing functionality.
58. !2 4
FINDINGSMAIN
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• External Contributors are available for communication only for a limited time.
• Communication problems can lead to wasting efforts in implementing
undesired or already existing functionality.
• Delays in communicating review outcome are a key factor for abandoning
desirable code contributions.
61. !2 6
CONCLUSIONSSUMMARY
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• Software Development has a SOCIAL SIDE, too!
62. !2 6
CONCLUSIONSSUMMARY
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• Software Development has a SOCIAL SIDE, too!
• Socio-Technical Information is valuable and can help us gain a more
holistic understanding of software quality.
63. !2 6
CONCLUSIONSSUMMARY
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• Software Development has a SOCIAL SIDE, too!
• Socio-Technical Information is valuable and can help us gain a more
holistic understanding of software quality.
• Communication delays found to be a strong explainer for
software defects.
64. !2 6
CONCLUSIONSSUMMARY
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• Software Development has a SOCIAL SIDE, too!
• Socio-Technical Information is valuable and can help us gain a more
holistic understanding of software quality.
• Communication delays found to be a strong explainer for
software defects.
• Communication delays found to be a key factor for abandoning
code contributions.
65. !2 7
AHEADTHE ROAD
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
66. !2 7
AHEADTHE ROAD
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• What other Socio-Technical Metrics exist?
67. !2 7
AHEADTHE ROAD
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• What other Socio-Technical Metrics exist?
• How can we get them?
68. !2 7
AHEADTHE ROAD
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• What other Socio-Technical Metrics exist?
• How can we get them?
• Measuring, Predicting, and Understanding not Enough!
69. !2 7
AHEADTHE ROAD
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• What other Socio-Technical Metrics exist?
• How can we get them?
• Measuring, Predicting, and Understanding not Enough!
• How can we make social metrics actionable?
70. !2 7
AHEADTHE ROAD
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• What other Socio-Technical Metrics exist?
• How can we get them?
• Measuring, Predicting, and Understanding not Enough!
• How can we make social metrics actionable?
• Looking at Repositories is not enough!
71. !2 7
AHEADTHE ROAD
Studying the Impact of Developer Communication on the Quality and Evolution of a Software System
• What other Socio-Technical Metrics exist?
• How can we get them?
• Measuring, Predicting, and Understanding not Enough!
• How can we make social metrics actionable?
• Looking at Repositories is not enough!
• Go to Google, use Google Glass to record and document “water cooler chat”.
• Go to Mozilla and IBM, corroborate findings with the actual developers.
• Use voice recognition software to mine face-to-face communication data.
PROJECTSFUN STUDENT
72. ?B E G I N
I I I
I V
E N D
MOTIVATION
LITERATURE
REVIEW
ANALYSIS &
RESULTS
I I I
TOOLS
V
CONCLUSIONS
i
!