SlideShare a Scribd company logo
1 of 47
Download to read offline
On The Relation of Test Smells to
Software Code Quality
Seneca
Davide Spadini, Fabio Palomba,
Andy Zaidman, Magiel Bruntink, Alberto Bacchelli
@DavideSpadini ishepard
On The Relation of Test Smells to
Software Code Quality
Seneca
Davide Spadini, Fabio Palomba,
Andy Zaidman, Magiel Bruntink, Alberto Bacchelli
Refactoring Test Code
Arie van Deursen Leon Moonen Alex van den Bergh Gerard Kok
CWI Software Improvement Group
The Netherlands The Netherlands
http://www.cwi.nl/~{arie,leon}/ http://www.software-improvers.com/
{arie,leon}@cwi.nl {alex,gerard}@software-improvers.com
ABSTRACT
Two key aspects of extreme programming (XP) are unit
testing and merciless refactoring. Given the fact that the
ideal test code / production code ratio approaches 1:1, it is
not surprising that unit tests are being refactored. We found
that refactoring test code is different from refactoring pro-
duction code in two ways: (1) there is a distinct set of bad
smells involved, and (2) improving test code involves ad-
ditional test-specific refactorings. To share our experiences
with other XP practitioners, we describe a set of bad smells
that indicate trouble in test code, and a collection of test
refactorings to remove these smells.
Keywords
Refactoring, unit testing, extreme programming.
1 INTRODUCTION
“If there is a technique at the heart of extreme program-
ming (XP), it is unit testing” [1]. As part of their program-
ming activity, XP developers write and maintain (white
box) unit tests continually. These tests are automated,
written in the same programming language as the produc-
tion code, considered an explicit part of the code, and put
under revision control.
The XP process encourages writing a test class for every
class in the system. Methods in these test classes are used
to verify complicated functionality and unusual circum-
stances. Moreover, they are used to document code by ex-
plicitly indicating what the expected results of a method
should be for typical cases. Last but not least, tests are
added upon receiving a bug report to check for the bug and
to check the bug fix [2]. A typical test for a particular
method includes: (1) code to set up the fixture (the data
used for testing), (2) the call of the method, (3) a compari-
son of the actual results with the expected values, and (4)
code to tear down the fixture. Writing tests is usually sup-
ported by frameworks such as JUnit [3].
The test code / production code ratio may vary from project
to project, but is ideally considered to approach a ratio of
1:1. In our project we currently have a 2:3 ratio, although
others have reported a lower ratio1
. One of the corner
stones of XP is that having many tests available helps the
developers to overcome their fear for change: the tests will
provide immediate feedback if the system gets broken at a
critical place. The downside of having many tests, how-
ever, is that changes in functionality will typically involve
changes in the test code as well. The more test code we get,
the more important it becomes that this test code is as eas-
ily modifiable as the production code.
The key XP practice to keep code flexible is “refactor mer-
cilessly”: transforming the code in order to bring it in the
simplest possible state. To support this, a catalog of “code
smells” and a wide range of refactorings is available, vary-
ing from simple modifications up to ways to introduce de-
sign patterns systematically in existing code [5].
When trying to apply refactorings to the test code of our
project we discovered that refactoring test code is different
from refactoring production code. Test code has a distinct
set of smells, dealing with the ways in which test cases are
organized, how they are implemented, and how they inter-
act with each other. Moreover, improving test code in-
volves a mixture of refactorings from [5] specialized to test
code improvements, as well as a set of additional refactor-
ings, involving the modification of test classes, ways of
grouping test cases, and so on.
The goal of this paper is to share our experience in im-
proving our test code with other XP practitioners. To that
end, we describe a set of test smells indicating trouble in
test code, and a collection of test refactorings explaining
how to overcome some of these problems through a simple
program modification.
This paper assumes some familiarity with the xUnit frame-
work [3] and refactorings as described by Fowler [5]. We
will refer to refactorings described in this book using Name
1
This project started a year ago and involves the development of a prod-
uct called DocGen [4]. Development is done by a small team of five peo-
ple using XP techniques. Code is written in Java and we use the JUnit
Test smells
Does Refactoring of Test Smells
Induce Fixing Flaky Tests?
Fabio Palomba and Andy Zaidman
Delft University of Technology, The Netherlands
f.palomba@tudelft.nl, a.e.zaidman@tudelft.nl
Abstract—Regression testing is a core activity that allows devel-
opers to ensure that source code changes do not introduce bugs.
An important prerequisite then is that test cases are deterministic.
However, this is not always the case as some tests suffer from so-
called flakiness. Flaky tests have serious consequences, as they can
hide real bugs and increase software inspection costs. Existing
research has focused on understanding the root causes of test
flakiness and devising techniques to automatically fix flaky tests;
a key area of investigation being concurrency. In this paper,
we investigate the relationship between flaky tests and three
previously defined test smells, namely Resource Optimism, Indirect
Testing and Test Run War. We have set up a study involving 19,532
JUnit test methods belonging to 18 software systems. A key result
of our investigation is that 54% of tests that are flaky contain a
test code smell that can cause the flakiness. Moreover, we found
that refactoring the test smells not only removed the design flaws,
but also fixed all 54% of flaky tests causally co-occurring with
test smells.
Index Terms—Test Smells; Flaky Tests; Refactoring;
I. INTRODUCTION
Test cases form the first line of defense against the introduc-
tion of software faults, especially when testing for regression
faults [1], [2]. As such, with the help of testing frameworks
but just flaky [19]. Perhaps most importantly, from a psy-
chological point of view flaky tests can reduce a developer’s
confidence in the tests, possibly leading to ignoring actual test
failures [17]. Because of this, the research community has
spent considerably effort on trying to understand the causes
behind test flakiness [18], [20], [21], [22] and on devising
automated techniques able to fix flaky tests [23], [24], [25].
However, most of this research mainly focused the attention
on some specific causes possibly leading to the introduction of
flaky tests, such as concurrency [26], [25], [27] or test order
dependency [22] issues, thus proposing ad-hoc solutions that
cannot be used to fix flaky tests characterized by other root
causes. Indeed, according to the findings by Luo et al. [18]
who conducted an empirical study on the motivations behind
test code flakiness, the problems faced by previous research
only represent a part of whole story: a deeper analysis of
possible fixing strategies of other root causes (e.g., flakiness
due to wrong usage of external resources) is still missing.
In this paper, we aim at making a further step ahead toward
the comprehension of test flakiness, by investigating the role
of so-called test smells [28], [29], [30], i.e., poor design or
implementation choices applied by programmers during the
Empir Software Eng (2015) 20:1052–1094
DOI 10.1007/s10664-014-9313-0
Are test smells really harmful? An empirical study
Gabriele Bavota · Abdallah Qusef · Rocco Oliveto ·
Andrea De Lucia · Dave Binkley
Published online: 31 May 2014
© Springer Science+Business Media New York 2014
Abstract Bad code smells have been defined as indicators of potential problems in source
code. Techniques to identify and mitigate bad code smells have been proposed and stud-
ied. Recently bad test code smells (test smells for short) have been put forward as a kind
of bad code smell specific to tests such a unit tests. What has been missing is empirical
investigation into the prevalence and impact of bad test code smells. Two studies aimed at
providing this missing empirical data are presented. The first study finds that there is a high
diffusion of test smells in both open source and industrial software systems with 86 % of
JUnit tests exhibiting at least one test smell and six tests having six distinct test smells. The
second study provides evidence that test smells have a strong negative impact on program
comprehension and maintenance. Highlights from this second study include the finding that
comprehension is 30 % better in the absence of test smells.
On The Relation of Test Smells to
Software Code Quality
Davide Spadini,⇤‡ Fabio Palomba§ Andy Zaidman,⇤ Magiel Bruntink,‡ Alberto Bacchelli§
‡Software Improvement Group, ⇤Delft University of Technology, §University of Zurich
⇤{d.spadini, a.e.zaidman}@tudelft.nl, ‡m.bruntink@sig.eu, §{palomba, bacchelli}@ifi.uzh.ch
Abstract—Test smells are sub-optimal design choices in the
implementation of test code. As reported by recent studies, their
presence might not only negatively affect the comprehension of
test suites but can also lead to test cases being less effective
in finding bugs in production code. Although significant steps
toward understanding test smells, there is still a notable absence
of studies assessing their association with software quality.
In this paper, we investigate the relationship between the
presence of test smells and the change- and defect-proneness of
test code, as well as the defect-proneness of the tested production
code. To this aim, we collect data on 221 releases of ten software
systems and we analyze more than a million test cases to investi-
gate the association of six test smells and their co-occurrence with
software quality. Key results of our study include:(i) tests with
smells are more change- and defect-prone, (ii) ‘Indirect Testing’,
‘Eager Test’, and ‘Assertion Roulette’ are the most significant
smells for change-proneness and, (iii) production code is more
defect-prone when tested by smelly tests.
I. INTRODUCTION
Automated testing (hereafter referred to as just testing)
has become an essential process for improving the quality of
software systems [12], [47]. In fact, testing can help to point
out defects and to ensure that production code is robust under
many usage conditions [12], [16]. Writing tests, however, is as
challenging as writing production code and developers should
maintain test code with the same care they use for production
found evidence of a negative impact of test smells on both
comprehensibility and maintainability of test code [7].
Although the study by Bavota et al. [7] made a first,
necessary step toward the understanding of maintainability
aspects of test smells, our empirical knowledge on whether
and how test smells are associated with software quality
aspects is still limited. Indeed, van Deursen et al. [74] based
their definition of test smells on their anecdotal experience,
without extensive evidence on whether and how such smells
are negatively associated with the overall system quality.
To fill this gap, in this paper we quantitatively investigate
the relationship between the presence of smells in test methods
and the change- and defect-proneness of both these test
methods and the production code they intend to test. Similar
to several previous studies on software quality [24], [62], we
employ the proxy metrics change-proneness (i.e., number of
times a method changes between two releases) and defect-
proneness (i.e., number of defects the method had between two
releases). We conduct an extensive observational study [15],
collecting data from 221 releases of ten open source software
systems, analyze more than a million test cases, and inves-
tigate the association between six test smell types and the
aforementioned proxy metrics.
Based on the experience and reasoning reported by van
Research questions
RQ1: Are test smells associated with change/
defect proneness of test code?
RQ2: Are test smells associated with defect
proneness of production code?
Methodology — subject systems
10 OSS
221 Major releases
# Releases # Classes # Methods KLOC
Total 221 9 - 2,072 68 - 19,445 1 - 334
• All the metrics are calculated at method level!
Methodology — test smells
t
Ri
• We calculate which test methods are affected by test smells
in every release, using the detector by Bavota et al.
method is_smelly type
file1.java:m1 FALSE
file1.java:m2 TRUE Mystery Guest
file2.java:m1 TRUE
Eager Test,
Indirect Testing
file2.java:m2 FALSE
• Type of smells:
1. Mystery Guest
2. Resource Optimism
3. Eager Test
4. Assertion Roulette
5. Indirect Testing
6. Sensitive Equality
Methodology — change proneness of test code
• We define change proneness of a test method Ti in release Ri
as the number of times Ti changed between Ri and Ri-1.
Methodology — change proneness of test code
• We define change proneness of a test method Ti in release Ri
as the number of times Ti changed between Ri and Ri-1.
t
Ri-1 Ri
Methodology — change proneness of test code
• We define change proneness of a test method Ti in release Ri
as the number of times Ti changed between Ri and Ri-1.
t
Ri-1 Ri
#00abc45
Methodology — change proneness of test code
• We define change proneness of a test method Ti in release Ri
as the number of times Ti changed between Ri and Ri-1.
t
Ri-1 Ri
ATest.java
#00abc45
Methodology — change proneness of test code
• We define change proneness of a test method Ti in release Ri
as the number of times Ti changed between Ri and Ri-1.
t
Ri-1 Ri
ATest.java
#00abc45
ATest.java
Methodology — change proneness of test code
ATest.javaATest.java
Methodology — change proneness of test code
ATest.javaATest.java
method1
method2
method5
method6
method3
method1
method2
method4
Methodology — change proneness of test code
ATest.javaATest.java
method1
method2
method5
method6
method3
method1
method2
method4
Methodology — change proneness of test code
ATest.javaATest.java
method1
method2
method5
method6
method3
method1
method2
method4
sum = a + b
return sum
sum = a + b
return sum
Methodology — change proneness of test code
ATest.javaATest.java
method2
method5
method6
method3
method2
method4
Methodology — change proneness of test code
ATest.javaATest.java
method2
method5
method6
method3
method2
method4
Methodology — change proneness of test code
ATest.javaATest.java
method2
method5
method6
method3
method2
method4
diff = a - b
return diff
diff = b - a
return diff
Methodology — change proneness of test code
ATest.javaATest.java
method2
method5
method6
method3
method2
method4
diff = a - b
return diff
diff = b - a
return diff
method2 changes ++
Methodology — change proneness of test code
ATest.javaATest.java
method5
method6
method3
method4
Methodology — change proneness of test code
ATest.javaATest.java
method5
method6
method3
method4
Methodology — change proneness of test code
ATest.javaATest.java
method5
method6
method3
method4
cosine similarity < 0.9
Methodology — change proneness of test code
ATest.javaATest.java
method5
method6
method3
method4
Methodology — change proneness of test code
ATest.javaATest.java
method5
method6
method3
method4
cosine similarity > 0.9
method5 changes ++
Methodology — change proneness of test code
ATest.javaATest.java
method6
method3
Methodology — change proneness of test code
ATest.javaATest.java
method6
method3
cosine similarity < 0.9
Methodology — change proneness of test code
ATest.javaATest.java
method6
Method Added
Methodology — defect proneness
• We define defect proneness of a (test and production) method
Ti in release Ri as the number of defects Ti contained in Ri.
• We first obtain the bug inducing commits, and then we apply
SZZ.
t
Ri
bug#1 bug#2
Research questions
RQ1: Are test smells associated with change/
defect proneness of test code?
Research questions
RQ1: Are test smells associated with change/
defect proneness of test code?
RQ1.1: To what extent are test smells
associated with the change- and defect-
proneness of test code?
RQ1.1: Is the co-occurrence of test smells associated with the
change- and defect-proneness of test code?
Change Proneness
1
1.47
1.31
size
overall
1.95
2.02
Conf. Int.
1.46-1.50
1.29-1.32
1.86-2.04
1.84-2.19
small
average
large
RQ1.1: Is the co-occurrence of test smells associated with the
change- and defect-proneness of test code?
Defect Proneness
1
1.45
1.63
3.55
2.37
1.56
1.81
Conf. Int.
1.50-1.63
2.05-2.75
2.74-4.61
size
small
average
large
C.P.
no
yes
overall
1.54-1.71
1.37-1.53
1.74-1.89
Research questions
RQ1: Are test smells associated with change/
defect proneness of test code?
RQ1.1: To what extent are test smells
associated with the change- and defect-
proneness of test code?
RQ1.2: Is the co-occurrence of test smells
associated with the change- and defect-
proneness of test code?
RQ1.2: Is the co-occurrence of test smells associated with the
change- and defect-proneness of test code?
0.0
2.5
5.0
7.5
10.0
12.5
0 1 2 3 4 5 6
Number of test smells
Numberofchanges
RQ1.2: Is the co-occurrence of test smells associated with the
change- and defect-proneness of test code?
0
10
20
30
40
0 1 2 3 4 5 6
Number of test smells
Numberofbugs
Research questions
RQ1: Are test smells associated with change/
defect proneness of test code?
RQ1.1: To what extent are test smells
associated with the change- and defect-
proneness of test code?
RQ1.2: Is the co-occurrence of test smells
associated with the change- and defect-
proneness of test code?
RQ1.3: Are certain test smell types more
associated with the change- and defect-
proneness of test code?
RQ1.3: Are certain test smell types more associated with the change-
and defect-proneness of test code?
0
20
40
60
Assertion Roulette Eager Test Indirect Testing Mystery Guest Sensitive Equality
Smell Type
Relationwithmaintainability
Number of changes
Number of bugs
Research questions
RQ2: Are test smells associated with defect
proneness of production code?
Research questions
RQ2: Are test smells associated with defect
proneness of production code?
RQ2.1: To what extent are test smells
associated with the defect-proneness of
production code?
RQ2.1: To what extent are test smells associated with the change-
and defect- proneness of production code?
Defect Proneness
1
Conf. Int.
1.52-1.60
2.03-2.46
1.84-2.54
1.67-1.75
2.17
2.23
1.56
size
overall
small
average
large
1.71
RQ2.1: To what extent are test smells associated with the change-
and defect- proneness of production code?
Defect Proneness
0
5
10
Non−smelly Smelly
Type
Numberofbugs
Research questions
RQ2: Are test smells associated with defect
proneness of production code?
RQ2.1: To what extent are test smells
associated with the defect-proneness of
production code?
RQ2.2: Is the co-occurrence of test smells
associated with the defect-proneness of
production code?
RQ2.2: Is the co-occurrence of test smells associated with the defect-
proneness of production code?
0.0
2.5
5.0
7.5
10.0
0 1 2 3 4 5 6
Number of test smells
Numberofbugsintheproductionmethods
Research questions
RQ2: Are test smells associated with defect
proneness of production code?
RQ2.1: To what extent are test smells
associated with the defect-proneness of
production code?
RQ2.2: Is the co-occurrence of test smells
associated with the defect-proneness of
production code?
RQ2.3: Are certain test smell types more
associated with the defect-proneness of
production code?
RQ2.3: Are certain test smell types more associated with the defect-
proneness of production code?
0.0
2.5
5.0
7.5
10.0
Assertion Roulette Eager Test Indirect Testing Mystery Guest Sensitive Equality
Smell Type
Numberofbugsintheproductionmethods
SummaryTestcodeProductioncode
More change- and
defect-prone if affected
by smells
Slightly more change-
prone if affected by
more smells
More defect-prone if
exercised by test code
affected by test smells
‘Indirect Testing’ and
‘Eager Test’ smells are
more defect-prone in
the exercised
production code
On The Relation of Test Smells to Software Code Quality

More Related Content

What's hot

Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?Michaela Greiler
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
A survey of software testing
A survey of software testingA survey of software testing
A survey of software testingTao He
 
Software testing q as collection by ravi
Software testing q as   collection by raviSoftware testing q as   collection by ravi
Software testing q as collection by raviRavindranath Tagore
 
Testing terms & definitions
Testing terms & definitionsTesting terms & definitions
Testing terms & definitionsSachin MK
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directionsTao He
 
Research Activities: past, present, and future.
Research Activities: past, present, and future.Research Activities: past, present, and future.
Research Activities: past, present, and future.Marco Torchiano
 
Importance of Testing in SDLC
Importance of Testing in SDLCImportance of Testing in SDLC
Importance of Testing in SDLCIJEACS
 
Continuous test suite failure prediction
Continuous test suite failure predictionContinuous test suite failure prediction
Continuous test suite failure predictionssuser94f898
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Software testing
Software testingSoftware testing
Software testingprasad g
 
Programming with GUTs
Programming with GUTsProgramming with GUTs
Programming with GUTscatherinewall
 
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Chakkrit (Kla) Tantithamthavorn
 
MIT521 software testing (2012) v2
MIT521   software testing  (2012) v2MIT521   software testing  (2012) v2
MIT521 software testing (2012) v2Yudep Apoi
 

What's hot (20)

Dc35579583
Dc35579583Dc35579583
Dc35579583
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
A survey of software testing
A survey of software testingA survey of software testing
A survey of software testing
 
Software testing q as collection by ravi
Software testing q as   collection by raviSoftware testing q as   collection by ravi
Software testing q as collection by ravi
 
Bd36334337
Bd36334337Bd36334337
Bd36334337
 
S440999102
S440999102S440999102
S440999102
 
nullcon 2011 - Fuzzing with Complexities
nullcon 2011 - Fuzzing with Complexitiesnullcon 2011 - Fuzzing with Complexities
nullcon 2011 - Fuzzing with Complexities
 
Testing terms & definitions
Testing terms & definitionsTesting terms & definitions
Testing terms & definitions
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
 
Research Activities: past, present, and future.
Research Activities: past, present, and future.Research Activities: past, present, and future.
Research Activities: past, present, and future.
 
Importance of Testing in SDLC
Importance of Testing in SDLCImportance of Testing in SDLC
Importance of Testing in SDLC
 
Continuous test suite failure prediction
Continuous test suite failure predictionContinuous test suite failure prediction
Continuous test suite failure prediction
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Software testing
Software testingSoftware testing
Software testing
 
Ijcatr04051006
Ijcatr04051006Ijcatr04051006
Ijcatr04051006
 
M018147883
M018147883M018147883
M018147883
 
Programming with GUTs
Programming with GUTsProgramming with GUTs
Programming with GUTs
 
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
 
MIT521 software testing (2012) v2
MIT521   software testing  (2012) v2MIT521   software testing  (2012) v2
MIT521 software testing (2012) v2
 

Similar to On The Relation of Test Smells to Software Code Quality

A Complexity Based Regression Test Selection Strategy
A Complexity Based Regression Test Selection StrategyA Complexity Based Regression Test Selection Strategy
A Complexity Based Regression Test Selection StrategyCSEIJJournal
 
Annotated Bibliography .Guidelines Annotated Bibliograph.docx
Annotated Bibliography  .Guidelines Annotated Bibliograph.docxAnnotated Bibliography  .Guidelines Annotated Bibliograph.docx
Annotated Bibliography .Guidelines Annotated Bibliograph.docxjustine1simpson78276
 
EFFECTIVE TEST CASE DESING: A REVIEW
EFFECTIVE TEST CASE DESING: A REVIEWEFFECTIVE TEST CASE DESING: A REVIEW
EFFECTIVE TEST CASE DESING: A REVIEWJournal For Research
 
Software testing techniques - www.testersforum.com
Software testing techniques - www.testersforum.comSoftware testing techniques - www.testersforum.com
Software testing techniques - www.testersforum.comwww.testersforum.com
 
Thetheoryofsoftwaretesting
ThetheoryofsoftwaretestingThetheoryofsoftwaretesting
ThetheoryofsoftwaretestingPiyushMehta57
 
30 February 2005 QUEUE rants [email protected] DARNEDTestin.docx
30  February 2005  QUEUE rants [email protected] DARNEDTestin.docx30  February 2005  QUEUE rants [email protected] DARNEDTestin.docx
30 February 2005 QUEUE rants [email protected] DARNEDTestin.docxtamicawaysmith
 
Software Quality Analysis Using Mutation Testing Scheme
Software Quality Analysis Using Mutation Testing SchemeSoftware Quality Analysis Using Mutation Testing Scheme
Software Quality Analysis Using Mutation Testing SchemeEditor IJMTER
 
Software testing strategy
Software testing strategySoftware testing strategy
Software testing strategyijseajournal
 
Comprehensive Testing Tool for Automatic Test Suite Generation, Prioritizatio...
Comprehensive Testing Tool for Automatic Test Suite Generation, Prioritizatio...Comprehensive Testing Tool for Automatic Test Suite Generation, Prioritizatio...
Comprehensive Testing Tool for Automatic Test Suite Generation, Prioritizatio...CSCJournals
 
Test case-point-analysis (whitepaper)
Test case-point-analysis (whitepaper)Test case-point-analysis (whitepaper)
Test case-point-analysis (whitepaper)KMS Technology
 

Similar to On The Relation of Test Smells to Software Code Quality (20)

A Complexity Based Regression Test Selection Strategy
A Complexity Based Regression Test Selection StrategyA Complexity Based Regression Test Selection Strategy
A Complexity Based Regression Test Selection Strategy
 
C41041120
C41041120C41041120
C41041120
 
Annotated Bibliography .Guidelines Annotated Bibliograph.docx
Annotated Bibliography  .Guidelines Annotated Bibliograph.docxAnnotated Bibliography  .Guidelines Annotated Bibliograph.docx
Annotated Bibliography .Guidelines Annotated Bibliograph.docx
 
50120140502017
5012014050201750120140502017
50120140502017
 
Software testing
Software testingSoftware testing
Software testing
 
stm f.pdf
stm f.pdfstm f.pdf
stm f.pdf
 
EFFECTIVE TEST CASE DESING: A REVIEW
EFFECTIVE TEST CASE DESING: A REVIEWEFFECTIVE TEST CASE DESING: A REVIEW
EFFECTIVE TEST CASE DESING: A REVIEW
 
Too many files
Too many filesToo many files
Too many files
 
Software testing techniques - www.testersforum.com
Software testing techniques - www.testersforum.comSoftware testing techniques - www.testersforum.com
Software testing techniques - www.testersforum.com
 
Thetheoryofsoftwaretesting
ThetheoryofsoftwaretestingThetheoryofsoftwaretesting
Thetheoryofsoftwaretesting
 
30 February 2005 QUEUE rants [email protected] DARNEDTestin.docx
30  February 2005  QUEUE rants [email protected] DARNEDTestin.docx30  February 2005  QUEUE rants [email protected] DARNEDTestin.docx
30 February 2005 QUEUE rants [email protected] DARNEDTestin.docx
 
Software Quality Analysis Using Mutation Testing Scheme
Software Quality Analysis Using Mutation Testing SchemeSoftware Quality Analysis Using Mutation Testing Scheme
Software Quality Analysis Using Mutation Testing Scheme
 
Paper 06
Paper 06Paper 06
Paper 06
 
Software testing strategy
Software testing strategySoftware testing strategy
Software testing strategy
 
Stm unit1
Stm unit1Stm unit1
Stm unit1
 
Comprehensive Testing Tool for Automatic Test Suite Generation, Prioritizatio...
Comprehensive Testing Tool for Automatic Test Suite Generation, Prioritizatio...Comprehensive Testing Tool for Automatic Test Suite Generation, Prioritizatio...
Comprehensive Testing Tool for Automatic Test Suite Generation, Prioritizatio...
 
Software engg unit 4
Software engg unit 4 Software engg unit 4
Software engg unit 4
 
An empirical evaluation of
An empirical evaluation ofAn empirical evaluation of
An empirical evaluation of
 
Software testing
Software testing   Software testing
Software testing
 
Test case-point-analysis (whitepaper)
Test case-point-analysis (whitepaper)Test case-point-analysis (whitepaper)
Test case-point-analysis (whitepaper)
 

More from Delft University of Technology

More from Delft University of Technology (6)

Investigating Severity Thresholds for Test Smells
Investigating Severity Thresholds for Test SmellsInvestigating Severity Thresholds for Test Smells
Investigating Severity Thresholds for Test Smells
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code Review
 
Test-Driven Code Review: An Empirical Study
Test-Driven Code Review: An Empirical StudyTest-Driven Code Review: An Empirical Study
Test-Driven Code Review: An Empirical Study
 
PyDriller: Python Framework for Mining Software Repositories
PyDriller: Python Framework for Mining Software RepositoriesPyDriller: Python Framework for Mining Software Repositories
PyDriller: Python Framework for Mining Software Repositories
 
When Testing Meets Code Review: Why and How Developers Review Tests
When Testing Meets Code Review: Why and How Developers Review TestsWhen Testing Meets Code Review: Why and How Developers Review Tests
When Testing Meets Code Review: Why and How Developers Review Tests
 
To Mock or Not To Mock
To Mock or Not To MockTo Mock or Not To Mock
To Mock or Not To Mock
 

Recently uploaded

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxchumtiyababu
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 

Recently uploaded (20)

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 

On The Relation of Test Smells to Software Code Quality

  • 1. On The Relation of Test Smells to Software Code Quality Seneca Davide Spadini, Fabio Palomba, Andy Zaidman, Magiel Bruntink, Alberto Bacchelli
  • 2. @DavideSpadini ishepard On The Relation of Test Smells to Software Code Quality Seneca Davide Spadini, Fabio Palomba, Andy Zaidman, Magiel Bruntink, Alberto Bacchelli
  • 3. Refactoring Test Code Arie van Deursen Leon Moonen Alex van den Bergh Gerard Kok CWI Software Improvement Group The Netherlands The Netherlands http://www.cwi.nl/~{arie,leon}/ http://www.software-improvers.com/ {arie,leon}@cwi.nl {alex,gerard}@software-improvers.com ABSTRACT Two key aspects of extreme programming (XP) are unit testing and merciless refactoring. Given the fact that the ideal test code / production code ratio approaches 1:1, it is not surprising that unit tests are being refactored. We found that refactoring test code is different from refactoring pro- duction code in two ways: (1) there is a distinct set of bad smells involved, and (2) improving test code involves ad- ditional test-specific refactorings. To share our experiences with other XP practitioners, we describe a set of bad smells that indicate trouble in test code, and a collection of test refactorings to remove these smells. Keywords Refactoring, unit testing, extreme programming. 1 INTRODUCTION “If there is a technique at the heart of extreme program- ming (XP), it is unit testing” [1]. As part of their program- ming activity, XP developers write and maintain (white box) unit tests continually. These tests are automated, written in the same programming language as the produc- tion code, considered an explicit part of the code, and put under revision control. The XP process encourages writing a test class for every class in the system. Methods in these test classes are used to verify complicated functionality and unusual circum- stances. Moreover, they are used to document code by ex- plicitly indicating what the expected results of a method should be for typical cases. Last but not least, tests are added upon receiving a bug report to check for the bug and to check the bug fix [2]. A typical test for a particular method includes: (1) code to set up the fixture (the data used for testing), (2) the call of the method, (3) a compari- son of the actual results with the expected values, and (4) code to tear down the fixture. Writing tests is usually sup- ported by frameworks such as JUnit [3]. The test code / production code ratio may vary from project to project, but is ideally considered to approach a ratio of 1:1. In our project we currently have a 2:3 ratio, although others have reported a lower ratio1 . One of the corner stones of XP is that having many tests available helps the developers to overcome their fear for change: the tests will provide immediate feedback if the system gets broken at a critical place. The downside of having many tests, how- ever, is that changes in functionality will typically involve changes in the test code as well. The more test code we get, the more important it becomes that this test code is as eas- ily modifiable as the production code. The key XP practice to keep code flexible is “refactor mer- cilessly”: transforming the code in order to bring it in the simplest possible state. To support this, a catalog of “code smells” and a wide range of refactorings is available, vary- ing from simple modifications up to ways to introduce de- sign patterns systematically in existing code [5]. When trying to apply refactorings to the test code of our project we discovered that refactoring test code is different from refactoring production code. Test code has a distinct set of smells, dealing with the ways in which test cases are organized, how they are implemented, and how they inter- act with each other. Moreover, improving test code in- volves a mixture of refactorings from [5] specialized to test code improvements, as well as a set of additional refactor- ings, involving the modification of test classes, ways of grouping test cases, and so on. The goal of this paper is to share our experience in im- proving our test code with other XP practitioners. To that end, we describe a set of test smells indicating trouble in test code, and a collection of test refactorings explaining how to overcome some of these problems through a simple program modification. This paper assumes some familiarity with the xUnit frame- work [3] and refactorings as described by Fowler [5]. We will refer to refactorings described in this book using Name 1 This project started a year ago and involves the development of a prod- uct called DocGen [4]. Development is done by a small team of five peo- ple using XP techniques. Code is written in Java and we use the JUnit Test smells Does Refactoring of Test Smells Induce Fixing Flaky Tests? Fabio Palomba and Andy Zaidman Delft University of Technology, The Netherlands f.palomba@tudelft.nl, a.e.zaidman@tudelft.nl Abstract—Regression testing is a core activity that allows devel- opers to ensure that source code changes do not introduce bugs. An important prerequisite then is that test cases are deterministic. However, this is not always the case as some tests suffer from so- called flakiness. Flaky tests have serious consequences, as they can hide real bugs and increase software inspection costs. Existing research has focused on understanding the root causes of test flakiness and devising techniques to automatically fix flaky tests; a key area of investigation being concurrency. In this paper, we investigate the relationship between flaky tests and three previously defined test smells, namely Resource Optimism, Indirect Testing and Test Run War. We have set up a study involving 19,532 JUnit test methods belonging to 18 software systems. A key result of our investigation is that 54% of tests that are flaky contain a test code smell that can cause the flakiness. Moreover, we found that refactoring the test smells not only removed the design flaws, but also fixed all 54% of flaky tests causally co-occurring with test smells. Index Terms—Test Smells; Flaky Tests; Refactoring; I. INTRODUCTION Test cases form the first line of defense against the introduc- tion of software faults, especially when testing for regression faults [1], [2]. As such, with the help of testing frameworks but just flaky [19]. Perhaps most importantly, from a psy- chological point of view flaky tests can reduce a developer’s confidence in the tests, possibly leading to ignoring actual test failures [17]. Because of this, the research community has spent considerably effort on trying to understand the causes behind test flakiness [18], [20], [21], [22] and on devising automated techniques able to fix flaky tests [23], [24], [25]. However, most of this research mainly focused the attention on some specific causes possibly leading to the introduction of flaky tests, such as concurrency [26], [25], [27] or test order dependency [22] issues, thus proposing ad-hoc solutions that cannot be used to fix flaky tests characterized by other root causes. Indeed, according to the findings by Luo et al. [18] who conducted an empirical study on the motivations behind test code flakiness, the problems faced by previous research only represent a part of whole story: a deeper analysis of possible fixing strategies of other root causes (e.g., flakiness due to wrong usage of external resources) is still missing. In this paper, we aim at making a further step ahead toward the comprehension of test flakiness, by investigating the role of so-called test smells [28], [29], [30], i.e., poor design or implementation choices applied by programmers during the Empir Software Eng (2015) 20:1052–1094 DOI 10.1007/s10664-014-9313-0 Are test smells really harmful? An empirical study Gabriele Bavota · Abdallah Qusef · Rocco Oliveto · Andrea De Lucia · Dave Binkley Published online: 31 May 2014 © Springer Science+Business Media New York 2014 Abstract Bad code smells have been defined as indicators of potential problems in source code. Techniques to identify and mitigate bad code smells have been proposed and stud- ied. Recently bad test code smells (test smells for short) have been put forward as a kind of bad code smell specific to tests such a unit tests. What has been missing is empirical investigation into the prevalence and impact of bad test code smells. Two studies aimed at providing this missing empirical data are presented. The first study finds that there is a high diffusion of test smells in both open source and industrial software systems with 86 % of JUnit tests exhibiting at least one test smell and six tests having six distinct test smells. The second study provides evidence that test smells have a strong negative impact on program comprehension and maintenance. Highlights from this second study include the finding that comprehension is 30 % better in the absence of test smells. On The Relation of Test Smells to Software Code Quality Davide Spadini,⇤‡ Fabio Palomba§ Andy Zaidman,⇤ Magiel Bruntink,‡ Alberto Bacchelli§ ‡Software Improvement Group, ⇤Delft University of Technology, §University of Zurich ⇤{d.spadini, a.e.zaidman}@tudelft.nl, ‡m.bruntink@sig.eu, §{palomba, bacchelli}@ifi.uzh.ch Abstract—Test smells are sub-optimal design choices in the implementation of test code. As reported by recent studies, their presence might not only negatively affect the comprehension of test suites but can also lead to test cases being less effective in finding bugs in production code. Although significant steps toward understanding test smells, there is still a notable absence of studies assessing their association with software quality. In this paper, we investigate the relationship between the presence of test smells and the change- and defect-proneness of test code, as well as the defect-proneness of the tested production code. To this aim, we collect data on 221 releases of ten software systems and we analyze more than a million test cases to investi- gate the association of six test smells and their co-occurrence with software quality. Key results of our study include:(i) tests with smells are more change- and defect-prone, (ii) ‘Indirect Testing’, ‘Eager Test’, and ‘Assertion Roulette’ are the most significant smells for change-proneness and, (iii) production code is more defect-prone when tested by smelly tests. I. INTRODUCTION Automated testing (hereafter referred to as just testing) has become an essential process for improving the quality of software systems [12], [47]. In fact, testing can help to point out defects and to ensure that production code is robust under many usage conditions [12], [16]. Writing tests, however, is as challenging as writing production code and developers should maintain test code with the same care they use for production found evidence of a negative impact of test smells on both comprehensibility and maintainability of test code [7]. Although the study by Bavota et al. [7] made a first, necessary step toward the understanding of maintainability aspects of test smells, our empirical knowledge on whether and how test smells are associated with software quality aspects is still limited. Indeed, van Deursen et al. [74] based their definition of test smells on their anecdotal experience, without extensive evidence on whether and how such smells are negatively associated with the overall system quality. To fill this gap, in this paper we quantitatively investigate the relationship between the presence of smells in test methods and the change- and defect-proneness of both these test methods and the production code they intend to test. Similar to several previous studies on software quality [24], [62], we employ the proxy metrics change-proneness (i.e., number of times a method changes between two releases) and defect- proneness (i.e., number of defects the method had between two releases). We conduct an extensive observational study [15], collecting data from 221 releases of ten open source software systems, analyze more than a million test cases, and inves- tigate the association between six test smell types and the aforementioned proxy metrics. Based on the experience and reasoning reported by van
  • 4. Research questions RQ1: Are test smells associated with change/ defect proneness of test code? RQ2: Are test smells associated with defect proneness of production code?
  • 5. Methodology — subject systems 10 OSS 221 Major releases # Releases # Classes # Methods KLOC Total 221 9 - 2,072 68 - 19,445 1 - 334 • All the metrics are calculated at method level!
  • 6. Methodology — test smells t Ri • We calculate which test methods are affected by test smells in every release, using the detector by Bavota et al. method is_smelly type file1.java:m1 FALSE file1.java:m2 TRUE Mystery Guest file2.java:m1 TRUE Eager Test, Indirect Testing file2.java:m2 FALSE • Type of smells: 1. Mystery Guest 2. Resource Optimism 3. Eager Test 4. Assertion Roulette 5. Indirect Testing 6. Sensitive Equality
  • 7. Methodology — change proneness of test code • We define change proneness of a test method Ti in release Ri as the number of times Ti changed between Ri and Ri-1.
  • 8. Methodology — change proneness of test code • We define change proneness of a test method Ti in release Ri as the number of times Ti changed between Ri and Ri-1. t Ri-1 Ri
  • 9. Methodology — change proneness of test code • We define change proneness of a test method Ti in release Ri as the number of times Ti changed between Ri and Ri-1. t Ri-1 Ri #00abc45
  • 10. Methodology — change proneness of test code • We define change proneness of a test method Ti in release Ri as the number of times Ti changed between Ri and Ri-1. t Ri-1 Ri ATest.java #00abc45
  • 11. Methodology — change proneness of test code • We define change proneness of a test method Ti in release Ri as the number of times Ti changed between Ri and Ri-1. t Ri-1 Ri ATest.java #00abc45 ATest.java
  • 12. Methodology — change proneness of test code ATest.javaATest.java
  • 13. Methodology — change proneness of test code ATest.javaATest.java method1 method2 method5 method6 method3 method1 method2 method4
  • 14. Methodology — change proneness of test code ATest.javaATest.java method1 method2 method5 method6 method3 method1 method2 method4
  • 15. Methodology — change proneness of test code ATest.javaATest.java method1 method2 method5 method6 method3 method1 method2 method4 sum = a + b return sum sum = a + b return sum
  • 16. Methodology — change proneness of test code ATest.javaATest.java method2 method5 method6 method3 method2 method4
  • 17. Methodology — change proneness of test code ATest.javaATest.java method2 method5 method6 method3 method2 method4
  • 18. Methodology — change proneness of test code ATest.javaATest.java method2 method5 method6 method3 method2 method4 diff = a - b return diff diff = b - a return diff
  • 19. Methodology — change proneness of test code ATest.javaATest.java method2 method5 method6 method3 method2 method4 diff = a - b return diff diff = b - a return diff method2 changes ++
  • 20. Methodology — change proneness of test code ATest.javaATest.java method5 method6 method3 method4
  • 21. Methodology — change proneness of test code ATest.javaATest.java method5 method6 method3 method4
  • 22. Methodology — change proneness of test code ATest.javaATest.java method5 method6 method3 method4 cosine similarity < 0.9
  • 23. Methodology — change proneness of test code ATest.javaATest.java method5 method6 method3 method4
  • 24. Methodology — change proneness of test code ATest.javaATest.java method5 method6 method3 method4 cosine similarity > 0.9 method5 changes ++
  • 25. Methodology — change proneness of test code ATest.javaATest.java method6 method3
  • 26. Methodology — change proneness of test code ATest.javaATest.java method6 method3 cosine similarity < 0.9
  • 27. Methodology — change proneness of test code ATest.javaATest.java method6 Method Added
  • 28. Methodology — defect proneness • We define defect proneness of a (test and production) method Ti in release Ri as the number of defects Ti contained in Ri. • We first obtain the bug inducing commits, and then we apply SZZ. t Ri bug#1 bug#2
  • 29. Research questions RQ1: Are test smells associated with change/ defect proneness of test code?
  • 30. Research questions RQ1: Are test smells associated with change/ defect proneness of test code? RQ1.1: To what extent are test smells associated with the change- and defect- proneness of test code?
  • 31. RQ1.1: Is the co-occurrence of test smells associated with the change- and defect-proneness of test code? Change Proneness 1 1.47 1.31 size overall 1.95 2.02 Conf. Int. 1.46-1.50 1.29-1.32 1.86-2.04 1.84-2.19 small average large
  • 32. RQ1.1: Is the co-occurrence of test smells associated with the change- and defect-proneness of test code? Defect Proneness 1 1.45 1.63 3.55 2.37 1.56 1.81 Conf. Int. 1.50-1.63 2.05-2.75 2.74-4.61 size small average large C.P. no yes overall 1.54-1.71 1.37-1.53 1.74-1.89
  • 33. Research questions RQ1: Are test smells associated with change/ defect proneness of test code? RQ1.1: To what extent are test smells associated with the change- and defect- proneness of test code? RQ1.2: Is the co-occurrence of test smells associated with the change- and defect- proneness of test code?
  • 34. RQ1.2: Is the co-occurrence of test smells associated with the change- and defect-proneness of test code? 0.0 2.5 5.0 7.5 10.0 12.5 0 1 2 3 4 5 6 Number of test smells Numberofchanges
  • 35. RQ1.2: Is the co-occurrence of test smells associated with the change- and defect-proneness of test code? 0 10 20 30 40 0 1 2 3 4 5 6 Number of test smells Numberofbugs
  • 36. Research questions RQ1: Are test smells associated with change/ defect proneness of test code? RQ1.1: To what extent are test smells associated with the change- and defect- proneness of test code? RQ1.2: Is the co-occurrence of test smells associated with the change- and defect- proneness of test code? RQ1.3: Are certain test smell types more associated with the change- and defect- proneness of test code?
  • 37. RQ1.3: Are certain test smell types more associated with the change- and defect-proneness of test code? 0 20 40 60 Assertion Roulette Eager Test Indirect Testing Mystery Guest Sensitive Equality Smell Type Relationwithmaintainability Number of changes Number of bugs
  • 38. Research questions RQ2: Are test smells associated with defect proneness of production code?
  • 39. Research questions RQ2: Are test smells associated with defect proneness of production code? RQ2.1: To what extent are test smells associated with the defect-proneness of production code?
  • 40. RQ2.1: To what extent are test smells associated with the change- and defect- proneness of production code? Defect Proneness 1 Conf. Int. 1.52-1.60 2.03-2.46 1.84-2.54 1.67-1.75 2.17 2.23 1.56 size overall small average large 1.71
  • 41. RQ2.1: To what extent are test smells associated with the change- and defect- proneness of production code? Defect Proneness 0 5 10 Non−smelly Smelly Type Numberofbugs
  • 42. Research questions RQ2: Are test smells associated with defect proneness of production code? RQ2.1: To what extent are test smells associated with the defect-proneness of production code? RQ2.2: Is the co-occurrence of test smells associated with the defect-proneness of production code?
  • 43. RQ2.2: Is the co-occurrence of test smells associated with the defect- proneness of production code? 0.0 2.5 5.0 7.5 10.0 0 1 2 3 4 5 6 Number of test smells Numberofbugsintheproductionmethods
  • 44. Research questions RQ2: Are test smells associated with defect proneness of production code? RQ2.1: To what extent are test smells associated with the defect-proneness of production code? RQ2.2: Is the co-occurrence of test smells associated with the defect-proneness of production code? RQ2.3: Are certain test smell types more associated with the defect-proneness of production code?
  • 45. RQ2.3: Are certain test smell types more associated with the defect- proneness of production code? 0.0 2.5 5.0 7.5 10.0 Assertion Roulette Eager Test Indirect Testing Mystery Guest Sensitive Equality Smell Type Numberofbugsintheproductionmethods
  • 46. SummaryTestcodeProductioncode More change- and defect-prone if affected by smells Slightly more change- prone if affected by more smells More defect-prone if exercised by test code affected by test smells ‘Indirect Testing’ and ‘Eager Test’ smells are more defect-prone in the exercised production code