SlideShare a Scribd company logo
1 of 12
Download to read offline
Revisiting Test Smells in Automatically
Generated Tests: Limitations, Pitfalls,
and Opportunities
A. Panichella, S. Panichella, G. Fraser,
A. A. Sawant, and V. J. Hellendoorn
1
Related Work
2
[Grano et al., JSS 2019]
JTExpert
Test Case Generation Tools
Test Smell Detection Tool from previous
work [EMSE 2015]
GPD
Grano Palomba Di Nucci
Related Work
3
[Grano et al., JSS 2019]
Main Results
81%
GPD precision in detecting test
smells (100% recall)
The tests [by EvoSuite] are scented since the
beginning as "crossover and mutation operations
[…] do not change the structure of the tests
of the JUnit test suites by
EvoSuite contain test smells
88%
Threats To Validity
Warnings raised by GPD are not manually
validated
EvoSuite was misconfigured:

- Old search algorithm

- Tests and Assertions are not minimization
Mutation and crossover alter the test
structure by adding/removing statements
[Arcuri and Fraser, TSE 2012]
Time To Revisit These Results
4
Our Study
• RQ1: How widespread are test smells in
automatically generated tests?
• RQ2: How accurate are automated tools in
detecting code smells in automatically generated
tests?
• RQ3: How well do test smells reflects real
problem in test suites?
5
Manually analysing
generated tests rather then
relying on detection tools
Assessing smell detection
accuracy based on the
manual oracle
Manual Analysis
6
100 Java
classes from
SF110
The same
classes used by
Grano at al.
100
Generated
Test Suites
Validator 2 Validator 3 Validator 4Validator 1
Validator 3 Validator 4 Validator 2Validator 1
Cross-
validated
Oracle
RQ1: Distributions of Test Smells
7
Eager Test
Assertion Roulette
Indirect Testing
Sensitive Equality
Mystery Guest
Resource Optimism
% Smelly Test Suites
0 25 50 75 100
Our results based on a
manually validated dataset
Results by Grano et al.
(based on automated tools
warning)
RQ2: Accuracy of Smell Detection Tools
8
Large False Positive Rate for Assertion
Roulette and Eager Tests
TABLE IV: Detection performance of different automated test smell detection tools for test cases generated by EVOSUITE.
FPR denotes the False Positive Rate and FNR is the False Negative Rate. The best values are highlighted in grey colour.
Test smell
Tool used by Grano et al. [6] TSDETECT calibrated by Spadini et al. [2]
FPR FNR Precision Recall F-measure FPR FNR Precision Recall F-measure
Assertion Roulette 0.72 0.00 0.22 1.00 0.36 0.05 0.50 0.67 0.5 0.57
Eager Test 0.53 0.05 0.33 0.95 0.49 0.05 0.45 0.73 0.55 0.63
Mystery Guest 0.12 — — — — 0.03 — — — —
Sensitive Equality 0.00 0.67 1.00 0.33 0.50 0.00 0.67 1.00 0.33 0.50
Resource Optimism 0.02 — — — — 0.02 — — — —
Indirect Testing 0.00 1.00 — 0.00 — — — — — —
@Test(timeout = 4000)
public void test07() throws Throwable {
ScriptOrFnScope s0 = new ScriptOrFnScope((-806),
(ScriptOrFnScope) null);
ScriptOrFnScope s1 = new ScriptOrFnScope((-330), s0);
s1.preventMunging();
s1.munge();
assertNotSame(s0, s1);
}
Fig. 2: Example of false positive for the tool used by Grano
et al. for Eager Test
@Test(timeout = 4000)
public void test00() throws Throwable {
Show show0 = new Show();
File file0 = MockFile.createTempFile("...");
Mystery Guest and Resource Optimism. For these two
types of smells, both detection tools raise several warnings.
However, they are all false positives by definition, as our gold
standard does not contain any instances of such smells. The
detection tools both annotate test methods that contain specific
strings or objects, such as: “File”, “FileOutputStream”
“DB”, “HttpClient” as smelly; however, EVOSUITE sep-
arates the test code from environmental dependencies (e.g.,
external files) in a fully automated fashion through byte-
code instrumentation [43]. In particular, it uses two mech-
anisms: (1) mocking, and (2) customized test runners. For
one, classes that access the filesystem (e.g., java.io.File)
(GPD)
Large False Negative Rate for Sensitive
Equality and Indirect Testing
GPD
Low False Positive Rate
Large False Negative Rate for most of
the test smells
TsDetector
Limitations of Test Smell Detection Tools
9
According to GPD warnings
12% of the JUnit test suites by EvoSuite
contain Mystery Guest
2% of the JUnit test suites by EvoSuite
contain Resource Optimism
EvoSuite does not use external
resources or files thanks to:
• Sandbox and scaffolding
• Automated mocks generation
• The use a customized JUnit runner
FALSE
POSITIVES
Limitations of Test Smell Detection Tools
10
GPD and TsDetector fail to detect instances of Sensitive Equality
@Test(timeout = 4000)
public void test62() throws Throwable {
SubstringLabeler.Match substringLabeler_Match0 = new SubstringLabeler.Match();
String string0 = substringLabeler_Match0.toString();
assertEquals("Substring: [Atts: ]", string0);
}
public void test62() throws Throwable {
SubstringLabeler.Match substringLabeler_Match0 = new SubstringLabeler.Match();
assertEquals("Substring: [Atts: ]", substringLabeler_Match0.toString());
}
Test generated
by EvoSuite but
not detected by
the two tools
This test would
be detected
Discussion
• In the paper we further discuss the limitations of test smell detection
tools (GDP and TsDetector) with more examples
• Our results disagree with the conclusions by Grano et al. 

• Only 80% 32% of generated tests contain test smells

• Researchers should avoid self-assessing their test smell detection tools

• The involvement of human participants (preferably in industrial contexts)
is critical for improving the accuracy of detection tools
11
Revisiting Test Smells in Automatically
Generated Tests: Limitations, Pitfalls,
and Opportunities
A. Panichella, S. Panichella, G. Fraser,
A. A. Sawant, and V. J. Hellendoorn
12

More Related Content

Similar to Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities

Soft And Handling
Soft And HandlingSoft And Handling
Soft And Handling
hiratufail
 
Aco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteAco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suite
IAEME Publication
 
Aco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteAco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suite
IAEME Publication
 
Aco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteAco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suite
IAEME Publication
 
Random testing
Random testingRandom testing
Random testing
Can KAYA
 

Similar to Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities (20)

Soft And Handling
Soft And HandlingSoft And Handling
Soft And Handling
 
Multi objective genetic algorithm for regression testing reduction
Multi objective genetic algorithm for regression testing reduction Multi objective genetic algorithm for regression testing reduction
Multi objective genetic algorithm for regression testing reduction
 
Multi objective genetic algorithm for regression
Multi objective genetic algorithm for regressionMulti objective genetic algorithm for regression
Multi objective genetic algorithm for regression
 
Aco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteAco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suite
 
Aco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteAco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suite
 
Aco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteAco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suite
 
Software testing foundation
Software testing foundationSoftware testing foundation
Software testing foundation
 
OWA BASED MAGDM TECHNIQUE IN EVALUATING DIAGNOSTIC LABORATORY UNDER FUZZY ENV...
OWA BASED MAGDM TECHNIQUE IN EVALUATING DIAGNOSTIC LABORATORY UNDER FUZZY ENV...OWA BASED MAGDM TECHNIQUE IN EVALUATING DIAGNOSTIC LABORATORY UNDER FUZZY ENV...
OWA BASED MAGDM TECHNIQUE IN EVALUATING DIAGNOSTIC LABORATORY UNDER FUZZY ENV...
 
Java Unit Testing
Java Unit TestingJava Unit Testing
Java Unit Testing
 
ETD featurespdf
ETD featurespdfETD featurespdf
ETD featurespdf
 
Coding and testing In Software Engineering
Coding and testing In Software EngineeringCoding and testing In Software Engineering
Coding and testing In Software Engineering
 
Validation and verification of immunoassay methods dr. ali mirjalili
Validation and verification of immunoassay methods dr. ali mirjalili Validation and verification of immunoassay methods dr. ali mirjalili
Validation and verification of immunoassay methods dr. ali mirjalili
 
Software testing strategy
Software testing strategySoftware testing strategy
Software testing strategy
 
[Rakuten TechConf2014] [G-4] Beyond Agile Testing to Lean Development
[Rakuten TechConf2014] [G-4] Beyond Agile Testing to Lean Development[Rakuten TechConf2014] [G-4] Beyond Agile Testing to Lean Development
[Rakuten TechConf2014] [G-4] Beyond Agile Testing to Lean Development
 
Testing
TestingTesting
Testing
 
SE2_Lec 20_Software Testing
SE2_Lec 20_Software TestingSE2_Lec 20_Software Testing
SE2_Lec 20_Software Testing
 
Random testing
Random testingRandom testing
Random testing
 
Are Your Students Ready for Lab?
Are Your Students Ready for Lab?Are Your Students Ready for Lab?
Are Your Students Ready for Lab?
 
Introduction to Gage R&R
Introduction to Gage R&RIntroduction to Gage R&R
Introduction to Gage R&R
 
Software testing lab manual
Software testing lab manualSoftware testing lab manual
Software testing lab manual
 

More from Sebastiano Panichella

Search-based Software Testing (SBST) '22
Search-based Software Testing (SBST) '22Search-based Software Testing (SBST) '22
Search-based Software Testing (SBST) '22
Sebastiano Panichella
 
NLBSE’22: Tool Competition
NLBSE’22: Tool CompetitionNLBSE’22: Tool Competition
NLBSE’22: Tool Competition
Sebastiano Panichella
 

More from Sebastiano Panichella (20)

The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...
Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...
Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
SBFT Tool Competition 2024 - CPS-UAV Test Case Generation Track
SBFT Tool Competition 2024 - CPS-UAV Test Case Generation TrackSBFT Tool Competition 2024 - CPS-UAV Test Case Generation Track
SBFT Tool Competition 2024 - CPS-UAV Test Case Generation Track
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
 
COSMOS: DevOps for Complex Cyber-physical Systems
COSMOS: DevOps for Complex Cyber-physical SystemsCOSMOS: DevOps for Complex Cyber-physical Systems
COSMOS: DevOps for Complex Cyber-physical Systems
 
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
 
An Empirical Characterization of Software Bugs in Open-Source Cyber-Physical ...
An Empirical Characterization of Software Bugs in Open-Source Cyber-Physical ...An Empirical Characterization of Software Bugs in Open-Source Cyber-Physical ...
An Empirical Characterization of Software Bugs in Open-Source Cyber-Physical ...
 
Automated Identification and Qualitative Characterization of Safety Concerns ...
Automated Identification and Qualitative Characterization of Safety Concerns ...Automated Identification and Qualitative Characterization of Safety Concerns ...
Automated Identification and Qualitative Characterization of Safety Concerns ...
 
The 2nd Intl. Workshop on NL-based Software Engineering
The 2nd Intl. Workshop on NL-based Software EngineeringThe 2nd Intl. Workshop on NL-based Software Engineering
The 2nd Intl. Workshop on NL-based Software Engineering
 
The 16th Intl. Workshop on Search-Based and Fuzz Testing
The 16th Intl. Workshop on Search-Based and Fuzz TestingThe 16th Intl. Workshop on Search-Based and Fuzz Testing
The 16th Intl. Workshop on Search-Based and Fuzz Testing
 
Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...
Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...
Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...
 
Exposed! A case study on the vulnerability-proneness of Google Play Apps
Exposed! A case study on the vulnerability-proneness of Google Play AppsExposed! A case study on the vulnerability-proneness of Google Play Apps
Exposed! A case study on the vulnerability-proneness of Google Play Apps
 
Search-based Software Testing (SBST) '22
Search-based Software Testing (SBST) '22Search-based Software Testing (SBST) '22
Search-based Software Testing (SBST) '22
 
NL-based Software Engineering (NLBSE) '22
NL-based Software Engineering (NLBSE) '22NL-based Software Engineering (NLBSE) '22
NL-based Software Engineering (NLBSE) '22
 
NLBSE’22: Tool Competition
NLBSE’22: Tool CompetitionNLBSE’22: Tool Competition
NLBSE’22: Tool Competition
 
"An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021.
 "An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021.  "An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021.
"An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021.
 
An Empirical Investigation of Relevant Changes and Automation Needs in Modern...
An Empirical Investigation of Relevant Changes and Automation Needs in Modern...An Empirical Investigation of Relevant Changes and Automation Needs in Modern...
An Empirical Investigation of Relevant Changes and Automation Needs in Modern...
 
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
 

Recently uploaded

Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
Kayode Fayemi
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
Sheetaleventcompany
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
amilabibi1
 

Recently uploaded (20)

Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 

Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities

  • 1. Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities A. Panichella, S. Panichella, G. Fraser, A. A. Sawant, and V. J. Hellendoorn 1
  • 2. Related Work 2 [Grano et al., JSS 2019] JTExpert Test Case Generation Tools Test Smell Detection Tool from previous work [EMSE 2015] GPD Grano Palomba Di Nucci
  • 3. Related Work 3 [Grano et al., JSS 2019] Main Results 81% GPD precision in detecting test smells (100% recall) The tests [by EvoSuite] are scented since the beginning as "crossover and mutation operations […] do not change the structure of the tests of the JUnit test suites by EvoSuite contain test smells 88% Threats To Validity Warnings raised by GPD are not manually validated EvoSuite was misconfigured: - Old search algorithm - Tests and Assertions are not minimization Mutation and crossover alter the test structure by adding/removing statements [Arcuri and Fraser, TSE 2012]
  • 4. Time To Revisit These Results 4
  • 5. Our Study • RQ1: How widespread are test smells in automatically generated tests? • RQ2: How accurate are automated tools in detecting code smells in automatically generated tests? • RQ3: How well do test smells reflects real problem in test suites? 5 Manually analysing generated tests rather then relying on detection tools Assessing smell detection accuracy based on the manual oracle
  • 6. Manual Analysis 6 100 Java classes from SF110 The same classes used by Grano at al. 100 Generated Test Suites Validator 2 Validator 3 Validator 4Validator 1 Validator 3 Validator 4 Validator 2Validator 1 Cross- validated Oracle
  • 7. RQ1: Distributions of Test Smells 7 Eager Test Assertion Roulette Indirect Testing Sensitive Equality Mystery Guest Resource Optimism % Smelly Test Suites 0 25 50 75 100 Our results based on a manually validated dataset Results by Grano et al. (based on automated tools warning)
  • 8. RQ2: Accuracy of Smell Detection Tools 8 Large False Positive Rate for Assertion Roulette and Eager Tests TABLE IV: Detection performance of different automated test smell detection tools for test cases generated by EVOSUITE. FPR denotes the False Positive Rate and FNR is the False Negative Rate. The best values are highlighted in grey colour. Test smell Tool used by Grano et al. [6] TSDETECT calibrated by Spadini et al. [2] FPR FNR Precision Recall F-measure FPR FNR Precision Recall F-measure Assertion Roulette 0.72 0.00 0.22 1.00 0.36 0.05 0.50 0.67 0.5 0.57 Eager Test 0.53 0.05 0.33 0.95 0.49 0.05 0.45 0.73 0.55 0.63 Mystery Guest 0.12 — — — — 0.03 — — — — Sensitive Equality 0.00 0.67 1.00 0.33 0.50 0.00 0.67 1.00 0.33 0.50 Resource Optimism 0.02 — — — — 0.02 — — — — Indirect Testing 0.00 1.00 — 0.00 — — — — — — @Test(timeout = 4000) public void test07() throws Throwable { ScriptOrFnScope s0 = new ScriptOrFnScope((-806), (ScriptOrFnScope) null); ScriptOrFnScope s1 = new ScriptOrFnScope((-330), s0); s1.preventMunging(); s1.munge(); assertNotSame(s0, s1); } Fig. 2: Example of false positive for the tool used by Grano et al. for Eager Test @Test(timeout = 4000) public void test00() throws Throwable { Show show0 = new Show(); File file0 = MockFile.createTempFile("..."); Mystery Guest and Resource Optimism. For these two types of smells, both detection tools raise several warnings. However, they are all false positives by definition, as our gold standard does not contain any instances of such smells. The detection tools both annotate test methods that contain specific strings or objects, such as: “File”, “FileOutputStream” “DB”, “HttpClient” as smelly; however, EVOSUITE sep- arates the test code from environmental dependencies (e.g., external files) in a fully automated fashion through byte- code instrumentation [43]. In particular, it uses two mech- anisms: (1) mocking, and (2) customized test runners. For one, classes that access the filesystem (e.g., java.io.File) (GPD) Large False Negative Rate for Sensitive Equality and Indirect Testing GPD Low False Positive Rate Large False Negative Rate for most of the test smells TsDetector
  • 9. Limitations of Test Smell Detection Tools 9 According to GPD warnings 12% of the JUnit test suites by EvoSuite contain Mystery Guest 2% of the JUnit test suites by EvoSuite contain Resource Optimism EvoSuite does not use external resources or files thanks to: • Sandbox and scaffolding • Automated mocks generation • The use a customized JUnit runner FALSE POSITIVES
  • 10. Limitations of Test Smell Detection Tools 10 GPD and TsDetector fail to detect instances of Sensitive Equality @Test(timeout = 4000) public void test62() throws Throwable { SubstringLabeler.Match substringLabeler_Match0 = new SubstringLabeler.Match(); String string0 = substringLabeler_Match0.toString(); assertEquals("Substring: [Atts: ]", string0); } public void test62() throws Throwable { SubstringLabeler.Match substringLabeler_Match0 = new SubstringLabeler.Match(); assertEquals("Substring: [Atts: ]", substringLabeler_Match0.toString()); } Test generated by EvoSuite but not detected by the two tools This test would be detected
  • 11. Discussion • In the paper we further discuss the limitations of test smell detection tools (GDP and TsDetector) with more examples • Our results disagree with the conclusions by Grano et al. • Only 80% 32% of generated tests contain test smells • Researchers should avoid self-assessing their test smell detection tools • The involvement of human participants (preferably in industrial contexts) is critical for improving the accuracy of detection tools 11
  • 12. Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities A. Panichella, S. Panichella, G. Fraser, A. A. Sawant, and V. J. Hellendoorn 12