Nicolas Erni, Al-Ameen, Mohammed, Christian Birchler, Pouria Derakhshanfar, Stephan Lukasczyk, Sebastiano Panichella: SBFT Tool Competition 2024 -- Python Test Case Generation Track 17th International Workshop on Search-Based and Fuzz Testing
Introduction to Prompt Engineering (Focusing on ChatGPT)
SBFT Tool Competition 2024 -- Python Test Case Generation Track
1. Search-Based and Fuzz Testing
Tool Competition 2024
Nicolas Erni
Zurich University of Applied
Science (ZHAW)
Christian Birchler
Zurich University of Applied
Science (ZHAW)
Pouria Derakhshanfar
JetBrains
Stephan Lukasczyk
University of Passau
Mohammed Al-Ameen
Zurich University of Applied
Science (ZHAW)
Software Under test Generated Test Code
Sebastiano Panichella
Zurich University of Applied
Science (ZHAW)
Co-located with the 46th International Conference on Software Engineering (ICSE 2024)
2. History SBFT Python Tool Competition
Year Venue
Coverage
tool
Mutation Tool #CUTs #Projects
#Participants
(+ baseline)
Round 1 2024 SBST PyTest
MutPy /
Cosmic Ray
35 7 4
3. SBFT Tool Competition - 2024
Python tool competition: For the
fi
rst time ever, we are extending an invitation to researchers to participate in our
competition using their test generation tool for Python. Tools will be assessed based on a benchmark that evaluates code
coverage and mutation score.
What is New?
Figure 1: Example of test generation for simple Python functions.
New!!!
Software Under test Generated Test Code
4. Python tool competition Infrastructure
python-tool-competition-2024 Infrastructure
run run run
Klara …. Tooln
CUT
Time budget
generated
tests
generated
tests
generated
tests
5. Python tool competition Infrastructure
python-tool-competition-2024 Infrastructure
run run run
Klara …. Tooln
CUT
Time budget
Generated
tests
MutPy /
Cosmic Ray
Line and Branch
coverage metrics
Mutation metrics
6. Scoring Formula
T = Generated Test
B = Search Budget
C = Class under test
R = independent Run
Covi = statement coverage
Covb = branch coverage
Covm = Strong Mutation
getTime = generation time
covScore(T, B, C, R) = 1 × Covi + 2 × Covb + 4 × Covm
tScore(T, B, C, R) = covScore(T, B, C, R) × min
(
1,
2 × B
genTime)
Score(T, B, C, R) = tScore(T, B, C, R) + penalty(T, B, C, R)
Xavier Devroey, Alessio Gambi, Juan Pablo Galeotti, René Just, Fitsum Meshesha
Kifetew, Annibale Panichella, Sebastiano Panichella: JUGE: An infrastructure for
benchmarking Java unit test generators. Softw. Test. Verification Reliab. 33(3) (2023)
18. Lessons Learned
• Identified aspects to improve and bugs that could be fixed in the
infrastructure
• Docker will simplify the evaluation procedure
• More participants to the competition!
• From Academia & Industry
19. What’s Next?
• Contest Infrastructure
• https://github.com/ThunderKey/python-tool-competition-2024
• Improve usability
• Facilitate setup of an evaluation
• Facilitate evaluation in other contexts
• Update the user documentation
• For the next edition
• More tools
• More CUTs
• Time budgets
• Time penalty