Which components of a large software system are the
most defect-prone? In a study on a large SAP Java system,
we evaluated and compared a number of defect predictors,
based on code features such as complexity metrics, static
error detectors, change frequency, or component imports,
thus replicating a number of earlier case studies in an industrial
context. We found the overall predictive power to
be lower than expected; still, the resulting regression models
successfully predicted 50–60% of the 20% most defectprone
components.
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Predicting Defects in SAP Java Code: An Experience Report
1. Predicting Defects in
SAP Java Code
An Experience Report
by Tilman Holschuh
(SQS AG)
Markus Päuser
(SAP AG)
Kim Herzig
(Saarland University)
Thomas Zimmermann
(Microsoft Research)
Rahul Premraj
(Vrije University Amsterdam)
Andreas Zeller
(Saarland University)
12. Replicated 2 Studies
1
Source
code McCabe
FanOut
LoC
Coupling
Version
archive
Bug
database
13. Replicated 2 Studies
1
Source
code McCabe
FanOut
LoC
Coupling
Version
archive
Component
Quality
Bug
database
14. Replicated 2 Studies
1
Source
code McCabe
FanOut
LoC
Coupling
Version
archive Predictor
Component
Quality
Bug
database
15. Replicated 2 Studies
2
Source
code McCabe
FanOut
LoC
Coupling
Version
archive Predictor
Component
Quality
Bug
database
16. Replicated 2 Studies
2
Source
code McCabe
FanOut
Dependencies
LoC
Coupling
Version
archive Predictor
Component
Quality
Bug
database
17. The Product
‣ SAP Standard Software
‣ Large scale Java software system ( > 10M LoC )
‣ Separated in projects
‣ Service pack release cycles
18. Defect Distribution
graphic created with TreeMap (University of Maryland)
see http://www.cs.umd.edu/hcil/treemap
19. Defect Distribution
graphic created with TreeMap (University of Maryland)
see http://www.cs.umd.edu/hcil/treemap
20. Defect Distribution
20% of the code
contain ~75% of defects
graphic created with TreeMap (University of Maryland)
see http://www.cs.umd.edu/hcil/treemap
21. Defect Distribution
20% of the code
contain ~75% of defects
Upper bound for
prediction
graphic created with TreeMap (University of Maryland)
see http://www.cs.umd.edu/hcil/treemap
40. How to build
Predictor Models?
Linear Regression Support Vector
Y = Xβ + ε Machine
McCabe McCabe
FanOut FanOut
LoC LoC Dependencies
Coupling Coupling
41. Forward Prediction
t
V1 V2
static analysis
training bug data
test bug data
43. Metric Correlations
Metric Level: package Class
Project 2 Project 4
Sum 0.583 0.377
LoC
Max 0.587 n/a
Sum 0.583 0.299
McCabe
Max 0.588 0.261
0.608 n/a
Efferent Coupling
Sum 0.557 0.264
Design Rules
Max 0.578 n/a
Sum 0.308 0.403
Changes
Max 0.240 n/a
44. Metric Correlations
Metric Level: package Class
Project 2 Project 4
Sum 0.583 0.377
LoC
Prediction is more precise at
Max 0.587 n/a
Sum 0.583 0.299
McCabe
higher granularity levels
Max 0.588 0.261
0.608 n/a
Efferent Coupling
Sum 0.557 0.264
Design Rules
Max 0.578 n/a
Sum 0.308 0.403
Changes
Max 0.240 n/a
45. Hit Rate
actual predicted
1 4
2 9 Hit rate = 50%
3 2
Top 20% 4 11
5 6
6 1
7 3
8 5
9 10
10 8
11 7
46. McCabe
FanOut
LoC
Predictions using
Linear Regression
Coupling
Top 5% Top 20%
All projects 46% 55%
Group 1 47% 63%
Project 1 21% 43%
Project 2 42% 64%
Project 3 41% 55%
47. Dependencies
Predicting from
Dependencies
Support Vector
Top 5% Top 20%
Machine
Group 1 26% 43%
Project 1 38% 50%
Project 2 36% 46%
Project 3 46% 49%
48. Dependencies
Predicting from
Dependencies
Support Vector
Top 5% Top 20%
Machine
Stable
Group 1 prediction results 43%
26%
across projects
Project 1 38% 50%
Project 2 36% 46%
Project 3 46% 49%
49. Compare Results
Dependencies Metrics
80%
60%
Hit rate
40%
20%
0%
Group 1 Project 1 Project 2 Project 3
50. Compare Results
Dependencies Metrics
80%
Complexity metrics have higher
60%
predictive power
Hit rate
40%
20%
0%
Group 1 Project 1 Project 2 Project 3
51. Lessons Learned
Nagappan Schröter
et al. et al. our study
metrics defect
correlation ✔ n/a ✔
prediction
possible ✔ ✔ ✔
forward
prediction ✘ ✘ ✔
universal
predictor ✘ ✘ ✘
53. Lessons Learned
Predictions based on static code features provide
limited results and depend on the project context
54. Lessons Learned
Predictions based on static code features provide
limited results and depend on the project context
Software archives are reliable and
easily accessible source of defect data
55. Lessons Learned
Predictions based on static code features provide
limited results and depend on the project context
Software archives are reliable and
easily accessible source of defect data
Defects have many sources, and code is
just one of them