Predicting Defects in SAP Java Code: An Experience Report

Predicting Defects in
SAP Java Code
An Experience Report
by Tilman Holschuh
(SQS AG)
Markus Päuser
(SAP AG)
Kim Herzig
(Saarland University)
Thomas Zimmermann
(Microsoft Research)
Rahul Premraj
(Vrije University Amsterdam)
Andreas Zeller
(Saarland University)

Motivation
Problems

Quality Manager Resources Time Knowledge

Motivation
Problems

Quality Manager Resources Time Knowledge

Where do we put the most effort?

Replicated 2 Studies
1

Source
code

Version
archive

Bug
database

1

Source
code McCabe
FanOut
LoC
Coupling
Version
archive

Bug
database

1

Source
code McCabe
FanOut
LoC
Coupling
Version
archive
Component
Quality

Bug
database

1

Source
code McCabe
FanOut
LoC
Coupling
Version
archive Predictor
Component
Quality

Bug
database

2

Source
code McCabe
FanOut
LoC
Coupling
Version
archive Predictor
Component
Quality

Bug
database

2

Source
code McCabe
FanOut
Dependencies
LoC
Coupling
Version
archive Predictor
Component
Quality

Bug
database

The Product

‣ SAP Standard Software
‣ Large scale Java software system ( > 10M LoC )
‣ Separated in projects
‣ Service pack release cycles

Defect Distribution

graphic created with TreeMap (University of Maryland)
see http://www.cs.umd.edu/hcil/treemap

Defect Distribution
20% of the code
contain ~75% of defects


Defect Distribution
20% of the code
contain ~75% of defects

Upper bound for
prediction


Basics

Predictor
Input Model Output

How to collect
Input Data?

1 2
McCabe
FanOut
LoC Dependencies
Coupling

Collecting Metric Data

1
McCabe
FanOut
LoC
Coupling

‣ Metric tools: ckjm,
JDepend, ephyra
1
McCabe
FanOut
LoC
Coupling

JDepend, ephyra
1
McCabe
FanOut ‣ Static code checkers:
LoC
Coupling PMD, FindBugs

JDepend, ephyra
1
McCabe
FanOut ‣ Static code checkers:
LoC
Coupling PMD, FindBugs
‣ Change frequency

JDepend ckjm

Collecting
Dependency Data
2
Dependencies

Collecting
Dependency Data
2 ‣ extracting package
import relations
Dependencies

Collecting
Dependency Data
2 ‣ extracting package
import relations
Dependencies
‣ Tool: JDepend

JDepend

How to measure
Component Quality?

Input ✔ Predictor
Model Output

Component Quality
Bug
database

Version-
archive

Component Quality
Bug Bug 42233
FileSystemPreferences
database lockFile() should close
...

Version-
archive v1.17 v1.18 v1.19

Component Quality
Bug Bug 42233
FileSystemPreferences
database lockFile() should close
...

Fixed Bug
42233

Version-

Component Quality

Fixed Bug
42233

Maintenance branch
v1.17 v1.18 v1.19

Version-

Component Quality

#defects + 1
Fixed Bug
42233

Maintenance branch
v1.17 v1.18 v1.19

Version-

How to build
Predictor Models?

Linear Regression Support Vector
Y = Xβ + ε Machine
McCabe McCabe
FanOut FanOut
LoC LoC Dependencies
Coupling Coupling

Forward Prediction

t
V1 V2

static analysis
training bug data
test bug data

Metric Correlations
Metric Level: package Class
Project 2 Project 4
Sum 0.583 0.377
LoC
Max 0.587 n/a
Sum 0.583 0.299
McCabe
Max 0.588 0.261
0.608 n/a
Efferent Coupling

Sum 0.557 0.264
Design Rules
Max 0.578 n/a
Sum 0.308 0.403
Changes
Max 0.240 n/a

Metric Correlations
Metric Level: package Class
Project 2 Project 4
Sum 0.583 0.377
LoC
Prediction is more precise at
Max 0.587 n/a
Sum 0.583 0.299
McCabe
higher granularity levels
Max 0.588 0.261
0.608 n/a
Efferent Coupling

Sum 0.557 0.264
Design Rules
Max 0.578 n/a
Sum 0.308 0.403
Changes
Max 0.240 n/a

Hit Rate
actual predicted
1 4
2 9 Hit rate = 50%
3 2
Top 20% 4 11
5 6
6 1
7 3
8 5
9 10
10 8
11 7

McCabe
FanOut
LoC
Predictions using
Linear Regression
Coupling

Top 5% Top 20%
All projects 46% 55%
Group 1 47% 63%
Project 1 21% 43%
Project 2 42% 64%
Project 3 41% 55%

Dependencies
Predicting from
Dependencies
Support Vector
Top 5% Top 20%
Machine
Group 1 26% 43%

Project 1 38% 50%

Project 2 36% 46%

Project 3 46% 49%

Dependencies
Predicting from
Dependencies
Support Vector
Top 5% Top 20%
Machine
Stable
Group 1 prediction results 43%
26%
across projects
Project 1 38% 50%

Project 2 36% 46%

Project 3 46% 49%

Compare Results
Dependencies Metrics
80%

60%
Hit rate

40%

20%

0%
Group 1 Project 1 Project 2 Project 3

Compare Results
Dependencies Metrics
80%

Complexity metrics have higher
60%

predictive power
Hit rate

40%

20%

0%
Group 1 Project 1 Project 2 Project 3

Lessons Learned
Nagappan Schröter
et al. et al. our study
metrics defect
correlation ✔ n/a ✔
prediction
possible ✔ ✔ ✔
forward
prediction ✘ ✘ ✔
universal
predictor ✘ ✘ ✘

Lessons Learned
Predictions based on static code features provide
limited results and depend on the project context

Lessons Learned

Software archives are reliable and
easily accessible source of defect data

Lessons Learned

Software archives are reliable and
easily accessible source of defect data

Defects have many sources, and code is
just one of them

SQS Software Quality Systems AG

Stollwerckstraße 11
51149 Cologne, Germany
Phone: + 49 22 03 91 54 - 7149
Fax: + 49 22 03 91 54 - 15
Email: tilman.holschuh@sqs.de

Internet: www.sqs-group.com

Thank you!
SQS Software Quality Systems AG

Stollwerckstraße 11
51149 Cologne, Germany
Phone: + 49 22 03 91 54 - 7149
Fax: + 49 22 03 91 54 - 15
Email: tilman.holschuh@sqs.de

Internet: www.sqs-group.com

Predicting Defects in SAP Java Code: An Experience Report

Predicting Defects in SAP Java Code: An Experience Report

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to Predicting Defects in SAP Java Code: An Experience Report

Similar to Predicting Defects in SAP Java Code: An Experience Report (20)

Recently uploaded

Recently uploaded (20)

Predicting Defects in SAP Java Code: An Experience Report