SlideShare a Scribd company logo
1 of 42
A Software Fault Localization Technique
Based on Program Mutations
Tao He
Coauthor with Xinming Wang, Xiaocong Zhou, Wenjun Li, Zhenyu Zhang, S.C. Cheung
elfinhe@gmail.com
Software Engineering Laboratory
Department of Computer Science, Sun Yat-Sen University
The 6nd Seminar of SELAB
November 2012
Sun Yat-Sen University, Guangzhou, China
1/23
Outline
 Background and Motivation
 Our Approach – Muffler
 Empirical Evaluation
 Conclusion
2/23
Background and Motivation
3/23
Background
 Coverage-Based Fault Localization (CBFL)
 Input
 Coverage
 Testing results (passed or failed)
 Output
 A ranking list of statements
 Ranking functions
 Most CBFL techniques are similar with each other
except that different ranking functions are used to
compute suspiciousness.
4/23
Motivation
 One fundamental assumption [YPW08] of CBFL
 The observed behaviors from passed runs can precisely
represent the correct behaviors of this program;
 and the observed behaviors from failed runs can represent the
infamous behaviors.
 Therefore, the different observed behaviors of program
entities between passed runs and failed runs will indicate the
fault’s location.
 But this does not always hold.
5/23
[YPW08] C. Yilmaz, A. Paradkar, and C. Williams. Time will tell: fault localization using time spectra. In Proceedings
of the 30th international conference on Software engineering (ICSE '08). ACM, New York, NY, USA, 81-90. 2008.
Motivation
 Coincidental Correctness (CC)
 “No failure is detected, even though a fault has been executed.” [RT93]
 i.e., the passed runs may cover the fault.
 Weaken the first part of CBFL’s assumption:
 The observed behaviors from passed runs can precisely represent
the correct behaviors of this program;
 More, CC occurs frequently in practice.[MAE+09]
6/23
[RT93] D.J. Richardson and M.C. Thompson, An analysis of test data selection criteria using the RELAY model of
fault detection, Software Engineering, IEEE Transactions on, vol. 19, (no. 6), pp. 533-553, 1993.
[MAE+09] W. Masri, R. Abou-Assi, M. El-Ghali, and N. Al-Fatairi, An empirical study of the factors that reduce the
effectiveness of coverage-based fault localization, in Proceedings of the 2nd International Workshop on Defects in
Large Software Systems: Held in conjunction with the ACM SIGSOFT International Symposium on Software Testing
and Analysis (ISSTA 2009), pp. 1-5, 2009.
Our goal is to address the CC issue via mutation analysis
Our Approach – Muffler
7/23
Why does our approach work?
- Key hypothesis
 Mutating the faulty statement tends to maintain the
results of passed test cases.
 By contrast, mutating a correct statement tends to
change the results of passed test cases (from passed to
failed).
8/23
Why does our approach work?
- Three comprehensive scenarios (1/3)
9/23
F M
M: Mutant point
F: Fault point
Program
Test cases
Test results
Passed
Failed
3 test results change from passed to failed
- If we mutate an M in different basic blocks with F
Why does our approach work?
- Three comprehensive scenarios (1/3)
10/23
F
M
M: Mutant point
F: Fault point
Program
Test cases
Test results
Passed
Failed
3 test results change from passed to failed
- If we mutate an M in different basic blocks with F
Why does our approach work?
- Three comprehensive scenarios (1/3)
11/23
F +M
M: Mutant point
F: Fault point
Program
Test cases
Test results
Passed
Failed
0 test result changes from passed to failed
- If we mutate F
Why does our approach work?
- Three comprehensive scenarios (2/3)
12/23
F
M: Mutant point
F: Fault point
Program
Test cases
Test results
Passed
Failed
M
Data Flow
Control Flow
3 test results change from passed to failed
- If we mutate an M in the same basic block with F
Due to different data flow to affect output
Why does our approach work?
- Three comprehensive scenarios (2/3)
13/23
F
M: Mutant point
F: Fault point
Program
Test cases
Test results
Passed
Failed
Data Flow
Control Flow
0 test result change from passed to failed
+M
- If we mutate F
Why does our approach work?
- Three comprehensive scenarios (3/3)
14/23
F +M
M: Mutant point
F: Fault point
Program
Test cases
Test results
Passed
Failed
0 test result changes from passed to failed
Weak ability to generate
an infectious state or to
propagate the infectious
state to output
- When CC occurs frequently
- If we mutate F
Due to weak ability to affect output
Our Approach – Muffler
15/23
 Naish, the best existing ranking function[LRR11]
 𝑆𝑢𝑠𝑝 𝑁𝑎𝑖𝑠ℎ (𝑆𝑖) = 𝐹𝑎𝑖𝑙𝑒𝑑(𝑃, 𝑆𝑖) × (𝑇𝑜𝑡𝑎𝑙𝑃𝑎𝑠𝑠𝑒𝑑(𝑃) + 1)
 Mutation impact, the average amount of testing results
change from passed to failed
 𝐼𝑚𝑝𝑎𝑐𝑡 𝑆𝑖 =
𝑗=1
𝑚
𝐶ℎ𝑎𝑛𝑔𝑒 𝑝→𝑓 𝑃, 𝑀 𝑆 𝑖,𝑗
𝑚
 Muffler, a combination of Naish and mutation impact
 𝑺𝒖𝒔𝒑 𝑴𝒖𝒇𝒇𝒍𝒆𝒓 𝑺𝒊 = 𝑺𝒖𝒔𝒑 𝑵𝒂𝒊𝒔𝒉 𝑺𝒊 – 𝑰𝒎𝒑𝒂𝒄𝒕(𝑺𝒊)
[LRR11] L. Naish, H. J. Lee, and K. Ramamohanarao, A model for spectra-based software diagnosis. ACM
Transaction on Software Engineering Methodology, 20(3):11, 2011.
Empirical Evaluation
16/23
Empirical Evaluation
Program suite
Number of
versions
Lines of
Executable
Code
Number of
test cases
LOC
tcas 41 63-67 1608 133-137
tot_info 23 122-123 1052 272-273
schedule 9 149-152 2650 290-294
schedule2 10 127-129 2710 261-263
print_tokens 7 189-190 4130 341-343
print_tokens2 10 199-200 4115 350-355
replace 32 240-245 5542 508-515
space 38 3633-3647 13585 5882-5904
17/23
 Evaluation metrics

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑢𝑙𝑡𝑦 𝑝𝑟𝑜𝑔𝑟𝑎𝑚𝑠 𝑤ℎ𝑜𝑠𝑒 𝑓𝑎𝑢𝑙𝑡 𝑐𝑎𝑛
𝑏𝑒 𝑓𝑜𝑢𝑛𝑑 𝑏𝑦 𝑒𝑥𝑎𝑚𝑖𝑛𝑖𝑛𝑔 𝑢𝑝 𝑡𝑜 𝑘% 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑜𝑑𝑒
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑢𝑙𝑡𝑦 𝑝𝑟𝑜𝑔𝑟𝑎𝑚𝑠
 Subject programs
Empirical Evaluation
18/23
Percentage of code examined
Percentageoffaultlocated
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
55%
60%
65%
70%
75%
80%
85%
90%
95%
100%
Techiniques
Muffler
Naish
Ochiai
Tarantula
Wong3
Figure: Overall effectiveness comparison.
Empirical Evaluation
19/23
% of code
examined
Tarantula Ochiai χDebug Naish Muffler
1% 14 18 19 21 35
5% 38 48 56 58 74
10% 54 63 68 68 85
15% 57 65 80 80 94
20% 60 67 84 84 99
30% 79 88 91 92 110
40% 92 98 98 99 117
50% 98 99 101 102 121
60% 99 103 105 106 123
70% 101 107 117 119 123
80% 114 122 122 123 123
90% 123 123 122 123 123
100% 123 123 123 123 123
Table: Number of faults located at different level of code
examination effort using Naish and Muffler.
 When 1% of the statements have been examined, Naish can reach the
fault in 17.07% of faulty versions. At the same time, Muffler can reach
the fault in 28.46% of faulty versions.
Empirical Evaluation
20/23
Tarantula Ochiai χDebug Naish Muffler
Min 0.00 0.00 0.00 0.00 0.00
Max 87.89 84.25 93.85 78.46 55.38
Median 20.33 9.52 7.69 7.32 3.25
Mean 27.68 23.62 20.04 19.34 9.62
Stdev 28.29 26.36 24.61 23.86 13.22
Table: Statistics of code examination effort.
Among these five techniques, Muffler always scores the best in the rows that correspond to
the minimum, median, and mean code examination effort. In addition, Muffler gets much
lower standard deviation, which means that their performances vary less widely than others,
and are shown to be more stable in terms of effectiveness. Results also show that Muffler
reduces the average code examination effort from Naish by 50.26% (=100%-
(9.62%/19.34%).
Conclusion and future work
 We propose Muffler, a technique using mutation to
help locate program faults.
 On 123 faulty versions of seven programs, we conduct
a comparison of effectiveness and efficiency with
Naish technique. Results show that Muffler reduces the
average code examination effort on each faulty version
by 50.26%.
 For future work, we plan to generalize our approach to
locate faults in multi-fault programs.
21/23
Q & A
22/23
Thank you!
Contact me via elfinhe@gmail.com
23/23
# Background
 Mutation analysis, first proposed by Hamlet [Ham77] and
Demilo et al. [DLS78] , is a fault-based testing technique
used to measure the effectiveness of a test suite.
 In mutation analysis, one introduces syntactic code
changes, one at a time, into a program to generate
various faulty programs (called mutants).
 A mutation operator is a change-seeding rule to
generate a mutant from the original program.
24/23
[Ham77] R.G. Hamlet, Testing Programs with the Aid of a Compiler, Software Engineering, IEEE Transactions on,
vol. SE-3, (no. 4), pp. 279- 290, 1977.
[DLS78] R.A. DeMillo, R.J. Lipton and F.G. Sayward, Hints on Test Data Selection: Help for the Practicing
Programmer, Computer, vol. 11, (no. 4), pp. 34-41, 1978.
# Ranking functions
25/23
Table: Ranking faunctions
𝑆𝑢𝑠𝑝 𝑇𝑎𝑟𝑎𝑛𝑡𝑢𝑙𝑎 𝑆𝑖 =
𝐹𝑎𝑖𝑙𝑒𝑑(𝑆𝑖)
𝑇𝑜𝑡𝑎𝑙𝐹𝑎𝑖𝑙𝑒𝑑
𝐹𝑎𝑖𝑙𝑒𝑑(𝑆𝑖)
𝑇𝑜𝑡𝑎𝑙𝐹𝑎𝑖𝑙𝑒𝑑
+
𝑃𝑎𝑠𝑠𝑒𝑑(𝑆𝑖)
𝑇𝑜𝑡𝑎𝑙𝑃𝑎𝑠𝑠𝑒𝑑
𝑆𝑢𝑠𝑝 𝑂𝑐ℎ𝑖𝑎𝑖 𝑆𝑖 =
𝐹𝑎𝑖𝑙𝑒𝑑(𝑆𝑖)
𝑇𝑜𝑡𝑎𝑙𝐹𝑎𝑖𝑙𝑒𝑑 × (𝐹𝑎𝑖𝑙𝑒𝑑 𝑆𝑖 + 𝑃𝑎𝑠𝑠𝑒𝑑(𝑆𝑖))
𝑆𝑢𝑠𝑝 𝜒𝐷𝑒𝑏𝑢𝑔 𝑆𝑖 = 𝐹𝑎𝑖𝑙𝑒𝑑 𝑆𝑖 − ℎ,
𝑤ℎ𝑒𝑟𝑒 ℎ =
𝑃𝑎𝑠𝑠𝑒𝑑 𝑆𝑖 , 𝑖𝑓 𝑃𝑎𝑠𝑠𝑒𝑑(𝑆𝑖) ≤ 2
2 + 0.1 × 𝑃𝑎𝑠𝑠𝑒𝑑 𝑆𝑖 − 2 , 𝑖𝑓2 < 𝑃𝑎𝑠𝑠𝑒𝑑(𝑆𝑖) ≤ 10
2.8 + 0.01 × 𝑃𝑎𝑠𝑠𝑒𝑑 𝑆𝑖 − 10 , 𝑖𝑓𝑃𝑎𝑠𝑠𝑒𝑑(𝑆𝑖) > 10
𝑆𝑢𝑠𝑝 𝑁𝑎𝑖𝑠ℎ (𝑆𝑖) = 𝐹𝑎𝑖𝑙𝑒𝑑(𝑆𝑖) × (𝑇𝑜𝑡𝑎𝑙𝑃𝑎𝑠𝑠𝑒𝑑 + 1)
 Tarantula [JHS02], Ochiai [AZV07], χDebug [WQZ+07], and Naish [NLR11]
[JHS02] J.A. Jones, M. J. Harrold, and J. Stasko. Visualization of test information to assist fault localization. In Proceedings of the
24th International Conference on Software Engineering (ICSE '02), pp. 467-477, 2002.
[AZV07] R. Abreu, P. Zoeteweij and A.J.C. Van Gemund, On the accuracy of spectrum-based fault localization, in Proc. Proceedings -
Testing: Academic and Industrial Conference Practice and Research Techniques, TAIC PART-Mutation 2007, pp. 89-98, 2007.
[WQZ+07] W.E. Wong, Yu Qi, Lei Zhao, and Kai-Yuan Cai. Effective Fault Localization using Code Coverage. In Proceedings of the
31st Annual International Computer Software and Applications Conference (COMPSAC '07), Vol. 1, pp. 449-456, 2007.
[NLR11] L. Naish, H. J. Lee, and K. Ramamohanarao, A model for spectra-based software diagnosis. ACM Transaction on Software
Engineering Methodology, 20(3):11, 2011.
# Our Approach – Muffler
26/23
Test
Suite
Faulty
Program
Ranking List of all
statements
Instrument program
&
Execute against test suite
Select statements to mutate
Mutate selected statements
Run mutants against test suite
Calculate suspiciousness
&
Sort statements
Coverage & Testing Results
Candidate Statements
Mutants
Changes of testing results
Legend
Process
Input
Output
Figure: Dataflow diagram of Muffler.
# Our Approach – Muffler
𝑆𝑢𝑠𝑝 𝑀𝑢𝑓𝑓𝑙𝑒𝑟 (𝑆𝑖) = 𝐹𝑎𝑖𝑙𝑒𝑑(𝑃, 𝑆𝑖) × (𝑇𝑜𝑡𝑎𝑙𝑃𝑎𝑠𝑠𝑒𝑑(𝑃) + 1)– 𝐼𝑚𝑝𝑎𝑐𝑡(𝑆𝑖)
27/23
Primary Key
(imprecise when
multiple faults
occurs)
Secondary Key
(invalid when
coincidental
correctness%
is high)
Additional Key
(inclined to handle
coincidental correctness)
# An Example
28/23
Part I
Statement
S1 if (block_queue){
S2 count = block_queue->mem_count + 1; /* fault: insert ‘+1’ */
S3 n = (int) (count*ratio); /* fault: missing ‘+1’ */
S4 proc = find_nth(block_queue, n);
S5 if (proc) {
S6 block_queue = del_ele(block_queue, proc);
S7 prio = proc->priority;
S8 prio_queue[prio] = append_ele(prio_queue[prio], proc);}}
Part II
Tarantula Ochiai χDebug Naish
susp* r** susp r susp r susp r
0.58 8 0.32 8 205.41 8 510812 8
0.64 7 0.36 7 205.83 7 511228 7
0.64 7 0.36 7 205.83 7 511228 7
0.64 7 0.36 7 205.83 7 511228 7
0.64 7 0.36 7 205.83 7 511228 7
0.64 3 0.37 3 205.85 3 511252 3
0.64 3 0.37 3 205.85 3 511252 3
0.64 3 0.37 3 205.85 3 511252 3
Code examination effort to locate S2 and S3:
TotalPassed TotalFailed
2440 210
Passed(s) Failed(s)
1798 210
1382 210
1382 210
1382 210
1382 210
1358 210
1358 210
1358 210
88% 88% 88% 88%
Figure: Faulty version v2 of program “schedule”.
# An Example
29/23
Part III
Mutated statement for each mutant Changep→f
M1 if (!block_queue ) { 1644
M2 count = block_queue->mem_count != 1; 249
M3 n = (int) (count <= ratio) ; 249
M4 proc = find_nth(block_queue , ratio); 1088
M5 if (!proc) { 1136
M6 block_queue = del_ele(block_queue , proc-1); 1123
M7 prio /= proc->priority; 1358
M8 prio_queue[prio] = append_ele(prio_queue[__MININT__] , proc); }} 598
Changep→f Changep→f Changep→f Changep→f
1798 1101 1101 1644
1097 1097 249 1382
1116 1101 494 1101
638 1136 744 1382
1358 1101 1382 1101
349 1358 814 1358
1358 1101 1101 1358
598 1138 1358 1101
Part IV Muffler
Impact susp r
1457.6 509354.4 8
814.8 510413.2 2
812.2 510415.8 2
997.6 510230.4 5
1215.6 510012.4 6
1000.4 510251.6 4
1255.2 509996.8 7
958.6 510293.4 3
Code examination effort to locate S2 and S3: 25%
Figure: Faulty version v2 of program “schedule”.
# An Example
30/23
Part I
Statement
S1 if (block_queue){
S2 count = block_queue->mem_count + 1; /* fault: insert ‘+1’ */
S3 n = (int) (count*ratio); /* fault: missing ‘+1’ */
S4 proc = find_nth(block_queue, n);
S5 if (proc) {
S6 block_queue = del_ele(block_queue, proc);
S7 prio = proc->priority;
S8 prio_queue[prio] = append_ele(prio_queue[prio], proc);}}
Part II
Tarantula Ochiai χDebug Naish
susp* r** susp r susp r susp r
0.58 8 0.32 8 205.41 8 510812 8
0.64 7 0.36 7 205.83 7 511228 7
0.64 7 0.36 7 205.83 7 511228 7
0.64 7 0.36 7 205.83 7 511228 7
0.64 7 0.36 7 205.83 7 511228 7
0.64 3 0.37 3 205.85 3 511252 3
0.64 3 0.37 3 205.85 3 511252 3
0.64 3 0.37 3 205.85 3 511252 3
Code examination effort to locate S2 and S3:
TotalPassed TotalFailed
2440 210
Passed(s) Failed(s)
1798 210
1382 210
1382 210
1382 210
1382 210
1358 210
1358 210
1358 210
88% 88% 88% 88%
Figure: Faulty version v2 of program “schedule”.
# An Example
31/23
Part III
Mutated statement for each mutant Changep→f
M1 if (!block_queue ) { 1644
M2 count = block_queue->mem_count != 1; 249
M3 n = (int) (count <= ratio) ; 249
M4 proc = find_nth(block_queue , ratio); 1088
M5 if (!proc) { 1136
M6 block_queue = del_ele(block_queue , proc-1); 1123
M7 prio /= proc->priority; 1358
M8 prio_queue[prio] = append_ele(prio_queue[__MININT__] , proc); }} 598
Changep→f Changep→f Changep→f Changep→f
1798 1101 1101 1644
1097 1097 249 1382
1116 1101 494 1101
638 1136 744 1382
1358 1101 1382 1101
349 1358 814 1358
1358 1101 1101 1358
598 1138 1358 1101
Part IV Muffler
Impact susp r
1457.6 509354.4 8
814.8 510413.2 2
812.2 510415.8 2
997.6 510230.4 5
1215.6 510012.4 6
1000.4 510251.6 4
1255.2 509996.8 7
958.6 510293.4 3
Code examination effort to locate S2 and S3: 25%
Figure: Faulty version v2 of program “schedule”.
# Empirical Evaluation
32/23
Versus
Tanrantula
Versus
Ochiai
Versus
χDebug
Versus
Naish
More effective 102 96 93 89
Same effectiveness 19 23 23 25
Less effective 2 4 7 9
Table: Pair-wise comparison between
Muffler and existing techniques.
Muffler is more effective (examining more statements before encountering the faulty
statement) than Naish for 89 out of 123 faulty versions; is as effective (examining the same
number of statements before encountering the faulty statement) as Naish for 25 out of 123
faulty versions; and is less effective (examining less statements before encountering the
faulty statement) than Naish for only 9 out of 123 faulty versions.
# Empirical Evaluation
33/23
Faulty versions CC% Code examination effort
Naish Muffler
v5 1% 0% 0%
v9 7% 1% 0%
v17 31% 12% 7%
v28 49% 11% 5%
v29 99% 25% 9%
 Experience on real faults
Table: Results with real faults in space
Five faulty versions are chosen to represent low, medium, and the high occurrence of
coincidental correctness. In this table, the column “CC%” presents the percentage of
coincidentally passed test cases out of all passed test cases. The columns under the head
“Code examination effort” present the percentage of code to be examined before the fault is
encountered.
# Empirical Evaluation
34/23
 Efficiency analysis
Table: Time spent by each technique on subject programs.
We have shown experimentally that, by taking advantages from both coverage and mutation
impact, Muffler outperforms Naish regardless the occurrence of coincidental correctness.
Unfortunately, our approaches, Muffler need to execute piles of mutants to compute mutation
impact. The execution of mutants against the test suite may increase the time cost of fault
localization. The time mainly contains the cost of instrumentation, execution, and coverage
collection. From this table, we observe that Muffler takes approximately 62.59 times of
average time cost to the Naish technique.
Program suite CBFL (seconds) Muffler (seconds)
tcas 18.00 868.68
tot_info 11.92 573.12
schedule 34.02 2703.01
schedule2 27.76 1773.14
print_tokens 59.11 2530.17
print_tokens2 62.07 5062.87
replace 69.13 4139.19
Average 40.29 2521.46
# Empirical Evaluation
35/23
 Efficiency analysis
Table: Information about mutants generated.
This Table illustrates the detailed data about the number of mutated/total executable
statements, the number of mutants generated, and the time cost of running each mutant. For
example, of the program tcas, there are, on average, 40.15 statements that are mutated by
Muffler; and 65.10 executable statements in total; 199.90 mutants are generated and it takes
4.26 seconds to run each of them, on average. Notice that there is no need to collect coverage
from the mutants’ executions, and it takes about 1/4 time to run a mutant without
instrumentation and coverage collection.
Program
suite
Mutated
statements
Total
statements
Mutants
Time per mutant
(seconds)
tcas 40.15 65.10 199.90 4.26
tot_info 39.57 122.96 191.87 2.92
schedule 80.60 150.20 351.60 7.59
schedule2 75.33 127.56 327.78 5.32
print_tokens 67.43 189.86 260.29 9.49
print_tokens2 86.67 199.44 398.67 12.54
replace 71.14 242.86 305.93 13.30
Average 56.52 142.79 256.90 7.92
How about the coincidental
correctness issue?
36/23
Empirical Evaluation
- The impact of coincidental correctness
37/23
0 %
10 %
20 %
30 %
40 %
50 %
60 %
70 %
80 %
90 %
100 %
0% 20% 40% 60% 80% 100%
Percentageofcodeexamined(Muffler)
Percentage of coincidental correctness (|Tcc|/|Tp|)
0 %
10 %
20 %
30 %
40 %
50 %
60 %
70 %
80 %
90 %
100 %
0% 20% 40% 60% 80% 100%
Percentageofcodeexamined(Naish)
Percentage of coincidental correctness (|Tcc|/|Tp|)
 Each point in Figure 5 represents a faulty version; the horizontal axis presents the
faulty version’s percentage of coincidental correctness (CC%) that occurs in
passed test cases, and the vertical axis presents the faulty version’s code
examination effort to find the fault. The polynomial fitting curve (second order)
represents the points’ tendency.
Figure 5: Correlation between effectiveness and coincidental correctness.
Does this work in real programs?
38/23
Why does our approach work?
- A feasibility study
39/23
0
200
400
600
800
1000
tcas v7
0
200
400
600
800
tot_info v17
0
500
1000
1500
2000
schedule v4
0
500
1000
1500
2000
2500
schedule2 v1
0
1000
2000
3000
4000
print_tokens v7
0
1000
2000
3000
4000
print_tokens2 v3
0
1000
2000
3000
4000
replace v24
0
50
100
150
space v20
The vertical axis denotes the number of testing results changes (from ‘passed’ to ‘failed’),
and horizontal width denotes the probability density at corresponding amount of testing
results changes.
Figure: Distribution of statements’ result changes
and faulty statement’s testing result changes.
Why does our approach work?
- A feasibility study
40/23
The vertical axis denotes the number of testing results changes (from ‘passed’ to ‘failed’),
and horizontal width denotes the probability density at corresponding amount of testing
results changes.
Figure: Distribution of statements’ result changes
and faulty statement’s testing result changes.
0
200
400
600
800
1000
tcas v7
0
200
400
600
800
tot_info v7
0
500
1000
1500
2000
2500
schedule v2
0
500
1000
1500
2000
2500
schedule2 v1
0
1000
2000
3000
4000
print_tokens v2
0
1000
2000
3000
4000
print_tokens2 v3
0
1000
2000
3000
4000
replace v15
0
50
100
150
space v8
0
200
400
600
800
1000
tcas v12
0
200
400
600
800
tot_info v8
0
500
1000
1500
2000
2500
schedule v3
0
500
1000
1500
2000
2500
schedule2 v4
0
1000
2000
3000
4000
print_tokens v3
0
1000
2000
3000
4000
print_tokens2 v6
0
1000
2000
3000
4000
replace v17
0
50
100
150
space v11
0
200
400
600
800
1000
tcas v17
0
200
400
600
800
tot_info v17
0
500
1000
1500
2000
2500
schedule v4
0
500
1000
1500
2000
2500
schedule2 v6
0
1000
2000
3000
4000
print_tokens v7
0
1000
2000
3000
4000
print_tokens2 v9
0
1000
2000
3000
4000
replace v24
0
50
100
150
space v20
Why does our approach work?
- Another feasibility study (When CC%≥95%)
41/23
0
5
10
15
20
25
0 % 20 % 40 % 60 % 80 %
Frequencyoffaultyversions
Percentage of code examined
 When CC% is greater or equal than 95%, code examination effort
reduction of result changes is 65.66% (=100%-16.33%/47.55%).
 Only 6 faulty versions need to examine less than 20% of statements for
Naish, while 22 versions by using result changes
∎ Result changes (avg. 16.33%)
∎ Naish (avg. 47.55%)
Figure: Frequency distribution of effectiveness
when CC%≥ 95%.
Experience on real faults
42/23
Table 8: Results with real faults in space
Faulty
versions
CC%
Lines of code examined
Naish Muffler
v5 0.90% 2 1
v20 1.97% 15 5
v21 1.97% 15 6
v10 2.74% 47 18
v11 6.29% 37 14
V6 6.92% 40 7
v9 19.05% 7 1
v17 30.92% 427 244
v28 48.57% 268 170
v29 99.32% 797 331

More Related Content

What's hot

Search-based testing of procedural programs:iterative single-target or multi-...
Search-based testing of procedural programs:iterative single-target or multi-...Search-based testing of procedural programs:iterative single-target or multi-...
Search-based testing of procedural programs:iterative single-target or multi-...Vrije Universiteit Brussel
 
Software Testing Foundations Part 6 - Intuitive and Experience-based testing
Software Testing Foundations Part 6 - Intuitive and Experience-based testingSoftware Testing Foundations Part 6 - Intuitive and Experience-based testing
Software Testing Foundations Part 6 - Intuitive and Experience-based testingNikita Knysh
 
AI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsAI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsChakkrit (Kla) Tantithamthavorn
 
Experiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryExperiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryTim Menzies
 
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...CS, NcState
 
Speeding-up Software Testing With Computational Intelligence
Speeding-up Software Testing With Computational IntelligenceSpeeding-up Software Testing With Computational Intelligence
Speeding-up Software Testing With Computational IntelligenceAnnibale Panichella
 
AUTOMATIC GENERATION AND OPTIMIZATION OF TEST DATA USING HARMONY SEARCH ALGOR...
AUTOMATIC GENERATION AND OPTIMIZATION OF TEST DATA USING HARMONY SEARCH ALGOR...AUTOMATIC GENERATION AND OPTIMIZATION OF TEST DATA USING HARMONY SEARCH ALGOR...
AUTOMATIC GENERATION AND OPTIMIZATION OF TEST DATA USING HARMONY SEARCH ALGOR...csandit
 
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...Chakkrit (Kla) Tantithamthavorn
 
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...Abdel Salam Sayyad
 
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Chakkrit (Kla) Tantithamthavorn
 
THE APPLICATION OF CAUSE EFFECT GRAPH FOR THE COLLEGE PLACEMENT PROCESS
THE APPLICATION OF CAUSE EFFECT GRAPH FOR THE COLLEGE PLACEMENT PROCESSTHE APPLICATION OF CAUSE EFFECT GRAPH FOR THE COLLEGE PLACEMENT PROCESS
THE APPLICATION OF CAUSE EFFECT GRAPH FOR THE COLLEGE PLACEMENT PROCESSVESIT/University of Mumbai
 
Icsoc12 tooldemo.ppt
Icsoc12 tooldemo.pptIcsoc12 tooldemo.ppt
Icsoc12 tooldemo.pptPtidej Team
 
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...Chakkrit (Kla) Tantithamthavorn
 
Test design techniques: Structured and Experienced-based techniques
Test design techniques: Structured and Experienced-based techniquesTest design techniques: Structured and Experienced-based techniques
Test design techniques: Structured and Experienced-based techniquesKhuong Nguyen
 
Test design techniques
Test design techniquesTest design techniques
Test design techniquesOksana
 
Testing Fundamentals
Testing FundamentalsTesting Fundamentals
Testing FundamentalsKiran Kumar
 
Software testing strategy
Software testing strategySoftware testing strategy
Software testing strategyijseajournal
 
Introduction to specification based test design techniques
Introduction to specification based test design techniquesIntroduction to specification based test design techniques
Introduction to specification based test design techniquesYogindernath Gupta
 
Boundary value analysis and equivalence partitioning
Boundary value analysis and equivalence partitioningBoundary value analysis and equivalence partitioning
Boundary value analysis and equivalence partitioningSneha Singh
 

What's hot (20)

Search-based testing of procedural programs:iterative single-target or multi-...
Search-based testing of procedural programs:iterative single-target or multi-...Search-based testing of procedural programs:iterative single-target or multi-...
Search-based testing of procedural programs:iterative single-target or multi-...
 
Software Testing Foundations Part 6 - Intuitive and Experience-based testing
Software Testing Foundations Part 6 - Intuitive and Experience-based testingSoftware Testing Foundations Part 6 - Intuitive and Experience-based testing
Software Testing Foundations Part 6 - Intuitive and Experience-based testing
 
AI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsAI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOps
 
Experiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryExperiments on Design Pattern Discovery
Experiments on Design Pattern Discovery
 
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
 
Speeding-up Software Testing With Computational Intelligence
Speeding-up Software Testing With Computational IntelligenceSpeeding-up Software Testing With Computational Intelligence
Speeding-up Software Testing With Computational Intelligence
 
AUTOMATIC GENERATION AND OPTIMIZATION OF TEST DATA USING HARMONY SEARCH ALGOR...
AUTOMATIC GENERATION AND OPTIMIZATION OF TEST DATA USING HARMONY SEARCH ALGOR...AUTOMATIC GENERATION AND OPTIMIZATION OF TEST DATA USING HARMONY SEARCH ALGOR...
AUTOMATIC GENERATION AND OPTIMIZATION OF TEST DATA USING HARMONY SEARCH ALGOR...
 
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
 
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
 
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
 
THE APPLICATION OF CAUSE EFFECT GRAPH FOR THE COLLEGE PLACEMENT PROCESS
THE APPLICATION OF CAUSE EFFECT GRAPH FOR THE COLLEGE PLACEMENT PROCESSTHE APPLICATION OF CAUSE EFFECT GRAPH FOR THE COLLEGE PLACEMENT PROCESS
THE APPLICATION OF CAUSE EFFECT GRAPH FOR THE COLLEGE PLACEMENT PROCESS
 
Icsoc12 tooldemo.ppt
Icsoc12 tooldemo.pptIcsoc12 tooldemo.ppt
Icsoc12 tooldemo.ppt
 
Ijcatr04051005
Ijcatr04051005Ijcatr04051005
Ijcatr04051005
 
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
 
Test design techniques: Structured and Experienced-based techniques
Test design techniques: Structured and Experienced-based techniquesTest design techniques: Structured and Experienced-based techniques
Test design techniques: Structured and Experienced-based techniques
 
Test design techniques
Test design techniquesTest design techniques
Test design techniques
 
Testing Fundamentals
Testing FundamentalsTesting Fundamentals
Testing Fundamentals
 
Software testing strategy
Software testing strategySoftware testing strategy
Software testing strategy
 
Introduction to specification based test design techniques
Introduction to specification based test design techniquesIntroduction to specification based test design techniques
Introduction to specification based test design techniques
 
Boundary value analysis and equivalence partitioning
Boundary value analysis and equivalence partitioningBoundary value analysis and equivalence partitioning
Boundary value analysis and equivalence partitioning
 

Similar to A software fault localization technique based on program mutations

DEFECT PREDICTION USING ORDER STATISTICS
DEFECT PREDICTION USING ORDER STATISTICSDEFECT PREDICTION USING ORDER STATISTICS
DEFECT PREDICTION USING ORDER STATISTICSIAEME Publication
 
H047054064
H047054064H047054064
H047054064inventy
 
Software testing defect prediction model a practical approach
Software testing defect prediction model   a practical approachSoftware testing defect prediction model   a practical approach
Software testing defect prediction model a practical approacheSAT Journals
 
ePoster_Saunak.Amitangshu
ePoster_Saunak.AmitangshuePoster_Saunak.Amitangshu
ePoster_Saunak.AmitangshuSaunak Saha
 
Software Testing Using Genetic Algorithms
Software Testing Using Genetic AlgorithmsSoftware Testing Using Genetic Algorithms
Software Testing Using Genetic AlgorithmsIJCSES Journal
 
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASESA PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASESKula Sekhar Reddy Yerraguntla
 
Configuration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case PrioritizationConfiguration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case Prioritizationijsrd.com
 
An Implementation on Effective Robot Mission under Critical Environemental Co...
An Implementation on Effective Robot Mission under Critical Environemental Co...An Implementation on Effective Robot Mission under Critical Environemental Co...
An Implementation on Effective Robot Mission under Critical Environemental Co...IJERA Editor
 
A Review on Software Fault Detection and Prevention Mechanism in Software Dev...
A Review on Software Fault Detection and Prevention Mechanism in Software Dev...A Review on Software Fault Detection and Prevention Mechanism in Software Dev...
A Review on Software Fault Detection and Prevention Mechanism in Software Dev...iosrjce
 
Enabling and Supporting the Debugging of Field Failures (Job Talk)
Enabling and Supporting the Debugging of Field Failures (Job Talk)Enabling and Supporting the Debugging of Field Failures (Job Talk)
Enabling and Supporting the Debugging of Field Failures (Job Talk)James Clause
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesM HiDayat
 
An SPRT Procedure for an Ungrouped Data using MMLE Approach
An SPRT Procedure for an Ungrouped Data using MMLE ApproachAn SPRT Procedure for an Ungrouped Data using MMLE Approach
An SPRT Procedure for an Ungrouped Data using MMLE ApproachIOSR Journals
 
Comparative performance analysis
Comparative performance analysisComparative performance analysis
Comparative performance analysiscsandit
 
Comparative Performance Analysis of Machine Learning Techniques for Software ...
Comparative Performance Analysis of Machine Learning Techniques for Software ...Comparative Performance Analysis of Machine Learning Techniques for Software ...
Comparative Performance Analysis of Machine Learning Techniques for Software ...csandit
 
ISTQB Advanced Study Guide - 3
ISTQB Advanced Study Guide - 3ISTQB Advanced Study Guide - 3
ISTQB Advanced Study Guide - 3Yogindernath Gupta
 

Similar to A software fault localization technique based on program mutations (20)

DEFECT PREDICTION USING ORDER STATISTICS
DEFECT PREDICTION USING ORDER STATISTICSDEFECT PREDICTION USING ORDER STATISTICS
DEFECT PREDICTION USING ORDER STATISTICS
 
H047054064
H047054064H047054064
H047054064
 
My Academic project work
My Academic project workMy Academic project work
My Academic project work
 
Debug me
Debug meDebug me
Debug me
 
Software testing defect prediction model a practical approach
Software testing defect prediction model   a practical approachSoftware testing defect prediction model   a practical approach
Software testing defect prediction model a practical approach
 
ePoster_Saunak.Amitangshu
ePoster_Saunak.AmitangshuePoster_Saunak.Amitangshu
ePoster_Saunak.Amitangshu
 
Software Testing Using Genetic Algorithms
Software Testing Using Genetic AlgorithmsSoftware Testing Using Genetic Algorithms
Software Testing Using Genetic Algorithms
 
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASESA PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
 
Configuration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case PrioritizationConfiguration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case Prioritization
 
An Implementation on Effective Robot Mission under Critical Environemental Co...
An Implementation on Effective Robot Mission under Critical Environemental Co...An Implementation on Effective Robot Mission under Critical Environemental Co...
An Implementation on Effective Robot Mission under Critical Environemental Co...
 
A Review on Software Fault Detection and Prevention Mechanism in Software Dev...
A Review on Software Fault Detection and Prevention Mechanism in Software Dev...A Review on Software Fault Detection and Prevention Mechanism in Software Dev...
A Review on Software Fault Detection and Prevention Mechanism in Software Dev...
 
F017652530
F017652530F017652530
F017652530
 
Enabling and Supporting the Debugging of Field Failures (Job Talk)
Enabling and Supporting the Debugging of Field Failures (Job Talk)Enabling and Supporting the Debugging of Field Failures (Job Talk)
Enabling and Supporting the Debugging of Field Failures (Job Talk)
 
SAIConference_PAPER
SAIConference_PAPERSAIConference_PAPER
SAIConference_PAPER
 
50120140502017
5012014050201750120140502017
50120140502017
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
An SPRT Procedure for an Ungrouped Data using MMLE Approach
An SPRT Procedure for an Ungrouped Data using MMLE ApproachAn SPRT Procedure for an Ungrouped Data using MMLE Approach
An SPRT Procedure for an Ungrouped Data using MMLE Approach
 
Comparative performance analysis
Comparative performance analysisComparative performance analysis
Comparative performance analysis
 
Comparative Performance Analysis of Machine Learning Techniques for Software ...
Comparative Performance Analysis of Machine Learning Techniques for Software ...Comparative Performance Analysis of Machine Learning Techniques for Software ...
Comparative Performance Analysis of Machine Learning Techniques for Software ...
 
ISTQB Advanced Study Guide - 3
ISTQB Advanced Study Guide - 3ISTQB Advanced Study Guide - 3
ISTQB Advanced Study Guide - 3
 

More from Tao He

Java 并发编程笔记:01. 并行与并发 —— 概念
Java 并发编程笔记:01. 并行与并发 —— 概念Java 并发编程笔记:01. 并行与并发 —— 概念
Java 并发编程笔记:01. 并行与并发 —— 概念Tao He
 
Introduction to llvm
Introduction to llvmIntroduction to llvm
Introduction to llvmTao He
 
Testing survey
Testing surveyTesting survey
Testing surveyTao He
 
Smart debugger
Smart debuggerSmart debugger
Smart debuggerTao He
 
Mutation testing
Mutation testingMutation testing
Mutation testingTao He
 
C语言benchmark覆盖信息收集总结4
C语言benchmark覆盖信息收集总结4C语言benchmark覆盖信息收集总结4
C语言benchmark覆盖信息收集总结4Tao He
 
Django
DjangoDjango
DjangoTao He
 
基于覆盖信息的软件错误定位技术综述
基于覆盖信息的软件错误定位技术综述基于覆盖信息的软件错误定位技术综述
基于覆盖信息的软件错误定位技术综述Tao He
 
Java覆盖信息收集工具比较
Java覆盖信息收集工具比较Java覆盖信息收集工具比较
Java覆盖信息收集工具比较Tao He
 
Testing group’s work on fault localization
Testing group’s work on fault localizationTesting group’s work on fault localization
Testing group’s work on fault localizationTao He
 
Muffler a tool using mutation to facilitate fault localization 2.0
Muffler a tool using mutation to facilitate fault localization 2.0Muffler a tool using mutation to facilitate fault localization 2.0
Muffler a tool using mutation to facilitate fault localization 2.0Tao He
 
Muffler a tool using mutation to facilitate fault localization 2.3
Muffler a tool using mutation to facilitate fault localization 2.3Muffler a tool using mutation to facilitate fault localization 2.3
Muffler a tool using mutation to facilitate fault localization 2.3Tao He
 
Semantic Parsing in Bayesian Anti Spam
Semantic Parsing in Bayesian Anti SpamSemantic Parsing in Bayesian Anti Spam
Semantic Parsing in Bayesian Anti SpamTao He
 
Problems
ProblemsProblems
ProblemsTao He
 
A survey of software testing
A survey of software testingA survey of software testing
A survey of software testingTao He
 
Cleansing test suites from coincidental correctness to enhance falut localiza...
Cleansing test suites from coincidental correctness to enhance falut localiza...Cleansing test suites from coincidental correctness to enhance falut localiza...
Cleansing test suites from coincidental correctness to enhance falut localiza...Tao He
 
Concrete meta research - how to collect, manage, and read papers?
Concrete meta research - how to collect, manage, and read papers?Concrete meta research - how to collect, manage, and read papers?
Concrete meta research - how to collect, manage, and read papers?Tao He
 

More from Tao He (17)

Java 并发编程笔记:01. 并行与并发 —— 概念
Java 并发编程笔记:01. 并行与并发 —— 概念Java 并发编程笔记:01. 并行与并发 —— 概念
Java 并发编程笔记:01. 并行与并发 —— 概念
 
Introduction to llvm
Introduction to llvmIntroduction to llvm
Introduction to llvm
 
Testing survey
Testing surveyTesting survey
Testing survey
 
Smart debugger
Smart debuggerSmart debugger
Smart debugger
 
Mutation testing
Mutation testingMutation testing
Mutation testing
 
C语言benchmark覆盖信息收集总结4
C语言benchmark覆盖信息收集总结4C语言benchmark覆盖信息收集总结4
C语言benchmark覆盖信息收集总结4
 
Django
DjangoDjango
Django
 
基于覆盖信息的软件错误定位技术综述
基于覆盖信息的软件错误定位技术综述基于覆盖信息的软件错误定位技术综述
基于覆盖信息的软件错误定位技术综述
 
Java覆盖信息收集工具比较
Java覆盖信息收集工具比较Java覆盖信息收集工具比较
Java覆盖信息收集工具比较
 
Testing group’s work on fault localization
Testing group’s work on fault localizationTesting group’s work on fault localization
Testing group’s work on fault localization
 
Muffler a tool using mutation to facilitate fault localization 2.0
Muffler a tool using mutation to facilitate fault localization 2.0Muffler a tool using mutation to facilitate fault localization 2.0
Muffler a tool using mutation to facilitate fault localization 2.0
 
Muffler a tool using mutation to facilitate fault localization 2.3
Muffler a tool using mutation to facilitate fault localization 2.3Muffler a tool using mutation to facilitate fault localization 2.3
Muffler a tool using mutation to facilitate fault localization 2.3
 
Semantic Parsing in Bayesian Anti Spam
Semantic Parsing in Bayesian Anti SpamSemantic Parsing in Bayesian Anti Spam
Semantic Parsing in Bayesian Anti Spam
 
Problems
ProblemsProblems
Problems
 
A survey of software testing
A survey of software testingA survey of software testing
A survey of software testing
 
Cleansing test suites from coincidental correctness to enhance falut localiza...
Cleansing test suites from coincidental correctness to enhance falut localiza...Cleansing test suites from coincidental correctness to enhance falut localiza...
Cleansing test suites from coincidental correctness to enhance falut localiza...
 
Concrete meta research - how to collect, manage, and read papers?
Concrete meta research - how to collect, manage, and read papers?Concrete meta research - how to collect, manage, and read papers?
Concrete meta research - how to collect, manage, and read papers?
 

Recently uploaded

PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 

Recently uploaded (20)

PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 

A software fault localization technique based on program mutations

  • 1. A Software Fault Localization Technique Based on Program Mutations Tao He Coauthor with Xinming Wang, Xiaocong Zhou, Wenjun Li, Zhenyu Zhang, S.C. Cheung elfinhe@gmail.com Software Engineering Laboratory Department of Computer Science, Sun Yat-Sen University The 6nd Seminar of SELAB November 2012 Sun Yat-Sen University, Guangzhou, China 1/23
  • 2. Outline  Background and Motivation  Our Approach – Muffler  Empirical Evaluation  Conclusion 2/23
  • 4. Background  Coverage-Based Fault Localization (CBFL)  Input  Coverage  Testing results (passed or failed)  Output  A ranking list of statements  Ranking functions  Most CBFL techniques are similar with each other except that different ranking functions are used to compute suspiciousness. 4/23
  • 5. Motivation  One fundamental assumption [YPW08] of CBFL  The observed behaviors from passed runs can precisely represent the correct behaviors of this program;  and the observed behaviors from failed runs can represent the infamous behaviors.  Therefore, the different observed behaviors of program entities between passed runs and failed runs will indicate the fault’s location.  But this does not always hold. 5/23 [YPW08] C. Yilmaz, A. Paradkar, and C. Williams. Time will tell: fault localization using time spectra. In Proceedings of the 30th international conference on Software engineering (ICSE '08). ACM, New York, NY, USA, 81-90. 2008.
  • 6. Motivation  Coincidental Correctness (CC)  “No failure is detected, even though a fault has been executed.” [RT93]  i.e., the passed runs may cover the fault.  Weaken the first part of CBFL’s assumption:  The observed behaviors from passed runs can precisely represent the correct behaviors of this program;  More, CC occurs frequently in practice.[MAE+09] 6/23 [RT93] D.J. Richardson and M.C. Thompson, An analysis of test data selection criteria using the RELAY model of fault detection, Software Engineering, IEEE Transactions on, vol. 19, (no. 6), pp. 533-553, 1993. [MAE+09] W. Masri, R. Abou-Assi, M. El-Ghali, and N. Al-Fatairi, An empirical study of the factors that reduce the effectiveness of coverage-based fault localization, in Proceedings of the 2nd International Workshop on Defects in Large Software Systems: Held in conjunction with the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2009), pp. 1-5, 2009.
  • 7. Our goal is to address the CC issue via mutation analysis Our Approach – Muffler 7/23
  • 8. Why does our approach work? - Key hypothesis  Mutating the faulty statement tends to maintain the results of passed test cases.  By contrast, mutating a correct statement tends to change the results of passed test cases (from passed to failed). 8/23
  • 9. Why does our approach work? - Three comprehensive scenarios (1/3) 9/23 F M M: Mutant point F: Fault point Program Test cases Test results Passed Failed 3 test results change from passed to failed - If we mutate an M in different basic blocks with F
  • 10. Why does our approach work? - Three comprehensive scenarios (1/3) 10/23 F M M: Mutant point F: Fault point Program Test cases Test results Passed Failed 3 test results change from passed to failed - If we mutate an M in different basic blocks with F
  • 11. Why does our approach work? - Three comprehensive scenarios (1/3) 11/23 F +M M: Mutant point F: Fault point Program Test cases Test results Passed Failed 0 test result changes from passed to failed - If we mutate F
  • 12. Why does our approach work? - Three comprehensive scenarios (2/3) 12/23 F M: Mutant point F: Fault point Program Test cases Test results Passed Failed M Data Flow Control Flow 3 test results change from passed to failed - If we mutate an M in the same basic block with F Due to different data flow to affect output
  • 13. Why does our approach work? - Three comprehensive scenarios (2/3) 13/23 F M: Mutant point F: Fault point Program Test cases Test results Passed Failed Data Flow Control Flow 0 test result change from passed to failed +M - If we mutate F
  • 14. Why does our approach work? - Three comprehensive scenarios (3/3) 14/23 F +M M: Mutant point F: Fault point Program Test cases Test results Passed Failed 0 test result changes from passed to failed Weak ability to generate an infectious state or to propagate the infectious state to output - When CC occurs frequently - If we mutate F Due to weak ability to affect output
  • 15. Our Approach – Muffler 15/23  Naish, the best existing ranking function[LRR11]  𝑆𝑢𝑠𝑝 𝑁𝑎𝑖𝑠ℎ (𝑆𝑖) = 𝐹𝑎𝑖𝑙𝑒𝑑(𝑃, 𝑆𝑖) × (𝑇𝑜𝑡𝑎𝑙𝑃𝑎𝑠𝑠𝑒𝑑(𝑃) + 1)  Mutation impact, the average amount of testing results change from passed to failed  𝐼𝑚𝑝𝑎𝑐𝑡 𝑆𝑖 = 𝑗=1 𝑚 𝐶ℎ𝑎𝑛𝑔𝑒 𝑝→𝑓 𝑃, 𝑀 𝑆 𝑖,𝑗 𝑚  Muffler, a combination of Naish and mutation impact  𝑺𝒖𝒔𝒑 𝑴𝒖𝒇𝒇𝒍𝒆𝒓 𝑺𝒊 = 𝑺𝒖𝒔𝒑 𝑵𝒂𝒊𝒔𝒉 𝑺𝒊 – 𝑰𝒎𝒑𝒂𝒄𝒕(𝑺𝒊) [LRR11] L. Naish, H. J. Lee, and K. Ramamohanarao, A model for spectra-based software diagnosis. ACM Transaction on Software Engineering Methodology, 20(3):11, 2011.
  • 17. Empirical Evaluation Program suite Number of versions Lines of Executable Code Number of test cases LOC tcas 41 63-67 1608 133-137 tot_info 23 122-123 1052 272-273 schedule 9 149-152 2650 290-294 schedule2 10 127-129 2710 261-263 print_tokens 7 189-190 4130 341-343 print_tokens2 10 199-200 4115 350-355 replace 32 240-245 5542 508-515 space 38 3633-3647 13585 5882-5904 17/23  Evaluation metrics  𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑢𝑙𝑡𝑦 𝑝𝑟𝑜𝑔𝑟𝑎𝑚𝑠 𝑤ℎ𝑜𝑠𝑒 𝑓𝑎𝑢𝑙𝑡 𝑐𝑎𝑛 𝑏𝑒 𝑓𝑜𝑢𝑛𝑑 𝑏𝑦 𝑒𝑥𝑎𝑚𝑖𝑛𝑖𝑛𝑔 𝑢𝑝 𝑡𝑜 𝑘% 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑜𝑑𝑒 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑢𝑙𝑡𝑦 𝑝𝑟𝑜𝑔𝑟𝑎𝑚𝑠  Subject programs
  • 18. Empirical Evaluation 18/23 Percentage of code examined Percentageoffaultlocated 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% 100% Techiniques Muffler Naish Ochiai Tarantula Wong3 Figure: Overall effectiveness comparison.
  • 19. Empirical Evaluation 19/23 % of code examined Tarantula Ochiai χDebug Naish Muffler 1% 14 18 19 21 35 5% 38 48 56 58 74 10% 54 63 68 68 85 15% 57 65 80 80 94 20% 60 67 84 84 99 30% 79 88 91 92 110 40% 92 98 98 99 117 50% 98 99 101 102 121 60% 99 103 105 106 123 70% 101 107 117 119 123 80% 114 122 122 123 123 90% 123 123 122 123 123 100% 123 123 123 123 123 Table: Number of faults located at different level of code examination effort using Naish and Muffler.  When 1% of the statements have been examined, Naish can reach the fault in 17.07% of faulty versions. At the same time, Muffler can reach the fault in 28.46% of faulty versions.
  • 20. Empirical Evaluation 20/23 Tarantula Ochiai χDebug Naish Muffler Min 0.00 0.00 0.00 0.00 0.00 Max 87.89 84.25 93.85 78.46 55.38 Median 20.33 9.52 7.69 7.32 3.25 Mean 27.68 23.62 20.04 19.34 9.62 Stdev 28.29 26.36 24.61 23.86 13.22 Table: Statistics of code examination effort. Among these five techniques, Muffler always scores the best in the rows that correspond to the minimum, median, and mean code examination effort. In addition, Muffler gets much lower standard deviation, which means that their performances vary less widely than others, and are shown to be more stable in terms of effectiveness. Results also show that Muffler reduces the average code examination effort from Naish by 50.26% (=100%- (9.62%/19.34%).
  • 21. Conclusion and future work  We propose Muffler, a technique using mutation to help locate program faults.  On 123 faulty versions of seven programs, we conduct a comparison of effectiveness and efficiency with Naish technique. Results show that Muffler reduces the average code examination effort on each faulty version by 50.26%.  For future work, we plan to generalize our approach to locate faults in multi-fault programs. 21/23
  • 23. Thank you! Contact me via elfinhe@gmail.com 23/23
  • 24. # Background  Mutation analysis, first proposed by Hamlet [Ham77] and Demilo et al. [DLS78] , is a fault-based testing technique used to measure the effectiveness of a test suite.  In mutation analysis, one introduces syntactic code changes, one at a time, into a program to generate various faulty programs (called mutants).  A mutation operator is a change-seeding rule to generate a mutant from the original program. 24/23 [Ham77] R.G. Hamlet, Testing Programs with the Aid of a Compiler, Software Engineering, IEEE Transactions on, vol. SE-3, (no. 4), pp. 279- 290, 1977. [DLS78] R.A. DeMillo, R.J. Lipton and F.G. Sayward, Hints on Test Data Selection: Help for the Practicing Programmer, Computer, vol. 11, (no. 4), pp. 34-41, 1978.
  • 25. # Ranking functions 25/23 Table: Ranking faunctions 𝑆𝑢𝑠𝑝 𝑇𝑎𝑟𝑎𝑛𝑡𝑢𝑙𝑎 𝑆𝑖 = 𝐹𝑎𝑖𝑙𝑒𝑑(𝑆𝑖) 𝑇𝑜𝑡𝑎𝑙𝐹𝑎𝑖𝑙𝑒𝑑 𝐹𝑎𝑖𝑙𝑒𝑑(𝑆𝑖) 𝑇𝑜𝑡𝑎𝑙𝐹𝑎𝑖𝑙𝑒𝑑 + 𝑃𝑎𝑠𝑠𝑒𝑑(𝑆𝑖) 𝑇𝑜𝑡𝑎𝑙𝑃𝑎𝑠𝑠𝑒𝑑 𝑆𝑢𝑠𝑝 𝑂𝑐ℎ𝑖𝑎𝑖 𝑆𝑖 = 𝐹𝑎𝑖𝑙𝑒𝑑(𝑆𝑖) 𝑇𝑜𝑡𝑎𝑙𝐹𝑎𝑖𝑙𝑒𝑑 × (𝐹𝑎𝑖𝑙𝑒𝑑 𝑆𝑖 + 𝑃𝑎𝑠𝑠𝑒𝑑(𝑆𝑖)) 𝑆𝑢𝑠𝑝 𝜒𝐷𝑒𝑏𝑢𝑔 𝑆𝑖 = 𝐹𝑎𝑖𝑙𝑒𝑑 𝑆𝑖 − ℎ, 𝑤ℎ𝑒𝑟𝑒 ℎ = 𝑃𝑎𝑠𝑠𝑒𝑑 𝑆𝑖 , 𝑖𝑓 𝑃𝑎𝑠𝑠𝑒𝑑(𝑆𝑖) ≤ 2 2 + 0.1 × 𝑃𝑎𝑠𝑠𝑒𝑑 𝑆𝑖 − 2 , 𝑖𝑓2 < 𝑃𝑎𝑠𝑠𝑒𝑑(𝑆𝑖) ≤ 10 2.8 + 0.01 × 𝑃𝑎𝑠𝑠𝑒𝑑 𝑆𝑖 − 10 , 𝑖𝑓𝑃𝑎𝑠𝑠𝑒𝑑(𝑆𝑖) > 10 𝑆𝑢𝑠𝑝 𝑁𝑎𝑖𝑠ℎ (𝑆𝑖) = 𝐹𝑎𝑖𝑙𝑒𝑑(𝑆𝑖) × (𝑇𝑜𝑡𝑎𝑙𝑃𝑎𝑠𝑠𝑒𝑑 + 1)  Tarantula [JHS02], Ochiai [AZV07], χDebug [WQZ+07], and Naish [NLR11] [JHS02] J.A. Jones, M. J. Harrold, and J. Stasko. Visualization of test information to assist fault localization. In Proceedings of the 24th International Conference on Software Engineering (ICSE '02), pp. 467-477, 2002. [AZV07] R. Abreu, P. Zoeteweij and A.J.C. Van Gemund, On the accuracy of spectrum-based fault localization, in Proc. Proceedings - Testing: Academic and Industrial Conference Practice and Research Techniques, TAIC PART-Mutation 2007, pp. 89-98, 2007. [WQZ+07] W.E. Wong, Yu Qi, Lei Zhao, and Kai-Yuan Cai. Effective Fault Localization using Code Coverage. In Proceedings of the 31st Annual International Computer Software and Applications Conference (COMPSAC '07), Vol. 1, pp. 449-456, 2007. [NLR11] L. Naish, H. J. Lee, and K. Ramamohanarao, A model for spectra-based software diagnosis. ACM Transaction on Software Engineering Methodology, 20(3):11, 2011.
  • 26. # Our Approach – Muffler 26/23 Test Suite Faulty Program Ranking List of all statements Instrument program & Execute against test suite Select statements to mutate Mutate selected statements Run mutants against test suite Calculate suspiciousness & Sort statements Coverage & Testing Results Candidate Statements Mutants Changes of testing results Legend Process Input Output Figure: Dataflow diagram of Muffler.
  • 27. # Our Approach – Muffler 𝑆𝑢𝑠𝑝 𝑀𝑢𝑓𝑓𝑙𝑒𝑟 (𝑆𝑖) = 𝐹𝑎𝑖𝑙𝑒𝑑(𝑃, 𝑆𝑖) × (𝑇𝑜𝑡𝑎𝑙𝑃𝑎𝑠𝑠𝑒𝑑(𝑃) + 1)– 𝐼𝑚𝑝𝑎𝑐𝑡(𝑆𝑖) 27/23 Primary Key (imprecise when multiple faults occurs) Secondary Key (invalid when coincidental correctness% is high) Additional Key (inclined to handle coincidental correctness)
  • 28. # An Example 28/23 Part I Statement S1 if (block_queue){ S2 count = block_queue->mem_count + 1; /* fault: insert ‘+1’ */ S3 n = (int) (count*ratio); /* fault: missing ‘+1’ */ S4 proc = find_nth(block_queue, n); S5 if (proc) { S6 block_queue = del_ele(block_queue, proc); S7 prio = proc->priority; S8 prio_queue[prio] = append_ele(prio_queue[prio], proc);}} Part II Tarantula Ochiai χDebug Naish susp* r** susp r susp r susp r 0.58 8 0.32 8 205.41 8 510812 8 0.64 7 0.36 7 205.83 7 511228 7 0.64 7 0.36 7 205.83 7 511228 7 0.64 7 0.36 7 205.83 7 511228 7 0.64 7 0.36 7 205.83 7 511228 7 0.64 3 0.37 3 205.85 3 511252 3 0.64 3 0.37 3 205.85 3 511252 3 0.64 3 0.37 3 205.85 3 511252 3 Code examination effort to locate S2 and S3: TotalPassed TotalFailed 2440 210 Passed(s) Failed(s) 1798 210 1382 210 1382 210 1382 210 1382 210 1358 210 1358 210 1358 210 88% 88% 88% 88% Figure: Faulty version v2 of program “schedule”.
  • 29. # An Example 29/23 Part III Mutated statement for each mutant Changep→f M1 if (!block_queue ) { 1644 M2 count = block_queue->mem_count != 1; 249 M3 n = (int) (count <= ratio) ; 249 M4 proc = find_nth(block_queue , ratio); 1088 M5 if (!proc) { 1136 M6 block_queue = del_ele(block_queue , proc-1); 1123 M7 prio /= proc->priority; 1358 M8 prio_queue[prio] = append_ele(prio_queue[__MININT__] , proc); }} 598 Changep→f Changep→f Changep→f Changep→f 1798 1101 1101 1644 1097 1097 249 1382 1116 1101 494 1101 638 1136 744 1382 1358 1101 1382 1101 349 1358 814 1358 1358 1101 1101 1358 598 1138 1358 1101 Part IV Muffler Impact susp r 1457.6 509354.4 8 814.8 510413.2 2 812.2 510415.8 2 997.6 510230.4 5 1215.6 510012.4 6 1000.4 510251.6 4 1255.2 509996.8 7 958.6 510293.4 3 Code examination effort to locate S2 and S3: 25% Figure: Faulty version v2 of program “schedule”.
  • 30. # An Example 30/23 Part I Statement S1 if (block_queue){ S2 count = block_queue->mem_count + 1; /* fault: insert ‘+1’ */ S3 n = (int) (count*ratio); /* fault: missing ‘+1’ */ S4 proc = find_nth(block_queue, n); S5 if (proc) { S6 block_queue = del_ele(block_queue, proc); S7 prio = proc->priority; S8 prio_queue[prio] = append_ele(prio_queue[prio], proc);}} Part II Tarantula Ochiai χDebug Naish susp* r** susp r susp r susp r 0.58 8 0.32 8 205.41 8 510812 8 0.64 7 0.36 7 205.83 7 511228 7 0.64 7 0.36 7 205.83 7 511228 7 0.64 7 0.36 7 205.83 7 511228 7 0.64 7 0.36 7 205.83 7 511228 7 0.64 3 0.37 3 205.85 3 511252 3 0.64 3 0.37 3 205.85 3 511252 3 0.64 3 0.37 3 205.85 3 511252 3 Code examination effort to locate S2 and S3: TotalPassed TotalFailed 2440 210 Passed(s) Failed(s) 1798 210 1382 210 1382 210 1382 210 1382 210 1358 210 1358 210 1358 210 88% 88% 88% 88% Figure: Faulty version v2 of program “schedule”.
  • 31. # An Example 31/23 Part III Mutated statement for each mutant Changep→f M1 if (!block_queue ) { 1644 M2 count = block_queue->mem_count != 1; 249 M3 n = (int) (count <= ratio) ; 249 M4 proc = find_nth(block_queue , ratio); 1088 M5 if (!proc) { 1136 M6 block_queue = del_ele(block_queue , proc-1); 1123 M7 prio /= proc->priority; 1358 M8 prio_queue[prio] = append_ele(prio_queue[__MININT__] , proc); }} 598 Changep→f Changep→f Changep→f Changep→f 1798 1101 1101 1644 1097 1097 249 1382 1116 1101 494 1101 638 1136 744 1382 1358 1101 1382 1101 349 1358 814 1358 1358 1101 1101 1358 598 1138 1358 1101 Part IV Muffler Impact susp r 1457.6 509354.4 8 814.8 510413.2 2 812.2 510415.8 2 997.6 510230.4 5 1215.6 510012.4 6 1000.4 510251.6 4 1255.2 509996.8 7 958.6 510293.4 3 Code examination effort to locate S2 and S3: 25% Figure: Faulty version v2 of program “schedule”.
  • 32. # Empirical Evaluation 32/23 Versus Tanrantula Versus Ochiai Versus χDebug Versus Naish More effective 102 96 93 89 Same effectiveness 19 23 23 25 Less effective 2 4 7 9 Table: Pair-wise comparison between Muffler and existing techniques. Muffler is more effective (examining more statements before encountering the faulty statement) than Naish for 89 out of 123 faulty versions; is as effective (examining the same number of statements before encountering the faulty statement) as Naish for 25 out of 123 faulty versions; and is less effective (examining less statements before encountering the faulty statement) than Naish for only 9 out of 123 faulty versions.
  • 33. # Empirical Evaluation 33/23 Faulty versions CC% Code examination effort Naish Muffler v5 1% 0% 0% v9 7% 1% 0% v17 31% 12% 7% v28 49% 11% 5% v29 99% 25% 9%  Experience on real faults Table: Results with real faults in space Five faulty versions are chosen to represent low, medium, and the high occurrence of coincidental correctness. In this table, the column “CC%” presents the percentage of coincidentally passed test cases out of all passed test cases. The columns under the head “Code examination effort” present the percentage of code to be examined before the fault is encountered.
  • 34. # Empirical Evaluation 34/23  Efficiency analysis Table: Time spent by each technique on subject programs. We have shown experimentally that, by taking advantages from both coverage and mutation impact, Muffler outperforms Naish regardless the occurrence of coincidental correctness. Unfortunately, our approaches, Muffler need to execute piles of mutants to compute mutation impact. The execution of mutants against the test suite may increase the time cost of fault localization. The time mainly contains the cost of instrumentation, execution, and coverage collection. From this table, we observe that Muffler takes approximately 62.59 times of average time cost to the Naish technique. Program suite CBFL (seconds) Muffler (seconds) tcas 18.00 868.68 tot_info 11.92 573.12 schedule 34.02 2703.01 schedule2 27.76 1773.14 print_tokens 59.11 2530.17 print_tokens2 62.07 5062.87 replace 69.13 4139.19 Average 40.29 2521.46
  • 35. # Empirical Evaluation 35/23  Efficiency analysis Table: Information about mutants generated. This Table illustrates the detailed data about the number of mutated/total executable statements, the number of mutants generated, and the time cost of running each mutant. For example, of the program tcas, there are, on average, 40.15 statements that are mutated by Muffler; and 65.10 executable statements in total; 199.90 mutants are generated and it takes 4.26 seconds to run each of them, on average. Notice that there is no need to collect coverage from the mutants’ executions, and it takes about 1/4 time to run a mutant without instrumentation and coverage collection. Program suite Mutated statements Total statements Mutants Time per mutant (seconds) tcas 40.15 65.10 199.90 4.26 tot_info 39.57 122.96 191.87 2.92 schedule 80.60 150.20 351.60 7.59 schedule2 75.33 127.56 327.78 5.32 print_tokens 67.43 189.86 260.29 9.49 print_tokens2 86.67 199.44 398.67 12.54 replace 71.14 242.86 305.93 13.30 Average 56.52 142.79 256.90 7.92
  • 36. How about the coincidental correctness issue? 36/23
  • 37. Empirical Evaluation - The impact of coincidental correctness 37/23 0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % 100 % 0% 20% 40% 60% 80% 100% Percentageofcodeexamined(Muffler) Percentage of coincidental correctness (|Tcc|/|Tp|) 0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % 100 % 0% 20% 40% 60% 80% 100% Percentageofcodeexamined(Naish) Percentage of coincidental correctness (|Tcc|/|Tp|)  Each point in Figure 5 represents a faulty version; the horizontal axis presents the faulty version’s percentage of coincidental correctness (CC%) that occurs in passed test cases, and the vertical axis presents the faulty version’s code examination effort to find the fault. The polynomial fitting curve (second order) represents the points’ tendency. Figure 5: Correlation between effectiveness and coincidental correctness.
  • 38. Does this work in real programs? 38/23
  • 39. Why does our approach work? - A feasibility study 39/23 0 200 400 600 800 1000 tcas v7 0 200 400 600 800 tot_info v17 0 500 1000 1500 2000 schedule v4 0 500 1000 1500 2000 2500 schedule2 v1 0 1000 2000 3000 4000 print_tokens v7 0 1000 2000 3000 4000 print_tokens2 v3 0 1000 2000 3000 4000 replace v24 0 50 100 150 space v20 The vertical axis denotes the number of testing results changes (from ‘passed’ to ‘failed’), and horizontal width denotes the probability density at corresponding amount of testing results changes. Figure: Distribution of statements’ result changes and faulty statement’s testing result changes.
  • 40. Why does our approach work? - A feasibility study 40/23 The vertical axis denotes the number of testing results changes (from ‘passed’ to ‘failed’), and horizontal width denotes the probability density at corresponding amount of testing results changes. Figure: Distribution of statements’ result changes and faulty statement’s testing result changes. 0 200 400 600 800 1000 tcas v7 0 200 400 600 800 tot_info v7 0 500 1000 1500 2000 2500 schedule v2 0 500 1000 1500 2000 2500 schedule2 v1 0 1000 2000 3000 4000 print_tokens v2 0 1000 2000 3000 4000 print_tokens2 v3 0 1000 2000 3000 4000 replace v15 0 50 100 150 space v8 0 200 400 600 800 1000 tcas v12 0 200 400 600 800 tot_info v8 0 500 1000 1500 2000 2500 schedule v3 0 500 1000 1500 2000 2500 schedule2 v4 0 1000 2000 3000 4000 print_tokens v3 0 1000 2000 3000 4000 print_tokens2 v6 0 1000 2000 3000 4000 replace v17 0 50 100 150 space v11 0 200 400 600 800 1000 tcas v17 0 200 400 600 800 tot_info v17 0 500 1000 1500 2000 2500 schedule v4 0 500 1000 1500 2000 2500 schedule2 v6 0 1000 2000 3000 4000 print_tokens v7 0 1000 2000 3000 4000 print_tokens2 v9 0 1000 2000 3000 4000 replace v24 0 50 100 150 space v20
  • 41. Why does our approach work? - Another feasibility study (When CC%≥95%) 41/23 0 5 10 15 20 25 0 % 20 % 40 % 60 % 80 % Frequencyoffaultyversions Percentage of code examined  When CC% is greater or equal than 95%, code examination effort reduction of result changes is 65.66% (=100%-16.33%/47.55%).  Only 6 faulty versions need to examine less than 20% of statements for Naish, while 22 versions by using result changes ∎ Result changes (avg. 16.33%) ∎ Naish (avg. 47.55%) Figure: Frequency distribution of effectiveness when CC%≥ 95%.
  • 42. Experience on real faults 42/23 Table 8: Results with real faults in space Faulty versions CC% Lines of code examined Naish Muffler v5 0.90% 2 1 v20 1.97% 15 5 v21 1.97% 15 6 v10 2.74% 47 18 v11 6.29% 37 14 V6 6.92% 40 7 v9 19.05% 7 1 v17 30.92% 427 244 v28 48.57% 268 170 v29 99.32% 797 331

Editor's Notes

  1. I assume that you have already known a lot of these techniques, so I only give a quick review.
  2. Please find another definition, using passed runs to describ CC
  3. Please use animation to show these slides
  4. Please use animation to show these slides
  5. Please use animation to show these slides
  6. Please use animation to show these slides
  7. Please use animation to show these slides
  8. Please use animation to show these slides
  9. Figure 3 shows the results of overall comparison of the effectiveness of Naish and Muffler. It depicts the percentage of versions (out of 123 faulty versions) whose fault can be located when a certain percentage of code in each version that have been examined. The vertical axis shows the percentage of faulty versions, and the horizontal axis shows the percentage of code to be examined in descending order of a ranking list from a fault localization technique.
  10. Tarantula [17], Ochiai [3], χDebug [37], and Naish [26]
  11. Please remember to notate the CC, e.g., 1382. Please remember to add amination
  12. Please remember to notate the CC, e.g., 1382. Please remember to add amination
  13. It is worthwhile to mention that Muffler’s time cost can be greatly reduced with a simple test selection strategy. The strategy can be described as: do not re-run a test case that does not cover the mutated statement. Furthermore, because the executions of mutants do not depend on each other, we can parallelize them with not much effort. Nonetheless, we have to admit that Muffler need more time to offer a better effectiveness in fault localization.
  14. The vertical axis denotes the number of testing results changes (from ‘passed’ to ‘failed’), and horizontal width denotes the probability density at corresponding amount of testing results changes. We additionally draw a red line to indicate the faulty statement’s testing result changes. Each of these programs varies from the other with respect to size, functionality, fault type, etc. But we can derive similar results that the fault
  15. The vertical axis denotes the number of testing results changes (from ‘passed’ to ‘failed’), and horizontal width denotes the probability density at corresponding amount of testing results changes. We additionally draw a red line to indicate the faulty statement’s testing result changes. Each of these programs varies from the other with respect to size, functionality, fault type, etc. But we can derive similar results that the fault
  16. of the 32 faulty versions whose CC% is greater than 95%: 1) 19 faulty versions need to examine more than 60% of all statements by using Naish; whereas no version for Muffler; 3) Only 8 versions need to examine less than 40% of statements for Naish, when 25 versions for Muffler.
  17. Table 8 reports the results for faulty versions with different concentration of coincidental correctness. In this table, the column “CC%” presents the percentage of coincidentally passed test cases out of all passed test cases. The columns under the head “Lines of code examined” present the number of code lines to be examined before the fault is encountered by using Naish and Muffler, respectively. For example, the version “v6” having a percentage of coincidental correctness as 6.92%, Naish needs to examine 40 code lines, whereas Muffler needs only to examine 7 code lines, the code examination effort reduction from Naish to Muffler is 82.5% (=100%–(7/40)).