Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Duplicate Bug Reports
Considered Harmful…
Really?
Nicolas Bettenburg • Rahul Premraj • Tom Zimmerman • Sunghun Kim

ICSME’...
0
75
150
225
300
Automated Severity Assessment of Software Defect Reports
Tim Menzies
Lane Department of Computer Science,
West Virginia Un...
Automated Severity Assessment of Software Defect Reports
Tim Menzies
Lane Department of Computer Science,
West Virginia Un...
Automated Severity Assessment of Software Defect Reports
Tim Menzies
Lane Department of Computer Science,
West Virginia Un...
Automated Severity Assessment of Software Defect Reports
Tim Menzies
Lane Department of Computer Science,
West Virginia Un...
Automated Severity Assessment of Software Defect Reports
Tim Menzies
Lane Department of Computer Science,
West Virginia Un...
Automated Severity Assessment of Software Defect Reports
Tim Menzies
Lane Department of Computer Science,
West Virginia Un...
Automated Severity Assessment of Software Defect Reports
Tim Menzies
Lane Department of Computer Science,
West Virginia Un...
Automated Severity Assessment of Software Defect Reports
Tim Menzies
Lane Department of Computer Science,
West Virginia Un...
Automated Severity Assessment of Software Defect Reports
Tim Menzies
Lane Department of Computer Science,
West Virginia Un...
Automated Severity Assessment of Software Defect Reports
Tim Menzies
Lane Department of Computer Science,
West Virginia Un...
March OctoberJanuary June November
2007
March OctoberJanuary June November
2007
Meet the Rebels!
Challenge conventional wisdom
There are many, varied stories behind
the observed SE artifacts.
Ignoring available data could lead to
missing fundamentally important
insights
CHALLENGE THE ASSUMPTIONS
When the same bug is reported several times
in Bugzilla, developers are slowed down
https://fedoraproject.org/wiki/How_to_...
When the same bug is reported several times
in Bugzilla, developers are slowed down
https://fedoraproject.org/wiki/How_to_...
When the same bug is reported several times
in Bugzilla, developers are slowed down
https://fedoraproject.org/wiki/How_to_...
When the same bug is reported several times
in Bugzilla, developers are slowed down
https://fedoraproject.org/wiki/How_to_...
DON’T BE
THAT GUY
who submitted a
DUPLICATE
It doesn't even mean that that the resolved
bug report can now be ignored, since we
have seen instances of late- identifica...
“Duplicates are not really problems.
They often add useful information.
That this information were filed under
a new report...
can
gly
m-
ese
to
in-
item h hm P(hm | h)
steps to reproduce 47 42 0.8936
stack traces 45 35 0.7778
screenshots 42 17 0.40...
PART 1
Is there extra information
in duplicate reports and
if so, can we quantify
how much?
PART 2
Is that extra informati...
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
...
100,000
200,000
300,000
400,000
500,000
Mozilla
Bug Reports without duplicates Duplicate Reports Master Reports
269,222
11...
Bug 137808
Summary: Exceptions from createFromString lock-up the editor
Product: [Modeling] EMF Reporter: Patrick Sodre <p...
Bug 137808
Summary: Exceptions from createFromString lock-up the editor
Product: [Modeling] EMF Reporter: Patrick Sodre <p...
Bug 137808
Summary: Exceptions from createFromString lock-up the editor
Product: [Modeling] EMF Reporter: Patrick Sodre <p...
Bug 137808
Summary: Exceptions from createFromString lock-up the editor
Product: [Modeling] EMF Reporter: Patrick Sodre <p...
Bug 137808
Summary: Exceptions from createFromString lock-up the editor
Product: [Modeling] EMF Reporter: Patrick Sodre <p...
Bug 137808
Summary: Exceptions from createFromString lock-up the editor
Product: [Modeling] EMF Reporter: Patrick Sodre <p...
3.6 Order of Extraction
PATCHES STACK TRACES SOURCE CODE ENUMERATIONS
loremm ipsum dolor met e4a
this is a public String {...
3.6 Order of Extraction
PATCHES STACK TRACES SOURCE CODE ENUMERATIONS
loremm ipsum dolor met e4a
this is a public String {...
Master Report
BUGthisasd
asdlknasdklnasdlk
askdnaklsdn
aksdnlaksdnlkasdkn
asd
sadddda
asdaddasd
aksdnlaskdnlkansd
Elements...
5.2 Results 35
Average per master report
Information item Master Extended Change⇤
Predefined fields
– product 1.000 1.127 ...
Duplicate bug reports can provide useful additional information.

For example, we can find up to three times the stack trac...
There is significant evidence of
additional information in duplicate
bug reports that is uniquely different
from the infor...
PART 1
Is there extra information
in duplicate reports and
if so, can we quantify
how much?
PART 2
Is that extra informati...
Developer
The Triage Problem
DeveloperReport
BUG
The Triage Problem
DeveloperReport
BUG
Fixed
BUG
✓
The Triage Problem
BUG
DeveloperReport
BUG
Fixed
BUG
✓
BUG
BUG
BUG
BUG
BUG
BUG
The Triage Problem
BUG
DeveloperReport
BUG
Fixed
BUG
✓
BUG
BUG
BUG
BUG
BUG
BUG Triager
The Triage Problem
BUG
DeveloperReport
BUG
Fixed
BUG
✓
BUG
BUG
BUG
BUG
BUG
BUG Triager
The Triage Problem
A1
A2
An
...
MASTER
Class 3
A1
A2
An
...
DUPLICATE n
Class 2
A1
A2
An
...
DUPLICATE 1
Class 3
A1
A2
An
...
DUPLICATE n
Cla...
Master reports, sorted chronologically
Training
Training
Training Testing
Fold 1 Fold 2 Fold 3 Fold 11
Testing
Testing
. ....
46 6. Additional Information can Help Developers
Table 6.1: Percentages of reports correctly triaged to ECLIPSE developers...
The information contained in Duplicate
reports the improves accuracy of
Machine Learning algorithms when
solving for the B...
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?

Download to read offline

Duplicate Bug Reports are widely considered harmful, adding additional burden on developers, and holding up software development processes. 10 Years ago we demonstrated that duplicate reports contain valuable additional information that helps developers get their jobs done faster and better. Thus duplicate reports should not be thrown away, but instead merged with their original reports to make that helpful information available to practitioners. This talk is a 10 year retrospective.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?

  1. 1. Duplicate Bug Reports Considered Harmful… Really? Nicolas Bettenburg • Rahul Premraj • Tom Zimmerman • Sunghun Kim
 ICSME’2018 (Madrid) • September 28th, 2018
  2. 2. 0 75 150 225 300
  3. 3. Automated Severity Assessment of Software Defect Reports Tim Menzies Lane Department of Computer Science, West Virginia University PO Box 6109, Morgantown, WV, 26506 304 293 0405 tim@menzies.us Andrian Marcus Department of Computer Science Wayne State University Detroit, MI 48202 313 577 5408 amarcus@wayne.edu Abstract In mission critical systems, such as those developed by NASA, it is very important that the test engineers properly recognize the severity of each issue they identify during testing. Proper severity assessment is essential for appropriate resource allocation and planning for fixing activities and additional testing. Severity assessment is strongly influenced by the experience of the test engineers and by the time they spend on each issue. The paper presents a new and automated method named SEVERIS (SEVERity ISsue assessment), which assists the test engineer in assigning severity levels to defect reports. SEVERIS is based on standard text mining and machine learning techniques applied to existing sets of defect reports. A case study on using SEVERIS with data from NASA’s Project and Issue Tracking System (PITS) is presented in the paper. The case study results indicate that SEVERIS is a good predictor for issue severity levels, while it is easy to use and efficient. 1. Introduction NASA’s software Independent Verification and Validation (IV&V) Program captures all of its findings in a database called the Project and Issue Tracking System (PITS). The data in PITS has been collected for more than 10 years and includes issues on robotic satellite missions and human-rated systems. Nowadays, similar defect tracking systems, such as Bugzilla1 , have become very popular, largely due to the spread of open source software development. These systems help to track bugs and changes in the code, to submit and review patches, to manage quality assurance, to support communication between developers, etc. As compared to newer systems, the problem with PITS is that there is a lack of consistency in how each 1 http://www.bugzilla.org/ of the projects collected issue data. In most instances, the specific configuration of the information captured about an issue was tailored by the IV&V project to meet its needs. This has created consistency problems when metrics data is pulled across projects. While there was a set of required data fields, the majorities of those fields do not provide information in regards to the quality of the issue and are not very suitable for comparing projects. A common issue among defect tracking systems is that they are useful for storing day-to-day information and generating small-scale tactical reports (e.g., “list the bugs we found last Tuesday”), but difficult to use for high-end business strategic analysis (e.g., “in the past, what methods have proved most cost effective in finding bugs?”). Another issue common to these systems is that most of the data is unstructured (i.e., free text). Specific to PITS is that the database fields in PITS keep changing, yet the nature of the unstructured text remains constant. In consequence, one logical choice in the analysis of defect reports is a combination of text mining and machine learning. In this paper we present a new approach for extracting general conclusions from PITS data based on text mining and machine learning methods, which are low cost, automatic, and rapid. We designed and built a tool named SEVERIS (SEVERity ISsue assessment) to automatically review issue reports and alert when a proposed severity is anomalous. The way SEVRIS is built provides the probabilities that the assessment is correct. These probabilities can be used to guide decision making in this process. Assigning the correct severity levels to issue reports is extremely important in the process employed at NASA, as it directly impacts resource allocation and planning of subsequent defect fixing activities. NASA uses a five-point scale to score issue severity. The scale ranges one to five, worst to dullest, respectively. A different scale is used for robotic and human-rated missions (see Table 1).
  4. 4. Automated Severity Assessment of Software Defect Reports Tim Menzies Lane Department of Computer Science, West Virginia University PO Box 6109, Morgantown, WV, 26506 304 293 0405 tim@menzies.us Andrian Marcus Department of Computer Science Wayne State University Detroit, MI 48202 313 577 5408 amarcus@wayne.edu Abstract In mission critical systems, such as those developed by NASA, it is very important that the test engineers properly recognize the severity of each issue they identify during testing. Proper severity assessment is essential for appropriate resource allocation and planning for fixing activities and additional testing. Severity assessment is strongly influenced by the experience of the test engineers and by the time they spend on each issue. The paper presents a new and automated method named SEVERIS (SEVERity ISsue assessment), which assists the test engineer in assigning severity levels to defect reports. SEVERIS is based on standard text mining and machine learning techniques applied to existing sets of defect reports. A case study on using SEVERIS with data from NASA’s Project and Issue Tracking System (PITS) is presented in the paper. The case study results indicate that SEVERIS is a good predictor for issue severity levels, while it is easy to use and efficient. 1. Introduction NASA’s software Independent Verification and Validation (IV&V) Program captures all of its findings in a database called the Project and Issue Tracking System (PITS). The data in PITS has been collected for more than 10 years and includes issues on robotic satellite missions and human-rated systems. Nowadays, similar defect tracking systems, such as Bugzilla1 , have become very popular, largely due to the spread of open source software development. These systems help to track bugs and changes in the code, to submit and review patches, to manage quality assurance, to support communication between developers, etc. As compared to newer systems, the problem with PITS is that there is a lack of consistency in how each 1 http://www.bugzilla.org/ of the projects collected issue data. In most instances, the specific configuration of the information captured about an issue was tailored by the IV&V project to meet its needs. This has created consistency problems when metrics data is pulled across projects. While there was a set of required data fields, the majorities of those fields do not provide information in regards to the quality of the issue and are not very suitable for comparing projects. A common issue among defect tracking systems is that they are useful for storing day-to-day information and generating small-scale tactical reports (e.g., “list the bugs we found last Tuesday”), but difficult to use for high-end business strategic analysis (e.g., “in the past, what methods have proved most cost effective in finding bugs?”). Another issue common to these systems is that most of the data is unstructured (i.e., free text). Specific to PITS is that the database fields in PITS keep changing, yet the nature of the unstructured text remains constant. In consequence, one logical choice in the analysis of defect reports is a combination of text mining and machine learning. In this paper we present a new approach for extracting general conclusions from PITS data based on text mining and machine learning methods, which are low cost, automatic, and rapid. We designed and built a tool named SEVERIS (SEVERity ISsue assessment) to automatically review issue reports and alert when a proposed severity is anomalous. The way SEVRIS is built provides the probabilities that the assessment is correct. These probabilities can be used to guide decision making in this process. Assigning the correct severity levels to issue reports is extremely important in the process employed at NASA, as it directly impacts resource allocation and planning of subsequent defect fixing activities. NASA uses a five-point scale to score issue severity. The scale ranges one to five, worst to dullest, respectively. A different scale is used for robotic and human-rated missions (see Table 1). Predicting which bugs get fixed.
 Guo et al.
  5. 5. Automated Severity Assessment of Software Defect Reports Tim Menzies Lane Department of Computer Science, West Virginia University PO Box 6109, Morgantown, WV, 26506 304 293 0405 tim@menzies.us Andrian Marcus Department of Computer Science Wayne State University Detroit, MI 48202 313 577 5408 amarcus@wayne.edu Abstract In mission critical systems, such as those developed by NASA, it is very important that the test engineers properly recognize the severity of each issue they identify during testing. Proper severity assessment is essential for appropriate resource allocation and planning for fixing activities and additional testing. Severity assessment is strongly influenced by the experience of the test engineers and by the time they spend on each issue. The paper presents a new and automated method named SEVERIS (SEVERity ISsue assessment), which assists the test engineer in assigning severity levels to defect reports. SEVERIS is based on standard text mining and machine learning techniques applied to existing sets of defect reports. A case study on using SEVERIS with data from NASA’s Project and Issue Tracking System (PITS) is presented in the paper. The case study results indicate that SEVERIS is a good predictor for issue severity levels, while it is easy to use and efficient. 1. Introduction NASA’s software Independent Verification and Validation (IV&V) Program captures all of its findings in a database called the Project and Issue Tracking System (PITS). The data in PITS has been collected for more than 10 years and includes issues on robotic satellite missions and human-rated systems. Nowadays, similar defect tracking systems, such as Bugzilla1 , have become very popular, largely due to the spread of open source software development. These systems help to track bugs and changes in the code, to submit and review patches, to manage quality assurance, to support communication between developers, etc. As compared to newer systems, the problem with PITS is that there is a lack of consistency in how each 1 http://www.bugzilla.org/ of the projects collected issue data. In most instances, the specific configuration of the information captured about an issue was tailored by the IV&V project to meet its needs. This has created consistency problems when metrics data is pulled across projects. While there was a set of required data fields, the majorities of those fields do not provide information in regards to the quality of the issue and are not very suitable for comparing projects. A common issue among defect tracking systems is that they are useful for storing day-to-day information and generating small-scale tactical reports (e.g., “list the bugs we found last Tuesday”), but difficult to use for high-end business strategic analysis (e.g., “in the past, what methods have proved most cost effective in finding bugs?”). Another issue common to these systems is that most of the data is unstructured (i.e., free text). Specific to PITS is that the database fields in PITS keep changing, yet the nature of the unstructured text remains constant. In consequence, one logical choice in the analysis of defect reports is a combination of text mining and machine learning. In this paper we present a new approach for extracting general conclusions from PITS data based on text mining and machine learning methods, which are low cost, automatic, and rapid. We designed and built a tool named SEVERIS (SEVERity ISsue assessment) to automatically review issue reports and alert when a proposed severity is anomalous. The way SEVRIS is built provides the probabilities that the assessment is correct. These probabilities can be used to guide decision making in this process. Assigning the correct severity levels to issue reports is extremely important in the process employed at NASA, as it directly impacts resource allocation and planning of subsequent defect fixing activities. NASA uses a five-point scale to score issue severity. The scale ranges one to five, worst to dullest, respectively. A different scale is used for robotic and human-rated missions (see Table 1). Predicting which bugs get fixed.
 Guo et al. Predicting Severity of a reported bug.
 Lamkanfi et al.
  6. 6. Automated Severity Assessment of Software Defect Reports Tim Menzies Lane Department of Computer Science, West Virginia University PO Box 6109, Morgantown, WV, 26506 304 293 0405 tim@menzies.us Andrian Marcus Department of Computer Science Wayne State University Detroit, MI 48202 313 577 5408 amarcus@wayne.edu Abstract In mission critical systems, such as those developed by NASA, it is very important that the test engineers properly recognize the severity of each issue they identify during testing. Proper severity assessment is essential for appropriate resource allocation and planning for fixing activities and additional testing. Severity assessment is strongly influenced by the experience of the test engineers and by the time they spend on each issue. The paper presents a new and automated method named SEVERIS (SEVERity ISsue assessment), which assists the test engineer in assigning severity levels to defect reports. SEVERIS is based on standard text mining and machine learning techniques applied to existing sets of defect reports. A case study on using SEVERIS with data from NASA’s Project and Issue Tracking System (PITS) is presented in the paper. The case study results indicate that SEVERIS is a good predictor for issue severity levels, while it is easy to use and efficient. 1. Introduction NASA’s software Independent Verification and Validation (IV&V) Program captures all of its findings in a database called the Project and Issue Tracking System (PITS). The data in PITS has been collected for more than 10 years and includes issues on robotic satellite missions and human-rated systems. Nowadays, similar defect tracking systems, such as Bugzilla1 , have become very popular, largely due to the spread of open source software development. These systems help to track bugs and changes in the code, to submit and review patches, to manage quality assurance, to support communication between developers, etc. As compared to newer systems, the problem with PITS is that there is a lack of consistency in how each 1 http://www.bugzilla.org/ of the projects collected issue data. In most instances, the specific configuration of the information captured about an issue was tailored by the IV&V project to meet its needs. This has created consistency problems when metrics data is pulled across projects. While there was a set of required data fields, the majorities of those fields do not provide information in regards to the quality of the issue and are not very suitable for comparing projects. A common issue among defect tracking systems is that they are useful for storing day-to-day information and generating small-scale tactical reports (e.g., “list the bugs we found last Tuesday”), but difficult to use for high-end business strategic analysis (e.g., “in the past, what methods have proved most cost effective in finding bugs?”). Another issue common to these systems is that most of the data is unstructured (i.e., free text). Specific to PITS is that the database fields in PITS keep changing, yet the nature of the unstructured text remains constant. In consequence, one logical choice in the analysis of defect reports is a combination of text mining and machine learning. In this paper we present a new approach for extracting general conclusions from PITS data based on text mining and machine learning methods, which are low cost, automatic, and rapid. We designed and built a tool named SEVERIS (SEVERity ISsue assessment) to automatically review issue reports and alert when a proposed severity is anomalous. The way SEVRIS is built provides the probabilities that the assessment is correct. These probabilities can be used to guide decision making in this process. Assigning the correct severity levels to issue reports is extremely important in the process employed at NASA, as it directly impacts resource allocation and planning of subsequent defect fixing activities. NASA uses a five-point scale to score issue severity. The scale ranges one to five, worst to dullest, respectively. A different scale is used for robotic and human-rated missions (see Table 1). Predicting which bugs get fixed.
 Guo et al. Predicting Severity of a reported bug.
 Lamkanfi et al. Characterizing re- opened bugs.
 Zimmermann et al.
  7. 7. Automated Severity Assessment of Software Defect Reports Tim Menzies Lane Department of Computer Science, West Virginia University PO Box 6109, Morgantown, WV, 26506 304 293 0405 tim@menzies.us Andrian Marcus Department of Computer Science Wayne State University Detroit, MI 48202 313 577 5408 amarcus@wayne.edu Abstract In mission critical systems, such as those developed by NASA, it is very important that the test engineers properly recognize the severity of each issue they identify during testing. Proper severity assessment is essential for appropriate resource allocation and planning for fixing activities and additional testing. Severity assessment is strongly influenced by the experience of the test engineers and by the time they spend on each issue. The paper presents a new and automated method named SEVERIS (SEVERity ISsue assessment), which assists the test engineer in assigning severity levels to defect reports. SEVERIS is based on standard text mining and machine learning techniques applied to existing sets of defect reports. A case study on using SEVERIS with data from NASA’s Project and Issue Tracking System (PITS) is presented in the paper. The case study results indicate that SEVERIS is a good predictor for issue severity levels, while it is easy to use and efficient. 1. Introduction NASA’s software Independent Verification and Validation (IV&V) Program captures all of its findings in a database called the Project and Issue Tracking System (PITS). The data in PITS has been collected for more than 10 years and includes issues on robotic satellite missions and human-rated systems. Nowadays, similar defect tracking systems, such as Bugzilla1 , have become very popular, largely due to the spread of open source software development. These systems help to track bugs and changes in the code, to submit and review patches, to manage quality assurance, to support communication between developers, etc. As compared to newer systems, the problem with PITS is that there is a lack of consistency in how each 1 http://www.bugzilla.org/ of the projects collected issue data. In most instances, the specific configuration of the information captured about an issue was tailored by the IV&V project to meet its needs. This has created consistency problems when metrics data is pulled across projects. While there was a set of required data fields, the majorities of those fields do not provide information in regards to the quality of the issue and are not very suitable for comparing projects. A common issue among defect tracking systems is that they are useful for storing day-to-day information and generating small-scale tactical reports (e.g., “list the bugs we found last Tuesday”), but difficult to use for high-end business strategic analysis (e.g., “in the past, what methods have proved most cost effective in finding bugs?”). Another issue common to these systems is that most of the data is unstructured (i.e., free text). Specific to PITS is that the database fields in PITS keep changing, yet the nature of the unstructured text remains constant. In consequence, one logical choice in the analysis of defect reports is a combination of text mining and machine learning. In this paper we present a new approach for extracting general conclusions from PITS data based on text mining and machine learning methods, which are low cost, automatic, and rapid. We designed and built a tool named SEVERIS (SEVERity ISsue assessment) to automatically review issue reports and alert when a proposed severity is anomalous. The way SEVRIS is built provides the probabilities that the assessment is correct. These probabilities can be used to guide decision making in this process. Assigning the correct severity levels to issue reports is extremely important in the process employed at NASA, as it directly impacts resource allocation and planning of subsequent defect fixing activities. NASA uses a five-point scale to score issue severity. The scale ranges one to five, worst to dullest, respectively. A different scale is used for robotic and human-rated missions (see Table 1). Predicting which bugs get fixed.
 Guo et al. Predicting Severity of a reported bug.
 Lamkanfi et al. Characterizing re- opened bugs.
 Zimmermann et al. What makes a good bug report. Bettenburg et al.
  8. 8. Automated Severity Assessment of Software Defect Reports Tim Menzies Lane Department of Computer Science, West Virginia University PO Box 6109, Morgantown, WV, 26506 304 293 0405 tim@menzies.us Andrian Marcus Department of Computer Science Wayne State University Detroit, MI 48202 313 577 5408 amarcus@wayne.edu Abstract In mission critical systems, such as those developed by NASA, it is very important that the test engineers properly recognize the severity of each issue they identify during testing. Proper severity assessment is essential for appropriate resource allocation and planning for fixing activities and additional testing. Severity assessment is strongly influenced by the experience of the test engineers and by the time they spend on each issue. The paper presents a new and automated method named SEVERIS (SEVERity ISsue assessment), which assists the test engineer in assigning severity levels to defect reports. SEVERIS is based on standard text mining and machine learning techniques applied to existing sets of defect reports. A case study on using SEVERIS with data from NASA’s Project and Issue Tracking System (PITS) is presented in the paper. The case study results indicate that SEVERIS is a good predictor for issue severity levels, while it is easy to use and efficient. 1. Introduction NASA’s software Independent Verification and Validation (IV&V) Program captures all of its findings in a database called the Project and Issue Tracking System (PITS). The data in PITS has been collected for more than 10 years and includes issues on robotic satellite missions and human-rated systems. Nowadays, similar defect tracking systems, such as Bugzilla1 , have become very popular, largely due to the spread of open source software development. These systems help to track bugs and changes in the code, to submit and review patches, to manage quality assurance, to support communication between developers, etc. As compared to newer systems, the problem with PITS is that there is a lack of consistency in how each 1 http://www.bugzilla.org/ of the projects collected issue data. In most instances, the specific configuration of the information captured about an issue was tailored by the IV&V project to meet its needs. This has created consistency problems when metrics data is pulled across projects. While there was a set of required data fields, the majorities of those fields do not provide information in regards to the quality of the issue and are not very suitable for comparing projects. A common issue among defect tracking systems is that they are useful for storing day-to-day information and generating small-scale tactical reports (e.g., “list the bugs we found last Tuesday”), but difficult to use for high-end business strategic analysis (e.g., “in the past, what methods have proved most cost effective in finding bugs?”). Another issue common to these systems is that most of the data is unstructured (i.e., free text). Specific to PITS is that the database fields in PITS keep changing, yet the nature of the unstructured text remains constant. In consequence, one logical choice in the analysis of defect reports is a combination of text mining and machine learning. In this paper we present a new approach for extracting general conclusions from PITS data based on text mining and machine learning methods, which are low cost, automatic, and rapid. We designed and built a tool named SEVERIS (SEVERity ISsue assessment) to automatically review issue reports and alert when a proposed severity is anomalous. The way SEVRIS is built provides the probabilities that the assessment is correct. These probabilities can be used to guide decision making in this process. Assigning the correct severity levels to issue reports is extremely important in the process employed at NASA, as it directly impacts resource allocation and planning of subsequent defect fixing activities. NASA uses a five-point scale to score issue severity. The scale ranges one to five, worst to dullest, respectively. A different scale is used for robotic and human-rated missions (see Table 1). Predicting which bugs get fixed.
 Guo et al. Predicting Severity of a reported bug.
 Lamkanfi et al. Characterizing re- opened bugs.
 Zimmermann et al. What makes a good bug report. Bettenburg et al.
  9. 9. Automated Severity Assessment of Software Defect Reports Tim Menzies Lane Department of Computer Science, West Virginia University PO Box 6109, Morgantown, WV, 26506 304 293 0405 tim@menzies.us Andrian Marcus Department of Computer Science Wayne State University Detroit, MI 48202 313 577 5408 amarcus@wayne.edu Abstract In mission critical systems, such as those developed by NASA, it is very important that the test engineers properly recognize the severity of each issue they identify during testing. Proper severity assessment is essential for appropriate resource allocation and planning for fixing activities and additional testing. Severity assessment is strongly influenced by the experience of the test engineers and by the time they spend on each issue. The paper presents a new and automated method named SEVERIS (SEVERity ISsue assessment), which assists the test engineer in assigning severity levels to defect reports. SEVERIS is based on standard text mining and machine learning techniques applied to existing sets of defect reports. A case study on using SEVERIS with data from NASA’s Project and Issue Tracking System (PITS) is presented in the paper. The case study results indicate that SEVERIS is a good predictor for issue severity levels, while it is easy to use and efficient. 1. Introduction NASA’s software Independent Verification and Validation (IV&V) Program captures all of its findings in a database called the Project and Issue Tracking System (PITS). The data in PITS has been collected for more than 10 years and includes issues on robotic satellite missions and human-rated systems. Nowadays, similar defect tracking systems, such as Bugzilla1 , have become very popular, largely due to the spread of open source software development. These systems help to track bugs and changes in the code, to submit and review patches, to manage quality assurance, to support communication between developers, etc. As compared to newer systems, the problem with PITS is that there is a lack of consistency in how each 1 http://www.bugzilla.org/ of the projects collected issue data. In most instances, the specific configuration of the information captured about an issue was tailored by the IV&V project to meet its needs. This has created consistency problems when metrics data is pulled across projects. While there was a set of required data fields, the majorities of those fields do not provide information in regards to the quality of the issue and are not very suitable for comparing projects. A common issue among defect tracking systems is that they are useful for storing day-to-day information and generating small-scale tactical reports (e.g., “list the bugs we found last Tuesday”), but difficult to use for high-end business strategic analysis (e.g., “in the past, what methods have proved most cost effective in finding bugs?”). Another issue common to these systems is that most of the data is unstructured (i.e., free text). Specific to PITS is that the database fields in PITS keep changing, yet the nature of the unstructured text remains constant. In consequence, one logical choice in the analysis of defect reports is a combination of text mining and machine learning. In this paper we present a new approach for extracting general conclusions from PITS data based on text mining and machine learning methods, which are low cost, automatic, and rapid. We designed and built a tool named SEVERIS (SEVERity ISsue assessment) to automatically review issue reports and alert when a proposed severity is anomalous. The way SEVRIS is built provides the probabilities that the assessment is correct. These probabilities can be used to guide decision making in this process. Assigning the correct severity levels to issue reports is extremely important in the process employed at NASA, as it directly impacts resource allocation and planning of subsequent defect fixing activities. NASA uses a five-point scale to score issue severity. The scale ranges one to five, worst to dullest, respectively. A different scale is used for robotic and human-rated missions (see Table 1). Predicting which bugs get fixed.
 Guo et al. Predicting Severity of a reported bug.
 Lamkanfi et al. Characterizing re- opened bugs.
 Zimmermann et al. What makes a good bug report. Bettenburg et al. Do clones matter? Juergens et al.
  10. 10. Automated Severity Assessment of Software Defect Reports Tim Menzies Lane Department of Computer Science, West Virginia University PO Box 6109, Morgantown, WV, 26506 304 293 0405 tim@menzies.us Andrian Marcus Department of Computer Science Wayne State University Detroit, MI 48202 313 577 5408 amarcus@wayne.edu Abstract In mission critical systems, such as those developed by NASA, it is very important that the test engineers properly recognize the severity of each issue they identify during testing. Proper severity assessment is essential for appropriate resource allocation and planning for fixing activities and additional testing. Severity assessment is strongly influenced by the experience of the test engineers and by the time they spend on each issue. The paper presents a new and automated method named SEVERIS (SEVERity ISsue assessment), which assists the test engineer in assigning severity levels to defect reports. SEVERIS is based on standard text mining and machine learning techniques applied to existing sets of defect reports. A case study on using SEVERIS with data from NASA’s Project and Issue Tracking System (PITS) is presented in the paper. The case study results indicate that SEVERIS is a good predictor for issue severity levels, while it is easy to use and efficient. 1. Introduction NASA’s software Independent Verification and Validation (IV&V) Program captures all of its findings in a database called the Project and Issue Tracking System (PITS). The data in PITS has been collected for more than 10 years and includes issues on robotic satellite missions and human-rated systems. Nowadays, similar defect tracking systems, such as Bugzilla1 , have become very popular, largely due to the spread of open source software development. These systems help to track bugs and changes in the code, to submit and review patches, to manage quality assurance, to support communication between developers, etc. As compared to newer systems, the problem with PITS is that there is a lack of consistency in how each 1 http://www.bugzilla.org/ of the projects collected issue data. In most instances, the specific configuration of the information captured about an issue was tailored by the IV&V project to meet its needs. This has created consistency problems when metrics data is pulled across projects. While there was a set of required data fields, the majorities of those fields do not provide information in regards to the quality of the issue and are not very suitable for comparing projects. A common issue among defect tracking systems is that they are useful for storing day-to-day information and generating small-scale tactical reports (e.g., “list the bugs we found last Tuesday”), but difficult to use for high-end business strategic analysis (e.g., “in the past, what methods have proved most cost effective in finding bugs?”). Another issue common to these systems is that most of the data is unstructured (i.e., free text). Specific to PITS is that the database fields in PITS keep changing, yet the nature of the unstructured text remains constant. In consequence, one logical choice in the analysis of defect reports is a combination of text mining and machine learning. In this paper we present a new approach for extracting general conclusions from PITS data based on text mining and machine learning methods, which are low cost, automatic, and rapid. We designed and built a tool named SEVERIS (SEVERity ISsue assessment) to automatically review issue reports and alert when a proposed severity is anomalous. The way SEVRIS is built provides the probabilities that the assessment is correct. These probabilities can be used to guide decision making in this process. Assigning the correct severity levels to issue reports is extremely important in the process employed at NASA, as it directly impacts resource allocation and planning of subsequent defect fixing activities. NASA uses a five-point scale to score issue severity. The scale ranges one to five, worst to dullest, respectively. A different scale is used for robotic and human-rated missions (see Table 1). Predicting which bugs get fixed.
 Guo et al. Predicting Severity of a reported bug.
 Lamkanfi et al. Characterizing re- opened bugs.
 Zimmermann et al. What makes a good bug report. Bettenburg et al. Do clones matter? Juergens et al. Frequency and Risks of changes to clones. Göde et al.
  11. 11. Automated Severity Assessment of Software Defect Reports Tim Menzies Lane Department of Computer Science, West Virginia University PO Box 6109, Morgantown, WV, 26506 304 293 0405 tim@menzies.us Andrian Marcus Department of Computer Science Wayne State University Detroit, MI 48202 313 577 5408 amarcus@wayne.edu Abstract In mission critical systems, such as those developed by NASA, it is very important that the test engineers properly recognize the severity of each issue they identify during testing. Proper severity assessment is essential for appropriate resource allocation and planning for fixing activities and additional testing. Severity assessment is strongly influenced by the experience of the test engineers and by the time they spend on each issue. The paper presents a new and automated method named SEVERIS (SEVERity ISsue assessment), which assists the test engineer in assigning severity levels to defect reports. SEVERIS is based on standard text mining and machine learning techniques applied to existing sets of defect reports. A case study on using SEVERIS with data from NASA’s Project and Issue Tracking System (PITS) is presented in the paper. The case study results indicate that SEVERIS is a good predictor for issue severity levels, while it is easy to use and efficient. 1. Introduction NASA’s software Independent Verification and Validation (IV&V) Program captures all of its findings in a database called the Project and Issue Tracking System (PITS). The data in PITS has been collected for more than 10 years and includes issues on robotic satellite missions and human-rated systems. Nowadays, similar defect tracking systems, such as Bugzilla1 , have become very popular, largely due to the spread of open source software development. These systems help to track bugs and changes in the code, to submit and review patches, to manage quality assurance, to support communication between developers, etc. As compared to newer systems, the problem with PITS is that there is a lack of consistency in how each 1 http://www.bugzilla.org/ of the projects collected issue data. In most instances, the specific configuration of the information captured about an issue was tailored by the IV&V project to meet its needs. This has created consistency problems when metrics data is pulled across projects. While there was a set of required data fields, the majorities of those fields do not provide information in regards to the quality of the issue and are not very suitable for comparing projects. A common issue among defect tracking systems is that they are useful for storing day-to-day information and generating small-scale tactical reports (e.g., “list the bugs we found last Tuesday”), but difficult to use for high-end business strategic analysis (e.g., “in the past, what methods have proved most cost effective in finding bugs?”). Another issue common to these systems is that most of the data is unstructured (i.e., free text). Specific to PITS is that the database fields in PITS keep changing, yet the nature of the unstructured text remains constant. In consequence, one logical choice in the analysis of defect reports is a combination of text mining and machine learning. In this paper we present a new approach for extracting general conclusions from PITS data based on text mining and machine learning methods, which are low cost, automatic, and rapid. We designed and built a tool named SEVERIS (SEVERity ISsue assessment) to automatically review issue reports and alert when a proposed severity is anomalous. The way SEVRIS is built provides the probabilities that the assessment is correct. These probabilities can be used to guide decision making in this process. Assigning the correct severity levels to issue reports is extremely important in the process employed at NASA, as it directly impacts resource allocation and planning of subsequent defect fixing activities. NASA uses a five-point scale to score issue severity. The scale ranges one to five, worst to dullest, respectively. A different scale is used for robotic and human-rated missions (see Table 1). Predicting which bugs get fixed.
 Guo et al. Predicting Severity of a reported bug.
 Lamkanfi et al. Characterizing re- opened bugs.
 Zimmermann et al. What makes a good bug report. Bettenburg et al. Do clones matter? Juergens et al. Frequency and Risks of changes to clones. Göde et al. Do developers care about code smells? Yamashita et al.
  12. 12. Automated Severity Assessment of Software Defect Reports Tim Menzies Lane Department of Computer Science, West Virginia University PO Box 6109, Morgantown, WV, 26506 304 293 0405 tim@menzies.us Andrian Marcus Department of Computer Science Wayne State University Detroit, MI 48202 313 577 5408 amarcus@wayne.edu Abstract In mission critical systems, such as those developed by NASA, it is very important that the test engineers properly recognize the severity of each issue they identify during testing. Proper severity assessment is essential for appropriate resource allocation and planning for fixing activities and additional testing. Severity assessment is strongly influenced by the experience of the test engineers and by the time they spend on each issue. The paper presents a new and automated method named SEVERIS (SEVERity ISsue assessment), which assists the test engineer in assigning severity levels to defect reports. SEVERIS is based on standard text mining and machine learning techniques applied to existing sets of defect reports. A case study on using SEVERIS with data from NASA’s Project and Issue Tracking System (PITS) is presented in the paper. The case study results indicate that SEVERIS is a good predictor for issue severity levels, while it is easy to use and efficient. 1. Introduction NASA’s software Independent Verification and Validation (IV&V) Program captures all of its findings in a database called the Project and Issue Tracking System (PITS). The data in PITS has been collected for more than 10 years and includes issues on robotic satellite missions and human-rated systems. Nowadays, similar defect tracking systems, such as Bugzilla1 , have become very popular, largely due to the spread of open source software development. These systems help to track bugs and changes in the code, to submit and review patches, to manage quality assurance, to support communication between developers, etc. As compared to newer systems, the problem with PITS is that there is a lack of consistency in how each 1 http://www.bugzilla.org/ of the projects collected issue data. In most instances, the specific configuration of the information captured about an issue was tailored by the IV&V project to meet its needs. This has created consistency problems when metrics data is pulled across projects. While there was a set of required data fields, the majorities of those fields do not provide information in regards to the quality of the issue and are not very suitable for comparing projects. A common issue among defect tracking systems is that they are useful for storing day-to-day information and generating small-scale tactical reports (e.g., “list the bugs we found last Tuesday”), but difficult to use for high-end business strategic analysis (e.g., “in the past, what methods have proved most cost effective in finding bugs?”). Another issue common to these systems is that most of the data is unstructured (i.e., free text). Specific to PITS is that the database fields in PITS keep changing, yet the nature of the unstructured text remains constant. In consequence, one logical choice in the analysis of defect reports is a combination of text mining and machine learning. In this paper we present a new approach for extracting general conclusions from PITS data based on text mining and machine learning methods, which are low cost, automatic, and rapid. We designed and built a tool named SEVERIS (SEVERity ISsue assessment) to automatically review issue reports and alert when a proposed severity is anomalous. The way SEVRIS is built provides the probabilities that the assessment is correct. These probabilities can be used to guide decision making in this process. Assigning the correct severity levels to issue reports is extremely important in the process employed at NASA, as it directly impacts resource allocation and planning of subsequent defect fixing activities. NASA uses a five-point scale to score issue severity. The scale ranges one to five, worst to dullest, respectively. A different scale is used for robotic and human-rated missions (see Table 1). Predicting which bugs get fixed.
 Guo et al. Predicting Severity of a reported bug.
 Lamkanfi et al. Characterizing re- opened bugs.
 Zimmermann et al. What makes a good bug report. Bettenburg et al. Do clones matter? Juergens et al. Frequency and Risks of changes to clones. Göde et al. Do developers care about code smells? Yamashita et al. Inconsistent Changes to Clones at Release Level. Bettenburg et al.
  13. 13. March OctoberJanuary June November 2007
  14. 14. March OctoberJanuary June November 2007
  15. 15. Meet the Rebels!
  16. 16. Challenge conventional wisdom
  17. 17. There are many, varied stories behind the observed SE artifacts.
  18. 18. Ignoring available data could lead to missing fundamentally important insights
  19. 19. CHALLENGE THE ASSUMPTIONS
  20. 20. When the same bug is reported several times in Bugzilla, developers are slowed down https://fedoraproject.org/wiki/How_to_file_a_bug_report#Avoiding_Duplicate_Bug_Reports
  21. 21. When the same bug is reported several times in Bugzilla, developers are slowed down https://fedoraproject.org/wiki/How_to_file_a_bug_report#Avoiding_Duplicate_Bug_Reports A duplicate bug is a burden in the testing cycle. https://www.softwaretestinghelp.com/how-to-write-good-bug-report/
  22. 22. When the same bug is reported several times in Bugzilla, developers are slowed down https://fedoraproject.org/wiki/How_to_file_a_bug_report#Avoiding_Duplicate_Bug_Reports Several duplicate bug reports just cause an administration headache for developers http://wicket.apache.org/help/reportabug.html A duplicate bug is a burden in the testing cycle. https://www.softwaretestinghelp.com/how-to-write-good-bug-report/
  23. 23. When the same bug is reported several times in Bugzilla, developers are slowed down https://fedoraproject.org/wiki/How_to_file_a_bug_report#Avoiding_Duplicate_Bug_Reports Duplicate bug reports, […] consume time of bug triagers and software developers that might better be spent working on reports that describe unique requests. Lyndon Hiew , MSc. Thesis, 2006, UBC Several duplicate bug reports just cause an administration headache for developers http://wicket.apache.org/help/reportabug.html A duplicate bug is a burden in the testing cycle. https://www.softwaretestinghelp.com/how-to-write-good-bug-report/
  24. 24. DON’T BE THAT GUY who submitted a DUPLICATE
  25. 25. It doesn't even mean that that the resolved bug report can now be ignored, since we have seen instances of late- identification of duplicates (e.g., BR-C in Figure 2) in which accumulated knowledge and dialogue may still be relevant to the resolution of the other bug reports in the BRN. Robert J. Sandusky, Les Gasser, and Gabriel Ripoche. Bug report networks: Varieties, strategies, and impacts in an oss development community. In Proc. of ICSE Workshop on Mining Software Repositories, 2004.
  26. 26. “Duplicates are not really problems. They often add useful information. That this information were filed under a new report is not ideal though.” N. Bettenburg, S. Just, A. Schröter, C. Weiss, R. Premraj, and T. Zimmermann. What makes a good bug report? In Proceedings of the 16th International Symposium on Foundations of Software Engineering, November 2008.
  27. 27. can gly m- ese to in- item h hm P(hm | h) steps to reproduce 47 42 0.8936 stack traces 45 35 0.7778 screenshots 42 17 0.4048 test cases 39 11 0.2821 observed behavior 44 12 0.2727 code examples 38 9 0.2368 error reports 33 3 0.0909 build information 34 3 0.0882 summary 36 3 0.0833 expected behavior 41 3 0.0732 version 38 1 0.0236 component 34 0 0.0000 hardware 13 0 0.0000 operating system 34 0 0.0000 product 30 0 0.0000 severity 26 0 0.0000 Table 1. Lists all items from the first survey part with the count how often they helped (h), how often they helped the most (hm), and the probability that an item helped most under the condition that it helped. 5. Metric Now that we got an idea about important information contained in a bug report and have a sample of reports ranked by experts we item a am P(am|a) errors in steps to reproduce 34 29 0.8235 incomplete information 44 35 0.7727 wrong observed behavior 15 11 0.6667 wrong version number 21 8 0.2857 errors in test cases 14 4 0.2857 unstructured text 19 7 0.2632 wrong operating system 8 3 0.2500 wrong expected behavior 18 7 0.2222 non-technical language 14 3 0.2143 too long text 11 2 0.1818 errors in code examples 11 2 0.1818 bad grammar 29 5 0.1724 wrong component name 22 2 0.0909 prose text 12 2 0.0833 duplicates 31 2 0.0645 no spellcheck 8 0 0.0000 wrong hardware 5 0 0.0000 spam 1 0 0.0000 wrong product name 11 0 0.0000 errors in strack traces 2 0 0.0000 Table 2. Lists all items from the second part with the count how often they harmed (a), how often they harmed the most (am), and the probability that an item harmed most under the condition that it harmed. was filled out by 48 out of 365 developers in total. Secondly we present the results of our metric which we compare to the expert opinions we gained from the ugly reports study. T r m repo tom acce W vide have 0.3. 7. D S tion. the r we c item a am P(am|a) errors in steps to reproduce 34 29 0.8235 incomplete information 44 35 0.7727 wrong observed behavior 15 11 0.6667 wrong version number 21 8 0.2857 errors in test cases 14 4 0.2857 unstructured text 19 7 0.2632 wrong operating system 8 3 0.2500 wrong expected behavior 18 7 0.2222 non-technical language 14 3 0.2143 too long text 11 2 0.1818 errors in code examples 11 2 0.1818 bad grammar 29 5 0.1724 wrong component name 22 2 0.0909 prose text 12 2 0.0833 duplicates 31 2 0.0645 no spellcheck 8 0 0.0000 wrong hardware 5 0 0.0000 spam 1 0 0.0000 wrong product name 11 0 0.0000 errors in strack traces 2 0 0.0000 Table 2. Lists all items from the second part with the count how often they harmed (a), how often they harmed the most (am), and the probability that an item harmed most under the condition that it harmed. was filled out by 48 out of 365 developers in total. Secondly we rep tom acc vid hav 0.3 7. What Helps the Most? What Harms the Most? N. Bettenburg, S. Just, A. Schröter, C. Weiss, R. Premraj, and T. Zimmermann. What makes a good bug report? In Proceedings of the 16th International Symposium on Foundations of Software Engineering, November 2008.
  28. 28. PART 1 Is there extra information in duplicate reports and if so, can we quantify how much? PART 2 Is that extra information helpful for carrying out software engineering tasks?
  29. 29. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Reports/Month ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 010002000300040005000 1.0 2.0 2.1 2.1.1 2.1.2 2.1.3 3.0 3.0.1 3.0.2 3.1 3.1.1 3.1.2 3.2 3.2.2 3.3 Milestones Oct'01 Jan'02 Apr'02 Jul'02 Oct'02 Jan'03 Apr'03 Jul'03 Oct'03 Jan'04 Apr'04 Jul'04 Oct'04 Jan'05 Apr'05 Jul'05 Oct'05 Jan'06 Apr'06 Jul'06 Oct'06 Jan'07 Apr'07 Jul'07 Oct'07 ● Reports submitted (total) ● Duplicates submitted ~ 3,000 reports submitted per month ~ 13% duplicate bug reports First, we need DATA … lot’s of DATA!
  30. 30. 100,000 200,000 300,000 400,000 500,000 Mozilla Bug Reports without duplicates Duplicate Reports Master Reports 269,222 116,727 36,697 50,000 100,000 150,000 200,000 250,000 Eclipse 167,494 27,838 16,511 Figure 4.1: Graphical representation of the collected bug report data. The MOZILLA database was mined using a tool that reads the XML repre- Inverse Duplicate Problem 27% (Mozilla) 31% (Eclipse)
  31. 31. Bug 137808 Summary: Exceptions from createFromString lock-up the editor Product: [Modeling] EMF Reporter: Patrick Sodre <psodre@gmail.com> Component: Core Assignee: Marcelo Paternostro <marcelop@ca.ibm.com> Status: VERIFIED FIXED QA Contact: Severity: normal Priority: P3 CC: merks@ca.ibm.com Version: 2.2 Target Milestone: --- Hardware: PC OS: Windows XP Whiteboard: Description: Opened: 2006-04-20 14:25 - 0400 As discussed on the newsgroup under the Thread with the same name I am opening this bug entry. Here is a history of the thread. -- From Ed Merks Patrick, The value is checked before it's applied and can't be applied until it's valid. But this BigDecimal cases behaves oddly because the exception thrown by new BigDecimal("badvalue") has a null message and the property editor relies on returning a non-null message string to indicate there is an error. Please open a bugzilla which I'll fix like this: ### Eclipse Workspace Patch 1.0 #P org.eclipse.emf.edit.ui Index: src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java =================================================================== RCS file: /cvsroot/tools/org.eclipse.emf/plugins/org.eclipse.emf.edit.ui/src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java,v retrieving revision 1.10 diff -u -r1.10 PropertyDescriptor.java --- src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 21 Mar 2006 16:42:30 -0000 1.10 +++ src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 20 Apr 2006 11:59:10 -0000 @@ -162,7 +162,8 @@ } catch (Exception exception) { - return exception.getMessage(); + String message = exception.getMessage(); + return message == null ? exception.toString() : message; } } Diagnostic diagnostic = Diagnostician.INSTANCE.validate(EDataTypeCellEditor.this.eDataType, value); Patrick Sodre wrote: Hi, It seems that if the user inputs an invalid parameter that gets created from "createFromString" the Editor locks-up until the user explicitly calls "restore Default Value". Is this the expected behavior or could something better be done? For instance if an exception is thrown restore the value back to what it was before after displaying a pop-up error message. I understand that for DataTypes defined by the user he/she should take care of catching the exceptions but for the default ones like BigInteger/BigDecimal I think the EMF runtime could do some of the grunt work... If you think this is something worth pursuing I could post an entry in Bugzilla. Regards, Patrick Sodre Below is the stack trace that I got from the Editor... java.lang.NumberFormatException at java.math.BigDecimal.<init>(BigDecimal.java:368) at java.math.BigDecimal.<init>(BigDecimal.java:647) at org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createEBigDecimalFromString(EcoreFactoryImpl.java:559) at org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createFromString(EcoreFactoryImpl.java:116) at org.eclipse.emf.edit.ui.provider.PropertyDescriptor$EDataTypeCellEditor.doGetValue(PropertyDescriptor.java:183) at org.eclipse.jface.viewers.CellEditor.getValue(CellEditor.java:449) at org.eclipse.ui.views.properties.PropertySheetEntry.applyEditorValue(PropertySheetEntry.java:135) at org.eclipse.ui.views.properties.PropertySheetViewer.applyEditorValue(PropertySheetViewer.java:249) at ------- Comment #1 From Ed Merks 2006-04-20 15:09:23 -0400 ------- The fix has been committed to CVS. Thanks for reporting this problem. ------- Comment #2 From Marcelo Paternostro 2006-04-27 10:44:24 -0400 ------- Fixed in the I200604270000 built ------- Comment #3 From Nick Boldt 2008-01-28 16:46:51 -0400 ------- Move to verified as per bug 206558. Extracting Structural Information from Bug Reports (MSR 2008)
  32. 32. Bug 137808 Summary: Exceptions from createFromString lock-up the editor Product: [Modeling] EMF Reporter: Patrick Sodre <psodre@gmail.com> Component: Core Assignee: Marcelo Paternostro <marcelop@ca.ibm.com> Status: VERIFIED FIXED QA Contact: Severity: normal Priority: P3 CC: merks@ca.ibm.com Version: 2.2 Target Milestone: --- Hardware: PC OS: Windows XP Whiteboard: Description: Opened: 2006-04-20 14:25 - 0400 As discussed on the newsgroup under the Thread with the same name I am opening this bug entry. Here is a history of the thread. -- From Ed Merks Patrick, The value is checked before it's applied and can't be applied until it's valid. But this BigDecimal cases behaves oddly because the exception thrown by new BigDecimal("badvalue") has a null message and the property editor relies on returning a non-null message string to indicate there is an error. Please open a bugzilla which I'll fix like this: ### Eclipse Workspace Patch 1.0 #P org.eclipse.emf.edit.ui Index: src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java =================================================================== RCS file: /cvsroot/tools/org.eclipse.emf/plugins/org.eclipse.emf.edit.ui/src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java,v retrieving revision 1.10 diff -u -r1.10 PropertyDescriptor.java --- src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 21 Mar 2006 16:42:30 -0000 1.10 +++ src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 20 Apr 2006 11:59:10 -0000 @@ -162,7 +162,8 @@ } catch (Exception exception) { - return exception.getMessage(); + String message = exception.getMessage(); + return message == null ? exception.toString() : message; } } Diagnostic diagnostic = Diagnostician.INSTANCE.validate(EDataTypeCellEditor.this.eDataType, value); Patrick Sodre wrote: Hi, It seems that if the user inputs an invalid parameter that gets created from "createFromString" the Editor locks-up until the user explicitly calls "restore Default Value". Is this the expected behavior or could something better be done? For instance if an exception is thrown restore the value back to what it was before after displaying a pop-up error message. I understand that for DataTypes defined by the user he/she should take care of catching the exceptions but for the default ones like BigInteger/BigDecimal I think the EMF runtime could do some of the grunt work... If you think this is something worth pursuing I could post an entry in Bugzilla. Regards, Patrick Sodre Below is the stack trace that I got from the Editor... java.lang.NumberFormatException at java.math.BigDecimal.<init>(BigDecimal.java:368) at java.math.BigDecimal.<init>(BigDecimal.java:647) at org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createEBigDecimalFromString(EcoreFactoryImpl.java:559) at org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createFromString(EcoreFactoryImpl.java:116) at org.eclipse.emf.edit.ui.provider.PropertyDescriptor$EDataTypeCellEditor.doGetValue(PropertyDescriptor.java:183) at org.eclipse.jface.viewers.CellEditor.getValue(CellEditor.java:449) at org.eclipse.ui.views.properties.PropertySheetEntry.applyEditorValue(PropertySheetEntry.java:135) at org.eclipse.ui.views.properties.PropertySheetViewer.applyEditorValue(PropertySheetViewer.java:249) at ------- Comment #1 From Ed Merks 2006-04-20 15:09:23 -0400 ------- The fix has been committed to CVS. Thanks for reporting this problem. ------- Comment #2 From Marcelo Paternostro 2006-04-27 10:44:24 -0400 ------- Fixed in the I200604270000 built ------- Comment #3 From Nick Boldt 2008-01-28 16:46:51 -0400 ------- Move to verified as per bug 206558. Extracting Structural Information from Bug Reports (MSR 2008) METADATA
  33. 33. Bug 137808 Summary: Exceptions from createFromString lock-up the editor Product: [Modeling] EMF Reporter: Patrick Sodre <psodre@gmail.com> Component: Core Assignee: Marcelo Paternostro <marcelop@ca.ibm.com> Status: VERIFIED FIXED QA Contact: Severity: normal Priority: P3 CC: merks@ca.ibm.com Version: 2.2 Target Milestone: --- Hardware: PC OS: Windows XP Whiteboard: Description: Opened: 2006-04-20 14:25 - 0400 As discussed on the newsgroup under the Thread with the same name I am opening this bug entry. Here is a history of the thread. -- From Ed Merks Patrick, The value is checked before it's applied and can't be applied until it's valid. But this BigDecimal cases behaves oddly because the exception thrown by new BigDecimal("badvalue") has a null message and the property editor relies on returning a non-null message string to indicate there is an error. Please open a bugzilla which I'll fix like this: ### Eclipse Workspace Patch 1.0 #P org.eclipse.emf.edit.ui Index: src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java =================================================================== RCS file: /cvsroot/tools/org.eclipse.emf/plugins/org.eclipse.emf.edit.ui/src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java,v retrieving revision 1.10 diff -u -r1.10 PropertyDescriptor.java --- src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 21 Mar 2006 16:42:30 -0000 1.10 +++ src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 20 Apr 2006 11:59:10 -0000 @@ -162,7 +162,8 @@ } catch (Exception exception) { - return exception.getMessage(); + String message = exception.getMessage(); + return message == null ? exception.toString() : message; } } Diagnostic diagnostic = Diagnostician.INSTANCE.validate(EDataTypeCellEditor.this.eDataType, value); Patrick Sodre wrote: Hi, It seems that if the user inputs an invalid parameter that gets created from "createFromString" the Editor locks-up until the user explicitly calls "restore Default Value". Is this the expected behavior or could something better be done? For instance if an exception is thrown restore the value back to what it was before after displaying a pop-up error message. I understand that for DataTypes defined by the user he/she should take care of catching the exceptions but for the default ones like BigInteger/BigDecimal I think the EMF runtime could do some of the grunt work... If you think this is something worth pursuing I could post an entry in Bugzilla. Regards, Patrick Sodre Below is the stack trace that I got from the Editor... java.lang.NumberFormatException at java.math.BigDecimal.<init>(BigDecimal.java:368) at java.math.BigDecimal.<init>(BigDecimal.java:647) at org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createEBigDecimalFromString(EcoreFactoryImpl.java:559) at org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createFromString(EcoreFactoryImpl.java:116) at org.eclipse.emf.edit.ui.provider.PropertyDescriptor$EDataTypeCellEditor.doGetValue(PropertyDescriptor.java:183) at org.eclipse.jface.viewers.CellEditor.getValue(CellEditor.java:449) at org.eclipse.ui.views.properties.PropertySheetEntry.applyEditorValue(PropertySheetEntry.java:135) at org.eclipse.ui.views.properties.PropertySheetViewer.applyEditorValue(PropertySheetViewer.java:249) at ------- Comment #1 From Ed Merks 2006-04-20 15:09:23 -0400 ------- The fix has been committed to CVS. Thanks for reporting this problem. ------- Comment #2 From Marcelo Paternostro 2006-04-27 10:44:24 -0400 ------- Fixed in the I200604270000 built ------- Comment #3 From Nick Boldt 2008-01-28 16:46:51 -0400 ------- Move to verified as per bug 206558. SOURCE CODE Extracting Structural Information from Bug Reports (MSR 2008) METADATA
  34. 34. Bug 137808 Summary: Exceptions from createFromString lock-up the editor Product: [Modeling] EMF Reporter: Patrick Sodre <psodre@gmail.com> Component: Core Assignee: Marcelo Paternostro <marcelop@ca.ibm.com> Status: VERIFIED FIXED QA Contact: Severity: normal Priority: P3 CC: merks@ca.ibm.com Version: 2.2 Target Milestone: --- Hardware: PC OS: Windows XP Whiteboard: Description: Opened: 2006-04-20 14:25 - 0400 As discussed on the newsgroup under the Thread with the same name I am opening this bug entry. Here is a history of the thread. -- From Ed Merks Patrick, The value is checked before it's applied and can't be applied until it's valid. But this BigDecimal cases behaves oddly because the exception thrown by new BigDecimal("badvalue") has a null message and the property editor relies on returning a non-null message string to indicate there is an error. Please open a bugzilla which I'll fix like this: ### Eclipse Workspace Patch 1.0 #P org.eclipse.emf.edit.ui Index: src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java =================================================================== RCS file: /cvsroot/tools/org.eclipse.emf/plugins/org.eclipse.emf.edit.ui/src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java,v retrieving revision 1.10 diff -u -r1.10 PropertyDescriptor.java --- src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 21 Mar 2006 16:42:30 -0000 1.10 +++ src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 20 Apr 2006 11:59:10 -0000 @@ -162,7 +162,8 @@ } catch (Exception exception) { - return exception.getMessage(); + String message = exception.getMessage(); + return message == null ? exception.toString() : message; } } Diagnostic diagnostic = Diagnostician.INSTANCE.validate(EDataTypeCellEditor.this.eDataType, value); Patrick Sodre wrote: Hi, It seems that if the user inputs an invalid parameter that gets created from "createFromString" the Editor locks-up until the user explicitly calls "restore Default Value". Is this the expected behavior or could something better be done? For instance if an exception is thrown restore the value back to what it was before after displaying a pop-up error message. I understand that for DataTypes defined by the user he/she should take care of catching the exceptions but for the default ones like BigInteger/BigDecimal I think the EMF runtime could do some of the grunt work... If you think this is something worth pursuing I could post an entry in Bugzilla. Regards, Patrick Sodre Below is the stack trace that I got from the Editor... java.lang.NumberFormatException at java.math.BigDecimal.<init>(BigDecimal.java:368) at java.math.BigDecimal.<init>(BigDecimal.java:647) at org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createEBigDecimalFromString(EcoreFactoryImpl.java:559) at org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createFromString(EcoreFactoryImpl.java:116) at org.eclipse.emf.edit.ui.provider.PropertyDescriptor$EDataTypeCellEditor.doGetValue(PropertyDescriptor.java:183) at org.eclipse.jface.viewers.CellEditor.getValue(CellEditor.java:449) at org.eclipse.ui.views.properties.PropertySheetEntry.applyEditorValue(PropertySheetEntry.java:135) at org.eclipse.ui.views.properties.PropertySheetViewer.applyEditorValue(PropertySheetViewer.java:249) at ------- Comment #1 From Ed Merks 2006-04-20 15:09:23 -0400 ------- The fix has been committed to CVS. Thanks for reporting this problem. ------- Comment #2 From Marcelo Paternostro 2006-04-27 10:44:24 -0400 ------- Fixed in the I200604270000 built ------- Comment #3 From Nick Boldt 2008-01-28 16:46:51 -0400 ------- Move to verified as per bug 206558. SOURCE CODE PATCHES Extracting Structural Information from Bug Reports (MSR 2008) METADATA
  35. 35. Bug 137808 Summary: Exceptions from createFromString lock-up the editor Product: [Modeling] EMF Reporter: Patrick Sodre <psodre@gmail.com> Component: Core Assignee: Marcelo Paternostro <marcelop@ca.ibm.com> Status: VERIFIED FIXED QA Contact: Severity: normal Priority: P3 CC: merks@ca.ibm.com Version: 2.2 Target Milestone: --- Hardware: PC OS: Windows XP Whiteboard: Description: Opened: 2006-04-20 14:25 - 0400 As discussed on the newsgroup under the Thread with the same name I am opening this bug entry. Here is a history of the thread. -- From Ed Merks Patrick, The value is checked before it's applied and can't be applied until it's valid. But this BigDecimal cases behaves oddly because the exception thrown by new BigDecimal("badvalue") has a null message and the property editor relies on returning a non-null message string to indicate there is an error. Please open a bugzilla which I'll fix like this: ### Eclipse Workspace Patch 1.0 #P org.eclipse.emf.edit.ui Index: src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java =================================================================== RCS file: /cvsroot/tools/org.eclipse.emf/plugins/org.eclipse.emf.edit.ui/src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java,v retrieving revision 1.10 diff -u -r1.10 PropertyDescriptor.java --- src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 21 Mar 2006 16:42:30 -0000 1.10 +++ src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 20 Apr 2006 11:59:10 -0000 @@ -162,7 +162,8 @@ } catch (Exception exception) { - return exception.getMessage(); + String message = exception.getMessage(); + return message == null ? exception.toString() : message; } } Diagnostic diagnostic = Diagnostician.INSTANCE.validate(EDataTypeCellEditor.this.eDataType, value); Patrick Sodre wrote: Hi, It seems that if the user inputs an invalid parameter that gets created from "createFromString" the Editor locks-up until the user explicitly calls "restore Default Value". Is this the expected behavior or could something better be done? For instance if an exception is thrown restore the value back to what it was before after displaying a pop-up error message. I understand that for DataTypes defined by the user he/she should take care of catching the exceptions but for the default ones like BigInteger/BigDecimal I think the EMF runtime could do some of the grunt work... If you think this is something worth pursuing I could post an entry in Bugzilla. Regards, Patrick Sodre Below is the stack trace that I got from the Editor... java.lang.NumberFormatException at java.math.BigDecimal.<init>(BigDecimal.java:368) at java.math.BigDecimal.<init>(BigDecimal.java:647) at org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createEBigDecimalFromString(EcoreFactoryImpl.java:559) at org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createFromString(EcoreFactoryImpl.java:116) at org.eclipse.emf.edit.ui.provider.PropertyDescriptor$EDataTypeCellEditor.doGetValue(PropertyDescriptor.java:183) at org.eclipse.jface.viewers.CellEditor.getValue(CellEditor.java:449) at org.eclipse.ui.views.properties.PropertySheetEntry.applyEditorValue(PropertySheetEntry.java:135) at org.eclipse.ui.views.properties.PropertySheetViewer.applyEditorValue(PropertySheetViewer.java:249) at ------- Comment #1 From Ed Merks 2006-04-20 15:09:23 -0400 ------- The fix has been committed to CVS. Thanks for reporting this problem. ------- Comment #2 From Marcelo Paternostro 2006-04-27 10:44:24 -0400 ------- Fixed in the I200604270000 built ------- Comment #3 From Nick Boldt 2008-01-28 16:46:51 -0400 ------- Move to verified as per bug 206558. SCREENSHOTS SOURCE CODE PATCHES Extracting Structural Information from Bug Reports (MSR 2008) METADATA
  36. 36. Bug 137808 Summary: Exceptions from createFromString lock-up the editor Product: [Modeling] EMF Reporter: Patrick Sodre <psodre@gmail.com> Component: Core Assignee: Marcelo Paternostro <marcelop@ca.ibm.com> Status: VERIFIED FIXED QA Contact: Severity: normal Priority: P3 CC: merks@ca.ibm.com Version: 2.2 Target Milestone: --- Hardware: PC OS: Windows XP Whiteboard: Description: Opened: 2006-04-20 14:25 - 0400 As discussed on the newsgroup under the Thread with the same name I am opening this bug entry. Here is a history of the thread. -- From Ed Merks Patrick, The value is checked before it's applied and can't be applied until it's valid. But this BigDecimal cases behaves oddly because the exception thrown by new BigDecimal("badvalue") has a null message and the property editor relies on returning a non-null message string to indicate there is an error. Please open a bugzilla which I'll fix like this: ### Eclipse Workspace Patch 1.0 #P org.eclipse.emf.edit.ui Index: src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java =================================================================== RCS file: /cvsroot/tools/org.eclipse.emf/plugins/org.eclipse.emf.edit.ui/src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java,v retrieving revision 1.10 diff -u -r1.10 PropertyDescriptor.java --- src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 21 Mar 2006 16:42:30 -0000 1.10 +++ src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 20 Apr 2006 11:59:10 -0000 @@ -162,7 +162,8 @@ } catch (Exception exception) { - return exception.getMessage(); + String message = exception.getMessage(); + return message == null ? exception.toString() : message; } } Diagnostic diagnostic = Diagnostician.INSTANCE.validate(EDataTypeCellEditor.this.eDataType, value); Patrick Sodre wrote: Hi, It seems that if the user inputs an invalid parameter that gets created from "createFromString" the Editor locks-up until the user explicitly calls "restore Default Value". Is this the expected behavior or could something better be done? For instance if an exception is thrown restore the value back to what it was before after displaying a pop-up error message. I understand that for DataTypes defined by the user he/she should take care of catching the exceptions but for the default ones like BigInteger/BigDecimal I think the EMF runtime could do some of the grunt work... If you think this is something worth pursuing I could post an entry in Bugzilla. Regards, Patrick Sodre Below is the stack trace that I got from the Editor... java.lang.NumberFormatException at java.math.BigDecimal.<init>(BigDecimal.java:368) at java.math.BigDecimal.<init>(BigDecimal.java:647) at org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createEBigDecimalFromString(EcoreFactoryImpl.java:559) at org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createFromString(EcoreFactoryImpl.java:116) at org.eclipse.emf.edit.ui.provider.PropertyDescriptor$EDataTypeCellEditor.doGetValue(PropertyDescriptor.java:183) at org.eclipse.jface.viewers.CellEditor.getValue(CellEditor.java:449) at org.eclipse.ui.views.properties.PropertySheetEntry.applyEditorValue(PropertySheetEntry.java:135) at org.eclipse.ui.views.properties.PropertySheetViewer.applyEditorValue(PropertySheetViewer.java:249) at ------- Comment #1 From Ed Merks 2006-04-20 15:09:23 -0400 ------- The fix has been committed to CVS. Thanks for reporting this problem. ------- Comment #2 From Marcelo Paternostro 2006-04-27 10:44:24 -0400 ------- Fixed in the I200604270000 built ------- Comment #3 From Nick Boldt 2008-01-28 16:46:51 -0400 ------- Move to verified as per bug 206558. SCREENSHOTS SOURCE CODE PATCHES STACK TRACES Extracting Structural Information from Bug Reports (MSR 2008) METADATA
  37. 37. 3.6 Order of Extraction PATCHES STACK TRACES SOURCE CODE ENUMERATIONS loremm ipsum dolor met e4a this is a public String { dosomeThing(); } We have the following problem: - first you have to do - then you must do We propos the following patch file to be used: Index: someFile.java ===================== INPUT Index: PatchFilter.java ================== RCS File: PatchFilter.java --- PatchFilter.java 23.10.2007 +++ PatchFilter.java 24.10.2007 @@+7,13-7,14@@ This is a sample context line - This line will be removed + this line will be added instead PATCH Index: PatchFilter.java ================== RCS File: PatchFilter.java --- PatchFilter.java 23.10.2007 +++ PatchFilter.java 24.10.2007 @@+7,13-7,14@@ This is a sample context line - This line will be removed + this line will be added instead TRACE Index: PatchFilter.java ================== RCS File: PatchFilter.java --- PatchFilter.java 23.10.2007 +++ PatchFilter.java 24.10.2007 @@+7,13-7,14@@ This is a sample context line - This line will be removed + this line will be added instead CODE loremm ipsum dolor met e4a this is a public String { dosomeThing(); } We have the following problem: - first you have to do - then you must do We propos the following patch file to be used: Index: someFile.java ===================== OUTPUT Figure 3.10: We extract structural elements in a fixed sequence. The order in which the detection and extraction of elements is executed, is of great importance. Several structural elements interfere: • Patches vs. Enumerations Enumerations, especially itemization interfere with the hunk lines in patches. Both use the symbols “+” and “-”.
  38. 38. 3.6 Order of Extraction PATCHES STACK TRACES SOURCE CODE ENUMERATIONS loremm ipsum dolor met e4a this is a public String { dosomeThing(); } We have the following problem: - first you have to do - then you must do We propos the following patch file to be used: Index: someFile.java ===================== INPUT Index: PatchFilter.java ================== RCS File: PatchFilter.java --- PatchFilter.java 23.10.2007 +++ PatchFilter.java 24.10.2007 @@+7,13-7,14@@ This is a sample context line - This line will be removed + this line will be added instead PATCH Index: PatchFilter.java ================== RCS File: PatchFilter.java --- PatchFilter.java 23.10.2007 +++ PatchFilter.java 24.10.2007 @@+7,13-7,14@@ This is a sample context line - This line will be removed + this line will be added instead TRACE Index: PatchFilter.java ================== RCS File: PatchFilter.java --- PatchFilter.java 23.10.2007 +++ PatchFilter.java 24.10.2007 @@+7,13-7,14@@ This is a sample context line - This line will be removed + this line will be added instead CODE loremm ipsum dolor met e4a this is a public String { dosomeThing(); } We have the following problem: - first you have to do - then you must do We propos the following patch file to be used: Index: someFile.java ===================== OUTPUT Figure 3.10: We extract structural elements in a fixed sequence. The order in which the detection and extraction of elements is executed, is of great importance. Several structural elements interfere: • Patches vs. Enumerations Enumerations, especially itemization interfere with the hunk lines in patches. Both use the symbols “+” and “-”. reports. The evaluation is split into two parts: first, we want to focus on the correct identification of the presence of enumerations, patches, stack traces and source code in bug reports. Knowing the the reliability of our approach, we can then proceed in identifying how good the detected elements are extracted by our methods. Evaluation Setup We parsed 161,500 bug reports from the ECLIPSE project which were submit- ted between October 2001 and December 2007. For each report, INFOZILLA verified the presence of each of the four structural element types. For each element, it classified the report into one of two bins: B1 (report has Element) and B2 (report does not have Element). loremm ipsum dolor met e4a this is a public String { dosomeThing(); } We have the following problem: - first you have to do - then you must do We propos the following patch file to be used: Index: someFile.java ===================== INPUT Has Element? No Yes B1 B2 Figure 3.11: For each element we classified the report into two bins.
  39. 39. Master Report BUGthisasd asdlknasdklnasdlk askdnaklsdn aksdnlaksdnlkasdkn asd sadddda asdaddasd aksdnlaskdnlkansd Elements Extended Report BUGthisasd asdlknasdklnasdlk askdnaklsdn aksdnlaksdnlkasdkn asd sadddda asdaddasd aksdnlaskdnlkansd BUGthisasd asdlknasdklnasdlk askdnaklsdn aksdnlkasdkn asdasdasdasdasd a s adddda a daddasd asdasdasdasdasd askdnlkansd Elements compare
  40. 40. 5.2 Results 35 Average per master report Information item Master Extended Change⇤ Predefined fields – product 1.000 1.127 +0.127 – component 1.000 1.287 +0.287 – operating system 1.000 1.631 +0.631 – reported platform 1.000 1.241 +0.241 – version 0.927 1.413 +0.486 – reporter 1.000 2.412 +1.412 – priority 1.000 1.291 +0.291 – target milestone 0.654 0.794 +0.140 Patches – total 1.828 1.942 +0.113 – unique: patched files 1.061 1.124 +0.062 Screenshots – total 0.139 0.285 +0.145 – unique: filename, filesize 0.138 0.281 +0.143 Stacktraces – total 0.504 1.422 +0.918 – unique: exception 0.195 0.314 +0.118 – unique: exception, top frame 0.223 0.431 +0.207 – unique: exception, top 2 frames 0.229 0.458 +0.229 – unique: exception, top 3 frames 0.234 0.483 +0.248 – unique: exception, top 4 frames 0.239 0.504 +0.265 – unique: exception, top 5 frames 0.244 0.525 +0.281 ⇤ For all information items the increase is significant at p < .001. Table 5.1: Average amount of information added by duplicates. A reporter’s reputation can go a long way in influencing the future course of a 36 5. Additional Information in Duplicate Reports Average per master report Information item Master Extended Change⇤ Predefined fields – product 1.000 1.400 +0.400 – component 1.000 1.953 +0.953 – operating system 1.000 2.102 +1.102 – reported platform 1.000 1.544 +0.544 – version 0.814 0.979 +0.165 – reporter 1.000 3.705 +2.705 – priority 0.377 0.499 +0.122 – target milestone 0.433 0.558 +0.125 Patches – total 5.038 5.184 +0.146 – unique: patched files 2.003 2.067 +0.064 Screenshots – total 0.200 0.391 +0.191 – unique: filename, filesize 0.197 0.385 +0.187 Stacktraces – total 0.100 0.185 +0.085 – unique: exception 0.033 0.047 +0.014 – unique: exception, top frame 0.069 0.130 +0.061 – unique: exception, top 2 frames 0.072 0.136 +0.064 – unique: exception, top 3 frames 0.073 0.139 +0.066 – unique: exception, top 4 frames 0.074 0.141 +0.067 – unique: exception, top 5 frames 0.075 0.143 +0.068 ⇤ For all information items the increase is significant at p < .001. Table 5.2: Average amount of information added by duplicates. We compared stack traces considering the exception that was thrown and ECLIPSE MOZILLA ADDITIONAL INFORMATION
  41. 41. Duplicate bug reports can provide useful additional information. For example, we can find up to three times the stack traces which are helpful in fixing bugs
  42. 42. There is significant evidence of additional information in duplicate bug reports that is uniquely different from the information already reported.
  43. 43. PART 1 Is there extra information in duplicate reports and if so, can we quantify how much? PART 2 Is that extra information helpful for carrying out software engineering tasks?
  44. 44. Developer The Triage Problem
  45. 45. DeveloperReport BUG The Triage Problem
  46. 46. DeveloperReport BUG Fixed BUG ✓ The Triage Problem
  47. 47. BUG DeveloperReport BUG Fixed BUG ✓ BUG BUG BUG BUG BUG BUG The Triage Problem
  48. 48. BUG DeveloperReport BUG Fixed BUG ✓ BUG BUG BUG BUG BUG BUG Triager The Triage Problem
  49. 49. BUG DeveloperReport BUG Fixed BUG ✓ BUG BUG BUG BUG BUG BUG Triager The Triage Problem
  50. 50. A1 A2 An ... MASTER Class 3 A1 A2 An ... DUPLICATE n Class 2 A1 A2 An ... DUPLICATE 1 Class 3 A1 A2 An ... DUPLICATE n Class 3 ... A1 A2 An ... MASTER Class 2 A1 A2 An ... MASTER Class 3 ... A1 A2 An ... DUPLICATE n Class 1 A1 A2 An ... DUPLICATE 1 Class 2 A1 A2 An ... DUPLICATE n Class 2 A1 A2 An ... DUPLICATE 1 Class 3 A1 A2 An ... DUPLICATE n Class 3 ... A1 A2 An ... MASTER Class 2 A1 A2 An ... MASTER Class 1 A1 A2 An ... MASTER Class 3 ... A1 A2 An ... DUPLICATE 1 Class 1 A1 A2 An ... DUPLICATE n Class 1 ... A1 A2 An ... DUPLICATE 1 Class 2 A1 A2 An ... DUPLICATE n Class 2 A1 A2 An ... DUPLICATE 1 Class 3 A1 A2 An ... DUPLICATE n Class 3 ... “Whoever was assigned to the Master should have been assigned to any of the Duplicates.” “Only the person who was originally assigned to a report can fix it.” “Any person assigned to any of the reports in the duplicate group can provide a fix.”
  51. 51. Master reports, sorted chronologically Training Training Training Testing Fold 1 Fold 2 Fold 3 Fold 11 Testing Testing . . . . . . . . . . . . . . . . . . ....... Split into Run 1 Run 2 Run 10
  52. 52. 46 6. Additional Information can Help Developers Table 6.1: Percentages of reports correctly triaged to ECLIPSE developers. Run Model Result Training 1 2 3 4 5 6 7 8 9 10 All SVM Top 1 Master 15.45 19.28 19.03 19.80 25.80 26.44 22.09 27.08 27.71 29.12 23.18 Extended 18.39⇤ 20.95 22.22⇤ 21.46 27.84 28.48 23.37 30.52⇤ 30.78⇤ 30.52 25.45⇤ Top 3 Master 32.44 37.42 40.87 39.72 46.10 46.36 38.95 44.70 48.53 47.25 42.23 Extended 38.70⇤ 42.78⇤ 43.30 39.34 50.83⇤ 49.55⇤ 42.40⇤ 50.32⇤ 50.32 55.04⇤ 46.25⇤ Top 5 Master 41.89 46.87 47.38 47.64 54.66 56.96 47.51 52.36 56.58 56.45 50.83 Extended 47.38⇤ 52.11⇤ 53.00⇤ 51.85⇤ 60.54⇤ 59.90⇤ 51.09⇤ 58.11⇤ 60.28⇤ 65.26⇤ 55.95⇤ Bayes Top 1 Master 14.81 16.60 17.75 17.75 22.73 21.20 20.56 23.50 27.71 28.22 21.08 Extended 15.45 17.11 20.56⇤ 18.01 19.80⇤ 19.80 22.99 27.08⇤ 26.82 30.40⇤ 21.80 Top 3 Master 29.12 32.31 35.12 34.99 40.36 38.06 35.76 43.55 45.59 46.87 38.17 Extended 36.53⇤ 33.08 38.83⇤ 35.50 39.08 39.08 39.97⇤ 46.23 45.85 50.45⇤ 40.46⇤ Top 5 Master 38.44 42.40 45.72 45.21 50.70 47.64 44.06 51.85 54.92 55.17 47.61 Extended 45.72⇤ 44.70 48.02 43.55 48.91 50.45⇤ 49.43⇤ 55.30⇤ 54.28 58.49⇤ 49.88⇤ ⇤ Increase in accuracy is significant at p = .05 Table 6.2: Percentages of reports correctly triaged to MOZILLA developers. Run Model Result Training 1 2 3 4 5 6 7 8 9 10 All Top 1 Master 14.57 14.30 14.16 18.29 18.83 19.17 21.00 19.65 19.99 22.15 18.21 Extended 15.31 14.43 17.95 19.44 19.78 19.51 21.82 23.10 18.29 19.31 18.90 onal Information can Help Developers rectly triaged to ECLIPSE developers. Run 4 5 6 7 8 9 10 All 19.80 25.80 26.44 22.09 27.08 27.71 29.12 23.18 21.46 27.84 28.48 23.37 30.52⇤ 30.78⇤ 30.52 25.45⇤ 39.72 46.10 46.36 38.95 44.70 48.53 47.25 42.23 39.34 50.83⇤ 49.55⇤ 42.40⇤ 50.32⇤ 50.32 55.04⇤ 46.25⇤ 47.64 54.66 56.96 47.51 52.36 56.58 56.45 50.83 51.85⇤ 60.54⇤ 59.90⇤ 51.09⇤ 58.11⇤ 60.28⇤ 65.26⇤ 55.95⇤ 17.75 22.73 21.20 20.56 23.50 27.71 28.22 21.08 18.01 19.80⇤ 19.80 22.99 27.08⇤ 26.82 30.40⇤ 21.80 34.99 40.36 38.06 35.76 43.55 45.59 46.87 38.17 35.50 39.08 39.08 39.97⇤ 46.23 45.85 50.45⇤ 40.46⇤ 45.21 50.70 47.64 44.06 51.85 54.92 55.17 47.61 43.55 48.91 50.45⇤ 49.43⇤ 55.30⇤ 54.28 58.49⇤ 49.88⇤ rectly triaged to MOZILLA developers. Run 4 5 6 7 8 9 10 All 18.29 18.83 19.17 21.00 19.65 19.99 22.15 18.21 19.44 19.78 19.51 21.82 23.10 18.29 19.31 18.90 Bayes Top 3 Master 29.12 32.31 35.12 34.99 40.36 38.06 35.76 43.55 45.59 46.87 38.17 Extended 36.53⇤ 33.08 38.83⇤ 35.50 39.08 39.08 39.97⇤ 46.23 45.85 50.45⇤ 40.46⇤ Top 5 Master 38.44 42.40 45.72 45.21 50.70 47.64 44.06 51.85 54.92 55.17 47.61 Extended 45.72⇤ 44.70 48.02 43.55 48.91 50.45⇤ 49.43⇤ 55.30⇤ 54.28 58.49⇤ 49.88⇤ ⇤ Increase in accuracy is significant at p = .05 Table 6.2: Percentages of reports correctly triaged to MOZILLA developers. Run Model Result Training 1 2 3 4 5 6 7 8 9 10 All SVM Top 1 Master 14.57 14.30 14.16 18.29 18.83 19.17 21.00 19.65 19.99 22.15 18.21 Extended 15.31 14.43 17.95 19.44 19.78 19.51 21.82 23.10 18.29 19.31 18.90 Top 3 Master 28.59 28.46 31.84 37.53 36.52 39.30 41.26 44.58 42.82 43.09 37.40 Extended 32.38 30.15 36.86 39.70 37.26 40.72 43.29 47.83 42.48 39.36 39.00 Top 5 Master 37.13 36.04 41.67 46.41 44.99 48.92 50.75 56.03 53.52 51.22 46.67 Extended 42.48 39.77 46.07 49.80 49.05 54.27 53.32 60.57 54.74 49.66 49.98 Bayes Top 1 Master 15.11 12.60 16.94 17.62 17.01 19.44 18.22 25.81 25.47 27.98 19.62 Extended 15.24 13.75 18.50 20.39 19.78 23.51 23.31 26.22 24.46 25.88 21.10 Top 3 Master 27.71 29.67 34.42 37.94 35.70 40.18 40.04 44.58 45.33 43.90 37.94 Extended 32.11 29.40 36.72 39.50 39.70 44.24 44.24 48.85 45.87 44.17 40.48 Top 5 Master 35.77 37.74 43.09 47.09 44.99 51.90 49.46 54.13 55.15 51.83 47.11 Extended 40.72 39.63 45.05 49.66 48.58 54.47 54.74 59.49 55.76 53.52 50.16 Importantly, all but the Top 1 results using Naïve Bayes in the last column were significant, too. Thus, the results demonstrate that bug reports can be better triaged by considering a larger set of existing bug reports by including duplicate reports. Bayes Top 3 Master 29.12 32.31 35.12 34. Extended 36.53⇤ 33.08 38.83⇤ 35. Top 5 Master 38.44 42.40 45.72 45. Extended 45.72⇤ 44.70 48.02 43. ⇤ Increase in accuracy is significant at p = .05 Table 6.2: Percentages of reports correc Model Result Training 1 2 3 SVM Top 1 Master 14.57 14.30 14.16 18. Extended 15.31 14.43 17.95 19. Top 3 Master 28.59 28.46 31.84 37. Extended 32.38 30.15 36.86 39. Top 5 Master 37.13 36.04 41.67 46. Extended 42.48 39.77 46.07 49. Bayes Top 1 Master 15.11 12.60 16.94 17. Extended 15.24 13.75 18.50 20. Top 3 Master 27.71 29.67 34.42 37. Extended 32.11 29.40 36.72 39. Top 5 Master 35.77 37.74 43.09 47. Extended 40.72 39.63 45.05 49. Importantly, all but the Top 1 results us were significant, too. Thus, the results d better triaged by considering a larger set duplicate reports. ECLIPSE MOZILLA
  53. 53. The information contained in Duplicate reports the improves accuracy of Machine Learning algorithms when solving for the Bug Triage problem.
  • JeongwhanChoi

    Jan. 28, 2020

Duplicate Bug Reports are widely considered harmful, adding additional burden on developers, and holding up software development processes. 10 Years ago we demonstrated that duplicate reports contain valuable additional information that helps developers get their jobs done faster and better. Thus duplicate reports should not be thrown away, but instead merged with their original reports to make that helpful information available to practitioners. This talk is a 10 year retrospective.

Views

Total views

179

On Slideshare

0

From embeds

0

Number of embeds

18

Actions

Downloads

1

Shares

0

Comments

0

Likes

1

×