Mining Unstructured Software Repositories Using IR Models

Mining Unstructured Software
Repositories Using IRModels
Stephen W. Thomas
PhD Candidate
Queen’s University
BBAA

2
Stephen W. Thomas
Mining Software Repositories with Topic Models.
ICSE 2011
Stephen W. Thomas, Hadi Hemmati, Ahmed E. Hassan, and Dorothea Blostein
Static TestC ase Prioritization Using Topic Models.
Empirical Software Engineering, 2012
Stephen W. Thomas, Nicolas Bettenburg, Ahmed E. Hassan, and Dorothea Blostein
Talk and Work: Recovering the Relationship between Mailing ListDiscussions and Development
Activity.
Empirical Software Engineering, 2nd
round
Stephen W. Thomas, Meiyappan Nagappan , Ahmed E. Hassan, and Dorothea Blostein
The ImpactofC lassifierC onfiguration and C lassifierC ombination on Bug Localization.
IEEE Transactions on Software Engineering, 2nd
round
Stephen W. Thomas, Bram Adams, Ahmed E. Hassan, and Dorothea Blostein
Validating the Use ofTopic Models forSoftware Evolution.
SCAM 2010
Modeling the Evolution ofTopics in Source C ode Histories.
MSR 2011
Studying Software Evolution Using Topic Models.
Science of Computer Programming, 2012

code changes
logs
bugs
email
reqs
bug prediction
traceability linking
feature location
architecture recovery
change pattern detection
3

00:03:45: E22344, 76, 90.3,
00:03:46: E2f3a4, 82, 95.0,
00:03:56: E22345, 78, 96.6,
00:04:15: E22344, 23, 95.1,
00:04:35: E23348, 65, 95.7,
00:04:37: E2234b, 56, 93.1,
00:04:38: E2234b, 54, 95.0,
00:04:39: E22a34, 98, 95.1,
00:05:42: E353f4, 65, 94.7,
00:05:42: E3556j, 45, 95.2,
00:05:42: E3545g, 63, 92.8,
00:05:42: E354r4, 94, 95.6,
source code comments
bug reports
emails
requirement descriptions
forum and blog posts
commit messages
source code identifiers
4

NPE caused by
no spashscreen
handler service
available
Provide unittests for link
creation constraints, unit tests
fail in standalone build
5

Service
pricing
Confer
6
pricing
Conference
Service

9
The research and practice of using IR models to
mine software repositories can be improved by
(i) considering additional software engineering
tasks, such as prioritizing test cases;
(ii) using advanced IR techniques, such as
combining multiple IR models; and
(iii) better understanding the assumptions and
parameters of IR models.

Test Case Prioritization
Less similar
Higher prioritySimilarity
identifiers
comments
string literals
Part 1
10[EMSE 2012]
structural-based IR-based

Source code ↔ Email Interaction
cleaning and
preprocessing
identifiers
comments
string literals
mail codeXML
printing
installation
GUI
Code
Mail
Time
Activity
XML
Monitoring project status
Software explanation
Training and documentation
11
Part 1
[EMSE 20XX]

Combining Multiple IRModels
identifiers
comments
string literalsBug
report
Bug
report
Similarity
title
description
Best individual
IR model
Random subset,
combined
13
Part 2
[TSE 20XX] sets had improved performance median improvement

XML concept
Swing concept
Encryption concept
Time
Popularity
Concept Evolution Models
identifiers
comments
string literals
14
Part 2
[SCP 2012]
[SCAM 2010]
accuracy of topic evolutions

Data Duplication Problem
identical
16
Part 3
[MSR 2011] accuracysensitivity

Preprocessing and ParameterEffects
Code representation
identifiers? comments?
past bug reports?
Bug report representation
title? description?
Preprocessing
split identifiers? remove stop words?
word stemming?
IR Model parameters
term weighting?
No. of topics? similarity measure?
No. of iterations?
Configuration matters!
worst:
best:
mean:
17
Part 3
[TSE 20XX]
“configuration”

New!
1
2
3
18
Part
Part
Part
Proposed and evaluated a technique to prioritize test cases
Proposed and evaluated a technique to analyze the interaction of source code and mailing lists
Described and evaluated a technique to analyze code histories using topic evolution models
Proposed and evaluated a frameworkforcombining the results of disparate IR models
Overcame the data duplication problem in large source code histories
Analyzed the sensitivity of IRmodels to data preprocessing and IR model parameters

Mining Unstructured Software Repositories Using IR Models

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Mining Unstructured Software Repositories Using IR Models

Similar to Mining Unstructured Software Repositories Using IR Models (20)

More from SAIL_QU

More from SAIL_QU (20)

Recently uploaded

Recently uploaded (20)

Mining Unstructured Software Repositories Using IR Models

Editor's Notes