SlideShare a Scribd company logo
1 of 78
Download to read offline
Think Locally, Act Globally
                       Improving Defect and Effort Prediction Models

                            Nicolas Bettenburg • Meiyappan Nagappan • Ahmed E. Hassan
                                                Queen’s University • Kingston, ON, Canada




                                                                      SOFTWARE ANALYSIS
                                                                       & INTELLIGENCE LAB
                                                                                            T
Saturday, 2 June, 12
Data Modelling in Empirical SE


                             measured from project data

                                Observations




                                                          2

Saturday, 2 June, 12
Data Modelling in Empirical SE


                                      measured from project data

                                          Observations



                       describe observations
                          mathematically       Model




                                                                   2

Saturday, 2 June, 12
Data Modelling in Empirical SE


                                      measured from project data

                                          Observations



                       describe observations
                          mathematically       Model                  Prediction
                                                                   guide decision making




                                         Understanding
                            guide process optimizations and future research


                                                                                           2

Saturday, 2 June, 12
Model Building Today




                  Whole Dataset




                                                     3

Saturday, 2 June, 12
Model Building Today




                  Whole Dataset      Training Data




                                      Testing Data




                                                     3

Saturday, 2 June, 12
Model Building Today




                  Whole Dataset      Training Data   Learned Model
                                                           M




                                      Testing Data




                                                                     3

Saturday, 2 June, 12
Model Building Today




                  Whole Dataset      Training Data   Learned Model
                                                           M




                                                            Y


                                      Testing Data     Predictions




                                                                     3

Saturday, 2 June, 12
Model Building Today




                  Whole Dataset      Training Data   Learned Model
                                                           M




                                                            Y


                                      Testing Data     Predictions

                                       Compare




                                                                     3

Saturday, 2 June, 12
Much Research Effort on
                       new metrics and new models!




                                                     4

Saturday, 2 June, 12
Maybe we need to look more at the data part




Saturday, 2 June, 12
In the Field




Saturday, 2 June, 12
In the Field




        Tom Zimmermann




Saturday, 2 June, 12
In the Field
                            We ran 622 cross-project
                         predictions and found that only
                             3.4% actually worked.




        Tom Zimmermann




Saturday, 2 June, 12
In the Field
                            We ran 622 cross-project
                         predictions and found that only
                             3.4% actually worked.




        Tom Zimmermann




                                                  Tim Menzies
Saturday, 2 June, 12
In the Field
                                            We ran 622 cross-project
                                         predictions and found that only
                                             3.4% actually worked.




        Tom Zimmermann




                             Rather than focus on
                       generalities, empirical SE should
                        focus more on context-specific
                                   principles.

                                                                  Tim Menzies
Saturday, 2 June, 12
In the Field
                                            We ran 622 cross-project
                                         predictions and found that only
                                             3.4% actually worked.




        Tom Zimmermann                 Taking local properties of data into
                                      consideration leads to better models!



                             Rather than focus on
                       generalities, empirical SE should
                        focus more on context-specific
                                   principles.

                                                                  Tim Menzies
Saturday, 2 June, 12
Using Locality in Statistical Models




Saturday, 2 June, 12
Using Locality in Statistical Models


             1         Does this principle work for statistical models?




Saturday, 2 June, 12
Using Locality in Statistical Models


             1         Does this principle work for statistical models?

             2         Does it work for Prediction?




Saturday, 2 June, 12
Using Locality in Statistical Models


             1         Does this principle work for statistical models?

             2         Does it work for Prediction?


             3         Can we do better?




Saturday, 2 June, 12
Building Local Models




                 Whole Dataset       Training Data   Learned Model
                                                           M




                                                            Y

                                     Testing Data      Predictions




                                                                     8

Saturday, 2 June, 12
Building Local Models


                                         ter Data
                                     Clus

                 Whole Dataset       Training Data   Learned Model
                                                           M




                                                            Y

                                     Testing Data      Predictions




                                                                     8

Saturday, 2 June, 12
Building Local Models
                                                              ltiple
                                                         n Mu
                                             Data    Lear dels
                                         ter             Mo
                                     Clus

                 Whole Dataset       Training Data   Learned Models
                                                        M1   M2   M3




                                                             Y

                                     Testing Data       Predictions




                                                                       8

Saturday, 2 June, 12
Building Local Models
                                                              ltiple
                                                         n Mu
                                             Data    Lear dels
                                         ter             Mo
                                     Clus

                 Whole Dataset       Training Data   Learned Models
                                                        M1       M2   M3




                                                             Y    Y   Y


                                     Testing Data       Predictions



                                                              dict
                                                          Pre ally
                                                        Ind ividu


                                                                           8

Saturday, 2 June, 12
Building Local Models
                                                              ltiple
                                                         n Mu
                                             Data    Lear dels
                                         ter             Mo
                                     Clus

                 Whole Dataset       Training Data   Learned Models
                                                        M1       M2   M3




                                                             Y    Y   Y


                                     Testing Data       Predictions

                                      Compare
                                                              dict
                                                          Pre ally
                                                        Ind ividu


                                                                           8

Saturday, 2 June, 12
HAPTER 2.
                                   Global StatisticalMODELS
                        GENERAL ASPECTS OF FITTING REGRESSION
                                                              Model                                                 34




                           f(X)




                                          0          1         2          3         4          5            6

                                                                          X

                                  Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                                9

 Saturday, 2 June, 12
HAPTER 2.
                                   Global StatisticalMODELS
                        GENERAL ASPECTS OF FITTING REGRESSION
                                                              Model                                                 34




                           f(X)




                                          0          1         2          3         4          5            6

                                                                          X

                                  Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                                9

 Saturday, 2 June, 12
HAPTER 2.
                                   Global StatisticalMODELS
                        GENERAL ASPECTS OF FITTING REGRESSION
                                                              Model                                                 34




                           f(X)




                                          0          1         2          3         4          5            6

                                                                          X

                                  Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                                9

 Saturday, 2 June, 12
HAPTER 2.
                                   Global StatisticalMODELS
                        GENERAL ASPECTS OF FITTING REGRESSION
                                                              Model                                                 34




                           f(X)




                                          0          1         2          3         4          5            6

                                                                          X


           Model fit leaves much room for improvement!
                                  Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                                9

 Saturday, 2 June, 12
Local Statistical Model
CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           3




                        f(X)




                                       0          1         2          3         4          5            6

                                                                       X

                               Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
                                                                                                             10

 Saturday, 2 June, 12
Local Statistical Model
CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           3




                        f(X)




                                       0          1         2          3         4          5            6

                                                                       X

                               Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
                                                                                                             10

 Saturday, 2 June, 12
Local Statistical Model
CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           3




                        f(X)




                                                                                         Model 2

                                                      Model 1

                                       0          1         2          3         4          5            6

                                                                       X

                               Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
                                                                                                             10

 Saturday, 2 June, 12
Local Statistical Model
CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           3




                        f(X)




                                                                                         Model 2

                                                      Model 1

                                       0          1         2          3         4          5            6

                                                                       X


                                                   Improved Fit!
                               Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
                                                                                                             10

 Saturday, 2 June, 12
How can we use this approach to get an
                  even better fit?




Saturday, 2 June, 12
Be Even More Local !
HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           34




                       f(X)




                                      0          1         2          3         4          5            6

                                                                      X

                              Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                            12

Saturday, 2 June, 12
Be Even More Local !
HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           34




                       f(X)




                                      0          1         2          3         4          5            6

                                                                      X

                              Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                            12

Saturday, 2 June, 12
Be Even More Local !
HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           34




                       f(X)




                                      0          1         2          3         4          5            6

                                                                      X

                              Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                            12

Saturday, 2 June, 12
Be Even More Local !
HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           34




                       f(X)




                                                                            Great Fit!


                                      0          1         2          3         4          5            6

                                                                      X

                              Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                            12

Saturday, 2 June, 12
Be Even More Local !
HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           34




                       f(X)




                                                                            Great Fit!
                                      BUT: Risk of Overfitting the Data!!
                                      0          1         2          3         4          5            6

                                                                      X

                              Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                            12

Saturday, 2 June, 12
Saturday, 2 June, 12
Clustering independent of Fit




Saturday, 2 June, 12
CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS
GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                34




                                                                                                         f(X)
     f(X)




                                                                                                                        0          1         2          3         4          5            6
                    0          1         2          3         4          5            6
                                                                                                                                                        X
                                                    X
                                                                                                                Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
            Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.




                                                                                                                   C(Y |X) = f (X) = X ,
               C(Y |X) = f (X) = X ,
                                          where X                                                               = 0 + 1 X1 + 2 X2 + 3 X3 + 4
X           = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 ,
                                          and
                                                                                                            X1 = X X2 = (X                                                        a)+ 14
                        X1 = X X2 = (X                                        a)+
Saturday, 2 June, 12
                                                                                                      X3 = (X b)+ X4 = (X                                                         c)+.
Optimize Local Fit wrt. Minimizing Global Overfit


                                                                                          CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS
GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                34




                                                                                                         f(X)
     f(X)




                                                                                                                        0          1         2          3         4          5            6
                    0          1         2          3         4          5            6
                                                                                                                                                        X
                                                    X
                                                                                                                Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
            Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.




                                                                                                                   C(Y |X) = f (X) = X ,
               C(Y |X) = f (X) = X ,
                                          where X                                                               = 0 + 1 X1 + 2 X2 + 3 X3 + 4
X           = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 ,
                                          and
                                                                                                            X1 = X X2 = (X                                                        a)+ 14
                        X1 = X X2 = (X                                        a)+
Saturday, 2 June, 12
                                                                                                      X3 = (X b)+ X4 = (X                                                         c)+.
Optimize Local Fit wrt. Minimizing Global Overfit
 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                                                                                                          34




                                                                                              CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS
GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                    34

                                             f(X)




                                                                                                             f(X)
     f(X)




                                                                                                                            0          1         2          3         4          5            6
                    0          1         2          3         4          5            6
                                                                                                                                                            X
                                                    X
                                                                                                                    Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
            Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
                                                                  0                   1   2          3              4                 5                6

                                                                                                     X
                                                                                                                        C(Y |X) = f (X) = X ,
             C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5.
                          Figure 2.1: A
                                        ,
                                                    where X = 0 + 1X1 + 2X2 + 3X3 + 4
X           = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 ,
                                                    and
                                                                                                                X1 = X X2 = (X                                                        a)+ 14
                        X1 = X X2 = (X                                        a)+
Saturday, 2 June, 12                                     C(Y |X) = f (X) = X ,                            X3 = (X b)+ X4 = (X                                                         c)+.
Optimize Local Fit wrt. Minimizing Global Overfit
 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                                                                                                          34




                                                                                              CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS
GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                    34

                                             f(X)




                                                                                                             f(X)
     f(X)




                                                                                                                            0          1         2          3         4          5            6
                    0          1         2          3         4          5            6
                                                                                                                                                            X
                                                    X
                                                                                                                    Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
            Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
                                                                  0                   1   2          3              4                 5                6

                                                                                                     X
                                                                                                                        C(Y |X) = f (X) = X ,
             C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5.
                          Figure 2.1: A
                                        ,
                                                    where X = 0 + 1X1 + 2X2 + 3X3 + 4
X           = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 ,
                                                    and
                                                                                                                X1 = X X2 = (X                                                        a)+ 14
                        X1 = X X2 = (X                                        a)+
Saturday, 2 June, 12                                     C(Y |X) = f (X) = X ,                            X3 = (X b)+ X4 = (X                                                         c)+.
Optimize Local Fit wrt. Minimizing Global Overfit
 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                                                                                                          34




                                                                                              CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS
GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                    34

                                             f(X)




                                                                                                             f(X)
     f(X)




                                                                                                                            0          1         2          3         4          5            6
                    0          1         2          3         4          5            6
                                                                                                                                                            X
                                                    X
                                                                                                                    Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
            Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
                                                                  0                   1   2          3              4                 5                6

                                                                                                     X
                                                                                                                        C(Y |X) = f (X) = X ,
             C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5.
                          Figure 2.1: A
                                        ,
                                                    where X = 0 + 1X1 + 2X2 + 3X3 + 4
X           = Multivariate2 Adaptive4X4,
              0 + 1X1 + 2X + 3X3 + Regression Splines (MARS)
                                                    and
                                                                                                                X1 = X X2 = (X                                                        a)+ 14
                        X1 = X X2 = (X                                        a)+
Saturday, 2 June, 12                                     C(Y |X) = f (X) = X ,                            X3 = (X b)+ X4 = (X                                                         c)+.
Optimize Local Fit wrt. Minimizing Global Overfit
 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                                                                                                          34




                                                                                              CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS
GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                    34

                                             f(X)




                                                                                                             f(X)
     f(X)




                                                                                                                            0          1         2          3         4          5            6
                    0          1         2          3         4          5            6
                                                                                                                                                            X
                                                    X
                                                                                                                    Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
            Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
                                                                  0                   1   2          3              4                 5                6

                                                                                                     X
                                                                                                                        C(Y |X) = f (X) = X ,
             C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5.
                          Figure 2.1: A
                                        ,
                                                    where X = 0 + 1X1 + 2X2 + 3X3 + 4
X           = Multivariate2 Adaptive4X4,
              0 + 1X1 + 2X + 3X3 + Regression Splines (MARS)
                                                    and
                     create local knowledge that optimizes process globally
                                                                                                                X1 = X X2 = (X                                                        a)+ 14
                        X1 = X X2 = (X                                        a)+
Saturday, 2 June, 12                                     C(Y |X) = f (X) = X ,                            X3 = (X b)+ X4 = (X                                                         c)+.
Case Study




                       15

Saturday, 2 June, 12
Case Study


                   Xalan 2.6
                               Post-Release Defects per Class
                                               20 CK Metrics
                 Lucene 2.4




                                                                15

Saturday, 2 June, 12
Case Study


                   Xalan 2.6
                                Post-Release Defects per Class
                                                20 CK Metrics
                 Lucene 2.4



                               Total Development Effort in Hours
                       CHINA
                                                 14 FP Metrics




                                                                   15

Saturday, 2 June, 12
Case Study


                   Xalan 2.6
                                Post-Release Defects per Class
                                                20 CK Metrics
                 Lucene 2.4



                               Total Development Effort in Hours
                       CHINA
                                                 14 FP Metrics



                                 Development Length in Months
                   NasaCoc              24 COCOMO-II Metrics
                                                                   15

Saturday, 2 June, 12
Results: Goodness of Fit

                  Rank-Correlation (0 = worst fit, 1 = optimal fit)




                                                                    16

Saturday, 2 June, 12
Results: Goodness of Fit

                  Rank-Correlation (0 = worst fit, 1 = optimal fit)
                                                Local
                                    Global                 MARS
                                             (Clustered)

                       Xalan 2.6     0.33       0.52       0.69

                       Lucene 2.4    0.32       0.60       0.83

                        CHINA        0.83       0.89       0.89

                       NasaCOC       0.93       0.97       0.99




                                                                    16

Saturday, 2 June, 12
Results: Goodness of Fit

                  Rank-Correlation (0 = worst fit, 1 = optimal fit)
                                                Local
                                    Global                 MARS
                                             (Clustered)

                       Xalan 2.6     0.33       0.52       0.69

                       Lucene 2.4    0.32       0.60       0.83

                        CHINA        0.83       0.89       0.89

                       NasaCOC       0.93       0.97       0.99




                                                                    16

Saturday, 2 June, 12
Results: Goodness of Fit

                  Rank-Correlation (0 = worst fit, 1 = optimal fit)
                                                Local
                                    Global                 MARS
                                             (Clustered)

                       Xalan 2.6     0.33       0.52       0.69

                       Lucene 2.4    0.32       0.60       0.83

                        CHINA        0.83       0.89       0.89

                       NasaCOC       0.93       0.97       0.99




                                                                    16

Saturday, 2 June, 12
Results: Goodness of Fit

                  Rank-Correlation (0 = worst fit, 1 = optimal fit)
                                                Local
                                    Global                 MARS
                                             (Clustered)

                       Xalan 2.6     0.33       0.52       0.69

                       Lucene 2.4    0.32       0.60       0.83

                        CHINA        0.83       0.89       0.89

                       NasaCOC       0.93       0.97       0.99




                                                                    16

Saturday, 2 June, 12
Results: Goodness of Fit

                                  Rank-Correlation (0 = worst fit, 1 = optimal fit)
                                                                                         Local
                                                                 Global                                                 MARS
                                                                                      (Clustered)

                              8    Xalan 2.6                      0.33                        0.52                         0.69
         Number of Clusters




                                                                                                                                         Dataset
                              6
                                                                                                                                            CHINA

                              4   Lucene 2.4                      0.32                        0.60                         0.83             Lucene 2.4
                                                                                                                                            NasaCoc
                                                                                                                                            Xalan 2.6
                              2


                              0     CHINA                         0.83                        0.89                         0.89
                                     Fold01   Fold02    Fold03    Fold04   Fold05    Fold06    Fold07    Fold08   Fold09    Fold10




                                   NasaCOC                        0.93                        0.97                         0.99
                                         Figure 3: Number of clusters generated by MCLUST in each run of the 10-fold cross validation.
  term for each additional prediction variable entering the                               is too small to continue or until a maximum number of terms
  regression model [23].                                                                  is reached. In our case study, the maximum number of terms
      For practical purposes, we use a publicly available imple-                          is automatically determined by the implementation, and is
  mentation of BIC-based model selection, contained in the                                based on the amount of independent variables we give as
  R package: BMA. The input to the BMA implementation                                     input. For MARS models, we use all independent variables
  is the dataset itself, as well as a list of all dependent and                           in a dataset after VIF analysis.
  independent variables that should be considered. In our case                               The first phase often builds a model that suffers from
                                                                                                                                               16
  study, we always supply a list of all independent variables                             overfitting. As a result, the second phase, called the back-
Saturday,were 12
  that 2 June, left after VIF analysis. The output of the BMA                             ward phase, prunes the model, to increase the model’s gen-
Results: Goodness of Fit

                  Rank-Correlation (0 = worst fit, 1 = optimal fit)
                                                Local
                                    Global                 MARS
                                             (Clustered)

                       Xalan 2.6     0.33       0.52       0.69

                       Lucene 2.4    0.32       0.60       0.83

                        CHINA        0.83       0.89       0.89

                       NasaCOC       0.93       0.97       0.99


   UP TO 2.5x BETTER FIT WHEN USING DATA LOCALITY!
                                                                    16

Saturday, 2 June, 12
Results: Prediction Error                           Global      Local         MARS



                       0.7                             1.2

                0.525                                  0.9

                   0.35      0.64                      0.6      1.15     1.15
                                      0.52                                       0.94
                0.175                           0.4    0.3

                         0                              0
                                    Xalan 2.6                      Lucene 2.4
                       800                              4

                       600                              3

                       400   765                        2
                                                                3.26
                                     552.85
                       200                              1                2.14
                                                                                 1.63
                                              234.43
                        0                               0
                                     CHINA                             NasaCoC




                                                                                        17

Saturday, 2 June, 12
Results: Prediction Error                           Global      Local         MARS



                       0.7                             1.2

                0.525                                  0.9

                   0.35      0.64                      0.6      1.15     1.15
                                      0.52                                       0.94
                0.175                           0.4    0.3

                         0                              0
                                    Xalan 2.6                      Lucene 2.4
                       800                              4

                       600                              3

                       400   765                        2
                                                                3.26
                                     552.85
                       200                              1                2.14
                                                                                 1.63
                                              234.43
                        0                               0
                                     CHINA                             NasaCoC


           Up to 4x lower prediction error with Local Models!
                                                                                        17

Saturday, 2 June, 12
?
                Model
            Interpretation




Saturday, 2 June, 12
Model Interpretation
        0.5
                             1 avg_cc                                         2 ca                                              3 cam                                                   4 cbm




                                                      0.80




                                                                                                                                                                1.1
                                                                                                         0.52




                                                                                                                                                                                                                           1.6
        −0.5




                                                      0.70




                                                                                                                                                                0.9
                                                                                                         0.48




                                                                                                                                                                                                                           1.2
        −1.5




                                                      0.60




                                                                                                                                                                0.7
                                                                                                         0.44
                                                      0.50




                                                                                                                                                                0.5
        −2.5




                                                                                                                                                                                                                           0.8
               0         5        10     15      20          0           50            100         150          0.0   0.2       0.4     0.6       0.8     1.0          0   5       10       15     20   25   30                  0.0



                                  5 ce                                        6 dam                                              7 dit                                                      8 ic
        0.62




                                                                                                         0.6




                                                                                                                                                                                                                           0.8
                                                                                                                                                                0.65
        0.58




                                                                                                         0.5
                                                      0.45




                                                                                                                                                                                                                           0.6
                                                                                                                                                                0.60
                                                                                                         0.4
        0.54




                                                                                                                                                                0.55




                                                                                                                                                                                                                           0.4
                                                                                                         0.3
                                                      0.35
        0.50




                                                                                                                                                                0.50




                                                                                                                                                                                                                           0.2
               0    10       20     30   40   50             0.0   0.2    0.4         0.6    0.8   1.0          1     2     3    4      5     6     7      8           0       1        2          3     4        5              1




                          (a)lcom of a global 10 lcom3 learned on the Xalan 2.6 dataset
                            9
                               Part            Model              11 loc           12 max_cc
                                                                                                                                                                                                                      (b) P
        1.8




                                                      0.7




                                                                                                                                                                                                                           6
                                                                                                                                                                                                                      2.6 d
                                                                                                         2.0




                                                                                                                                                                4
                                                      0.6




                                                                                                                                                                                                                           5
        1.4




                                                                                                                                                                                                                           4
                                                                                                                                                                3
                                                      0.5




                                                                                                         1.5


Figure 6: Global models report general trends, while global models with local c
        1.0




                                                                                                                                                                                                                           3
                                                                                                                                                                2
                                                      0.4




                                                                                                         1.0




                                                                                                                                                                                                                           2
                                                                                                                                                                1
                                                      0.3
        0.6




describes the response (in this case bugs) while keeping all other prediction variab
                                                                                                         0.5




                                                                                                                                                                                                                           1
               0   1000           3000    5000               0.0    0.5         1.0         1.5    2.0          0     1000       2000       3000        4000           0   20      40    60      80      120                     0


                                                                         Fold 9, Cluster 1
                             13 mfa                                       14 moa                                                15 noc                                                  16 npm                              pr
                                                                                                         0.50
                                                      0.58




                                                                                                                                                                                                                           1.0
        0.51




                                         ic                                                   npm                                                              mfa
                                                                                                                                                                                                                            O
                                                                                                                                                                0.70




                                                                                                                                                                                                                           0.5
                                                                                                                                                                                                                      19
        0.49




                                                                                                         0.46




                                                                                                                                                                                                                            w




                                                                                                                                                                                                                           0.0
                                                      0.54




                                                                                                                                                                0.60
        .47




Saturday, 2 June, 12
Model Interpretation
        0.5
                             1 avg_cc                                         2 ca                                              3 cam                                                   4 cbm




                                                      0.80




                                                                                                                                                                1.1
                                                                                                         0.52




                                                                                                                                                                                                                           1.6
        −0.5




                                                      0.70




                                                                                                                                                                0.9
                                                                                                         0.48




                                                                                                                                                                                                                           1.2
        −1.5




                                                      0.60




                                                                                                                                                                0.7
                                                                                                         0.44
                                                      0.50




                                                                                                                                                                0.5
        −2.5




                                                                                                                                                                                                                           0.8
               0         5        10     15      20          0           50            100         150          0.0   0.2       0.4     0.6       0.8     1.0          0   5       10       15     20   25   30                  0.0



                                  5 ce                                        6 dam                                              7 dit                                                      8 ic
        0.62




                                                                                                         0.6




                                                                                                                                                                                                                           0.8
                                                                                                                                                                0.65
        0.58




                                                                                                         0.5
                                                      0.45




                                                                                                                                                                                                                           0.6
                                                                                                                                                                0.60
                                                                                                         0.4
        0.54




                                                                                                                                                                0.55




                                                                                                                                                                                                                           0.4
                                                                                                         0.3
                                                      0.35
        0.50




                                                                                                                                                                0.50




                                                                                                                                                                                                                           0.2
               0    10       20     30   40   50             0.0   0.2    0.4         0.6    0.8   1.0          1     2     3    4      5     6     7      8           0       1        2          3     4        5              1




                          (a)lcom of a global 10 lcom3 learned on the Xalan 2.6 dataset
                            9
                               Part            Model              11 loc           12 max_cc
                                                                                                                                                                                                                      (b) P
        1.8




                                                      0.7




                                                                                                                                                                                                                           6
                                                                                                                                                                                                                      2.6 d
                                                                                                         2.0




                                                                                                                                                                4
                                                      0.6




                                                                                                                                                                                                                           5
        1.4




                                                                                                                                                                                                                           4
                                                                                                                                                                3
                                                      0.5




                                                                                                         1.5


Figure 6: Global models report general trends, while global models with local c
          Traditional Global Model: General Trends
        1.0




                                                                                                                                                                                                                           3
                                                                                                                                                                2
                                                      0.4




                                                                                                         1.0




                                                                                                                                                                                                                           2
                                                                                                                                                                1
                                                      0.3
        0.6




describes the response (in this case bugs) while keeping all other prediction variab
                                                                                                         0.5




                                                                                                                                                                                                                           1
               0   1000           3000    5000               0.0    0.5         1.0         1.5    2.0          0     1000       2000       3000        4000           0   20      40    60      80      120                     0


                                                                         Fold 9, Cluster 1
                             13 mfa                                       14 moa                                                15 noc                                                  16 npm                              pr
                                                                                                         0.50
                                                      0.58




                                                                                                                                                                                                                           1.0
        0.51




                                         ic                                                   npm                                                              mfa
                                                                                                                                                                                                                            O
                                                                                                                                                                0.70




                                                                                                                                                                                                                           0.5
                                                                                                                                                                                                                      19
        0.49




                                                                                                         0.46




                                                                                                                                                                                                                            w




                                                                                                                                                                                                                           0.0
                                                      0.54




                                                                                                                                                                0.60
        .47




Saturday, 2 June, 12
Model Interpretation
        0.5
                             1 avg_cc                                         2 ca                                              3 cam                                                   4 cbm




                                                      0.80




                                                                                                                                                                1.1
                                                                                                         0.52




                                                                                                                                                                                                                           1.6
        −0.5




                                                      0.70




                                                                                                                                                                0.9
                                                                                                         0.48




                                                                                                                                                                                                                           1.2
        −1.5




                                                      0.60




                                                                                                                                                                0.7
                                                                                                         0.44
                                                      0.50




                                                                                                                                                                0.5
        −2.5




                                                                                                                                                                                                                           0.8
               0         5        10     15      20          0           50            100         150          0.0   0.2       0.4     0.6       0.8     1.0          0   5       10       15     20   25   30                  0.0



                                  5 ce                                        6 dam                                              7 dit                                                      8 ic
        0.62




                                                                                                         0.6




                                                                                                                                                                                                                           0.8
                                                                                                                                                                0.65
        0.58




                                                                                                         0.5
                                                      0.45




                                                                                                                                                                                                                           0.6
                                                                                                                                                                0.60
                                                                                                         0.4
        0.54




                                                                                                                                                                0.55




                                                                                                                                                                                                                           0.4
                                                                                                         0.3
                                                      0.35
        0.50




                                                                                                                                                                0.50




                                                                                                                                                                                                                           0.2
               0    10       20     30   40   50             0.0   0.2    0.4         0.6    0.8   1.0          1     2     3    4      5     6     7      8           0       1        2          3     4        5              1




                          (a)lcom of a global 10 lcom3 learned on the Xalan 2.6 dataset
                            9
                               Part            Model              11 loc           12 max_cc
                                                                                                                                                                                                                      (b) P
        1.8




                                                      0.7




                                                                                                                                                                                                                           6
                                                                                                                                                                                                                      2.6 d
                                                                                                         2.0




                                                                                                                                                                4
                                                      0.6




                                                                                                                                                                                                                           5
        1.4




                                                                                                                                                                                                                           4
                                                                                                                                                                3
                                                      0.5




                                                                                                         1.5


Figure 6: Global models report general trends, while global models with local c
             Traditional Global Model: General Trends
        1.0




                                                                                                                                                                                                                           3
                                                                                                                                                                2
                                                      0.4




                                                                                                         1.0




                                                                                                                                                                                                                           2
describes One Curve per metric, run corp on all other prediction variab
          the response (in this case bugs) while keeping that curve



                                                                                                                                                                1
                                                      0.3
        0.6




                                                                                                         0.5




                                                                                                                                                                                                                           1
               0   1000           3000    5000               0.0    0.5         1.0         1.5    2.0          0     1000       2000       3000        4000           0   20      40    60      80      120                     0


                                                                         Fold 9, Cluster 1
                             13 mfa                                       14 moa                                                15 noc                                                  16 npm                              pr
                                                                                                         0.50
                                                      0.58




                                                                                                                                                                                                                           1.0
        0.51




                                         ic                                                   npm                                                              mfa
                                                                                                                                                                                                                            O
                                                                                                                                                                0.70




                                                                                                                                                                                                                           0.5
                                                                                                                                                                                                                      19
        0.49




                                                                                                         0.46




                                                                                                                                                                                                                            w




                                                                                                                                                                                                                           0.0
                                                      0.54




                                                                                                                                                                0.60
        .47




Saturday, 2 June, 12
1
                                                                                                                                                                                                                                                                      4
                                                                                  0.3 0.4 0.




                                                                                                                                        0.5 1.0 1.




                                                                                                                                                                                                       3
                                                                                0.3 0.4 0.5
  Figure 6: Global models report general trends, while global models with local considerations give insig




                                                                                                                                      0.5 1.0 1.5
 Model Interpretation
 Figure 6: Global models report general trends, while global models with local considerations give insight
        1.0




                                                                                                                                                                                                                                                               3




                                                                                                                                                                                                                                                                                                                      1.0
                                                                                                                                                                                                  2
      1.0




                                                                                                                                                                                                                                                                      3




                                                                                                                                                                                                                                                                                                                            1.0
                                                                                                                                                                                                       2




                                                                                                                                                                                                                                                               2
                                                                                                                                                                                                                                                                      2
                                                                                                                                                                                                  1
        0.6
  describes the response (in this case bugs) while keeping all other prediction variables atat their median val
   describes the response (in this case bugs) while keeping all other prediction variables their median value




                                                                                                                                                                                                                                                                                                                      0.8
                                                                                                                                                                                                                                                               1
                                                                                                                                                                                                       1
      0.6




                                                                                                                                                                                                                                                                                                                            0.8
                                                                                                                                                                                                                                                                      1
                                            0    1000     3000    5000                          0.0   0.5     1.0    1.5        2.0                  0   1000 2000 3000 4000                                        0       20 40 60 80             120                   0         1000 2000 3000 4000                     0.0      0.2    0.4
                                        0       1000     3000    5000                          0.0    0.5     1.0    1.5        2.0                  0   1000 2000 3000 4000                                            0    20 40 60 80              120                       0     1000 2000 3000 4000                         0.0    0.2    0


                                Fold 9, Cluster 1 15 noc
                                Fold 9, Cluster 1
                                                                             prediction models lead
                                                                              prediction models lea
              13 mfa             14 moa                              16 npm               13 npm




                                                                                                                                         0.50
                                                                                                                                                                                                                                                                                                   13 npm




                                                                                   0.58
                                                        13 mfa                                              14 moa                                            15 noc                                                                   16 npm




                                                                                                                                                                                                                                                                  0.0 0.5 1.0
                                 0.51




                                                                                                                                      0.50
                                                                                0.58
                      ic                npm                mfa
                                                                             Our findings thus co




                                                                                                                                                                                                                                                               0.0 0.5 1.0
                              0.51




                                                                                                                                                                                                     0.70
                     ic                 npm                mfa
                                                                              Our findings thus c




                                                                                                                                                                                                  0.70
                         0.49




                                                                                                                                         0.46
                                                                             who observed a asimil
                      0.49




                                                                                   0.54




                                                                                                                                      0.46
                                                                              who observed sim




                                                                                                                                                                                                             0.60
                                                                                0.54
                 0.47




                                                                                                                                                                                                          0.60
                                                                            Clustermachine-lear
                                                                             WHICH 1
              0.47




                                                                                                                                         0.42
                                                                              WHICH machine-lea




                                                                                                                                                                                                                                                                  −1.0
                                                                                                                                      0.42




                                                                                                                                                                                                     0.50
                                                                                   0.50
         0.45




                                                                                                                                                                                                                                                               −1.0
                                                                                                                                                                                                  0.50
                                                                                0.50
      0.45




                                                                             have practical implic
                                 0.0   0.2  0.4  0.6              0.8     1.0                   0       5       10         15                        0   5   10   15              20   25   30                      0       20    40    60    80    100 120               0         20        40     60 80 100 120


                                                                              have practical impli
                               0.0   0.2   0.4  0.6              0.8     1.0                   0        5       10         15                        0   5   10      15           20   25    30                         0    20    40    60    80    100 120                    0        20        40 60 80 100 120


                                  0  2   4  6   8 10
                                                                             using regression mod
                                  0  2   4  6   8 10
                                                                              using regression mo
                                                                             are more insightful th
                                 Fold 9, Cluster 6 ...
                                                                              are more insightful t
                                                                             general trends across
                                 Fold 9, Cluster 6                            general trends acros
                      ic                 npm               mfa
                                                                             demonstrated that such
                     ic                  npm               mfa
                                                                              demonstrated that su
                                                                             particular parts of the


                                                                                                                                                                  0 01 12 2 3 3
                                                                              particular parts of th
                                                                             in the Xalan 2.6 def
                                                                              in the Xalan 2.6 de
                                                                            Cluster 6 are infl
                                                                             sets of classes
                0  1   2  3 4     0 10 20 30 40   60
                                                                              sets of classes are in
                                                                             as inheritance, cohes
               0  1   2  3  4     0 10 20 30 40    60                         as inheritance, coh
                                                                             reinforce the recomm
      Figure 7: Example of contradicting trends in local models (Xalan 2.6,
    Figure 17: Example ofin Fold 9). trends in local models (Xalan 2.6,
                              contradicting
                                                                             the use of the recom
                                                                              reinforce a “one-size
      Cluster and Cluster 6                                                  model, whenatrying to
                                                                              the use of “one-si
    Cluster 1 and Cluster 6 in Fold 9).                                       model, when trying t
      model already partition the data into regions with individual
    model already partition the data into regions increase of ic
      properties. For example, we observe that an with individual B. Act Globally
    properties. For example, we observethrough parent classes)                B. Act Globally
      (measuring the inheritance coupling that an increase of ic When the goal is carry
    (measuring the only have a negative effect on bug-proneness
      is predicted to                                                         When the goal is car
                         inheritance coupling through parent classes) understanding, local m   20

Saturday, predicted to only have a negative effect on bug-proneness
    is 2 June, 12                                                             understanding, local
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

More Related Content

More from Nicolas Bettenburg

10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...Nicolas Bettenburg
 
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...Nicolas Bettenburg
 
Mining Development Repositories to Study the Impact of Collaboration on Softw...
Mining Development Repositories to Study the Impact of Collaboration on Softw...Mining Development Repositories to Study the Impact of Collaboration on Softw...
Mining Development Repositories to Study the Impact of Collaboration on Softw...Nicolas Bettenburg
 
Using Fuzzy Code Search to Link Code Fragments in Discussions to Source Code
Using Fuzzy Code Search to Link Code Fragments in Discussions to Source CodeUsing Fuzzy Code Search to Link Code Fragments in Discussions to Source Code
Using Fuzzy Code Search to Link Code Fragments in Discussions to Source CodeNicolas Bettenburg
 
A Lightweight Approach to Uncover Technical Information in Unstructured Data
A Lightweight Approach to Uncover Technical Information in Unstructured DataA Lightweight Approach to Uncover Technical Information in Unstructured Data
A Lightweight Approach to Uncover Technical Information in Unstructured DataNicolas Bettenburg
 
Managing Community Contributions: Lessons Learned from a Case Study on Andro...
Managing Community Contributions:  Lessons Learned from a Case Study on Andro...Managing Community Contributions:  Lessons Learned from a Case Study on Andro...
Managing Community Contributions: Lessons Learned from a Case Study on Andro...Nicolas Bettenburg
 
Studying the impact of Social Structures on Software Quality
Studying the impact of Social Structures on Software QualityStudying the impact of Social Structures on Software Quality
Studying the impact of Social Structures on Software QualityNicolas Bettenburg
 
An Empirical Study on Inconsistent Changes to Code Clones at Release Level
An Empirical Study on Inconsistent Changes to Code Clones at Release LevelAn Empirical Study on Inconsistent Changes to Code Clones at Release Level
An Empirical Study on Inconsistent Changes to Code Clones at Release LevelNicolas Bettenburg
 
An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...
An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...
An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...Nicolas Bettenburg
 
Finding Paths in Large Spaces - A* and Hierarchical A*
Finding Paths in Large Spaces - A* and Hierarchical A*Finding Paths in Large Spaces - A* and Hierarchical A*
Finding Paths in Large Spaces - A* and Hierarchical A*Nicolas Bettenburg
 
Automatic Identification of Bug Introducing Changes
Automatic Identification of Bug Introducing ChangesAutomatic Identification of Bug Introducing Changes
Automatic Identification of Bug Introducing ChangesNicolas Bettenburg
 
Cloning Considered Harmful Considered Harmful
Cloning Considered Harmful Considered HarmfulCloning Considered Harmful Considered Harmful
Cloning Considered Harmful Considered HarmfulNicolas Bettenburg
 
Predictors of Customer Perceived Quality
Predictors of Customer Perceived QualityPredictors of Customer Perceived Quality
Predictors of Customer Perceived QualityNicolas Bettenburg
 
Extracting Structural Information from Bug Reports.
Extracting Structural Information from Bug Reports.Extracting Structural Information from Bug Reports.
Extracting Structural Information from Bug Reports.Nicolas Bettenburg
 
Computing Accuracy Precision And Recall
Computing Accuracy Precision And RecallComputing Accuracy Precision And Recall
Computing Accuracy Precision And RecallNicolas Bettenburg
 
Duplicate Bug Reports Considered Harmful ... Really?
Duplicate Bug Reports Considered Harmful ... Really?Duplicate Bug Reports Considered Harmful ... Really?
Duplicate Bug Reports Considered Harmful ... Really?Nicolas Bettenburg
 
The Quality of Bug Reports in Eclipse ETX'07
The Quality of Bug Reports in Eclipse ETX'07The Quality of Bug Reports in Eclipse ETX'07
The Quality of Bug Reports in Eclipse ETX'07Nicolas Bettenburg
 

More from Nicolas Bettenburg (20)

10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...
 
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...
 
Mining Development Repositories to Study the Impact of Collaboration on Softw...
Mining Development Repositories to Study the Impact of Collaboration on Softw...Mining Development Repositories to Study the Impact of Collaboration on Softw...
Mining Development Repositories to Study the Impact of Collaboration on Softw...
 
Using Fuzzy Code Search to Link Code Fragments in Discussions to Source Code
Using Fuzzy Code Search to Link Code Fragments in Discussions to Source CodeUsing Fuzzy Code Search to Link Code Fragments in Discussions to Source Code
Using Fuzzy Code Search to Link Code Fragments in Discussions to Source Code
 
A Lightweight Approach to Uncover Technical Information in Unstructured Data
A Lightweight Approach to Uncover Technical Information in Unstructured DataA Lightweight Approach to Uncover Technical Information in Unstructured Data
A Lightweight Approach to Uncover Technical Information in Unstructured Data
 
Managing Community Contributions: Lessons Learned from a Case Study on Andro...
Managing Community Contributions:  Lessons Learned from a Case Study on Andro...Managing Community Contributions:  Lessons Learned from a Case Study on Andro...
Managing Community Contributions: Lessons Learned from a Case Study on Andro...
 
Mud flash
Mud flashMud flash
Mud flash
 
Studying the impact of Social Structures on Software Quality
Studying the impact of Social Structures on Software QualityStudying the impact of Social Structures on Software Quality
Studying the impact of Social Structures on Software Quality
 
An Empirical Study on Inconsistent Changes to Code Clones at Release Level
An Empirical Study on Inconsistent Changes to Code Clones at Release LevelAn Empirical Study on Inconsistent Changes to Code Clones at Release Level
An Empirical Study on Inconsistent Changes to Code Clones at Release Level
 
An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...
An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...
An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...
 
Fuzzy Logic in Smart Homes
Fuzzy Logic in Smart HomesFuzzy Logic in Smart Homes
Fuzzy Logic in Smart Homes
 
Finding Paths in Large Spaces - A* and Hierarchical A*
Finding Paths in Large Spaces - A* and Hierarchical A*Finding Paths in Large Spaces - A* and Hierarchical A*
Finding Paths in Large Spaces - A* and Hierarchical A*
 
Automatic Identification of Bug Introducing Changes
Automatic Identification of Bug Introducing ChangesAutomatic Identification of Bug Introducing Changes
Automatic Identification of Bug Introducing Changes
 
Cloning Considered Harmful Considered Harmful
Cloning Considered Harmful Considered HarmfulCloning Considered Harmful Considered Harmful
Cloning Considered Harmful Considered Harmful
 
Approximation Algorithms
Approximation AlgorithmsApproximation Algorithms
Approximation Algorithms
 
Predictors of Customer Perceived Quality
Predictors of Customer Perceived QualityPredictors of Customer Perceived Quality
Predictors of Customer Perceived Quality
 
Extracting Structural Information from Bug Reports.
Extracting Structural Information from Bug Reports.Extracting Structural Information from Bug Reports.
Extracting Structural Information from Bug Reports.
 
Computing Accuracy Precision And Recall
Computing Accuracy Precision And RecallComputing Accuracy Precision And Recall
Computing Accuracy Precision And Recall
 
Duplicate Bug Reports Considered Harmful ... Really?
Duplicate Bug Reports Considered Harmful ... Really?Duplicate Bug Reports Considered Harmful ... Really?
Duplicate Bug Reports Considered Harmful ... Really?
 
The Quality of Bug Reports in Eclipse ETX'07
The Quality of Bug Reports in Eclipse ETX'07The Quality of Bug Reports in Eclipse ETX'07
The Quality of Bug Reports in Eclipse ETX'07
 

Recently uploaded

Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 

Recently uploaded (20)

Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 

Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

  • 1. Think Locally, Act Globally Improving Defect and Effort Prediction Models Nicolas Bettenburg • Meiyappan Nagappan • Ahmed E. Hassan Queen’s University • Kingston, ON, Canada SOFTWARE ANALYSIS & INTELLIGENCE LAB T Saturday, 2 June, 12
  • 2. Data Modelling in Empirical SE measured from project data Observations 2 Saturday, 2 June, 12
  • 3. Data Modelling in Empirical SE measured from project data Observations describe observations mathematically Model 2 Saturday, 2 June, 12
  • 4. Data Modelling in Empirical SE measured from project data Observations describe observations mathematically Model Prediction guide decision making Understanding guide process optimizations and future research 2 Saturday, 2 June, 12
  • 5. Model Building Today Whole Dataset 3 Saturday, 2 June, 12
  • 6. Model Building Today Whole Dataset Training Data Testing Data 3 Saturday, 2 June, 12
  • 7. Model Building Today Whole Dataset Training Data Learned Model M Testing Data 3 Saturday, 2 June, 12
  • 8. Model Building Today Whole Dataset Training Data Learned Model M Y Testing Data Predictions 3 Saturday, 2 June, 12
  • 9. Model Building Today Whole Dataset Training Data Learned Model M Y Testing Data Predictions Compare 3 Saturday, 2 June, 12
  • 10. Much Research Effort on new metrics and new models! 4 Saturday, 2 June, 12
  • 11. Maybe we need to look more at the data part Saturday, 2 June, 12
  • 13. In the Field Tom Zimmermann Saturday, 2 June, 12
  • 14. In the Field We ran 622 cross-project predictions and found that only 3.4% actually worked. Tom Zimmermann Saturday, 2 June, 12
  • 15. In the Field We ran 622 cross-project predictions and found that only 3.4% actually worked. Tom Zimmermann Tim Menzies Saturday, 2 June, 12
  • 16. In the Field We ran 622 cross-project predictions and found that only 3.4% actually worked. Tom Zimmermann Rather than focus on generalities, empirical SE should focus more on context-specific principles. Tim Menzies Saturday, 2 June, 12
  • 17. In the Field We ran 622 cross-project predictions and found that only 3.4% actually worked. Tom Zimmermann Taking local properties of data into consideration leads to better models! Rather than focus on generalities, empirical SE should focus more on context-specific principles. Tim Menzies Saturday, 2 June, 12
  • 18. Using Locality in Statistical Models Saturday, 2 June, 12
  • 19. Using Locality in Statistical Models 1 Does this principle work for statistical models? Saturday, 2 June, 12
  • 20. Using Locality in Statistical Models 1 Does this principle work for statistical models? 2 Does it work for Prediction? Saturday, 2 June, 12
  • 21. Using Locality in Statistical Models 1 Does this principle work for statistical models? 2 Does it work for Prediction? 3 Can we do better? Saturday, 2 June, 12
  • 22. Building Local Models Whole Dataset Training Data Learned Model M Y Testing Data Predictions 8 Saturday, 2 June, 12
  • 23. Building Local Models ter Data Clus Whole Dataset Training Data Learned Model M Y Testing Data Predictions 8 Saturday, 2 June, 12
  • 24. Building Local Models ltiple n Mu Data Lear dels ter Mo Clus Whole Dataset Training Data Learned Models M1 M2 M3 Y Testing Data Predictions 8 Saturday, 2 June, 12
  • 25. Building Local Models ltiple n Mu Data Lear dels ter Mo Clus Whole Dataset Training Data Learned Models M1 M2 M3 Y Y Y Testing Data Predictions dict Pre ally Ind ividu 8 Saturday, 2 June, 12
  • 26. Building Local Models ltiple n Mu Data Lear dels ter Mo Clus Whole Dataset Training Data Learned Models M1 M2 M3 Y Y Y Testing Data Predictions Compare dict Pre ally Ind ividu 8 Saturday, 2 June, 12
  • 27. HAPTER 2. Global StatisticalMODELS GENERAL ASPECTS OF FITTING REGRESSION Model 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 9 Saturday, 2 June, 12
  • 28. HAPTER 2. Global StatisticalMODELS GENERAL ASPECTS OF FITTING REGRESSION Model 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 9 Saturday, 2 June, 12
  • 29. HAPTER 2. Global StatisticalMODELS GENERAL ASPECTS OF FITTING REGRESSION Model 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 9 Saturday, 2 June, 12
  • 30. HAPTER 2. Global StatisticalMODELS GENERAL ASPECTS OF FITTING REGRESSION Model 34 f(X) 0 1 2 3 4 5 6 X Model fit leaves much room for improvement! Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 9 Saturday, 2 June, 12
  • 31. Local Statistical Model CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 3 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 10 Saturday, 2 June, 12
  • 32. Local Statistical Model CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 3 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 10 Saturday, 2 June, 12
  • 33. Local Statistical Model CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 3 f(X) Model 2 Model 1 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 10 Saturday, 2 June, 12
  • 34. Local Statistical Model CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 3 f(X) Model 2 Model 1 0 1 2 3 4 5 6 X Improved Fit! Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 10 Saturday, 2 June, 12
  • 35. How can we use this approach to get an even better fit? Saturday, 2 June, 12
  • 36. Be Even More Local ! HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 12 Saturday, 2 June, 12
  • 37. Be Even More Local ! HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 12 Saturday, 2 June, 12
  • 38. Be Even More Local ! HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 12 Saturday, 2 June, 12
  • 39. Be Even More Local ! HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) Great Fit! 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 12 Saturday, 2 June, 12
  • 40. Be Even More Local ! HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) Great Fit! BUT: Risk of Overfitting the Data!! 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 12 Saturday, 2 June, 12
  • 42. Clustering independent of Fit Saturday, 2 June, 12
  • 43. CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. C(Y |X) = f (X) = X , C(Y |X) = f (X) = X , where X = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 , and X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+ Saturday, 2 June, 12 X3 = (X b)+ X4 = (X c)+.
  • 44. Optimize Local Fit wrt. Minimizing Global Overfit CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. C(Y |X) = f (X) = X , C(Y |X) = f (X) = X , where X = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 , and X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+ Saturday, 2 June, 12 X3 = (X b)+ X4 = (X c)+.
  • 45. Optimize Local Fit wrt. Minimizing Global Overfit CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 0 1 2 3 4 5 6 X C(Y |X) = f (X) = X , C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A , where X = 0 + 1X1 + 2X2 + 3X3 + 4 X = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 , and X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+ Saturday, 2 June, 12 C(Y |X) = f (X) = X , X3 = (X b)+ X4 = (X c)+.
  • 46. Optimize Local Fit wrt. Minimizing Global Overfit CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 0 1 2 3 4 5 6 X C(Y |X) = f (X) = X , C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A , where X = 0 + 1X1 + 2X2 + 3X3 + 4 X = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 , and X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+ Saturday, 2 June, 12 C(Y |X) = f (X) = X , X3 = (X b)+ X4 = (X c)+.
  • 47. Optimize Local Fit wrt. Minimizing Global Overfit CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 0 1 2 3 4 5 6 X C(Y |X) = f (X) = X , C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A , where X = 0 + 1X1 + 2X2 + 3X3 + 4 X = Multivariate2 Adaptive4X4, 0 + 1X1 + 2X + 3X3 + Regression Splines (MARS) and X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+ Saturday, 2 June, 12 C(Y |X) = f (X) = X , X3 = (X b)+ X4 = (X c)+.
  • 48. Optimize Local Fit wrt. Minimizing Global Overfit CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 0 1 2 3 4 5 6 X C(Y |X) = f (X) = X , C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A , where X = 0 + 1X1 + 2X2 + 3X3 + 4 X = Multivariate2 Adaptive4X4, 0 + 1X1 + 2X + 3X3 + Regression Splines (MARS) and create local knowledge that optimizes process globally X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+ Saturday, 2 June, 12 C(Y |X) = f (X) = X , X3 = (X b)+ X4 = (X c)+.
  • 49. Case Study 15 Saturday, 2 June, 12
  • 50. Case Study Xalan 2.6 Post-Release Defects per Class 20 CK Metrics Lucene 2.4 15 Saturday, 2 June, 12
  • 51. Case Study Xalan 2.6 Post-Release Defects per Class 20 CK Metrics Lucene 2.4 Total Development Effort in Hours CHINA 14 FP Metrics 15 Saturday, 2 June, 12
  • 52. Case Study Xalan 2.6 Post-Release Defects per Class 20 CK Metrics Lucene 2.4 Total Development Effort in Hours CHINA 14 FP Metrics Development Length in Months NasaCoc 24 COCOMO-II Metrics 15 Saturday, 2 June, 12
  • 53. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) 16 Saturday, 2 June, 12
  • 54. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) Xalan 2.6 0.33 0.52 0.69 Lucene 2.4 0.32 0.60 0.83 CHINA 0.83 0.89 0.89 NasaCOC 0.93 0.97 0.99 16 Saturday, 2 June, 12
  • 55. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) Xalan 2.6 0.33 0.52 0.69 Lucene 2.4 0.32 0.60 0.83 CHINA 0.83 0.89 0.89 NasaCOC 0.93 0.97 0.99 16 Saturday, 2 June, 12
  • 56. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) Xalan 2.6 0.33 0.52 0.69 Lucene 2.4 0.32 0.60 0.83 CHINA 0.83 0.89 0.89 NasaCOC 0.93 0.97 0.99 16 Saturday, 2 June, 12
  • 57. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) Xalan 2.6 0.33 0.52 0.69 Lucene 2.4 0.32 0.60 0.83 CHINA 0.83 0.89 0.89 NasaCOC 0.93 0.97 0.99 16 Saturday, 2 June, 12
  • 58. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) 8 Xalan 2.6 0.33 0.52 0.69 Number of Clusters Dataset 6 CHINA 4 Lucene 2.4 0.32 0.60 0.83 Lucene 2.4 NasaCoc Xalan 2.6 2 0 CHINA 0.83 0.89 0.89 Fold01 Fold02 Fold03 Fold04 Fold05 Fold06 Fold07 Fold08 Fold09 Fold10 NasaCOC 0.93 0.97 0.99 Figure 3: Number of clusters generated by MCLUST in each run of the 10-fold cross validation. term for each additional prediction variable entering the is too small to continue or until a maximum number of terms regression model [23]. is reached. In our case study, the maximum number of terms For practical purposes, we use a publicly available imple- is automatically determined by the implementation, and is mentation of BIC-based model selection, contained in the based on the amount of independent variables we give as R package: BMA. The input to the BMA implementation input. For MARS models, we use all independent variables is the dataset itself, as well as a list of all dependent and in a dataset after VIF analysis. independent variables that should be considered. In our case The first phase often builds a model that suffers from 16 study, we always supply a list of all independent variables overfitting. As a result, the second phase, called the back- Saturday,were 12 that 2 June, left after VIF analysis. The output of the BMA ward phase, prunes the model, to increase the model’s gen-
  • 59. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) Xalan 2.6 0.33 0.52 0.69 Lucene 2.4 0.32 0.60 0.83 CHINA 0.83 0.89 0.89 NasaCOC 0.93 0.97 0.99 UP TO 2.5x BETTER FIT WHEN USING DATA LOCALITY! 16 Saturday, 2 June, 12
  • 60. Results: Prediction Error Global Local MARS 0.7 1.2 0.525 0.9 0.35 0.64 0.6 1.15 1.15 0.52 0.94 0.175 0.4 0.3 0 0 Xalan 2.6 Lucene 2.4 800 4 600 3 400 765 2 3.26 552.85 200 1 2.14 1.63 234.43 0 0 CHINA NasaCoC 17 Saturday, 2 June, 12
  • 61. Results: Prediction Error Global Local MARS 0.7 1.2 0.525 0.9 0.35 0.64 0.6 1.15 1.15 0.52 0.94 0.175 0.4 0.3 0 0 Xalan 2.6 Lucene 2.4 800 4 600 3 400 765 2 3.26 552.85 200 1 2.14 1.63 234.43 0 0 CHINA NasaCoC Up to 4x lower prediction error with Local Models! 17 Saturday, 2 June, 12
  • 62. ? Model Interpretation Saturday, 2 June, 12
  • 63. Model Interpretation 0.5 1 avg_cc 2 ca 3 cam 4 cbm 0.80 1.1 0.52 1.6 −0.5 0.70 0.9 0.48 1.2 −1.5 0.60 0.7 0.44 0.50 0.5 −2.5 0.8 0 5 10 15 20 0 50 100 150 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 0.0 5 ce 6 dam 7 dit 8 ic 0.62 0.6 0.8 0.65 0.58 0.5 0.45 0.6 0.60 0.4 0.54 0.55 0.4 0.3 0.35 0.50 0.50 0.2 0 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 1 (a)lcom of a global 10 lcom3 learned on the Xalan 2.6 dataset 9 Part Model 11 loc 12 max_cc (b) P 1.8 0.7 6 2.6 d 2.0 4 0.6 5 1.4 4 3 0.5 1.5 Figure 6: Global models report general trends, while global models with local c 1.0 3 2 0.4 1.0 2 1 0.3 0.6 describes the response (in this case bugs) while keeping all other prediction variab 0.5 1 0 1000 3000 5000 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 0 20 40 60 80 120 0 Fold 9, Cluster 1 13 mfa 14 moa 15 noc 16 npm pr 0.50 0.58 1.0 0.51 ic npm mfa O 0.70 0.5 19 0.49 0.46 w 0.0 0.54 0.60 .47 Saturday, 2 June, 12
  • 64. Model Interpretation 0.5 1 avg_cc 2 ca 3 cam 4 cbm 0.80 1.1 0.52 1.6 −0.5 0.70 0.9 0.48 1.2 −1.5 0.60 0.7 0.44 0.50 0.5 −2.5 0.8 0 5 10 15 20 0 50 100 150 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 0.0 5 ce 6 dam 7 dit 8 ic 0.62 0.6 0.8 0.65 0.58 0.5 0.45 0.6 0.60 0.4 0.54 0.55 0.4 0.3 0.35 0.50 0.50 0.2 0 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 1 (a)lcom of a global 10 lcom3 learned on the Xalan 2.6 dataset 9 Part Model 11 loc 12 max_cc (b) P 1.8 0.7 6 2.6 d 2.0 4 0.6 5 1.4 4 3 0.5 1.5 Figure 6: Global models report general trends, while global models with local c Traditional Global Model: General Trends 1.0 3 2 0.4 1.0 2 1 0.3 0.6 describes the response (in this case bugs) while keeping all other prediction variab 0.5 1 0 1000 3000 5000 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 0 20 40 60 80 120 0 Fold 9, Cluster 1 13 mfa 14 moa 15 noc 16 npm pr 0.50 0.58 1.0 0.51 ic npm mfa O 0.70 0.5 19 0.49 0.46 w 0.0 0.54 0.60 .47 Saturday, 2 June, 12
  • 65. Model Interpretation 0.5 1 avg_cc 2 ca 3 cam 4 cbm 0.80 1.1 0.52 1.6 −0.5 0.70 0.9 0.48 1.2 −1.5 0.60 0.7 0.44 0.50 0.5 −2.5 0.8 0 5 10 15 20 0 50 100 150 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 0.0 5 ce 6 dam 7 dit 8 ic 0.62 0.6 0.8 0.65 0.58 0.5 0.45 0.6 0.60 0.4 0.54 0.55 0.4 0.3 0.35 0.50 0.50 0.2 0 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 1 (a)lcom of a global 10 lcom3 learned on the Xalan 2.6 dataset 9 Part Model 11 loc 12 max_cc (b) P 1.8 0.7 6 2.6 d 2.0 4 0.6 5 1.4 4 3 0.5 1.5 Figure 6: Global models report general trends, while global models with local c Traditional Global Model: General Trends 1.0 3 2 0.4 1.0 2 describes One Curve per metric, run corp on all other prediction variab the response (in this case bugs) while keeping that curve 1 0.3 0.6 0.5 1 0 1000 3000 5000 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 0 20 40 60 80 120 0 Fold 9, Cluster 1 13 mfa 14 moa 15 noc 16 npm pr 0.50 0.58 1.0 0.51 ic npm mfa O 0.70 0.5 19 0.49 0.46 w 0.0 0.54 0.60 .47 Saturday, 2 June, 12
  • 66. 1 4 0.3 0.4 0. 0.5 1.0 1. 3 0.3 0.4 0.5 Figure 6: Global models report general trends, while global models with local considerations give insig 0.5 1.0 1.5 Model Interpretation Figure 6: Global models report general trends, while global models with local considerations give insight 1.0 3 1.0 2 1.0 3 1.0 2 2 2 1 0.6 describes the response (in this case bugs) while keeping all other prediction variables atat their median val describes the response (in this case bugs) while keeping all other prediction variables their median value 0.8 1 1 0.6 0.8 1 0 1000 3000 5000 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 0 20 40 60 80 120 0 1000 2000 3000 4000 0.0 0.2 0.4 0 1000 3000 5000 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 0 20 40 60 80 120 0 1000 2000 3000 4000 0.0 0.2 0 Fold 9, Cluster 1 15 noc Fold 9, Cluster 1 prediction models lead prediction models lea 13 mfa 14 moa 16 npm 13 npm 0.50 13 npm 0.58 13 mfa 14 moa 15 noc 16 npm 0.0 0.5 1.0 0.51 0.50 0.58 ic npm mfa Our findings thus co 0.0 0.5 1.0 0.51 0.70 ic npm mfa Our findings thus c 0.70 0.49 0.46 who observed a asimil 0.49 0.54 0.46 who observed sim 0.60 0.54 0.47 0.60 Clustermachine-lear WHICH 1 0.47 0.42 WHICH machine-lea −1.0 0.42 0.50 0.50 0.45 −1.0 0.50 0.50 0.45 have practical implic 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 0 5 10 15 20 25 30 0 20 40 60 80 100 120 0 20 40 60 80 100 120 have practical impli 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 0 5 10 15 20 25 30 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 2 4 6 8 10 using regression mod 0 2 4 6 8 10 using regression mo are more insightful th Fold 9, Cluster 6 ... are more insightful t general trends across Fold 9, Cluster 6 general trends acros ic npm mfa demonstrated that such ic npm mfa demonstrated that su particular parts of the 0 01 12 2 3 3 particular parts of th in the Xalan 2.6 def in the Xalan 2.6 de Cluster 6 are infl sets of classes 0 1 2 3 4 0 10 20 30 40 60 sets of classes are in as inheritance, cohes 0 1 2 3 4 0 10 20 30 40 60 as inheritance, coh reinforce the recomm Figure 7: Example of contradicting trends in local models (Xalan 2.6, Figure 17: Example ofin Fold 9). trends in local models (Xalan 2.6, contradicting the use of the recom reinforce a “one-size Cluster and Cluster 6 model, whenatrying to the use of “one-si Cluster 1 and Cluster 6 in Fold 9). model, when trying t model already partition the data into regions with individual model already partition the data into regions increase of ic properties. For example, we observe that an with individual B. Act Globally properties. For example, we observethrough parent classes) B. Act Globally (measuring the inheritance coupling that an increase of ic When the goal is carry (measuring the only have a negative effect on bug-proneness is predicted to When the goal is car inheritance coupling through parent classes) understanding, local m 20 Saturday, predicted to only have a negative effect on bug-proneness is 2 June, 12 understanding, local