SlideShare a Scribd company logo
1 of 22
Download to read offline
A Semi-naive Bayes
Classifier with Grouping
of Cases
J. Abellán, A. Cano, A. R. Masegosa, S. Moral
Department of Computer Science and A.I.
University of Granada
Spain
2
Outline
1. Introduction.
2. Semi-Naive Bayes Classifier with
Grouping of Cases.
 General Description
 The Joining Criterions
 The Grouping Criterions
3. Experimental Evaluation.
4. Conclusions and Future Work.
3
Introduction
Information from a data base
Attribute variables Class variable
Data Base
Calcium Tumor Coma Migraine Cancer
normal a1 absent absent absent
high a1 present absent present
normal a1 absent absent absent
normal a1 absent absent absent
high ao present present absent
...... ...... ...... ...... ......
4
Introduction
Naive Bayes (Duda & Hart, 1973)
 Attribute variables {Xi | i=1,..,r}
 Class variable C={c1,..,ck}.
 New observation z=(z1,..,zr) 
(X1=z1,..,Xr=zr).
 Select state of C:
arg maxci
(P(ci|Z)).
 Supposition of independecy
known the class variable:
arg maxci
(P(ci) ∏r
j=1
P(zj|ci))
…
C
X1 X2 Xr
Graphical Structure
5
Introduction
Naive Bayes Classifiers
 Naive Bayesian Classifiers:
NB’s performance is comparable with some
state-of-the-art classifiers even when its
independency assumption does not hold in
normal cases.
 Question:
“Can the performance be better when the
conditional independency assumption of NB is
relaxed?”
6
 Semi-Naive Bayesian Classifiers(SNB)
 A looser assumption than NB.
 Independency occurs among the joined
variables given the class variable C.
Introduction
Semi-Naive Bayes Classifiers
7
Introduction
Semi-Naive Bayes Classifiers
 Main problems of Semi-NB approach:
 When to join two variables? Joining Criterion
 Kononenko’s criterion is entropy based.
 Pazzani’s criterion is accuracy based.
 Wrapper estimation.
 Very high complexity with high number of variables.
Class entropy reduction
8
A SNB with Grouping of Cases
Joining Method
 Three new proposals for Joining Criterions.
 BDe: Bayesian Dirichlet Equivalent.
 L10: The Expected Log-likelihood under
leaving-one-out.
 LRT: Log-likelihood Ratio Test.
9
A SNB with Grouping of Cases
Grouping Method
 Increment in Parameter Estimations
 Solution: “Grouping cases of the new variable”.
Independent
P (Xi | C)P(Xj | C)
Nº Parameters:
#(C) (#(Xi) + #(Xj))
Dependent
P (Xi, Xj | C)
Nº Parameters:
#(C) #(Xi) #(Xj)
Similar Information
10
A SNB with Grouping of Cases
Example
…
C
X1 X2 Xr
Joining Phase
…
C
X5 x X9 X1 Xr
Each pair of Variables
is evaluated using a JC
Grouping Phase
Similar Information
Each pair of Cases
is evaluated using a GC
…
C
X5 x X9 X1 Xr
11
Joining Criterions
BDe criterion
 Bayesian Dirichlet equivalent Metric (BDe)
“Bayesian scores measure the quality of a
model, M, as the posterior probability of
the model given the learning data D”
JC(BDe) = Score (M1:D) – Score(M2:D)
C
X Y
C
X x Y
M1 M2
12
Joining Criterions
L1O criterion
 Expected Log-Likelihood Under Leave-
One-Out (L1O).
Leave-one-out EstimationLaplace Estimation
“The estimation of the log-likelihood of the class
is carried out with a leave-one-out scheme
computed with a closed equation”
13
Joining Criterions
LRT criterion
 Log-likelihood Ratio Test (LRT):
Corrector Factor:
“Comparison of two nested models: M1 with
merged variables and M2 variables are independent”
Number of total
comparisons over
n active variables
14
Grouping Method
Hypotheses
 Hypotheses: Model Selection Problem
 Sample data D is restricted to X=xi or X=xj.
 Consider xi and xj the only possible cases of X.
 Grouping xi and xj implies X has only one case.
Similar Information
15
Grouping Method
Criterions
 BDe score:
 L10 score:
 LRT score:
16
Experimental Evaluation
Details
 SNG was implemented in Elvira.
 Integrated in Weka for evaluation.
 Tested in 13 data bases without missing
values from UCI repository.
 10 fold-cross validation repeated 10 times.
 Comparison with a corrected paired t-test
to 5%.
17
 The trade-off between Accuracy and log-
likelihood is better for LRT.
 L10 works badly as joining criterion.
Evaluating Joining Criterions
Naive Bayes Comparison
18
Evaluating Joining Criterions
Pazzani’s semi-NB comparison
LRT works slightly better than BDe.
Similar performance with a lower time
complexity.
LRT is the best joining criterion
19
Evaluating Grouping Criterions
Naive Bayes Comparison
 LRT Joining + Grouping Method
Not strong differences among criterions.
L10 slightly better.
L1O is the best grouping criterion
20
Pazzani’s Semi-NB Comparison
SNB-G = LRT Joining + L10 Grouping
 Similar performance:
 Dramatic building time reduction:
21
State-of-the-art Classifiers
AODE, TAN and LBR comparison
 Three wins against
NB.
 1 W vs 1 D against
AODE.
 None difference
against TAN and
LBR.
 One Win against
Pazzani’s Semi-NB.
22
Conclusions and Future Work
 A preprocessing step for Naive Bayes:
 Method for joining variables.
 Combined method for grouping cases.
 Very efficient with similar performance
respect to Pazzani’s Semi-NB classifier.
 Application to high-dimensionality data sets.
 Generalization of the methodology to
another models: decision trees and TAN
model.

More Related Content

Viewers also liked

2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
Dongseo University
 

Viewers also liked (12)

Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)
Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)
Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)
 
Naive Bayes with Conditionally Dependent Data
Naive Bayes with Conditionally Dependent DataNaive Bayes with Conditionally Dependent Data
Naive Bayes with Conditionally Dependent Data
 
"Naive Bayes Classifier" @ Papers We Love Bucharest
"Naive Bayes Classifier" @ Papers We Love Bucharest"Naive Bayes Classifier" @ Papers We Love Bucharest
"Naive Bayes Classifier" @ Papers We Love Bucharest
 
02. naive bayes classifier revision
02. naive bayes classifier   revision02. naive bayes classifier   revision
02. naive bayes classifier revision
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association Rules
 
Naive Bayes
Naive Bayes Naive Bayes
Naive Bayes
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive Bayes
 
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
 

Similar to A Semi-naive Bayes Classifier with Grouping of Cases

AQM Presentation by Johnny Lin on Jan 9, 2009
AQM Presentation by Johnny Lin on Jan 9, 2009AQM Presentation by Johnny Lin on Jan 9, 2009
AQM Presentation by Johnny Lin on Jan 9, 2009
guestbeb22e
 
Johnny Aqm Presentation
Johnny Aqm PresentationJohnny Aqm Presentation
Johnny Aqm Presentation
guestbeb22e
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selection
chenhm
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
butest
 
Bayesian Co clustering
Bayesian Co clusteringBayesian Co clustering
Bayesian Co clustering
lau
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVA
Stephen Senn
 

Similar to A Semi-naive Bayes Classifier with Grouping of Cases (20)

Split Criterions for Variable Selection Using Decision Trees
Split Criterions for Variable Selection Using Decision TreesSplit Criterions for Variable Selection Using Decision Trees
Split Criterions for Variable Selection Using Decision Trees
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfish
 
AQM Presentation by Johnny Lin on Jan 9, 2009
AQM Presentation by Johnny Lin on Jan 9, 2009AQM Presentation by Johnny Lin on Jan 9, 2009
AQM Presentation by Johnny Lin on Jan 9, 2009
 
Johnny Aqm Presentation
Johnny Aqm PresentationJohnny Aqm Presentation
Johnny Aqm Presentation
 
Shockomics milano april_2016_v2
Shockomics milano april_2016_v2Shockomics milano april_2016_v2
Shockomics milano april_2016_v2
 
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
 
lecture15-supervised.ppt
lecture15-supervised.pptlecture15-supervised.ppt
lecture15-supervised.ppt
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
 
Petrini - MSc Thesis
Petrini - MSc ThesisPetrini - MSc Thesis
Petrini - MSc Thesis
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selection
 
nber_slides.pdf
nber_slides.pdfnber_slides.pdf
nber_slides.pdf
 
IDA 2015: Efficient model selection for regularized classification by exploit...
IDA 2015: Efficient model selection for regularized classification by exploit...IDA 2015: Efficient model selection for regularized classification by exploit...
IDA 2015: Efficient model selection for regularized classification by exploit...
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
Bayesian Co clustering
Bayesian Co clusteringBayesian Co clustering
Bayesian Co clustering
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis
 
Approximate Inference for Logic Programs with Annotated Disjunctions (RCRA 2009)
Approximate Inference for Logic Programs with Annotated Disjunctions (RCRA 2009)Approximate Inference for Logic Programs with Annotated Disjunctions (RCRA 2009)
Approximate Inference for Logic Programs with Annotated Disjunctions (RCRA 2009)
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVA
 
ilp-nlp-slides.pdf
ilp-nlp-slides.pdfilp-nlp-slides.pdf
ilp-nlp-slides.pdf
 
Probabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyProbabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross Entropy
 

More from NTNU

More from NTNU (16)

Varying parameter in classification based on imprecise probabilities
Varying parameter in classification based on imprecise probabilitiesVarying parameter in classification based on imprecise probabilities
Varying parameter in classification based on imprecise probabilities
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
 
Bagging Decision Trees on Data Sets with Classification Noise
Bagging Decision Trees on Data Sets with Classification NoiseBagging Decision Trees on Data Sets with Classification Noise
Bagging Decision Trees on Data Sets with Classification Noise
 
lassification with decision trees from a nonparametric predictive inference p...
lassification with decision trees from a nonparametric predictive inference p...lassification with decision trees from a nonparametric predictive inference p...
lassification with decision trees from a nonparametric predictive inference p...
 
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsLocally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet Metrics
 
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
 
An interactive approach for cleaning noisy observations in Bayesian networks ...
An interactive approach for cleaning noisy observations in Bayesian networks ...An interactive approach for cleaning noisy observations in Bayesian networks ...
An interactive approach for cleaning noisy observations in Bayesian networks ...
 
Learning classifiers from discretized expression quantitative trait loci
Learning classifiers from discretized expression quantitative trait lociLearning classifiers from discretized expression quantitative trait loci
Learning classifiers from discretized expression quantitative trait loci
 
Interactive Learning of Bayesian Networks
Interactive Learning of Bayesian NetworksInteractive Learning of Bayesian Networks
Interactive Learning of Bayesian Networks
 
A Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification treesA Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification trees
 
A Bayesian Random Split to Build Ensembles of Classification Trees
A Bayesian Random Split to Build Ensembles of Classification TreesA Bayesian Random Split to Build Ensembles of Classification Trees
A Bayesian Random Split to Build Ensembles of Classification Trees
 
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
 
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
 
Evaluating query-independent object features for relevancy prediction
Evaluating query-independent object features for relevancy predictionEvaluating query-independent object features for relevancy prediction
Evaluating query-independent object features for relevancy prediction
 
Effects of Highly Agreed Documents in Relevancy Prediction
Effects of Highly Agreed Documents in Relevancy PredictionEffects of Highly Agreed Documents in Relevancy Prediction
Effects of Highly Agreed Documents in Relevancy Prediction
 
Conference poster 6
Conference poster 6Conference poster 6
Conference poster 6
 

Recently uploaded

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 

Recently uploaded (20)

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 

A Semi-naive Bayes Classifier with Grouping of Cases

  • 1. A Semi-naive Bayes Classifier with Grouping of Cases J. Abellán, A. Cano, A. R. Masegosa, S. Moral Department of Computer Science and A.I. University of Granada Spain
  • 2. 2 Outline 1. Introduction. 2. Semi-Naive Bayes Classifier with Grouping of Cases.  General Description  The Joining Criterions  The Grouping Criterions 3. Experimental Evaluation. 4. Conclusions and Future Work.
  • 3. 3 Introduction Information from a data base Attribute variables Class variable Data Base Calcium Tumor Coma Migraine Cancer normal a1 absent absent absent high a1 present absent present normal a1 absent absent absent normal a1 absent absent absent high ao present present absent ...... ...... ...... ...... ......
  • 4. 4 Introduction Naive Bayes (Duda & Hart, 1973)  Attribute variables {Xi | i=1,..,r}  Class variable C={c1,..,ck}.  New observation z=(z1,..,zr)  (X1=z1,..,Xr=zr).  Select state of C: arg maxci (P(ci|Z)).  Supposition of independecy known the class variable: arg maxci (P(ci) ∏r j=1 P(zj|ci)) … C X1 X2 Xr Graphical Structure
  • 5. 5 Introduction Naive Bayes Classifiers  Naive Bayesian Classifiers: NB’s performance is comparable with some state-of-the-art classifiers even when its independency assumption does not hold in normal cases.  Question: “Can the performance be better when the conditional independency assumption of NB is relaxed?”
  • 6. 6  Semi-Naive Bayesian Classifiers(SNB)  A looser assumption than NB.  Independency occurs among the joined variables given the class variable C. Introduction Semi-Naive Bayes Classifiers
  • 7. 7 Introduction Semi-Naive Bayes Classifiers  Main problems of Semi-NB approach:  When to join two variables? Joining Criterion  Kononenko’s criterion is entropy based.  Pazzani’s criterion is accuracy based.  Wrapper estimation.  Very high complexity with high number of variables. Class entropy reduction
  • 8. 8 A SNB with Grouping of Cases Joining Method  Three new proposals for Joining Criterions.  BDe: Bayesian Dirichlet Equivalent.  L10: The Expected Log-likelihood under leaving-one-out.  LRT: Log-likelihood Ratio Test.
  • 9. 9 A SNB with Grouping of Cases Grouping Method  Increment in Parameter Estimations  Solution: “Grouping cases of the new variable”. Independent P (Xi | C)P(Xj | C) Nº Parameters: #(C) (#(Xi) + #(Xj)) Dependent P (Xi, Xj | C) Nº Parameters: #(C) #(Xi) #(Xj) Similar Information
  • 10. 10 A SNB with Grouping of Cases Example … C X1 X2 Xr Joining Phase … C X5 x X9 X1 Xr Each pair of Variables is evaluated using a JC Grouping Phase Similar Information Each pair of Cases is evaluated using a GC … C X5 x X9 X1 Xr
  • 11. 11 Joining Criterions BDe criterion  Bayesian Dirichlet equivalent Metric (BDe) “Bayesian scores measure the quality of a model, M, as the posterior probability of the model given the learning data D” JC(BDe) = Score (M1:D) – Score(M2:D) C X Y C X x Y M1 M2
  • 12. 12 Joining Criterions L1O criterion  Expected Log-Likelihood Under Leave- One-Out (L1O). Leave-one-out EstimationLaplace Estimation “The estimation of the log-likelihood of the class is carried out with a leave-one-out scheme computed with a closed equation”
  • 13. 13 Joining Criterions LRT criterion  Log-likelihood Ratio Test (LRT): Corrector Factor: “Comparison of two nested models: M1 with merged variables and M2 variables are independent” Number of total comparisons over n active variables
  • 14. 14 Grouping Method Hypotheses  Hypotheses: Model Selection Problem  Sample data D is restricted to X=xi or X=xj.  Consider xi and xj the only possible cases of X.  Grouping xi and xj implies X has only one case. Similar Information
  • 15. 15 Grouping Method Criterions  BDe score:  L10 score:  LRT score:
  • 16. 16 Experimental Evaluation Details  SNG was implemented in Elvira.  Integrated in Weka for evaluation.  Tested in 13 data bases without missing values from UCI repository.  10 fold-cross validation repeated 10 times.  Comparison with a corrected paired t-test to 5%.
  • 17. 17  The trade-off between Accuracy and log- likelihood is better for LRT.  L10 works badly as joining criterion. Evaluating Joining Criterions Naive Bayes Comparison
  • 18. 18 Evaluating Joining Criterions Pazzani’s semi-NB comparison LRT works slightly better than BDe. Similar performance with a lower time complexity. LRT is the best joining criterion
  • 19. 19 Evaluating Grouping Criterions Naive Bayes Comparison  LRT Joining + Grouping Method Not strong differences among criterions. L10 slightly better. L1O is the best grouping criterion
  • 20. 20 Pazzani’s Semi-NB Comparison SNB-G = LRT Joining + L10 Grouping  Similar performance:  Dramatic building time reduction:
  • 21. 21 State-of-the-art Classifiers AODE, TAN and LBR comparison  Three wins against NB.  1 W vs 1 D against AODE.  None difference against TAN and LBR.  One Win against Pazzani’s Semi-NB.
  • 22. 22 Conclusions and Future Work  A preprocessing step for Naive Bayes:  Method for joining variables.  Combined method for grouping cases.  Very efficient with similar performance respect to Pazzani’s Semi-NB classifier.  Application to high-dimensionality data sets.  Generalization of the methodology to another models: decision trees and TAN model.