SlideShare a Scribd company logo
1 of 11
Download to read offline
e
                                                     Naïv
                                                  to
                                    n
                                ctio
                              du
                         intro theory drew’s
                      th
                 -dep nd the see An ers.
               in
            ore fiers a lease ta Min

                                                      Naïve Bayes
         m
  or a Classi hem, p for Da
 F                t
       es                  ty
  Bay unding obabili
                  r
         o
   surr re on P

                                                       Classifiers
          u
    l ec t




                                                  Andrew W. Moore
Note to other teachers and users of
these slides. Andrew would be delighted

                                                       Professor
if you found this source material useful in
giving your own lectures. Feel free to use
these slides verbatim, or to modify them

                                              School of Computer Science
to fit your own needs. PowerPoint
originals are available. If you make use
of a significant portion of these slides in

                                              Carnegie Mellon University
your own lecture, please include this
message, or the following link to the
source repository of Andrew’s tutorials:
                                                    www.cs.cmu.edu/~awm
http://www.cs.cmu.edu/~awm/tutorials .
Comments and corrections gratefully
                                                      awm@cs.cmu.edu
received.

                                                                                                e ad y
                                                                                       have alr
                                                        412-268-7599
                                                                                   you
                                                                            assume
                                                                       otes
                                                                These n          works
                                                                         sian Net
                                                                        e
                                                                met Bay
               Copyright © 2004, Andrew W. Moore
A simple Bayes Net
                                    J



     C                       Z            R


                      J       Person is a Junior
                      C       Brought Coat to Classroom
                      Z       Live in zipcode 15213
                      R       Saw “Return of the King” more than once
Copyright © 2004, Andrew W. Moore                                       Slide 2
A simple Bayes Net
                                    J              What parameters are
                                                   stored in the CPTs of
                                                   this Bayes Net?



     C                       Z            R


                      J       Person is a Junior
                      C       Brought Coat to Classroom
                      Z       Live in zipcode 15213
                      R       Saw “Return of the King” more than once
Copyright © 2004, Andrew W. Moore                                          Slide 3
A simple Bayes Net
                                        P(J) =
                                    J                    J    Person is a Junior
                                                         C Brought Coat to Classroom
                                                         Z Live in zipcode 15213
                                                         R Saw “Return of the King” more
                                                           than once


     C                       Z                 R
                           P(Z|J) =                P(R|J) =
P(C|J) =
                           P(Z|~J)=                P(R|~J)=
P(C|~J)=
                                                                                     le
                                                                             20 peop
                                                                      from
                                                              tabase                se
                                                                            ld we u
                                                          a da          co u
                                                     ave
                                             s e we h          re. How this CPT?
                                        Suppo              ctu
                                                   ded a le e values in
                                                 en
                                         who att stimate th
                                                   e
                                           that to
Copyright © 2004, Andrew W. Moore                                                      Slide 4
A simple Bayes Net
                                        P(J) =
                                    J                   J   Person is a Junior

                                                  # people who Coat to Classroom
                                                       C Brought
                                                                   walked to school
                                                       Z Live in zipcode 15213
                                                       R people in database more
                                                        # Saw “Return of the King”
                                                            than once


     C                   F            R
                   # people who walked to school and brought a coat
                            # people who walked to school
                           P(Z|J) =              P(R|J) =
P(C|J) =
                           P(Z|~J)=              P(R|~J)=
P(C|~J)=
                                                                                    le
                                                                            20 peop
                                                                     from
                                                              tabase               se
                                                                           ld we u
                                                          a da         co u
                             # coat - bringers who ave ectture. How schoolCPT?
                                            se we h didn' walk to
                                       Suppo                              this
                                                         l
                                                edidn' a walk to school
                                                                  lues in
                                                 nded
                                 # people o att estimtate the va
                                        wh who
                                          that to
Copyright © 2004, Andrew W. Moore                                                 Slide 5
A Naïve Bayes Classifier
                                        P(J) =
                                            Output Attribute
                                    J                  J    Walked to School
                                                       C Brought Coat to Classroom
                                                       Z Live in zipcode 15213
                                                       R Saw “Return of the King” more
                                                         than once


     C                       Z               R
                           P(Z|J) =              P(R|J) =
                                Input Attributes
P(C|J) =
                   P(Z|~J)=           P(R|~J)=
P(C|~J)=
    A new person shows up at class wearing an “I live right above the
    Manor Theater where I saw all the Lord of The Rings Movies every
    night” overcoat.
    What is the probability that they are a Junior?
Copyright © 2004, Andrew W. Moore                                                    Slide 6
Naïve Bayes Classifier Inference
                              P ( J | C ^ ¬Z ^ R ) =
                     J
                                       P ( J ^ C ^ ¬Z ^ R )
                                     =
                                          P(C ^ ¬Z ^ R)
   C             Z            R

                                              P ( J ^ C ^ ¬Z ^ R )
                               =
                                 P ( J ^ C ^ ¬Z ^ R ) + P ( ¬J ^ C ^ ¬Z ^ R )

                           P(C | J ) P(¬Z | J ) P ( R | J ) P ( J )
             =
                           P(C | J ) P(¬Z | J ) P ( R | J ) P ( J )
                  ⎛                                             ⎞
                  ⎜                                             ⎟
                                           +
                  ⎜                                             ⎟
                  ⎜ P (C | ¬J ) P(¬Z | ¬J ) P ( R | ¬J ) P(¬J ) ⎟
                  ⎝                                             ⎠
Copyright © 2004, Andrew W. Moore                                       Slide 7
The General Case
                                              Y




                                                  ...
                                    X1   X2             Xm


 1. Estimate P(Y=v) as fraction of records with Y=v
 2. Estimate P(Xi=u | Y=v) as fraction of “Y=v” records that also
    have X=u.
 3. To predict the Y value given observations of all the Xi values,
    compute
             Y predict = argmax P(Y = v | X 1 = u1 L X m = um )
                                     v
Copyright © 2004, Andrew W. Moore                                     Slide 8
Naïve Bayes Classifier
              Y predict = argmax P(Y = v | X 1 = u1 L X m = um )
                                    v
                                       P(Y = v ^ X 1 = u1 L X m = um )
                              = argmax
                predict
            Y
                                          P( X 1 = u1 L X m = um )
                                   v

                            P( X 1 = u1 L X m = um | Y = v) P(Y = v)
                   = argmax
         predict
     Y
                                     P( X 1 = u1 L X m = um )
                        v



    Y predict = argmax P( X 1 = u1 L X m = um | Y = v) P(Y = v)
                               v
                                                        Because of the structure of
                                                              the Bayes Net
                                           nY
 Y predict = argmax P(Y = v)∏ P( X j = u j | Y = v)
                                           j =1
                          v

Copyright © 2004, Andrew W. Moore                                             Slide 9
More Facts About Naïve Bayes
              Classifiers
• Naïve Bayes Classifiers can be built with real-valued
  inputs*
• Rather Technical Complaint: Bayes Classifiers don’t try to
  be maximally discriminative---they merely try to honestly
  model what’s going on*
• Zero probabilities are painful for Joint and Naïve. A hack
  (justifiable with the magic words “Dirichlet Prior”) can
  help*.
• Naïve Bayes is wonderfully cheap. And survives 10,000
  attributes cheerfully!
                                           *See future Andrew Lectures




Copyright © 2004, Andrew W. Moore                               Slide 10
What you should know
• How to build a Bayes Classifier
• How to predict with a BC




Copyright © 2004, Andrew W. Moore       Slide 11

More Related Content

Viewers also liked (18)

Training pendidikan 2012
Training pendidikan 2012Training pendidikan 2012
Training pendidikan 2012
 
FY2009 Sex Ed Abstinence Fact Sheet
FY2009 Sex Ed Abstinence Fact SheetFY2009 Sex Ed Abstinence Fact Sheet
FY2009 Sex Ed Abstinence Fact Sheet
 
Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)
 
Gaussians
GaussiansGaussians
Gaussians
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Models
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Learning Bayesian Networks
Learning Bayesian NetworksLearning Bayesian Networks
Learning Bayesian Networks
 
Eight Regression Algorithms
Eight Regression AlgorithmsEight Regression Algorithms
Eight Regression Algorithms
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networks
 
Role Of States
Role Of StatesRole Of States
Role Of States
 
Probability for Data Miners
Probability for Data MinersProbability for Data Miners
Probability for Data Miners
 
Bayesian Networks
Bayesian NetworksBayesian Networks
Bayesian Networks
 
PAC Learning
PAC LearningPAC Learning
PAC Learning
 
VC dimensio
VC dimensioVC dimensio
VC dimensio
 
Cross-Validation
Cross-ValidationCross-Validation
Cross-Validation
 
Gaussian Mixture Models
Gaussian Mixture ModelsGaussian Mixture Models
Gaussian Mixture Models
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood Estimation
 
K-means and Hierarchical Clustering
K-means and Hierarchical ClusteringK-means and Hierarchical Clustering
K-means and Hierarchical Clustering
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

A Short Intro to Naive Bayesian Classifiers

  • 1. e Naïv to n ctio du intro theory drew’s th -dep nd the see An ers. in ore fiers a lease ta Min Naïve Bayes m or a Classi hem, p for Da F t es ty Bay unding obabili r o surr re on P Classifiers u l ec t Andrew W. Moore Note to other teachers and users of these slides. Andrew would be delighted Professor if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them School of Computer Science to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in Carnegie Mellon University your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: www.cs.cmu.edu/~awm http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully awm@cs.cmu.edu received. e ad y have alr 412-268-7599 you assume otes These n works sian Net e met Bay Copyright © 2004, Andrew W. Moore
  • 2. A simple Bayes Net J C Z R J Person is a Junior C Brought Coat to Classroom Z Live in zipcode 15213 R Saw “Return of the King” more than once Copyright © 2004, Andrew W. Moore Slide 2
  • 3. A simple Bayes Net J What parameters are stored in the CPTs of this Bayes Net? C Z R J Person is a Junior C Brought Coat to Classroom Z Live in zipcode 15213 R Saw “Return of the King” more than once Copyright © 2004, Andrew W. Moore Slide 3
  • 4. A simple Bayes Net P(J) = J J Person is a Junior C Brought Coat to Classroom Z Live in zipcode 15213 R Saw “Return of the King” more than once C Z R P(Z|J) = P(R|J) = P(C|J) = P(Z|~J)= P(R|~J)= P(C|~J)= le 20 peop from tabase se ld we u a da co u ave s e we h re. How this CPT? Suppo ctu ded a le e values in en who att stimate th e that to Copyright © 2004, Andrew W. Moore Slide 4
  • 5. A simple Bayes Net P(J) = J J Person is a Junior # people who Coat to Classroom C Brought walked to school Z Live in zipcode 15213 R people in database more # Saw “Return of the King” than once C F R # people who walked to school and brought a coat # people who walked to school P(Z|J) = P(R|J) = P(C|J) = P(Z|~J)= P(R|~J)= P(C|~J)= le 20 peop from tabase se ld we u a da co u # coat - bringers who ave ectture. How schoolCPT? se we h didn' walk to Suppo this l edidn' a walk to school lues in nded # people o att estimtate the va wh who that to Copyright © 2004, Andrew W. Moore Slide 5
  • 6. A Naïve Bayes Classifier P(J) = Output Attribute J J Walked to School C Brought Coat to Classroom Z Live in zipcode 15213 R Saw “Return of the King” more than once C Z R P(Z|J) = P(R|J) = Input Attributes P(C|J) = P(Z|~J)= P(R|~J)= P(C|~J)= A new person shows up at class wearing an “I live right above the Manor Theater where I saw all the Lord of The Rings Movies every night” overcoat. What is the probability that they are a Junior? Copyright © 2004, Andrew W. Moore Slide 6
  • 7. Naïve Bayes Classifier Inference P ( J | C ^ ¬Z ^ R ) = J P ( J ^ C ^ ¬Z ^ R ) = P(C ^ ¬Z ^ R) C Z R P ( J ^ C ^ ¬Z ^ R ) = P ( J ^ C ^ ¬Z ^ R ) + P ( ¬J ^ C ^ ¬Z ^ R ) P(C | J ) P(¬Z | J ) P ( R | J ) P ( J ) = P(C | J ) P(¬Z | J ) P ( R | J ) P ( J ) ⎛ ⎞ ⎜ ⎟ + ⎜ ⎟ ⎜ P (C | ¬J ) P(¬Z | ¬J ) P ( R | ¬J ) P(¬J ) ⎟ ⎝ ⎠ Copyright © 2004, Andrew W. Moore Slide 7
  • 8. The General Case Y ... X1 X2 Xm 1. Estimate P(Y=v) as fraction of records with Y=v 2. Estimate P(Xi=u | Y=v) as fraction of “Y=v” records that also have X=u. 3. To predict the Y value given observations of all the Xi values, compute Y predict = argmax P(Y = v | X 1 = u1 L X m = um ) v Copyright © 2004, Andrew W. Moore Slide 8
  • 9. Naïve Bayes Classifier Y predict = argmax P(Y = v | X 1 = u1 L X m = um ) v P(Y = v ^ X 1 = u1 L X m = um ) = argmax predict Y P( X 1 = u1 L X m = um ) v P( X 1 = u1 L X m = um | Y = v) P(Y = v) = argmax predict Y P( X 1 = u1 L X m = um ) v Y predict = argmax P( X 1 = u1 L X m = um | Y = v) P(Y = v) v Because of the structure of the Bayes Net nY Y predict = argmax P(Y = v)∏ P( X j = u j | Y = v) j =1 v Copyright © 2004, Andrew W. Moore Slide 9
  • 10. More Facts About Naïve Bayes Classifiers • Naïve Bayes Classifiers can be built with real-valued inputs* • Rather Technical Complaint: Bayes Classifiers don’t try to be maximally discriminative---they merely try to honestly model what’s going on* • Zero probabilities are painful for Joint and Naïve. A hack (justifiable with the magic words “Dirichlet Prior”) can help*. • Naïve Bayes is wonderfully cheap. And survives 10,000 attributes cheerfully! *See future Andrew Lectures Copyright © 2004, Andrew W. Moore Slide 10
  • 11. What you should know • How to build a Bayes Classifier • How to predict with a BC Copyright © 2004, Andrew W. Moore Slide 11