A Short Intro to Naive Bayesian Classifiers
- 1. e
Naïv
to
n
ctio
du
intro theory drew’s
th
-dep nd the see An ers.
in
ore fiers a lease ta Min
Naïve Bayes
m
or a Classi hem, p for Da
F t
es ty
Bay unding obabili
r
o
surr re on P
Classifiers
u
l ec t
Andrew W. Moore
Note to other teachers and users of
these slides. Andrew would be delighted
Professor
if you found this source material useful in
giving your own lectures. Feel free to use
these slides verbatim, or to modify them
School of Computer Science
to fit your own needs. PowerPoint
originals are available. If you make use
of a significant portion of these slides in
Carnegie Mellon University
your own lecture, please include this
message, or the following link to the
source repository of Andrew’s tutorials:
www.cs.cmu.edu/~awm
http://www.cs.cmu.edu/~awm/tutorials .
Comments and corrections gratefully
awm@cs.cmu.edu
received.
e ad y
have alr
412-268-7599
you
assume
otes
These n works
sian Net
e
met Bay
Copyright © 2004, Andrew W. Moore
- 2. A simple Bayes Net
J
C Z R
J Person is a Junior
C Brought Coat to Classroom
Z Live in zipcode 15213
R Saw “Return of the King” more than once
Copyright © 2004, Andrew W. Moore Slide 2
- 3. A simple Bayes Net
J What parameters are
stored in the CPTs of
this Bayes Net?
C Z R
J Person is a Junior
C Brought Coat to Classroom
Z Live in zipcode 15213
R Saw “Return of the King” more than once
Copyright © 2004, Andrew W. Moore Slide 3
- 4. A simple Bayes Net
P(J) =
J J Person is a Junior
C Brought Coat to Classroom
Z Live in zipcode 15213
R Saw “Return of the King” more
than once
C Z R
P(Z|J) = P(R|J) =
P(C|J) =
P(Z|~J)= P(R|~J)=
P(C|~J)=
le
20 peop
from
tabase se
ld we u
a da co u
ave
s e we h re. How this CPT?
Suppo ctu
ded a le e values in
en
who att stimate th
e
that to
Copyright © 2004, Andrew W. Moore Slide 4
- 5. A simple Bayes Net
P(J) =
J J Person is a Junior
# people who Coat to Classroom
C Brought
walked to school
Z Live in zipcode 15213
R people in database more
# Saw “Return of the King”
than once
C F R
# people who walked to school and brought a coat
# people who walked to school
P(Z|J) = P(R|J) =
P(C|J) =
P(Z|~J)= P(R|~J)=
P(C|~J)=
le
20 peop
from
tabase se
ld we u
a da co u
# coat - bringers who ave ectture. How schoolCPT?
se we h didn' walk to
Suppo this
l
edidn' a walk to school
lues in
nded
# people o att estimtate the va
wh who
that to
Copyright © 2004, Andrew W. Moore Slide 5
- 6. A Naïve Bayes Classifier
P(J) =
Output Attribute
J J Walked to School
C Brought Coat to Classroom
Z Live in zipcode 15213
R Saw “Return of the King” more
than once
C Z R
P(Z|J) = P(R|J) =
Input Attributes
P(C|J) =
P(Z|~J)= P(R|~J)=
P(C|~J)=
A new person shows up at class wearing an “I live right above the
Manor Theater where I saw all the Lord of The Rings Movies every
night” overcoat.
What is the probability that they are a Junior?
Copyright © 2004, Andrew W. Moore Slide 6
- 7. Naïve Bayes Classifier Inference
P ( J | C ^ ¬Z ^ R ) =
J
P ( J ^ C ^ ¬Z ^ R )
=
P(C ^ ¬Z ^ R)
C Z R
P ( J ^ C ^ ¬Z ^ R )
=
P ( J ^ C ^ ¬Z ^ R ) + P ( ¬J ^ C ^ ¬Z ^ R )
P(C | J ) P(¬Z | J ) P ( R | J ) P ( J )
=
P(C | J ) P(¬Z | J ) P ( R | J ) P ( J )
⎛ ⎞
⎜ ⎟
+
⎜ ⎟
⎜ P (C | ¬J ) P(¬Z | ¬J ) P ( R | ¬J ) P(¬J ) ⎟
⎝ ⎠
Copyright © 2004, Andrew W. Moore Slide 7
- 8. The General Case
Y
...
X1 X2 Xm
1. Estimate P(Y=v) as fraction of records with Y=v
2. Estimate P(Xi=u | Y=v) as fraction of “Y=v” records that also
have X=u.
3. To predict the Y value given observations of all the Xi values,
compute
Y predict = argmax P(Y = v | X 1 = u1 L X m = um )
v
Copyright © 2004, Andrew W. Moore Slide 8
- 9. Naïve Bayes Classifier
Y predict = argmax P(Y = v | X 1 = u1 L X m = um )
v
P(Y = v ^ X 1 = u1 L X m = um )
= argmax
predict
Y
P( X 1 = u1 L X m = um )
v
P( X 1 = u1 L X m = um | Y = v) P(Y = v)
= argmax
predict
Y
P( X 1 = u1 L X m = um )
v
Y predict = argmax P( X 1 = u1 L X m = um | Y = v) P(Y = v)
v
Because of the structure of
the Bayes Net
nY
Y predict = argmax P(Y = v)∏ P( X j = u j | Y = v)
j =1
v
Copyright © 2004, Andrew W. Moore Slide 9
- 10. More Facts About Naïve Bayes
Classifiers
• Naïve Bayes Classifiers can be built with real-valued
inputs*
• Rather Technical Complaint: Bayes Classifiers don’t try to
be maximally discriminative---they merely try to honestly
model what’s going on*
• Zero probabilities are painful for Joint and Naïve. A hack
(justifiable with the magic words “Dirichlet Prior”) can
help*.
• Naïve Bayes is wonderfully cheap. And survives 10,000
attributes cheerfully!
*See future Andrew Lectures
Copyright © 2004, Andrew W. Moore Slide 10
- 11. What you should know
• How to build a Bayes Classifier
• How to predict with a BC
Copyright © 2004, Andrew W. Moore Slide 11