Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Naive Bayes Presentation by Md. Enamul Haque ... 8871 views
- Naive bayes by Ashraf Uddin 61882 views
- Lecture10 - Naïve Bayes by Albert Orriols-Puig 21000 views
- Natural language processing with na... by Tim Ruffles 5437 views
- Naive Bayes Classifier by Yiqun Hu 1397 views
- KNN by West Virginia Uni... 21223 views

2,222 views

Oct. 01, 2012

No Downloads

Total views

2,222

On SlideShare

0

From Embeds

0

Number of Embeds

185

Shares

0

Downloads

109

Comments

3

Likes

3

No notes for slide

- 1. Building a Naive Bayes Classifier Eric Wilson Search Engineer Manta Media
- 2. The problem: Undesirable ContentRecommended by 3 people:Bob PerkinsIt is a pleasure to work with Kim! Her work is beautiful and she isprofessional, communicative, and friendly.FredShe lied and stole my money, STAY AWAY!!!!!Jane RobinsonVery Quick Turn Around as asked - Synced up Perfectly Great Help!
- 3. Possible solutions● First approach: manually remove undesired content.● Attempt to filter based on lists of banned words.● Use a machine learning algorithm to identify undesirable content based on a small set of manually classified examples.
- 4. Using Naive Bayes isnt too hard!● Well need a bit of probability, including the concept of conditional probability.● A few natural language processing ideas will be necessary.● Facility with any modern programming language.● Persistence with many details.
- 5. Probability 101Suppose we choose a number from the set: U = {1,2,3,4,5,6,7,8,9,10}Let A be the event that the number is even,and B be the event that the number is prime.Compute P(A), P(B), P(A|B), and P(B|A),where P(A|B) is the probability of A given B.
- 6. Just count! 1 9 3 4 7 2 8 6 10 5 A BP(A) = 5/10 = 1/2P(B) = 4/10 = 2/5P(A|B) = 1/4P(B|A) = 1/5
- 7. Bayes TheoremP(A|B) = P(AB)/P(B)P(B)P(A|B) = P(AB)P(B)P(A|B) = P(A)P(B|A)P(A|B) = P(A)P(B|A)/P(B)
- 8. A simplistic language modelConsider each document to be a set of words,along with frequencies.For example: “The premium quality for thediscount price” is viewed as:{the:2, premium:1, quality:1, for:1, discount:1, price:1}Same as “The discount quality for the premiumprice,” since we dont care about order.
- 9. That seems … foolish● English is so complicated that we wont have any real hope of understanding semantics.● In many real-life scenarios, text that you want to classify is not exactly subtle.● If necessary, we can improve our language model later.
- 10. An example:Type Text ClassTraining Good happy good PositiveTraining Good good service PositiveTraining Good friendly PositiveTraining Lousy good cheat NegativeTest Good good good cheat lousy ??In order to be able to perform all calculations, wewill use an example with extremely smalldocuments.
- 11. What was the question?We are trying to determine whether the lastrecommendation was positive or negative.We want to compute:P(Pos|good good good lousy cheat)By Bayes Theorem, this is equal to:P(Pos)P(good good good lousy cheat|Pos) P(good good good lousy cheat)
- 12. What do we know?P(Pos) = 3/4P(good|Pos), P(cheat|Pos), P(lousy|Pos)Are all easily computed by counting using thetraining set.Which is almost what we want ...
- 13. Wouldnt it be nice ...Maybe we have all we need? IsntP(good good good lousy cheat|Pos) =P(good|Pos)3P(lousy|Pos)P(cheat|Pos) ?Well, yes, if these are independent events,which almost certainly doesnt hold.The “naive” assumption is that we canconsider these events independent.
- 14. The Naive Bayes AlgorithmIf C1,C2,...,Cn are classes, and an instance hasfeatures F1,F2,...,Fm, then the most likely classfor this instance is the one that maximizes thefollowing: P(Ci )P(F1|Ci )P(F2|Ci )...P(Fm|Ci )
- 15. Wasnt there a denominator?If our goal was to compute the probability ofthe most likely class, we should divide by: P(F1)P(F2)...P(Fm)We can ignore this part because, we only careabout which class has the highest probability,and this term is the same for each class.
- 16. Interesting theory but …Wont this break as soon as we encounter aword that isnt in our training set?For example, if “goood” is not in our trainingset, and occurs in our test set, then sinceP(Pos|goood) = 0, so our product is zero for allclasses.We need nonzero probabilities for all words,even words that dont exist.
- 17. Plus-one smoothingJust count every word one time more than itactually occurs.Since we are only concerned with relativeprobabilities, this inaccuracy should be of noconcern.P(word|C) = count(word|C) + 1 count(C) + V(V is the total vocabulary, so that ourprobabilities sum to 1.)
- 18. Lets try it out: P(Pos) = ¾ Type Text Class P(Neg) = ¼ Training Good happy good Positive Good good service Positive Good friendly Positive Lousy good cheat Negative Test Good good good cheat lousy ??P(good|Pos) = (5+1)/(8+6) = 3/7P(cheat|Pos) = (0+1)/(8+6) = 1/14 P(Pos|D5) ~ ¾ * (3/7)3*(1/14)*(1/14)P(lousy|Pos) = (0+1)/(8+6) = 1/14 = 0.0003P(good|Neg) = (1+1)/(3+6) = 2/9 P(Neg|D5) ~ ¼ * (2/9)3*(2/9)*(2/9)P(cheat|Neg) = (1+1)/(3+6) = 2/9 = 0.0001P(lousy|Neg) = (1+1)/(3+6) = 2/9
- 19. Training the classifier● Count instances of classes, store counts in a map.● Store counts of all words in a nested map: {pos: {good: 5, friendly: 1, service: 1, happy: 1}, neg: {cheat: 1, lousy: 1, good: 1} }● Should be easy to compute probabilities.● Should be efficient (training time and memory.)
- 20. Some practical problems● Tokenization● Arithmetic● How to evaluate results?
- 21. Tokenization● Use whitespace? – “food”, “food.”, food,” and “food!” all different.● Use whitespace and punctuation? – “wont” tokenized to “won” and “t”● What about emails? Urls? Phone numbers? What about the things we havent thought about yet?● Use a library. Lucene is a good choice.
- 22. ArithmeticWhat happens when you multiply a largeamount of small numbers?To prevent underflow, use sums of logs insteadof products of true probabilities.Key properties of log: ● log(AB) = log(A) + log(B) ● x > y => log(x) > log(y) ● Turns very small numbers into managable negative numbers
- 23. Evaluating a classifier● Precision and recall● Confusion matrix● Divide training set into nine “folds”, train classifier on nine folds, and check accuracy of classifying the tenth fold
- 24. Experiment● Tokenization strategies – Stop words – Capitalization – Stemming● Language model – Ignore multiplicities – Smoothing
- 25. Contact me● wilson.eric.n@gmail.com● ewilson@manta.com● @wilsonericn● http://wilsonericn.wordpress.com

No public clipboards found for this slide

Login to see the comments