22. Rocchio Text Categorization Algorithm (Training) Assume the set of categories is { c 1 , c 2 ,… c n } For i from 1 to n let p i = <0, 0,…,0> ( init. prototype vectors ) For each training example < x , c ( x )> D Let d be the frequency normalized TF/IDF term vector for doc x Let i = j : ( c j = c ( x )) ( sum all the document vectors in c i to get p i ) Let p i = p i + d
23. Rocchio Text Categorization Algorithm (Test) Given test document x Let d be the TF/IDF weighted term vector for x Let m = –2 ( init. maximum cosSim ) For i from 1 to n : ( compute similarity to prototype vector ) Let s = cosSim( d , p i ) if s > m let m = s let r = c i ( update most similar class prototype ) Return class r
30. K Nearest Neighbor for Text Training: For each each training example < x , c ( x )> D Compute the corresponding TF-IDF vector, d x , for document x Test instance y : Compute TF-IDF vector d for document y For each < x , c ( x )> D Let s x = cosSim( d , d x ) Sort examples, x , in D by decreasing value of s x Let N be the first k examples in D. ( get most similar neighbors ) Return the majority class of examples in N
48. Text Naïve Bayes Algorithm (Train) Let V be the vocabulary of all words in the documents in D For each category c i C Let D i be the subset of documents in D in category c i P( c i ) = | D i | / | D | Let T i be the concatenation of all the documents in D i Let n i be the total number of word occurrences in T i For each word w j V Let n ij be the number of occurrences of w j in T i Let P( w i | c i ) = ( n ij + 1) / ( n i + | V |)
49. Text Naïve Bayes Algorithm (Test) Given a test document X Let n be the number of word occurrences in X Return the category: where a j is the word occurring the j th position in X
65. HAC Algorithm Start with all instances in their own cluster. Until there is only one cluster: Among the current clusters, determine the two clusters, c i and c j , that are most similar. Replace c i and c j with a single cluster c i c j
78. K-Means Algorithm Let d be the distance measure between instances. Select k random instances { s 1 , s 2 ,… s k } as seeds. Until clustering converges or other stopping criterion: For each instance x i : Assign x i to the cluster c j such that d ( x i , s j ) is minimal. ( Update the seeds to the centroid of each cluster ) For each cluster c j s j = ( c j )
79. K Means Example (K=2) Reassign clusters Converged! Pick seeds Reassign clusters Compute centroids x x Reasssign clusters x x x x Compute centroids
80.
81.
82.
83.
84.
85.
86.
87.
88. Naïve Bayes EM Randomly assign examples probabilistic category labels. Use standard naïve-Bayes training to learn a probabilistic model with parameters from the labeled data. Until convergence or until maximum number of iterations reached: E-Step : Use the naïve Bayes model to compute P( c i | E ) for each category and example, and re-label each example using these probability values as soft category labels. M-Step : Use standard naïve-Bayes training to re-estimate the parameters using these new probabilistic category labels.