Lecture15 - Advances topics on association rules PART II
1. Introduction to Machine
Learning
Lecture 15
Advanced Topics in Association Rules Mining
Albert Orriols i Puig
http://www.albertorriols.net
htt // lb t i l t
aorriols@salle.url.edu
Artificial Intelligence – Machine Learning
g g
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull
2. Recap of Lecture 13-14
Ideas come from the market basket analysis (
y (MBA)
)
Let’s go shopping!
Milk, eggs, sugar,
bread
Milk, eggs, cereal, Eggs, sugar
bread
bd
Customer1
Customer2 Customer3
What do my customer buy? Which product are bought together?
Aim: Find associations and correlations between t e d e e t
d assoc at o s a d co e at o s bet ee the different
items that customers place in their shopping basket
Slide 2
Artificial Intelligence Machine Learning
3. Recap of Lecture 13-14
Apriori
p
Will find all the association with minimum support and
co de ce
confidence
However:
Scans the data base multiple times
Most often, there is a high number of candidates
Support counting for candidates can be time expensive
FP-growth
Will obtain the same rules than Apriori
Avoids candidate generation by building a GP tree
Counting the support of candidates more efficiently
Slide 3
Artificial Intelligence Machine Learning
4. Today’s Agenda
Continuing our journey through some advanced
topics in ARM
Mining frequent patterns without candidate
generation
Multiple Level AR
Sequential Pattern Mining
Quantitative association rules
Mining class association rules
Beyond support & confidence
B d t fid
Applications
Slide 4
Artificial Intelligence Machine Learning
5. Acknowledgments
Part of this lecture is based on the work by
y
Slide 5
Artificial Intelligence Machine Learning
6. Why Multiple Level AR?
Aim: Find associations between items
But wait!
There are many different diapers
Dodot, Huggies …
gg
There are many different beers:
heineken, desperados, king fisher … in bottle/can …
, p , g
Which rule do you prefer?
diapers ⇒ beer
dodot diapers M ⇒ Dam beer in Can
Which will have greater support?
Slide 6
Artificial Intelligence Machine Learning
7. Concept Hierarchy
Create is-a hierarchies
Clothes Footwear
Outwear Shoes
Shirts Hiking Boots
Jackets Ski Pants
Assume we found the rule: Outwear ⇒ Hiking boots
Then
Jackets ⇒ Hiking boots may not have minimum support
Clothes ⇒ Hiking boots may not have minimum confidence
Slide 7
Artificial Intelligence Machine Learning
8. Concept Hierarchy
This means that
Rules at lower levels may not have enough support to be part of any
frequent itemset
However, rules at a lower level of the hierarchy which are overspecific
may denote a strong association
Jackets ⇒ Hiking boots
So, which rules do you want?
Users are interested in generating rules that span different levels of
the taxonomy
Rules of lower levels may not have minimum support
Taxonomy can be used to prune uninteresting or redundant rules
Multiple taxonomies may be present
For example: category, price (cheap, expensive), “items-on-sale”, etc
Multiple taxonomies may be modeled as a forest, or a DAG
Slide 8
Artificial Intelligence Machine Learning
9. Notation
z
ancestors
(marked with ^)
edge:
parent
is_a relationship p
c1 c2 child
descendants
Slide 9
Artificial Intelligence Machine Learning
10. Notation
Formalizing the problem
g p
I = {i1, i2, …, im}- items
T-transaction, set of items T ⊆ I
Tt ti t f it
D-set of transactions
T supports item x, if x is in T or x is an ancestor of some item in T
T supports X ⊆ I if it supports e e y item in X
suppo ts t suppo ts every te
Generalized association rule: X ⇒ Y
if X ⊂ I Y ⊂ I X ∩ Y = ∅ and no item in Y is an ancestor of any
∅,
I, I,
item in X.
That is, jacket ⇒ clothes is essentially true
The rule X ⇒ Y has confidence c in D if c% of transactions in D
that support X also support Y
The rule X ⇒ Y has support s in D if s% of transactions in D
supports X ∪ Y
Slide 10
Artificial Intelligence Machine Learning
11. So, Let’s Re-state the Problem
New aim: find all generalized association rules that have
g
support and confidence greater than the user-specified
minimum support (called minsup) and minimum confidence
(called minconf) respectively
Clothes Footwear
Outwear Shoes
Shirts Hiking Boots
Jackets
J kt Ski P t
Pants
Antecedent and consequent may have items of any level of the hierarchy
Do you see any potential problem?
I can find many redundant rules!
Slide 11
Artificial Intelligence Machine Learning
13. Mining the Example
Observation 1
If the set{x,y} has minimum support, so do {x^,y^} {x^,y} and
{ ,y }
{x^,y^}
E.g.:
if {Jacket Shoes} has minsup then
{Jacket,
{Outwear, Shoes}, {Jacket, Footwear}, and {Outwear,
Footwear} also have minimum support
} pp
Slide 13
Artificial Intelligence Machine Learning
14. Mining the Example
Observation 2
If the rule x ⇒ y has minimum support and confidence, then
x ⇒ y^ is guaranteed to have bot minsup a d minconf.
y s gua a teed a e both sup and co
E.g.:
The rule Outwear ⇒ Hiking Boots has minsup and minconf
minconf.
The rule Outwear ⇒ Footwear has both minsup and minconf
However, th rules x^ ⇒ y and x^ ⇒ y^ will h
H the l ^ d^ ^ ill have minsup, th
i they
may not have minconf.
E.g.:
E
Clothes ⇒ Hiking Boots
Cl th ⇒ F t
Clothes Footwear
have minsup, but not minconf
Slide 14
Artificial Intelligence Machine Learning
15. Interesting Rules
So, in which rules are we interested?
,
Up to now, we were interested in rules that
How much the support of a rule was more than the expected
support based on the support of the antecedent and the
consequent
But this does not consider taxonomy
I have poor pruning… But now, I need to prune a lot!
Shrikant and Agrawal proposed a different approach
Consider that Milk
Milk ⇒ cereal [s=0.08, c=0.70] [s = ]
And that
Skim milk ⇒ cereal [s=0.02, c=0.70] 2% Milk Skim Milk
[s = ] [s = ]
So, do you think that the second rule
is important?
May be not!
Slide 15
Artificial Intelligence Machine Learning
16. Interesting Rules
A rule is X ⇒ Y is R-interesting w.r.t
g
an ancestor X^ ⇒ Y^ if:
real s ( X ⇒ Y ) > R · expected s( X ⇒ Y ) b d on ( X ^ ⇒ Y ^ )
l td( based
or
real c ( X ⇒ Y ) > R · expected s( X ⇒ Y ) b d on ( X ^ ⇒ Y ^ )
l d( based
Aim: Interesting rules will be those whose support is more than
R times the expected value or whose confidence is more than
R times the expected value for some user specified constant R
value, user-specified
Slide 16
Artificial Intelligence Machine Learning
17. Interesting Rules
What’s the expected value?
p
A method defined to compute the expected value
Pr( z j )
Pr( z1 )
EZˆ [Pr( Z )] = × ... × × Pr( Z )
ˆ
ˆ ˆ
Pr( z1 ) Pr( z j )
Where Z^ is an ancestor of Z
Go to the papers for the details
Now,
Now we aim at:
finding all generalized R-interesting association rules (R is a
user-specified
user specified minimum interest called min interest) that have
min-interest)
support and confidence greater than minsup and minconf
respectivelyy
Slide 17
Artificial Intelligence Machine Learning
18. Algorithms to Mine General AR
Follow three steps:
p
Find all itemsets whose support is greater than minsup.
1.
These itemsets are ca ed frequent itemsets.
ese e se s a e called eque e se s
Use the frequent itemsets to generate the desired rules:
2.
if ABCD and AB are frequent then
1.
1
conf(AB ⇒ CD) = support(ABCD)/support(AB)
2.
Prune all uninteresting rules f
P ll i t ti l from thi set
this t
3.
Different algorithms for this purpose
Basic
Cumulate
EstMerge
Slide 18
Artificial Intelligence Machine Learning
19. Basic Algorithm
Follow the steps:
p
Is itemset X is frequent?
Does t
D transaction T supports X?
ti t
(X contains items from different levels of taxonomy, T contains only
leaves)
T’ = T + ancestors(T);
Answer: T supports X ↔ X ⊆ T’
T
Slide 19
Artificial Intelligence Machine Learning
20. Details of the Basic Algorithm
Count item occurrences
Generate new k-itemsets
k itemsets
candidates
Add all ancestors of each item
in t to t, removing any
duplication
Find the support of all the
candidates
Take only those with
support over minsup
Slide 20
Artificial Intelligence Machine Learning
21. Can You Optimize It?
Optimization 1: Filtering the ancestors added to
p g
transactions
We only need to add to transaction t the ancestors that are in
one of the candidates.
If the original item is not in any itemsets it can be dropped from
itemsets,
the transaction.
Clothes
Outwear Shirts
Jackets Ski Pants
Example:
Candidates: {clothes, shoes}.
Transaction t: {Jacket, …} can be replaced with {clothes …}
{Jacket } {clothes, }
Slide 21
Artificial Intelligence Machine Learning
22. Can You Optimize It?
Optimization 2: Pre-computing ancestors
p p g
Rather than finding ancestors for each item by traversing the
taxonomy g ap , we ca p e co pu e the a ces o s for eac
a o o y graph, e can pre-compute e ancestors o each
item
We ca d op a ces o s that a e not co a ed in a y o the
e can drop ancestors a are o contained any of e
candidates in the same time
Clothes
Outwear Shirts
Jackets Ski Pants
Slide 22
Artificial Intelligence Machine Learning
23. Can You Optimize It?
Optimization 3: Prune itemsets containing an item and
p g
its ancestor
If we have {Jacket} and {Outwear} we will have candidate
{Outwear},
{Jacket, Outwear} which is not interesting.
s({Jacket}) = s ({Jacket, Outwear})
({Jacket
Delete ({Jacket, Outwear}) in k=2 will ensure it will not erase in
k>2.
k>2 (because of the prune step of candidate generation
method)
Therefore,
Therefore we can prune the rules containing an item an its
ancestor only for k=2, and in the next steps all candidates will
not include item + ancestor
Slide 23
Artificial Intelligence Machine Learning
24. Summary
Importance of hierarchy in real-world applications
p y pp
How?
Build
B ild a DAG
Redefine the problem of ARM
Get association rules
Don t
Don’t take these ideas in isolation!
Applicable to all the advances we will see in the next classes
Real-world problems usually require the mixing of many ideas
Slide 24
Artificial Intelligence Machine Learning
25. Next Class
Advanced topics in association rule mining
Slide 25
Artificial Intelligence Machine Learning
26. Introduction to Machine
Learning
Lecture 15
Advanced Topics in Association Rules Mining
Albert Orriols i Puig
http://www.albertorriols.net
htt // lb t i l t
aorriols@salle.url.edu
Artificial Intelligence – Machine Learning
g g
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull