[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation using Differential Relaxation
1. Style: Jazz
Optimal Feature Selection for Context-aware
Recommendation using Differential Relaxation
Yong Zheng
Robin Burke
Bamshad Mobasher
Proceedings of the 4th International Workshop on Context-Aware
Recommender Systems, RecSys 2012, Dublin, Ireland; 09/09/2012
2. CONTEXT-AWARE RECOMMENDER SYSTEM (CARS)
R: Users × Items × Contexts Ratings
Assumptions:
1. Contexts – Characterize the situation/condition users like the items;
2. Even the same user, may have different preferences for the same item
BUT under different contexts;
1
3. RESEARCH IN CARS
Detecting the useful and relevant features
-- Q1.which should be used? contexts only or other features?
Which contextual variables are influential ones?
-- Q2.which should be used? feature selection!
Incorporating contextual information into recommendation process
-- Q3.how to use contexts?
Our proposed approach: differential context relaxation (DCR)
First proposed in EC-WEB 2012:
“Differential Context Relaxation for Context-aware Travel Recommendation”
2
4. DCR —— “RELAXATION”
Introducing contexts into recommendation? Sparsity Problem!!
User-based collaborative filtering: Predict (user, item, contexts)
Neighbor selection
select neighbors who rated the item under the same “contexts”; Use the
exactly full contexts? —— may be very few or even no matches
Take seeing a movie for example:
Contexts = [Cinema, Weekend, Girlfriend]
At Cinema Black areas: matched users.
Weekend
Solution: a set of relaxed dimensions
Such as [Cinema, Girlfriend]
Optimal feature selection:
With Girlfriend balance between accuracy & coverage
3
5. DCR —— “DIFFERENTIAL”
User-based collaborative filtering: Predict (user, item, contexts)
Differential aspect: Decompose algorithms into functional components
and apply appropriate different aspect of contexts to each component!
Goal: to maximize the functional contribution of each component in
the prediction function
Neighbor Selection Neighbor contribution
User baseline 4
6. DCR MODEL – A GENERAL MODEL
Apply it to user-based collaborative filtering: Predict (user, item, contexts)
Choose appropriate relaxations for each algorithm component (feature
selection) as contextual constraints, and then perform regular
recommendation.
C = Full contextual situations
C1, C2, C3 = relaxed context dimensions
Ci can be modeled as a binary selection vector.
<1, 0, 1> denotes we select the 1st and 3rd contextual dimension for Ci 5
7. DCR MODEL
Q2. Which contextual variables should be used?
– Optimal feature selection in shape of context relaxations
Q3. How to use contexts?
– Apply optimal constraints to each component, differentially
Remaining Question:
Q1.Which variables are relevant/useful/should be used?
6
8. Q1.WHICH VARIABLES ARE RELEVANT?
: influential features linked to contexts
Which kinds of users Contexts Which kinds of items
Alone
Action Movie
Jim
Alone
Comedy Movie
Romantic Movie
Nadia
7
User’s preferences on “Genre” are linked to the context “Companion”
9. DCR MODEL — OPTIMIZATION
How to find optimal feature selection for each algorithm component?
Recall that the selection is modeled by binary vectors.
Search Space Reduction [Contexts + Context-linked Features]
Neighbor Selection Neighbor contribution
(No item features) (No user profiles)
User baseline
(No user profiles) 8
10. DCR MODEL — OPTIMIZATION
Two approaches to find the optimal context relaxations:
1. Exhaustive Search
Try all combinations of binary vectors
Assume there are two dimensions, then it could be 4 possibilities for each
component: <0, 0>; <0, 1>; <1, 0>; <1, 1>
Not efficient, because it increases computational costs significantly!
More practical and efficient optimization requires for:
1).Larger dataset;
2).Several more contextual dimensions;
Other optimization techniques, such as Hill climbing and Gradient
descent may not work well.
9
11. DCR MODEL — OPTIMIZATION
2. Binary Particle Swarm Optimization (Binary PSO)
PSO is derived from swarm intelligence.
Binary PSO is a discrete version of PSO. Let’ see how PSO works.
Fish Birds Bees 10
12. DCR MODEL — OPTIMIZATION
2. Binary Particle Swarm Optimization (Binary PSO)
Example: Birds are looking for the pizza
Swarm = a group of birds
Particle = each bird
Goal = the location of pizza
So, how to find goal by swam?
1.Each bird is looking for the pizza
A machine can tell the distance to pizza
2.Each iteration is an attempt or move
3.Cognitive learning from particle itself
Am I closer to the pizza comparing with
my “best ”locations in previous history?
4.Social Learning from the swarm
Hey, my distance is 1 mile.
11
It is the closest ever! Follow me!!
The moving direction is a hybrid function of cognitive and social learning!
13. DCR MODEL — OPTIMIZATION
2. Binary Particle Swarm Optimization (Binary PSO)
Birds Example DCR Model
Swarm a group of birds a group of objects or agents
Particle each bird each object or agent
Goal location of pizza minimal prediction error (RMSE)
Location bird's position vector the binary selection vector
Learning adjust each bit of position vector adjust each bit of the binary vector
Binary PSO is a discrete version, where the bit value in position vector
is binary value instead of real number – switching between 0 and 1.
Disadvantages: 1). Converge slowly; 2). Local optimum
There are several improvements on PSO, but few on Binary PSO.
We use an improved Binary PSO introduced by Mojtaba et al,
It is demonstrated to be able to converge quickly. 12
More details about it, please refer to our paper.
14. EXPERIMENTS
Dataset: AIST Context-aware Food Preference Data (thanks to Hideki Asoh!)
Contextual dimensions:
1).Contexts: real hunger, virtual hunger (hungry/normal/full)
2).Possible Context-linked features
User Profile: gender
Item feature:
food genre (Chinese/Japan/Western)
food stuff (vegetable, pork, beef, fish, etc)
food style = the style of food preparation
This is a dataset with dense context information:
212 users, 6,360 ratings;
Each user rated 5 out of 20 items;
Once two users rated one same item, they rated it in 6 same situations!
We run exhaustive search – to get performance baseline;
Then we run improved BPSO – to see whether it can help find optimum! 13
17. EXPERIMENTAL RESULTS BY EXHAUSTIVE SEARCH
1.Best relaxation
2.Effects of contexts
3.Effects of context-linked features
16
18. EXPERIMENTAL RESULTS BY BINARY PSO
Exhaustive search requires 8,192 iterations;
1-BPSO found optimum at 18th iteration; 5-BPSO founds it at 12th iteration.
1.More particles, more efficient (less iterations); but it requires a balance.
2.Data set is larger, may be more complicated – more particles are required.
17
19. LIMITATION AND FUTURE RESEARCH
Limitation of DCR model: sparse contexts!!
1. The 4th component – introduce contexts to user-user similarity?
2. Optimal model selection – multi-objective function (RMSE, coverage, etc)
3. Optimal feature weighting other than feature selection
4. Contextual dimensions do NOT match – may also share similarities
5. Integrate DCR model with latent factor models, such as MF, etc
6. Expand DCR to more recommendation algorithms 18
Solutions may help alleviate sparsity problem: #3, #4, #5
20. Style: Jazz
Thanks!
Proceedings of the 4th International Workshop on
Context-Aware Recommender Systems, RecSys 2012,
Dublin, Ireland; 09/09/2012