[CIKM 2014] Deviation-Based Contextual SLIM Recommenders

Deviation-Based Contextual SLIM Recommenders
Yong Zheng, Bamshad Mobasher, Robin Burke
DePaul University, Chicago, IL, USA
@CIKM 2014, Shanghai, China, Nov 4, 2014

Outline of the Talk
• Context-aware Recommender Systems (CARS)
• Collaborative Filtering and SLIM Recommenders
• CSLIM: Contextualizing SLIM Recommenders
• Experimental Evaluations
• Conclusions and Future Work

Traditional Recommender Systems (RS)
T1 T2 T3 T4 T5
U1 3 2
U2 3 3 4
U3 4 2 1
U4 2 5 5
U5 3 2 4 2
Example: User-Item 2D-Rating Matrix
Traditional Recommender: Users × Items Ratings

Context-aware RS (CARS)
Motivations behind: Recommendation cannot live
alone without considering contexts, because users’
preferences always change from contexts to contexts.
Companion

Example: User-Item Contextual Rating Matrix
In CARS: Users × Items × Contexts Ratings

Example: User-Item Contextual Rating Matrix
Terminology:
Context dimension: time, location, companion
Context condition: values in specific dimension, e.g.,
weekend and weekday are two conditions in the
context dimension “Time”

Representational CARS (R-CARS):
Assuming there are known influential contextual
variables available (e.g., location, time, mood, etc),
how to build CARS algorithms to adapt to users’
preferences in different contextual situations.

Most of research in R-CARS is focusing on
development of context-aware collaborative filtering
(CACF).
CF CACF
Contexts

Collaborative Filtering (CF)
CF is one of most popular recommendation algorithms.
1). Memory-based CF
Such as user-based CF and item-based CF
Pros: good for explanation; Cons: sparsity problems
2). Model-based CF
Such as matrix factorization, etc
Pros: good performance; Cons: cold-start, explanation
3).Hybrid CF Recommendation Algorithms
Such as content-based hybrid CF, etc
Pros: further improvement; Cons: running costs

Item-based CF (ItemKNN, Sarwar, 2001)
T1 T2 T3 T4 T5
U1 3 2
U2 3 3 ??? 4
U3 4 2 1
U4 2 5 5
U5 3 2 4 2
𝑃𝑢,𝑖 =
𝑗∈𝑁 𝑖
𝑅 𝑢,𝑗 × 𝑠𝑖𝑚(𝑖, 𝑗
𝑗∈𝑁 𝑖
𝑠𝑖𝑚(𝑖, 𝑗
Rating Prediction:
Cons: item-item similarity calculations and
neighborhood selections rely on co-ratings.
What if the # of co-ratings is limited?

SLIM (Ning, et al., 2011)
Sparse Linear Model (SLIM) is considered as another
shape of collaborative filtering approach.
Ranking Score Prediction:
Matrix R = User-Item rating matrix;
Matrix W = Item-Item coefficient matrix ≈ similarity matrix
We name this approach as SLIM-I, since W represents
item-item coefficients.
𝑆𝑖,𝑗 = 𝑅𝑖,: ⋅ 𝑊:,𝑗 =
ℎ=1,ℎ≠𝑗
𝑁
𝑅𝑖,ℎ 𝑊ℎ,𝑗

Comparison Between ItemKNN & SLIM-I
Pros of SLIM-I:
Matrix W is learned directly towards prediction/ranking
error; in other words, item-item coefficient/similarity is no
longer calculated based on co-ratings, which is more
reliable and can be optimized towards ranking directly.
SLIM-I has been demonstrated to outperform UserKNN,
ItemKNN, matrix factorization and other traditional RS
algorithms.
𝑆𝑖,𝑗 = 𝑅𝑖,: ⋅ 𝑊:,𝑗 =
ℎ=1,ℎ≠𝑗
𝑁
𝑃𝑢,𝑖 =
𝑗∈𝑁 𝑖
𝑅 𝑢,𝑗 × 𝑠𝑖𝑚(𝑖, 𝑗
𝑗∈𝑁 𝑖
𝑠𝑖𝑚(𝑖, 𝑗
Rating Prediction in ItemKNN:
Ranking Score Prediction in SLIM-I:

SLIM-I and SLIM-U
SLIM-I is another shape of ItemKNN; W = Item-item coefficient matrix;
SLIM-U is another shape of UserKNN; W = User-user coefficient matrix;

CSLIM: Contextual SLIM Recommenders
We use SLIM-I as an example to introduce how to build
CSLIM-I approaches; contexts can also be incorporated
into SLIM-U to formulate CSLIM-U models accordingly.
Ranking Prediction in SLIM-I:
CSLIM has a uniform ranking prediction:
CSLIM aggregates contextual ratings with item-item coefficients.
There are two key points:
1).The rating to be aggregated should be placed under same c;
2).Accordingly, W indicates coefficients under same contexts;
𝑆𝑖,𝑗 = 𝑅𝑖,: ⋅ 𝑊:,𝑗 =
ℎ=1,ℎ≠𝑗
𝑁
𝑆𝑖,𝑗,𝑐 =
ℎ=1,ℎ≠𝑗
𝑁
𝑅𝑖,ℎ,𝑐 𝑊ℎ,𝑗
Incorporate Contexts

The challenge is how to estimate , since contextual
ratings are usually sparse – it is not guaranteed that the
same user already rated other items in the same context c.
Ranking Prediction in CSLIM-I:
We used a deviation-based approach to estimate it.
Matrix R: user-item 2D rating matrix (non-contextual ratings)
Matrix W: item-item coefficient matrix
Matrix D: a matrix estimating rating deviations in contexts;
Here, D is a CI matrix (rows are items, cols are contexts)
This approach is named as CSLIM-I-CI
𝑆𝑖,𝑗,𝑐 =
ℎ=1,ℎ≠𝑗
𝑁
𝑅𝑖,ℎ,𝑐 𝑊ℎ,𝑗
𝑅𝑖,ℎ,𝑐

We used a deviation-based approach to estimate it.
Example: CSLIM-I-CI,
R = non-contextual Rating Matrix
D = Contextual Rating Deviation Matrix
W = Item-item Coefficient Matrix
C = a binary context vector, as below
𝑅𝑖,𝑗,𝑐 = 𝑅𝑖,𝑗 +
𝑙=1
𝐿
𝐷𝑗,𝑙 𝑐𝑙
Weekend Weekday At Home At Park
1 0 0 1
We use this estimation even if we already know a real contextual rating in
situation c, since we’d like to learn as many cells in D as possible.

There are three ways to model contextual rating deviation (CRD) in D:
1). D is a CI matrix – assuming there is CRD for each <item, context> pair
2). D is a CU matrix – assuming there is CRD for each <user, context> pair
3). D is a vector – assuming CRD is only dependent with context
Incorporate contexts into SLIM-I: CSLIM-I-CI, CSLIM-I-CU, CSLIM-I-C;
Incorporate contexts into SLIM-U: CSLIM-U-CI, CSLIM-U-CU, CSLIM-U-C;
We have built six Deviation-based CSLIM models!!

Further Step: General CSLIM Approaches
Cons: CSLIM requires users’ non-contextual ratings on items; if
there are no such ratings, we proposed to use the average of user’s
contextual ratings on the item for representative, which was
demonstrated to be feasible in our experiments.
However, we’d like to build more General CSLIM (GSLIM) models
which does not require the data of non-contextual ratings.
Simply, we model matrix D as a CC matrix, where each cell in D
represents the CRD between each two contextual conditions.
GCSLIM-I-CC can estimate rating deviations from a contextual rating
to another contextual rating (same item but different contexts).

For example, we want to estimate R<u1, t1, {Weekday, At home}>
And we already know the rating R<u1, t1, {Weekend, At cinema}>
And Matrix D helps us to learn and estimate
CRD (Weekday, Weekend) & CRD (At home, At cinema)
Therefore, R<u1, t1, {Weekday, At home}> =
R<u1, t1, {Weekend, At cinema}> + CRD (Weekday, Weekend)
+ CRD (At home, At cinema)
Similarly, matrix D can be paired with users or items; e.g., we
assume CRD between contexts differ from users to users.

Two challenges in GCSLIM approaches:
1). For each <user, item> pair, there could be several ratings for
this pair but in different contexts. Which contextual rating should
be applied?
If we use all those ratings  increasing computational costs;
If we just select one of them  there are three ways: MostSimilar,
LeastSimilar and Random; our experiments showed we could
randomly pick up one. See our papers for more details.

Two challenges in GCSLIM approaches:
2). How to couple matrix D with user or item dimension
If assign a D for each user/item  increasing computation costs
Solution: we can cluster users/items to small groups, and assume
the users/items in the same group can share a same matrix D.
We will explore this attempt in our future work.

Data Sets
The current situation in the CARS research domain:
1). The number of data sets is limited;
2). The data is either small or sparse;
3). There are no large data sets, or larger ones are not publicly
accessible. Most data were collected from surveys.
All the data sets used can be found here: http://tiny.cc/contextdata
For reason of limited time, we only present results based on the
restaurant and music data in this slide. See more results in our
CIKM paper.

Baseline Approaches
We choose the state-of-the-arts CACF algorithms as baselines:
1). Differential context modeling (DCM): DCM incorporates contexts
into UserKNN/ItemKNN, but it suffers from sparsity problem and
performs the worst in terms of precision, recall and MAP.
2). Context-aware Splitting Approaches (CASA): CASA is a contextual
transformation approach, where contextual data were converted to
2D user-item rating matrix, and then traditional approach (MF in
this case) can be applied to the transformed data.
3). Context-aware Matrix Factorization (CAMF): CAMF incorporates
contexts into MF, where CRD is modeled as similar way as CSLIM.
4). Tensor Factorization (TF): TF is an independent context-aware
algorithm, since contexts are assumed to be independent with
user and item dimensions. TF increases computational costs with
the number of contexts increases.

Evaluation Protocols
1). 5-folds Cross-validation
All algorithms were run based on the same 5-folds of the data.
2). Top-N Recommendation Evaluations
Metrics: Precision, Recall and MAP (Mean Average Precision)
Precision and Recall are used to measure accuracy;
MAP is used to measure the position in the rankings;
Research Questions:
1). CSLIM outperforms the state-of-the-art CARS algorithms?
2). How about the GCSLIM? Better than CSLIM?
3). There are so many CLSIM algorithms, any guidelines to
pre-select the appropriate CSLIM algorithm?

Evaluation Results
Research Questions:

Evaluation Results
Research Questions:
There are two pieces in CSLIM algorithms; For example, CSLIM-I-CI
1). CSLIM-I, indicates we perform an ItemKNN CF approach;
2). – CI, indicates we model CRD as a CI matrix;
Questions:
1). CSLIM-I/ItemKNN or CSLIM-U/UserKNN should be used?
AW: it depends on the average number of ratings on items or
the average number of ratings by users.
2). –CI, –CU or –C should be applied?
AW: it relies on contexts are more dependent with users or items
For more details, see our CIKM paper.

Evaluation Results
How about the running efficiency?
Typically, in CSLIM and GCSLIM, the matrices D and W should be
learned in the process. There could be different challenges:
1). Large number of users/items/ratings
In this case, the non-contextual rating matrix R or the rating space
P will be very large, as well as the matrix W.
Solution: adopt KNN strategy. We do not use all the ratings, but
only select the top-N neighbors (items or users).
2). Large scale of contexts
What if there are tons of contextual conditions? Usually, in CARS
domain, the # of contextual dimensions are within 10, and the # of
contextual conditions are 100 at most.
Solution: there are many ways to pre-select influential contexts,
which contributes to reduce the # of contexts.

Conclusions
1). CSLIM actually has been demonstrated to outperform the
state-of-the-art CARS algorithms;
2). GCSLIM sometimes contributes further improvements, but it is
not guaranteed that GCSLIM can always beat CSLIM algorithms – it
depends on how sparse the contextual ratings are;
3). We figure out some influential factors and discover latent rules
to select the appropriate CSLIM algorithms in advance.
1). Try to examine CSLIM and GCSLIM on larger data sets;
2). Try to compete with more models, e.g. factorization machines;
3). Try to couple CC matrix with users/items in GCSLIM approach;
4). Try to incorporate contexts into matrix W instead of adding the
matrix D.
Future Work

Deviation-Based Contextual SLIM Recommenders
Yong Zheng, Bamshad Mobasher, Robin Burke
DePaul University, Chicago, IL, USA
@CIKM 2014, Shanghai, China, Nov 4, 2014
Thanks!
Questions?

[CIKM 2014] Deviation-Based Contextual SLIM Recommenders

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to [CIKM 2014] Deviation-Based Contextual SLIM Recommenders

Similar to [CIKM 2014] Deviation-Based Contextual SLIM Recommenders (20)

Recently uploaded

Recently uploaded (20)

[CIKM 2014] Deviation-Based Contextual SLIM Recommenders