7. Number of
keywords in title
2 2
Number of
keywords in text
2 0
Domain carrerjet.sk vienna-rb.at
Category Job search Programming
Language Slovak English
8. Document feature How much I care about it
(the higher the more I care)
# keywords in title 2.1
# keywords in text 1
Domain is carreerjet.sk -2
Domain is vienna-rb.at 3.5
Category is Job Search -1
Category is Programming 4.2
Language is Slovak 0.9
Language is English 1.5
9. Document feature How much I
care about it
# keywords in title 2.1 2 2
# keywords in text 1 2 0
Domain is carreerjet.sk -2 1 0
Domain is vienna-rb.at 3.5 0 1
Category is Job Search -1 1 0
Category is Programming 4.2 0 1
Language is Slovak 0.9 1 0
Language is English 1.5 0 1
= 4.1 = 13.3rank = d . u
15. Except..
● You don't know the explicit
ratings
● User preferences change in time
● Those equations probably don't
have solution
16. Approximation function
h(d): d → rank
h(d) = d1
.u1
+ ... + dn
.un
= estimated_rank
If the function is good, it should make
minimal errors
error = (estimated_rank - real_rank)2
17. Gradient descent
1. Set user preferences (u) to arbitrary
values
2. Calculate the estimated rank h(d)
for each document
3. Calculate the mean square error
4. Adjust preferences u in a way that
minimizes the error
5. Repeat until the error converges
19. meansquareerror
u# of keywords in title
cost function
Calculate the derivation of cost
function at this point and it will
give you the direction to move in.
20. Preference update
ui
= ui
- α.h(d)dui
α learning rate
h(d)dui
partial derivation of
cost function h(d)
by ui
21. Preference update
ui
= ui
- α.h(d)dui
α learning rate
h(d)dui
partial derivation of
cost function h(d)
by ui
How fast will you
move. Too low -
slow progress. Too
high - you will
overshoot.
22. Preference update
ui
= ui
- α.h(d)dui
α learning rate
h(d)dui
partial derivation of
cost function h(d)
by ui
Nothing scary. You can
find these online for
standard cost
functions.
For mean square error:
(rank(d) - h(d)) * ui
23. Gradient descent
1. Set user preferences (u) to arbitrary
values
2. Calculate the estimated rank h(d)
for each document
3. Calculate the square error
4. Adjust preferences u in a way that
minimizes the error
5. Repeat until the error converges
26. What's changed?
We no longer have ratings, just document
comparisons.
Cost function - something that
considers ordering, e.g., Kendall's T
(number of concordant and
discordant pairs)
h is now a function of 2
parameters: h(d1, d2). But you can
just do d2 - d1 and learn on that.
d4
> d3
d4
> d2