The document summarizes a presentation on people recommender systems and social networks. It discusses key concepts in social recommenders like reciprocity and multiple objectives. It provides examples of recommender systems at LinkedIn including People You May Know, talent matching, and endorsements. It also covers special topics like intent understanding using techniques like survival analysis, and evaluation challenges for social recommenders.
2. Disclaimer
Material presented references publicly shared
research work done at LinkedIn
Opinions expressed however are mine and do
not represent the official position of LinkedIn
3. Outline
Introduction
The basics of Social Recommenders
People recommender systems
Reciprocity & its quirks
o Cornerstones
o Special Topics in People Recommenders
o Motivating Examples
o Intent Understanding
o Reciprocity & Multi-Objective Optimization
o Evaluation
o Some Novel approaches & Applications
o Social Lens & Referrals
o Virtual Profiles
o Endorsements
o Conclusions
3
4. Cornerstones
o Accuracy & Precision are key
Revenue at stake
o Reciprocity throws a wrench
o Three actors at play
Multiple (possibly competing)
objectives to optimize
recommendee
recommendation
recommender system
o Evaluations are delayed
Conversion of a lead takes days/weeks/months
o Not deployed in Isolation
Usually co-located work with other recommenders, search
and Social Streams
4
5. Outline
Introduction
The basics of Social Recommenders
People recommender systems
Reciprocity & its quirks
Cornerstones
Motivating Examples
o Special Topics in People Recommenders
o Intent Understanding
o Multi-Objective Optimization
o Evaluation Quirks
o Some Novel approaches & Applications
o Social Lens & Referrals
o Virtual Profiles
o Endorsements
o Conclusions
5
11. Motivating Examples
Email
News Feed
Notification
o Endorsements
A
o
endorses
B
No Reciprocity, Utility B Recommender System
to
notified
B “accepts”
endorsement
Endorsement
recommendations
B
endorses
C
B
endorses
D
12. Outline
Introduction
The basics of Social Recommenders
People recommender systems
Reciprocity & its quirks
Cornerstones
Motivating Examples
o Special Topics in People Recommenders
o Intent Understanding
o Multi-Objective Optimization
o Some Novel approaches & Applications
o Social Lens & Referrals
o Virtual Profiles
o Endorsements
o Conclusions
12
15. Recruiting Intent
o Look-alike Models
Well Researched technique in Computational
Advertising
Finding/Ranking behavioral look-alikes
Performance at a certain reach
Reference
:
http://www.theguardian.com/media-network/media-networkblog/2013/sep/06/lookalike-modelling-advertising-demystified
16. Recruiting Intent
o Target Definition is crucial
How do we define targets/labels to predict?
It is a waste of time to develop features and learning algorithms without
carefully defining the right target
T: Profile Based Recruiters
U: Non-Recruiters
VT: Recruiters not showing
Recruiting activity
CT: Recruiters
showing activity
CU:
Showing
Recruiting activity
VU: Not showing
Recruiting activity
: positives
: negatives
17. Recruiting Intent
o As always, magic is in the features
Who are you?
- industry, title, seniority, function, skill, groups …
What are you doing ?
- page views, searches, invitations, news reads, group memberships
Temporal Behavioral featues..
target window
t0
tn tn+1
target time
feature window
17
18. Recruiting Intent
o L2 Regularized Logistic Regression
o Model derives a response score for each user from his static profile
and past online activities
o Score indicates the likelihood that this user will respond to the ad
campaign (clicks or conversion)
18
19. Job Seeking Intent
2008.02
2010.05
Given 1) the member started job a at time ta
2) the member hasn’t change from job a till now
3) various information (x) we have about the member
Predict the probability of the user changing to job b at time y
2013.09
20. Job Seeking - Survival Analysis
Review of Survival Analysis
is the time of death/event/purchase
is the survival time
Probability density distribution of event
Survival function
Hazards function
21. Job Seeking – Survival Analysis
Cox Proportional Hazards Model for Survival Analysis
How to incorporate covariates/additional information?
– Covariates are multiplicatively relative to hazards (Cox Proportional Hazards)
– Another way to do this is to have covariates multiplicatively related to
Survival (Accelerated Failure time)
What can be included in x?
– Time independent variables
Titles of Jobs, Companies at play, long term user preferences
– Time dependent interval variables
Mean time to switch between jobs in an area, industry
– Time dependent external variables
Seasonal softness
– Time independent external variables
Economic conditions
22. Job Seeking Intent
Weibull Distribution
Basic Weibull
distribution
Proportional hazards
model with Weibull
distribution
Scale of the curve
Reference : http://data.princeton.edu/pop509/ParametricSurvival.pdf
23. Probability of switch
Job Seeking – Feature Engineering
Months since graduation
What should you transition to .. and when ?
23
24. Job Seeking – Feature Engineering
Open to relocation ?
Region similarity based on profiles or network
Region transition probability
Model individuals propensity to migrate and most
likely migration target
25. Job Seeking
Bayesian Proportional Hazards Model
A hazards model for each transition pair m: {ja -> jb}
Hierarchical Bayesian models: handle transitions without much
training data
data
Transition
Reference : Jian Wang, Yi Zhang, Christian Posse, A Bhasin. Is it time for a career
switch? Proceedings of the 22nd World Wide Web conference, 2013
26. Job Seeking Intent
What Can the Model Tell Us?
Tenure-based Decision Probability
– The probability that user
make a job transition from to
at time between and
to
(in the near future)
given that the user doesn’t change job from till now
27. Job Seeking Intent
H-one
• Single set of parameters
H-Source
• Multiple sets of parameters for
transitions
H-SourceDest
• Multiple sets of parameters for
transitions
H-SourceDestCov
• Further incorporates
covariates
30. Multi-Objective Optimization
Serving Content on Y! Front Page : Click Shaping
What do we want to optimize?
Maximize clicks (maximize downstream supply from FP)
But consider the following
Article 1: CTR=5%, utility per click = 5
Article 2: CTR=4.9%, utility per click=10
By promoting 2, we lose 1 click/100 visits, gain 5 utils
If we do this for a large number of visits --- lose some clicks but
obtain significant gains in utility?
E.g. lose 5% relative CTR, gain 40% in utility (revenue, engagement,
etc)
31. Multi-Objective Optimization
other
Why call it Click Shaping?
other
video
videogames
tv
buzz
autos
finance
gmy.news
health
autos
travel
hotjobs
travel
buzz
video
videogames
tv
hotjobs
tech
movies
movies
tech
finance
gmy.news
health
AFTER
new.music
new.music
BEFORE
sports
sports
shopping
shopping
news
shine
shine
rivals
omg
realestate
realestate
omg
10.00%
8.00%
6.00%
4.00%
2.00%
-8.00%
-10.00%
es
othe
r
gam
tv
vide
o
vide
o
-6.00%
om g
rea le
s tat e
rival
s
-4.00%
buzz
finan
ce
gmy
.ne w
s
heal
th
hotjo
bs
mov
ie s
new
.mus
ic
new
s
0.00%
-2.00%
aut o
s
Supply distribution
Changes
shin
e
shop
ping
spor
ts
te ch
tra ve
l
rivals
news
SHAPING can happen with respect to any downstream metrics (like engagement)
32. Multi-Objective Optimization
n articles
K properties
m user segments
A1
S1
A2
S2
news
finance
…
…
…
omg
An
Sm
CTR of user segment i on article j: pij
Time duration of i on j: dij
32
33. Multi-Objective Optimization
Scalarization
Goal Programming
Simplex constraints on xiJ is always applied
Constraints are linear
Every 10 mins, solve x
Use this x as the serving scheme in the next 10 mins
Reference : Deepak Agarwal, Bee-Chung Chen, Pradheep Elango, Xuanhui Wang.
Click shaping to optimize multiple objectives. Proceedings of the 17th ACM SIGKDD
international conference on Knowledge discovery and data mining (KDD’11)
39. Multi-Objective Optimization
Loss Function
Objective and divergence depend on a sort/rank,
so gradient-based optimization not directly
applicable
44. Evaluation quirks
Days to act on Recommendation
Weeks to reciprocate
Does not work in isolation
- Success only if
- Reciprocation comes from first impression from recommender
- First impression : Did not see that result on any channel “K” days
before seeing it on the Recommender
45. Outline
Introduction
The basics of Social Recommenders
People recommender systems
Reciprocity & its quirks
Cornerstones
Motivating Examples
Special Topics in People Recommenders
Intent Understanding
Multi-Objective Optimization
Evaluation quirks
o Some Novel approaches & Applications
o Social Lens & Referrals
o Virtual Profiles
o Endorsements
o Conclusions
45
47. Social Referral
Formulation
When user ui interacts with Group g j
Define C = f , the candidate neighbor set
Foreach
uk Î neighbor(ui )
Guk = {g0 , g1,...gk }
- Generate
- If g j Î Gu then
k
the top-k group recommendations
C Å (uk , g j )
Rank order C using
Connection strength between ui &uk
Probability ofuk joiningg j
Combined score using the above two factors
48. Social Referral
Linkedin Group: Text Analytics
From: Deepak Agarwal – Engineering Director, LinkedIn
I found this group interesting, and I think you will too
Deepak
Linkedin Group: Text Analytics 2X
>
2X conversion
Conversion
Reference : Mohammad Amin, Baoshi Yan, Sripad Sriram, Anmol Bhasin, Christian
Posse. Social Referral : Using network connections to deliver recommendations. Proceedings of the Sixth ACM conference on Recommender systems (RecSys '12)
49. Social Referral
Quirks and Cautionary points
Controlled number of referral nudges to the source user
-
If nudged too many times, it may degrade the experience
Controlled number of referrals to the target user
-
Presumably degrades the experience of the target user as well
Only useful to use social referrals to individuals not
engaged with the product
-
If the target already interacts with many items, the referral has marginal
utility
Referred items of high quality
-
If the item referred is of poor quality, the entire exercise is futile
51. Virtual Profiles
Title : Eng Dir
Company : LinkedIn
Location : CA,USA
Skills : ML, RecSys
Title : Sr. Manager
Company : Netflix
Location : CA, USA
Skills : Machine
Learning, Data Mining
Title : Eng Mgr
Company : Linkedin
Location : PA, USA
Skills : Machine
Learning, Statistics,
Data Mining
Title :
Sr. Mgr<1>, Eng Dir<1>,
Eng Mgr <1>
Company :
LinkedIn<1>, Netflix<1>
Google<1>,
Location :
CA,USA <2>, PA, USA<1>
Skills :
ML<2>, RecSys<1>,
Stats<1>, DM<1>
52. Virtual Profiles
Point-wise Mutual Information
Pick Top K overrepresented features (f) from the
Group Join distribution vs the overall userpopulation feature distribution
A representative projection of the item (Group) in
the user feature space
53. Virtual Profiles – Group join propensity
Ranker
MEMBER FEATURES
Group virtual profile
Group
Features
Pjoin
Social
Information
Match feature pair includes
Group Virtual Profile features, Group popularity features
Member Profile features
Contextual features (device, location)
Interaction featues
L2 regularized Logistic Regression (Liblinear, VW, Mahout, ADMM) for
Ranking
Reference : Haishan Liu, Mohammad Amin, Baoshi Yan, Anmol Bhasin. Generating
Supplemental Content Information using Virtual Profiles.To appear at ACM RecSys’13
55. Endorsements
Rank Ordered Candidates with LR
with L2 penalty
Features
–
–
–
–
–
–
–
Company overlap
School overlap
Group overlap
Industry and functional area similarity
Title similarity
Site interactions
Co-interactions
Open Questions
– Do they share the same skill ?
– Validity of the endorsement ?
Candidate
generation
Feature
Vectors
- Company
- Title
- Groups
- Industry
-…
Classifier
Suggested Endorsements
(ranked by likelihood)
55
58. Big Challenge (Shameless plug)
Detect when we don’t have
“ANY” good items to show to
a particular user
Top K
WTF
Personalized Thresholds for
users – Cost of Consumption
Marginal utility of showing a
particular item to a particular
user is –ve
How to use crowdsourcing to
rate WTF for a particular user
When not to show..
58
59. Outline
Introduction
The basics of Social Recommenders
People recommender systems
Reciprocity & its quirks
Cornerstones
Motivating Examples
Special Topics in People Recommenders
Intent Understanding
Multi-Objective Optimization
Evaluation quirks
Some Novel approaches & Applications
Social Lens & Referrals
Virtual Profiles
Endorsements
o Conclusions
59
60. Conclusions
o Accuracy & Precision are key
Revenue at stake
o Reciprocity throws a wrench
o Three actors at play
Multiple (possibly competing)
objectives to optimize
o Evaluations are delayed
recommendee
recommendation
recommender system
Conversion of a lead takes days/weeks/months
o Not deployed in Isolation
Usually co-located work with other recommenders, search
and Social Streams
60
61. It takes a village!
LinkedIn Engineering : Abhishek Gupta, Adam Smyczek,
Adil Aijaz, Alan Li, Baoshi Yan, Bee-Chung Chen, Deepak
Agarwal, Ethan Zhang, Haishan Liu, Igor Perisic, Jonathan
Traupman, Liang Zhang, Lokesh Bajaj, Mario Rodriguez,
Mitul Tiwari, Mohammad Amin, Parul Jain, Paul Ogilvie,
Sam Shah, Sanjay Dubey, Tarun Kumar, Trevor Walker,
Utku Irmak
LinkedIn Product : Andrew Hill, Christian posse, Gyanda
Sachdeva, Parker Barrile, Sachit Kamat
External Partners : Christian Posse, Mike Grishaver,
Monica Rogati, Luiz Augusto Pizzato, Yi Zhang
Alphabetically sorted
SamSuggestions drive more people through this loop, faster.This is the KEY to virality
Here we consider failure as a user making a decision to transit to a new job. Let p(y) denote the probability density function of such an event.
This probability density function represents the basic pro- portional hazards model that models the tenure before a transition with associated covariates.
Unreasonable effectiveness of Big Data.. This chart shows the probability of holding a title across all titles, plotted vs number of months after graduation. Notice the spikes.. They are ~12 month almost perfectly aligned.. Remember the itch that you had when you finished 2 years at your company
This probability density function represents the basic pro- portional hazards model that models the tenure before a transition with associated covariates.
Understand this graph better
Talent Match: job posting flow: When recruiters post jobs we in real time suggest top candidates fit for the job
So, Intuitively, it makes sense to suggest users who are job seekers in TalentMatch. But we confirmed our intuition, we ran the numbers, and saw that users with a high job seeking intent (actives and passives) have a much higher rate of reply to career related emails when compared to non-job-seekers (16 times the reply rate). And this is exactly the facet of the utility function of TalentMatch that we are interested in improving. So, what we want to do is incorporate the job seeker intent into the TalentMatch model, and we want to do so without negatively affecting the booking rate and the email rate.
So, what we want is a controlled perturbation of the ranking output by the talent match model, and this is how we are gonna do it: given the talent match ranking, we run a perturbation function on it that generates another ranking, the perturbed ranking, which optimizes for a metric we’re interested in (in the case of TalentMatch, it’s number of users with high-job seeking intent in the top-12 recommendations). Given the 2 rankings and their distribution of match scores, we can compute the distance between them using a variety of metrics, for example KL divergence or Euclidean distance. This divergence score is what will help us to make sure we are not negatively affecting the quality of the recommendations. Notice how, in the perturbed ranking, item Z was bumped from its original third position, below the cutoff line, to the second position, and so whereas before we had 2 non-seekers above the cutoff, meaning they would be recommended, now we have a non-seeker and an active. Also notice, that the perturbation is minimal. We should feel comfortable bumping item Z to the second position, but not to the first position.There are then 3 functions that we need to define: the perturbation function, the divergence function, and the objective function. The parameters of the perturbation function is what we will be estimating based the performance established by the divergence and objective measures: we want high scores on the objective and low scores on the divergence.
Here is theinstantiation of those functions for the TalentMatch case. The perturbation function simply applies a small boost to the match score, denoted by the letter “y”, and we allow that boost to be different for active and passive job seekers (as denoted by the alpha and the beta parameters). The divergence function is simply the Euclidean distance between the distribution of scores in the talent match ranking and the distribution of scores in the perturbed ranking. This is simply a measure of how match quality was affected (a divergence score of 0 means that the quality of the matches remained unaffected). The objective is the average number of actives and passives in the top-12.
To find a good perturbation function, we can construct a typical loss function, where the effect of the divergence is governed by a regularization parameter lambda, and then optimize this loss function to find the parameters of the perturbation function, alpha and beta, which correspond respectively to the boost of active and passive job seekers. However, there is a complicating factor: both the divergence and objective functions depend on a ranking, which depends on a sorting operation, and therefore, traditional gradient based approaches are not readily applicable. Also, what should we set lambda to? We don’t just want to use the lambda that generates the lowest loss, we are actually more interested in what our options are regarding what our tradeoff is going to be between the objective and the divergence function.
We will discuss computational strategies for optimizing the perturbation function in a moment, but before that, we need to discuss the kind of optimization we are actually interested in. What we really want is Pareto optimization, where there is not one optimal solution, instead, there are some solutions which are better in one objective, while other solutions are better in others. In this plot, we have the objective, the average number of actives and passives in the top-12 results, on the y-axis, and the divergence on the x-axis. The original ranking has on average 4 actives and passives in the top-12, as shown in the table in the top left corner. Also, by definition, the divergence of this original ranking, is 0. Each point (or bubble) in the plot represents a specific assignment to the parameters of the perturbation function: alpha and beta. We see on the plot that the only way to increase the objective on the y-axis, is to also allow an increase on the divergence on the x-axis. We also see that for a given divergence, say 50, there are many assignments of alpha and beta with that divergence, with varying scores on the objective. We want the maximum objective for each divergence, and those are the points in the pareto frontier, which are the red points in the plot. So, no matter what divergence you allow, you should pick a point on the pareto frontier. Back to the table of sample plans, we see that if we set alpha and beta to 1.15, we can double the the number of actives and passives in the top-12 (from 4 to 8) while paying the cost of having a divergence of 64, and that this is a point in the pareto frontier.
Here we can get a better idea of what the divergence scores actually mean, the top left has the distribution of the original, unperturbed model, and as we move across the quadrant, we see how the divergence increases (0, 27, 54, and 100). In the top left histogram, we see the bump around the 0.9’s, and with each histogram, the bump is gradually attenuated, until there is no more bump in the bottom right. So, we would probably be willing to accept a divergence in the 50-60’s range (as shown in the bottom left), but not in the 100’s, which is what’s shown in the bottom right.
Given that we only had 2 parameters in our perturbation function, grid search was a satisfactory approach and so that’s what we used. When you have a set of pareto optimal values, typically what it’s done is that you look for the proverbial knee of the curve, a point after which you have to pay too much in one objective to get increases in another, and our curve actually displays this characteristic: the Pareto tradeoff is constant up to a divergence of about 60, which as we saw earlier in the histogram slide, was not too bad. Still we did not know exactly what a given divergence would do to the booking and email rate, so we picked a couple of values to A/B test. We picked the maximum value on that line, the one at the knee, and a point in the middle, which corresponded to a boost of 1.15 and 1.07 respectively.So, what did we expect from the tests? Since we knew the rate of reply to career-related emails of users with high-job seeking intent, as well as the expected proportion of those users in the top-12 recommendations, it was easy to get a ball park figure of how much of an increase in reply rate we would obtain: we expected a 50% increase over control for the 1.07 treatment and a 100% increase over control for the 1.15 treatment. Regarding the other 2 facets of the utility function, the booking rate and the email rate of job posters to candidates, what we hoped was that they would remain unchanged or only be minimally affected.
So, how did we do? Let’s see how facet of the utility was affected. The booking rate remained mostly unchanged, with possibly a very slight dip of 0.4% on the 1.15 treatment. The email rate, to our surprise actually increased in both treatments. This tells us that somehow, the profiles of users with high-job seeking intent were more appealing to job posters than those who weren’t. Specifically what about their profile was more appealing is something we have yet to look into. This also tells us that maybe the snippets that we show job posters were not a great representation of the value for them, and that perhaps better snippets would lead to higher booking rates. Finally, we see that we were able to increase the reply rate, which is what we had originally set out to do, and that the increase for the 1.15 treatment was double that of the 1.07 treatment: 42% and 22%, which was in line with our expectations. Now, these numbers are pretty good, but why weren’t they as high as we had expected? Well, we had thought that job posters contacted all the recommendations, since it did not cost them more to contact all than to contact one, but as we observed in the email rate, which we were able to improve, job posters do not, in fact, contact all of the recommendations.
A brand new Recommendation Delivery paradigm – Tested on LinkedIn Groups to generate 2X Group Join rate. Applicable to advertising as well..The idea is simple - Reverse the Social Proof idea . Ask the actor to recommend their connections to interact with this item. - The message comes from the individual not LinkedInInherently socially endorsedTimely and contextualCan be applied to other recommendation paradigms as well. Using social recommendations to drive engagement on products on a network/website
A brand new Recommendation Delivery paradigm – Tested on LinkedIn Groups to generate 2X Group Join rate. Applicable to advertising as well..The idea is simple - Reverse the Social Proof idea . Ask the actor to recommend their connections to interact with this item. - The message comes from the individual not LinkedInInherently socially endorsedTimely and contextual
Incredibly powerful whetted paradigm that we are excited to try to rope into our Ads offerings
A brand new Recommendation Delivery paradigm – Tested on LinkedIn Groups to generate 2X Group Join rate. Applicable to advertising as well..The idea is simple - Reverse the Social Proof idea . Ask the actor to recommend their connections to interact with this item. - The message comes from the individual not LinkedInInherently socially endorsedTimely and contextual
Incredibly powerful whetted paradigm that we are excited to try to rope into our Ads offerings
Solve the impedance mismatch by creating the Group representation in the user space. This concept is used extensively at LinkedIn for all kinds of user recommendations, not just groups.
SamNow we can prompt your connections to validate your skills and expertise through an endorsementThis moves more people through the loop faster
SamHow would you think about this problem? How do you decide what people and skills to show?
Now we have all the pieces…To reinforce how this works so well,limited adopted by asking manual entry;accelerate by asking them to confirm, but no validation;social tagging, viral loops, and crowdsourcing -> provides the biggest winYou have a skills section -> people may enter their own skills, though not validatedYou recommend skills to add -> more people add skills, still not validatedYou provide a viral endorsement system -> don’t have the catalyst to get adoptionYou need recommendations as a core piece of this ecosystemSo we have the data, what are the applications? Why is this important?
PeteTO ADD: “Reid endorsed you for Venture Capital.”It’s not just the number of endorsements, it’s the nature.