The document summarizes the practical challenges faced and lessons learned from building a personalized job recommendation system at LinkedIn. It discusses 3 key challenges - candidate selection using decision trees to generate queries, training personalized relevance models at scale using generalized linear mixed models, and realizing an ideal jobs marketplace through early intervention to redistribute job applications. The summary provides an overview while hitting the main points discussed in the document in 3 sentences or less.
4. Why Candidate Selection?
•Need to meet latency requirements of an online recommendation system
•Only subset of jobs is relevant to a user based on their domain and expertise
•Enables scoring with more complex and computationally expensive models
4
6. Decision Tree Based Approach
[Grover et al., CIKM 2017]
• Train on top-k ranked documents
as positives and tail end as
negatives.
• Extract combinations of clauses
decision tree by traversing root
to leaf paths.
• Do a weighted combination of
the clauses.
6
7. Decision Tree Based Query Generation
• Trees are a natural way to
learn combinations of clauses
• Weighted combinations can
be learned by looking at
purity of the nodes in a
WAND query
7
Title
Match
Seniority
Match
Negative Positive
Function
Match
Positive Positive
NO Yes
YesYes NONO
7 7
10. Generalized Linear Mixed Models (GLMM)
• Mixture of linear models into an additive model
• Fixed Effect – Population Average Model
• Random Effects – Entity Specific Models
Response Prediction (Logistic Regression)
User 1
Random Effect Model
User 2
Random Effect Model
Personalization
Job 2
Random Effect Model
Job 1
Random Effect Model
Collaboration
Global Fixed Effect Model
Content-Based Similarity
10
11. Features
• Dense Vector Bag of Words Similarity Features in global model for Generalization
• i.e: Similarity in title text good predictor of response
• Sparse Cross Features in global,user, and job model for Memorization
• i.e: Memorize that computer science students will transition to entry engineering roles
Vector BoW Similarity Feature
Sim(User Title BoW,
Job Title BoW)
Global Model Cross Feature
AND(user = Comp Sci. Student,
job = Software Engineer)
User Model Cross Feature
AND(user = User 2,
job = Software Engineer)
Job Model Cross Feature
AND(user = Comp Sci. Student,
job = Job 1)
11
12. Training a GLMM at Scale
• Millions of random effect models * thousands of features per model
= Billions of parameters
• Infeasible to run traditional fitting methods on this very large linear
model with industry scale datasets
• Key Idea: For each entity's random effect, only the labeled data
associated with that entity is needed to fit its model
12
14. Training a GLMM at Scale
14
Global Fixed Effect Model
All labeled data is
first used to train the
fixed effect model
15. Training a GLMM at Scale
15
Global Fixed Effect Model
Nadia’s
Random Effect Model
Ben’s
Random Effect Model
Ganesh’s
Random Effect Model
Liang’s
Random Effect Model
Labeled data is
partitioned by entity to
train random effect
models in parallel
Repeat for each random
effect
16. Training a GLMM at Scale
16
Global Fixed Effect Model
After training the random
effects, cycle back and
train the fixed effect model
again if convergence
criteria is not met
Nadia’s
Random Effect Model
Ben’s
Random Effect Model
Ganesh’s
Random Effect Model
Liang’s
Random Effect Model
19. The Ideal Jobs Marketplace
• Maximize number of confirmed hires while minimizing number of job
applications
• This maximizes utility of job seeker and job posters
• Ranking by 𝑃 User 𝑢 applies to Job 𝑗 𝑢, 𝑗) only optimizes for user
engagement
1. Recommend highly relevant jobs to users
2. Ensure each job posting
• Receives sufficient number of applications from qualified candidates to
guarantee a hire
• But not overwhelm the job poster with too many applications
19
21. Potential Solution?
• Rank by likelihood that user will apply for the job and pass the
interview and accept the job offer?
• Data on whether a candidate passed an interview is confidential
• Data about the offer to the candidate is confidential too
• More importantly, modeling this requires careful understanding on potential
bias and unfairness of a model due to societal bias in the data
• Practically, we solve the job application redistribution problem instead
• Ensure a job does not receive too many or too few applications
21
23. Our High-level Idea: Early Intervention
[Borisyuk et al., KDD 2017]
• Say a job expires at time T
• At any time t < T
• Predict #applications it would receive at time T
• Given data received from time 0 to t
• If too few => Boost ranking score 𝑃 User 𝑢 applies to Job 𝑗 𝑢, 𝑗)
• If too many => Penalize ranking score 𝑃 User 𝑢 applies to Job 𝑗 𝑢, 𝑗)
• Otherwise => No intervention
• Key: Forecasting model of #applications per Job, using signals from:
• # Applies / Impressions the job has received so far
• Other features (xjt): e.g.
• Seasonality (time of day, day of week)
• Job attributes: title, company, industry, qualifications, …
23
24. • Control Model for Ranking : Optimize for user engagement only
• Split the jobs into 3 buckets every day:
• Bucket 1: Received <8 applications up to now
• Bucket 2: Received [8, 100] applications up to now
• Bucket 3: Received >100 applications up to now
Re-distribute
Online A/B Testing
24
25. Summary
• Model candidate selection query generation using decision trees
• Personalization at Scale through GLMM
• Realizing the ideal jobs marketplace through application
redistribution
• But a lot of research work still needed to
• Reformulate problem to model optimizing for a healthy marketplace directly
• Understand and quantify bias and fairness in those potential new models
25
26. References
• [Borisyuk et al., 2016] CaSMoS: A framework for learning candidate
selection models over structured queries and documents, KDD 2016
• [Borisyuk et al., 2017] LiJAR: A System for Job Application
Redistribution towards Efficient Career Marketplace, KDD 2017
• [Grover et al., 2017] Latency reduction via decision tree based query
construction, CIKM 2017
• [Zhang et al., 2016] GLMix: Generalized Linear Mixed Models For
Large-Scale Response Prediction, KDD 2016
26
29. Problem Formulation
• Rank jobs by 𝑃 User 𝑢 applies to Job 𝑗 𝑢, 𝑗)
• Model response given:
29
Careers History, Skills, Education, Connections Job Title, Description, Location, Company
29
30. User
Interaction
Logs
Offline Modeling
Workflow + User /
Item derived
features
User
Search-based
Candidate
Selection &
Retrieval
Query
Construction
User
Feature
Store
Search
Index of
Items
Recommendation
Ranking
Ranking
Model Store
Additional Re-
ranking/Filtering
Steps
1
2
3
4 5
6
7
Offline System Online System
Item
derived features
JYMBII Infrastructure
30
32. Understanding WAND Query
Query : “Quality Assurance Engineer”
WAND : “(Quality[5] AND Assurance[5] AND Engineer[1]) [10]”
✅ ✅ 32
33. Offline Evaluation
• Utilize offline query replay to validate
query against current baseline
• Replay baseline and new query to
compute metrics from the retention of
actions and operational metrics
• Applied jobs retained
• Hits retrieved
• Kendall’s Tau
• Mimicking production ranking through
replay to get more reliable estimate of
online metrics
33
- Passive Job Seekers: Allow them to discover what is avaliable in the marketplace.
- Not Alot of Data for Passives, show them jobs that make the most sense to them given their current career history and experience
- Active Job Seekers: Reinforce their job seeking experience. Show them similar jobs that they applied to that they may have missed. Make sure they don't miss opportunities
- Powers alot of modules including jobs home, feed, email, ads
Cite any examples
- Content Based Recommendations, Personalization, and Collaboration are incorporated through a GLMM Model
- Generalized link function
- Mixed due to ensemble of models in additive
- Fixed effect model
- Random Effects for variation
- All Linear Models
- Example of features in each of the model. Generalization, on most of the time to share learning across examples in a linear model
- Sparse Features for memorization
- Need to choose good features to memorize
- Random effect sparse features model personal affinity
- Goal: Get people hired - Confirmed Hires
- Time Lag on signal
- Proxy in Total Job Applies
- Metric: Total Job Applies
- Optimize probability of Apply. Not View. Showing users popular/attractive jobs not as important as showing them actual good matches
- User, Job, Activity