Deep natural language processing in search systems
1. AI Tech Meetup #
Deep Natural Language Processing in Search Systems
Weiwei Guo, Xiaowei Liu, Huiji Gao
LinkedIn AI
05/09/2019
2. AI Tech Meetup 2
Natural Language Data in Search Systems
Job Posts,
Member Profiles,
Articles,
Courses,
etc.
Query
Member
Profile
Natural Language Data
Ranking
Personalization
3. AI Tech Meetup 3
Natural Language Processing in Search Ecosystem
Query
Tagging
Retrieval & Ranking
Raw Query
Query
Understanding
Query
Intention
Document
Understanding
Auto
Completion
Spell
Check
Query
Suggestion
Natural Language Generation
for Search Assistant
Natural Language Understanding
Indexing Matching
Text
Processing
Language
Detection
Tokenization
4. AI Tech Meetup 4
NLP in Search Systems: Challenges
â Data Ambiguity
â Short query text
â âabcâ
â No strict syntax
â âbing search engineerâ
â Strong correlation to the searcher
â âlooking for new jobsâ
â Deep Semantics
â Representations for query & document w.r.t.
search intent, entities, topics
â âEngineering Openingsâ -> job posts
5. AI Tech Meetup 5
Deep NLP in Search: Challenges
â Complicated Search Ecosystem
â Query suggestion affects both recall and precision in downstream retrieval
and ranking.
â Query tagging needs to be compatible with indexing and align with ranking
features.
â Product Oriented Model Design
â Design deep NLP algorithms for specific search components
â Consider business rules, post filters, results blender, user experience, etc
â Online Latency
â Serving deep NLP models with product latency restriction
6. AI Tech Meetup 6
Applying Deep NLP in Search
â Feature Driven
â Representation Learning
using features generated from deep learning models
e.g., word embedding
â Model Driven
â Power product features directly with deep learning models
â CNN/LSTM/Seq2seq/GAN/BERT based deep NLP models
7. AI Tech Meetup 7
Sequential Modeling
Query Intention
(Classification)
Document Ranking
(Neural Ranker)
to label
to document
Query Suggestion
(seq2seq) to next
sequence
Auto-completion
(Language Modeling)to completed
sequence
Deep Learning for Natural Language Processing
Query Tagger
(Sequential
Tagging)
to word labels
8. AI Tech Meetup 8
Sequential Modeling
Query Intention
(Classification)
Document Ranking
(Neural Ranker)
to label
to document
Query Suggestion
(seq2seq) to next
sequence
Auto-completion
(Language Modeling)to completed
sequence
Deep Learning for Natural Language Processing
Query Tagger
(Sequential
Tagging)
to word labels
9. AI Tech Meetup 9
Natural Language Generation: Auto-Completion
âȘ Given a prefix, predict the completed
query, rather than the completed word
10. AI Tech Meetup 10
Auto-completion Challenges
âȘ High requirement of latency
â Have to adopt simple and fast models
âȘ Traditional methods cannot encode powerful textual features
â Usually the helpful feature is the query frequency
11. AI Tech Meetup 11
A Two-step Approach: Generation and Ranking
âȘ Candidate Generation
â Collect query frequency from search log
â For any prefix, FST returns the most frequent completed queries
â When a prefix is never seen, remove one word from beginning
âȘ e.g., âmetamind sofâ â âsofâ
âȘ Candidate Ranking
â Neural Language Model serves as a scoring function
12. AI Tech Meetup 12
Auto-Completion: Neural Language Model as Scoring/Ranking
âȘ Achieved 6x speedup with optimization
13. AI Tech Meetup 13
Auto-Completion Offline Experiments
Generation Ranking dev test
MPC 18.097 17.032
FST-backoff 33.274 (+83.8%) 32.331 (+89.8%)
FST-backoff neural 34.877 (+92.7%) 33.784 (+98.3%)
âȘ Public Data Set: AOL
â Baseline: Most Popular Candidate (MPC), FST-backoff
â Evaluation metrics: MRR, 4th place â 0.25
14. AI Tech Meetup 14
Sequential Modeling
Query Intention
(Classification)
Document Ranking
(Neural Ranker)
to label
to document
Query Suggestion
(seq2seq) to next
sequence
Auto-completion
(Language Modeling)to completed
sequence
Deep Learning for Natural Language Processing
Query Tagger
(Sequential
Tagging)
to word labels
15. AI Tech Meetup 15
Natural Language Generation: Query Suggestion
âȘ Traditional frequency based methods
â Collect <q1, q2> pairs from search log
â Save the frequent pairs in a key-value store
âȘ Lack of generalization
â Purely string matching
â Cannot handle unseen queries, rare words
âȘ Seq2seq: capture query reformulation
16. AI Tech Meetup 16
encoder
Query Suggestion: Reformulate to Related Queries
â Training: the 2nd query is given
â Maximize
linkedin engineering <s> linkedin software
linkedin software development </s>
development
decoder
17. AI Tech Meetup 17
decoderencoder
Query Suggestion: Reformulate to Related Queries
linkedin engineering <s>
linkedin software development </s>
â Inference: the 2nd query is unknown
â Beam search instead of greedy search
18. AI Tech Meetup 18
Query Suggestion: How to Handle Online Latency
âȘ Latency too long for one query
âȘ Make it parallel with search ranking
19. AI Tech Meetup 19
Online Performance
âȘ English Market
â Coverage: +80% Impressions, +26% CTR
â +1% Total job application
âȘ I18n Market
â +3.2% Successful Searches
â +28.6% First Order Actions
20. AI Tech Meetup 20
Sequential Modeling
Query Intention
(Classification)
Document Ranking
(Neural Ranker)
to label
to document
Query Suggestion
(seq2seq) to next
sequence
Auto-completion
(Language Modeling)to completed
sequence
Deep Learning for Natural Language Processing
Query Tagger
(Sequential
Tagging)
to word labels
21. AI Tech Meetup 21
Natural Language Understanding: Query Intent
Motivation:
âȘ To understand the task
user wants to finish on
LinkedIn
Challenges:
âȘ Complicated Semantics
âȘ Personalization
Steve Jobs
22. Personalization & Search AI Foundation 22AI Algorithms FoundationAI Algorithms Foundation
CNN Based Query Intention Model
CNN for Semantic Feature
Extraction
âȘ Word/query representations
âȘ Generalization power
âȘ Word n-gram patterns
Personalization
âȘ Member-level Features
23. AI Tech Meetup 23
Query Intent
âȘ Offline results
Overall Accuracy F1 on PEOPLE F1 on JOB
LR (Baseline) - - -
LR + BOW +1.3% +4.1% +0.8%
CNN +2.9% +11.9% +1.7%
âȘ Online results
â + 0.65% JOB Ctr At 1 Serp
â + 0.90% Overall Cluster Ctr, + 4.03% Cluster Ctr Via Entity Click
24. AI Tech Meetup 24
Sequential Modeling
Query Intention
(Classification)
Document Ranking
(Neural Ranker)
to label
to document
Query Suggestion
(seq2seq) to next
sequence
Auto-completion
(Language Modeling)to completed
sequence
Deep Learning for Natural Language Processing
Query Tagger
(Sequential
Tagging)
to word labels
25. AI Tech Meetup 25
Natural Language Understanding: Query Tagger
LinkedIn software engineer data scientist jobsâ
CNâ Tâ Tâ Tâ Tâ Oâ
B-CNâ B-Tâ I-Tâ B-Tâ I-Tâ Oâ
B-CN: beginning of a company nameâ
I-CN: Inside of a company nameâ
B-T: beginning of a job title
I-T: Inside of a job title
O: Not an entityâ
B-PN: beginning of person name
âŠ
26. AI Tech Meetup 26
Query Tagger: Logistic Regression
LinkedIn software engineer data scientist jobsâ
B-CNâ B-Tâ I-Tâ B-Tâ I-Tâ Oâ
N way classification
sparse ftr vec
ftr 0: whether the current word is âlinkedinâ
ftr 1: whether the current word is âfacebookâ
âŠ
ftr n: whether the next word is âsoftwareâ
ftr n+1: whether the next word is âlinkedinâ
âŠ.
28. AI Tech Meetup 28
Query Tagger: CRF + LSTM
LinkedIn software engineer data scientist jobsâ
B-CNâ B-Tâ I-Tâ B-Tâ I-Tâ Oâ
CRF Layer
LSTM hidden state
Word embedding
29. AI Tech Meetup 29
Sequential Modeling
Query Intention
(Classification)
Document Ranking
(Neural Ranker)
to label
to document
Query Suggestion
(seq2seq) to next
sequence
Auto-completion
(Language Modeling)to completed
sequence
Deep Learning for Natural Language Processing
Query Tagger
(Sequential
Tagging)
to word labels
30. AI Tech Meetup 30
Natural Language Understanding: Document Ranking
31. AI Tech Meetup 31
Neural Ranking: Scoring a Query/Document Pair
query/doc score
hidden layer
wide ftrs
document
CNN/
BERT
query
CNN/
BERT
doc_title
CNN/
BERT
doc_text
deep ftrs
text
embedding
User profile
CNN/
BERT
user title
CNN/
BERT
user company
raw text
CNN/
BERT
behavior seqhistorical
queries
33. AI Tech Meetup 33
Offline Experiments
People Search Ranking
(NDCG@10)
Wide Features
CNN +1.32%
BERT (google
pretrained)
+1.52%
BERT (linkedin
pretrained)
+1.96%
34. AI Tech Meetup 34
Online Experiments
âȘ Help center ranking (BERT vs CNN)
â +9.5% search ctr
â +7% search clicks
35. AI Tech Meetup 35
Conclusions & Future Work
âȘ Deep NLP technologies can power various search components
directly with proper handling on challenges such as latency
âȘ Design deep NLP technologies needs to consider the specific product
properties as well as user experience
âȘ Multilingual learning can be exploited for developing advanced deep
NLP technologies