SlideShare a Scribd company logo
1 of 36
AI Tech Meetup #
Deep Natural Language Processing in Search Systems
Weiwei Guo, Xiaowei Liu, Huiji Gao
LinkedIn AI
05/09/2019
AI Tech Meetup 2
Natural Language Data in Search Systems
Job Posts,
Member Profiles,
Articles,
Courses,
etc.
Query
Member
Profile
Natural Language Data
Ranking
Personalization
AI Tech Meetup 3
Natural Language Processing in Search Ecosystem
Query
Tagging
Retrieval & Ranking
Raw Query
Query
Understanding
Query
Intention
Document
Understanding
Auto
Completion
Spell
Check
Query
Suggestion
Natural Language Generation
for Search Assistant
Natural Language Understanding
Indexing Matching
Text
Processing
Language
Detection
Tokenization
AI Tech Meetup 4
NLP in Search Systems: Challenges
● Data Ambiguity
○ Short query text
■ “abc”
○ No strict syntax
■ “bing search engineer”
○ Strong correlation to the searcher
■ “looking for new jobs”
● Deep Semantics
○ Representations for query & document w.r.t.
search intent, entities, topics
■ “Engineering Openings” -> job posts
AI Tech Meetup 5
Deep NLP in Search: Challenges
● Complicated Search Ecosystem
○ Query suggestion affects both recall and precision in downstream retrieval
and ranking.
○ Query tagging needs to be compatible with indexing and align with ranking
features.
● Product Oriented Model Design
○ Design deep NLP algorithms for specific search components
○ Consider business rules, post filters, results blender, user experience, etc
● Online Latency
○ Serving deep NLP models with product latency restriction
AI Tech Meetup 6
Applying Deep NLP in Search
● Feature Driven
○ Representation Learning
using features generated from deep learning models
e.g., word embedding
● Model Driven
○ Power product features directly with deep learning models
■ CNN/LSTM/Seq2seq/GAN/BERT based deep NLP models
AI Tech Meetup 7
Sequential Modeling
Query Intention
(Classification)
Document Ranking
(Neural Ranker)
to label
to document
Query Suggestion
(seq2seq) to next
sequence
Auto-completion
(Language Modeling)to completed
sequence
Deep Learning for Natural Language Processing
Query Tagger
(Sequential
Tagging)
to word labels
AI Tech Meetup 8
Sequential Modeling
Query Intention
(Classification)
Document Ranking
(Neural Ranker)
to label
to document
Query Suggestion
(seq2seq) to next
sequence
Auto-completion
(Language Modeling)to completed
sequence
Deep Learning for Natural Language Processing
Query Tagger
(Sequential
Tagging)
to word labels
AI Tech Meetup 9
Natural Language Generation: Auto-Completion
â–Ș Given a prefix, predict the completed
query, rather than the completed word
AI Tech Meetup 10
Auto-completion Challenges
â–Ș High requirement of latency
– Have to adopt simple and fast models
â–Ș Traditional methods cannot encode powerful textual features
– Usually the helpful feature is the query frequency
AI Tech Meetup 11
A Two-step Approach: Generation and Ranking
â–Ș Candidate Generation
– Collect query frequency from search log
– For any prefix, FST returns the most frequent completed queries
– When a prefix is never seen, remove one word from beginning
â–Ș e.g., “metamind sof” → “sof”
â–Ș Candidate Ranking
– Neural Language Model serves as a scoring function
AI Tech Meetup 12
Auto-Completion: Neural Language Model as Scoring/Ranking
â–Ș Achieved 6x speedup with optimization
AI Tech Meetup 13
Auto-Completion Offline Experiments
Generation Ranking dev test
MPC 18.097 17.032
FST-backoff 33.274 (+83.8%) 32.331 (+89.8%)
FST-backoff neural 34.877 (+92.7%) 33.784 (+98.3%)
â–Ș Public Data Set: AOL
– Baseline: Most Popular Candidate (MPC), FST-backoff
– Evaluation metrics: MRR, 4th place → 0.25
AI Tech Meetup 14
Sequential Modeling
Query Intention
(Classification)
Document Ranking
(Neural Ranker)
to label
to document
Query Suggestion
(seq2seq) to next
sequence
Auto-completion
(Language Modeling)to completed
sequence
Deep Learning for Natural Language Processing
Query Tagger
(Sequential
Tagging)
to word labels
AI Tech Meetup 15
Natural Language Generation: Query Suggestion
â–Ș Traditional frequency based methods
– Collect <q1, q2> pairs from search log
– Save the frequent pairs in a key-value store
â–Ș Lack of generalization
– Purely string matching
– Cannot handle unseen queries, rare words
â–Ș Seq2seq: capture query reformulation
AI Tech Meetup 16
encoder
Query Suggestion: Reformulate to Related Queries
■ Training: the 2nd query is given
■ Maximize
linkedin engineering <s> linkedin software
linkedin software development </s>
development
decoder
AI Tech Meetup 17
decoderencoder
Query Suggestion: Reformulate to Related Queries
linkedin engineering <s>
linkedin software development </s>
■ Inference: the 2nd query is unknown
■ Beam search instead of greedy search
AI Tech Meetup 18
Query Suggestion: How to Handle Online Latency
â–Ș Latency too long for one query
â–Ș Make it parallel with search ranking
AI Tech Meetup 19
Online Performance
â–Ș English Market
– Coverage: +80% Impressions, +26% CTR
– +1% Total job application
â–Ș I18n Market
– +3.2% Successful Searches
– +28.6% First Order Actions
AI Tech Meetup 20
Sequential Modeling
Query Intention
(Classification)
Document Ranking
(Neural Ranker)
to label
to document
Query Suggestion
(seq2seq) to next
sequence
Auto-completion
(Language Modeling)to completed
sequence
Deep Learning for Natural Language Processing
Query Tagger
(Sequential
Tagging)
to word labels
AI Tech Meetup 21
Natural Language Understanding: Query Intent
Motivation:
â–Ș To understand the task
user wants to finish on
LinkedIn
Challenges:
â–Ș Complicated Semantics
â–Ș Personalization
Steve Jobs
Personalization & Search AI Foundation 22AI Algorithms FoundationAI Algorithms Foundation
CNN Based Query Intention Model
CNN for Semantic Feature
Extraction
â–Ș Word/query representations
â–Ș Generalization power
â–Ș Word n-gram patterns
Personalization
â–Ș Member-level Features
AI Tech Meetup 23
Query Intent
â–Ș Offline results
Overall Accuracy F1 on PEOPLE F1 on JOB
LR (Baseline) - - -
LR + BOW +1.3% +4.1% +0.8%
CNN +2.9% +11.9% +1.7%
â–Ș Online results
– + 0.65% JOB Ctr At 1 Serp
– + 0.90% Overall Cluster Ctr, + 4.03% Cluster Ctr Via Entity Click
AI Tech Meetup 24
Sequential Modeling
Query Intention
(Classification)
Document Ranking
(Neural Ranker)
to label
to document
Query Suggestion
(seq2seq) to next
sequence
Auto-completion
(Language Modeling)to completed
sequence
Deep Learning for Natural Language Processing
Query Tagger
(Sequential
Tagging)
to word labels
AI Tech Meetup 25
Natural Language Understanding: Query Tagger
LinkedIn software engineer data scientist jobs​
CN​ T​ T​ T​ T​ O​
B-CN​ B-T​ I-T​ B-T​ I-T​ O​
B-CN: beginning of a company name​
I-CN: Inside of a company name​
B-T: beginning of a job title
I-T: Inside of a job title
O: Not an entity​
B-PN: beginning of person name


AI Tech Meetup 26
Query Tagger: Logistic Regression
LinkedIn software engineer data scientist jobs​
B-CN​ B-T​ I-T​ B-T​ I-T​ O​
N way classification
sparse ftr vec
ftr 0: whether the current word is “linkedin”
ftr 1: whether the current word is “facebook”


ftr n: whether the next word is “software”
ftr n+1: whether the next word is “linkedin”

.
AI Tech Meetup 27
Query Tagger: CRF
LinkedIn software engineer data scientist jobs​
B-CN​ B-T​ I-T​ B-T​ I-T​ O​
CRF Layer
sparse ftr vec
AI Tech Meetup 28
Query Tagger: CRF + LSTM
LinkedIn software engineer data scientist jobs​
B-CN​ B-T​ I-T​ B-T​ I-T​ O​
CRF Layer
LSTM hidden state
Word embedding
AI Tech Meetup 29
Sequential Modeling
Query Intention
(Classification)
Document Ranking
(Neural Ranker)
to label
to document
Query Suggestion
(seq2seq) to next
sequence
Auto-completion
(Language Modeling)to completed
sequence
Deep Learning for Natural Language Processing
Query Tagger
(Sequential
Tagging)
to word labels
AI Tech Meetup 30
Natural Language Understanding: Document Ranking
AI Tech Meetup 31
Neural Ranking: Scoring a Query/Document Pair
query/doc score
hidden layer
wide ftrs
document
CNN/
BERT
query
CNN/
BERT
doc_title
CNN/
BERT
doc_text
deep ftrs
text
embedding
User profile
CNN/
BERT
user title
CNN/
BERT
user company
raw text
CNN/
BERT
behavior seqhistorical
queries
AI Tech Meetup 32
Neural Ranking: Training
AI Tech Meetup 33
Offline Experiments
People Search Ranking
(NDCG@10)
Wide Features
CNN +1.32%
BERT (google
pretrained)
+1.52%
BERT (linkedin
pretrained)
+1.96%
AI Tech Meetup 34
Online Experiments
â–Ș Help center ranking (BERT vs CNN)
– +9.5% search ctr
– +7% search clicks
AI Tech Meetup 35
Conclusions & Future Work
â–Ș Deep NLP technologies can power various search components
directly with proper handling on challenges such as latency
â–Ș Design deep NLP technologies needs to consider the specific product
properties as well as user experience
â–Ș Multilingual learning can be exploited for developing advanced deep
NLP technologies
AI Tech Meetup #

More Related Content

What's hot

Arabic question answering ‫‏
Arabic question answering ‫‏Arabic question answering ‫‏
Arabic question answering ‫‏
Arabic_NLP_ImamU2013
 
Aspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisAspect Based Sentiment Analysis
Aspect Based Sentiment Analysis
Gaurav kumar
 
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Preetha Chatterjee
 
Answer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic QuestionsAnswer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic Questions
Ahmed Magdy Ezzeldin, MSc.
 
Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13
Brian Ulicny
 

What's hot (20)

Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty DialogueTransformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Code Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiCode Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging hai
 
Arabic question answering ‫‏
Arabic question answering ‫‏Arabic question answering ‫‏
Arabic question answering ‫‏
 
AINL 2016: Kravchenko
AINL 2016: KravchenkoAINL 2016: Kravchenko
AINL 2016: Kravchenko
 
Implications of GPT-3
Implications of GPT-3Implications of GPT-3
Implications of GPT-3
 
The NLP Muppets revolution!
The NLP Muppets revolution!The NLP Muppets revolution!
The NLP Muppets revolution!
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
AINL 2016: Bastrakova, Ledesma, Millan, Zighed
AINL 2016: Bastrakova, Ledesma, Millan, ZighedAINL 2016: Bastrakova, Ledesma, Millan, Zighed
AINL 2016: Bastrakova, Ledesma, Millan, Zighed
 
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
A Hierarchical Model of Reviews for Aspect-based Sentiment AnalysisA Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
 
Aspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisAspect Based Sentiment Analysis
Aspect Based Sentiment Analysis
 
A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3
 
Extracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related ChatsExtracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related Chats
 
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
 
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
 
Improving classification accuracy for customer contact transcriptions
Improving classification accuracy for customer contact transcriptionsImproving classification accuracy for customer contact transcriptions
Improving classification accuracy for customer contact transcriptions
 
Answer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic QuestionsAnswer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic Questions
 
Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13
 

Similar to Deep natural language processing in search systems

WISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataWISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked Data
Andre Freitas
 
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
NadinaLisbon1
 

Similar to Deep natural language processing in search systems (20)

AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn SearchStructure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
 
Programming Languages: Trends for 2021
Programming Languages: Trends for 2021Programming Languages: Trends for 2021
Programming Languages: Trends for 2021
 
Machine learning 101 Talk at Freshworks
Machine learning 101 Talk at FreshworksMachine learning 101 Talk at Freshworks
Machine learning 101 Talk at Freshworks
 
WISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataWISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked Data
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
Visualising the world of competitive programming with Python (Codeforces)
Visualising the world of competitive programming with Python (Codeforces)Visualising the world of competitive programming with Python (Codeforces)
Visualising the world of competitive programming with Python (Codeforces)
 
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
 
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
 
Navigating Semantic Search
Navigating Semantic SearchNavigating Semantic Search
Navigating Semantic Search
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
How to Write Amazing Functional Analysis Documents for your SharePoint Projects
How to Write Amazing Functional Analysis Documents for your SharePoint Projects How to Write Amazing Functional Analysis Documents for your SharePoint Projects
How to Write Amazing Functional Analysis Documents for your SharePoint Projects
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through Entities
 
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsBreaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
 
[PythonPH] Transforming the call center with Text mining and Deep learning (C...
[PythonPH] Transforming the call center with Text mining and Deep learning (C...[PythonPH] Transforming the call center with Text mining and Deep learning (C...
[PythonPH] Transforming the call center with Text mining and Deep learning (C...
 
How will development change with LLMs
How will development change with LLMsHow will development change with LLMs
How will development change with LLMs
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the Rise
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge Graph
 
Building a Meta-search Engine
Building a Meta-search EngineBuilding a Meta-search Engine
Building a Meta-search Engine
 

More from Bill Liu

More from Bill Liu (20)

Walk Through a Real World ML Production Project
Walk Through a Real World ML Production ProjectWalk Through a Real World ML Production Project
Walk Through a Real World ML Production Project
 
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...
 
Productizing Machine Learning at the Edge
Productizing Machine Learning at the EdgeProductizing Machine Learning at the Edge
Productizing Machine Learning at the Edge
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsDeep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps Workflows
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
 
Practical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at ScalePractical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at Scale
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
 
Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
 
Weekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileWeekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on Mobile
 
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningWeekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
 
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsAISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with Microeconomics
 
AISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldAISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First World
 
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeAISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the Edge
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Deep natural language processing in search systems

  • 1. AI Tech Meetup # Deep Natural Language Processing in Search Systems Weiwei Guo, Xiaowei Liu, Huiji Gao LinkedIn AI 05/09/2019
  • 2. AI Tech Meetup 2 Natural Language Data in Search Systems Job Posts, Member Profiles, Articles, Courses, etc. Query Member Profile Natural Language Data Ranking Personalization
  • 3. AI Tech Meetup 3 Natural Language Processing in Search Ecosystem Query Tagging Retrieval & Ranking Raw Query Query Understanding Query Intention Document Understanding Auto Completion Spell Check Query Suggestion Natural Language Generation for Search Assistant Natural Language Understanding Indexing Matching Text Processing Language Detection Tokenization
  • 4. AI Tech Meetup 4 NLP in Search Systems: Challenges ● Data Ambiguity ○ Short query text ■ “abc” ○ No strict syntax ■ “bing search engineer” ○ Strong correlation to the searcher ■ “looking for new jobs” ● Deep Semantics ○ Representations for query & document w.r.t. search intent, entities, topics ■ “Engineering Openings” -> job posts
  • 5. AI Tech Meetup 5 Deep NLP in Search: Challenges ● Complicated Search Ecosystem ○ Query suggestion affects both recall and precision in downstream retrieval and ranking. ○ Query tagging needs to be compatible with indexing and align with ranking features. ● Product Oriented Model Design ○ Design deep NLP algorithms for specific search components ○ Consider business rules, post filters, results blender, user experience, etc ● Online Latency ○ Serving deep NLP models with product latency restriction
  • 6. AI Tech Meetup 6 Applying Deep NLP in Search ● Feature Driven ○ Representation Learning using features generated from deep learning models e.g., word embedding ● Model Driven ○ Power product features directly with deep learning models ■ CNN/LSTM/Seq2seq/GAN/BERT based deep NLP models
  • 7. AI Tech Meetup 7 Sequential Modeling Query Intention (Classification) Document Ranking (Neural Ranker) to label to document Query Suggestion (seq2seq) to next sequence Auto-completion (Language Modeling)to completed sequence Deep Learning for Natural Language Processing Query Tagger (Sequential Tagging) to word labels
  • 8. AI Tech Meetup 8 Sequential Modeling Query Intention (Classification) Document Ranking (Neural Ranker) to label to document Query Suggestion (seq2seq) to next sequence Auto-completion (Language Modeling)to completed sequence Deep Learning for Natural Language Processing Query Tagger (Sequential Tagging) to word labels
  • 9. AI Tech Meetup 9 Natural Language Generation: Auto-Completion â–Ș Given a prefix, predict the completed query, rather than the completed word
  • 10. AI Tech Meetup 10 Auto-completion Challenges â–Ș High requirement of latency – Have to adopt simple and fast models â–Ș Traditional methods cannot encode powerful textual features – Usually the helpful feature is the query frequency
  • 11. AI Tech Meetup 11 A Two-step Approach: Generation and Ranking â–Ș Candidate Generation – Collect query frequency from search log – For any prefix, FST returns the most frequent completed queries – When a prefix is never seen, remove one word from beginning â–Ș e.g., “metamind sof” → “sof” â–Ș Candidate Ranking – Neural Language Model serves as a scoring function
  • 12. AI Tech Meetup 12 Auto-Completion: Neural Language Model as Scoring/Ranking â–Ș Achieved 6x speedup with optimization
  • 13. AI Tech Meetup 13 Auto-Completion Offline Experiments Generation Ranking dev test MPC 18.097 17.032 FST-backoff 33.274 (+83.8%) 32.331 (+89.8%) FST-backoff neural 34.877 (+92.7%) 33.784 (+98.3%) â–Ș Public Data Set: AOL – Baseline: Most Popular Candidate (MPC), FST-backoff – Evaluation metrics: MRR, 4th place → 0.25
  • 14. AI Tech Meetup 14 Sequential Modeling Query Intention (Classification) Document Ranking (Neural Ranker) to label to document Query Suggestion (seq2seq) to next sequence Auto-completion (Language Modeling)to completed sequence Deep Learning for Natural Language Processing Query Tagger (Sequential Tagging) to word labels
  • 15. AI Tech Meetup 15 Natural Language Generation: Query Suggestion â–Ș Traditional frequency based methods – Collect <q1, q2> pairs from search log – Save the frequent pairs in a key-value store â–Ș Lack of generalization – Purely string matching – Cannot handle unseen queries, rare words â–Ș Seq2seq: capture query reformulation
  • 16. AI Tech Meetup 16 encoder Query Suggestion: Reformulate to Related Queries ■ Training: the 2nd query is given ■ Maximize linkedin engineering <s> linkedin software linkedin software development </s> development decoder
  • 17. AI Tech Meetup 17 decoderencoder Query Suggestion: Reformulate to Related Queries linkedin engineering <s> linkedin software development </s> ■ Inference: the 2nd query is unknown ■ Beam search instead of greedy search
  • 18. AI Tech Meetup 18 Query Suggestion: How to Handle Online Latency â–Ș Latency too long for one query â–Ș Make it parallel with search ranking
  • 19. AI Tech Meetup 19 Online Performance â–Ș English Market – Coverage: +80% Impressions, +26% CTR – +1% Total job application â–Ș I18n Market – +3.2% Successful Searches – +28.6% First Order Actions
  • 20. AI Tech Meetup 20 Sequential Modeling Query Intention (Classification) Document Ranking (Neural Ranker) to label to document Query Suggestion (seq2seq) to next sequence Auto-completion (Language Modeling)to completed sequence Deep Learning for Natural Language Processing Query Tagger (Sequential Tagging) to word labels
  • 21. AI Tech Meetup 21 Natural Language Understanding: Query Intent Motivation: â–Ș To understand the task user wants to finish on LinkedIn Challenges: â–Ș Complicated Semantics â–Ș Personalization Steve Jobs
  • 22. Personalization & Search AI Foundation 22AI Algorithms FoundationAI Algorithms Foundation CNN Based Query Intention Model CNN for Semantic Feature Extraction â–Ș Word/query representations â–Ș Generalization power â–Ș Word n-gram patterns Personalization â–Ș Member-level Features
  • 23. AI Tech Meetup 23 Query Intent â–Ș Offline results Overall Accuracy F1 on PEOPLE F1 on JOB LR (Baseline) - - - LR + BOW +1.3% +4.1% +0.8% CNN +2.9% +11.9% +1.7% â–Ș Online results – + 0.65% JOB Ctr At 1 Serp – + 0.90% Overall Cluster Ctr, + 4.03% Cluster Ctr Via Entity Click
  • 24. AI Tech Meetup 24 Sequential Modeling Query Intention (Classification) Document Ranking (Neural Ranker) to label to document Query Suggestion (seq2seq) to next sequence Auto-completion (Language Modeling)to completed sequence Deep Learning for Natural Language Processing Query Tagger (Sequential Tagging) to word labels
  • 25. AI Tech Meetup 25 Natural Language Understanding: Query Tagger LinkedIn software engineer data scientist jobs​ CN​ T​ T​ T​ T​ O​ B-CN​ B-T​ I-T​ B-T​ I-T​ O​ B-CN: beginning of a company name​ I-CN: Inside of a company name​ B-T: beginning of a job title I-T: Inside of a job title O: Not an entity​ B-PN: beginning of person name 

  • 26. AI Tech Meetup 26 Query Tagger: Logistic Regression LinkedIn software engineer data scientist jobs​ B-CN​ B-T​ I-T​ B-T​ I-T​ O​ N way classification sparse ftr vec ftr 0: whether the current word is “linkedin” ftr 1: whether the current word is “facebook” 
 ftr n: whether the next word is “software” ftr n+1: whether the next word is “linkedin” 
.
  • 27. AI Tech Meetup 27 Query Tagger: CRF LinkedIn software engineer data scientist jobs​ B-CN​ B-T​ I-T​ B-T​ I-T​ O​ CRF Layer sparse ftr vec
  • 28. AI Tech Meetup 28 Query Tagger: CRF + LSTM LinkedIn software engineer data scientist jobs​ B-CN​ B-T​ I-T​ B-T​ I-T​ O​ CRF Layer LSTM hidden state Word embedding
  • 29. AI Tech Meetup 29 Sequential Modeling Query Intention (Classification) Document Ranking (Neural Ranker) to label to document Query Suggestion (seq2seq) to next sequence Auto-completion (Language Modeling)to completed sequence Deep Learning for Natural Language Processing Query Tagger (Sequential Tagging) to word labels
  • 30. AI Tech Meetup 30 Natural Language Understanding: Document Ranking
  • 31. AI Tech Meetup 31 Neural Ranking: Scoring a Query/Document Pair query/doc score hidden layer wide ftrs document CNN/ BERT query CNN/ BERT doc_title CNN/ BERT doc_text deep ftrs text embedding User profile CNN/ BERT user title CNN/ BERT user company raw text CNN/ BERT behavior seqhistorical queries
  • 32. AI Tech Meetup 32 Neural Ranking: Training
  • 33. AI Tech Meetup 33 Offline Experiments People Search Ranking (NDCG@10) Wide Features CNN +1.32% BERT (google pretrained) +1.52% BERT (linkedin pretrained) +1.96%
  • 34. AI Tech Meetup 34 Online Experiments â–Ș Help center ranking (BERT vs CNN) – +9.5% search ctr – +7% search clicks
  • 35. AI Tech Meetup 35 Conclusions & Future Work â–Ș Deep NLP technologies can power various search components directly with proper handling on challenges such as latency â–Ș Design deep NLP technologies needs to consider the specific product properties as well as user experience â–Ș Multilingual learning can be exploited for developing advanced deep NLP technologies