SlideShare a Scribd company logo
1 of 23
Download to read offline
Query Expansion with Locally-
Trained Word Embeddings
Fernando Bhaskar Mitra Nick Craswell
Microsoft
p(d)
d
p(d)
d
q
p(d|q)
cut
global local*
cutting tax
squeeze deficit
reduce vote
slash budget
reduction reduction
spend house
lower bill
halve plan
soften spend
freeze billion
global: trained using full
corpus
local: trained using topically-
*gas
global local
t-SNE projection: top words by ˜p(d|q) (blue: query; red: top words by p(d|q))
• local term clustering [Lesk 1968, Attar and Fraenkel
1977]
• local latent semantic analysis [Hull 1995, Hull, 1994;
Schutze et al., 1995; Singhal et al., 1997]
• local document clustering [Tombros and van
Rijsbergen, 2001; Tombros et al., 2002; Willett, 1985]
• one sense per discourse [Gale et al., 1992]
target
corpus
query
results
q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …]
query = gas tax
q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …]
query = gas tax
d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]
q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …]
query = gas tax
…
gas petroleum:0.9 indigestion:0.6 …
tax tariff:0.7 strain:0.4 …
…[ ]W=
q = [gas:1.0 tax:1.0 petroleum:0.8 tariff:0.6 …]
query = gas tax
d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]
W = UUT
U m ⇥ k embedding matrix
p(d)
d
q
p(d|q)
p(d)
d
q
˜p(d|q)
target
corpus
query
results
external
corpus
query
results
U =
8
>>><
>>>:
uniform p(d) on the target corpus
uniform p(d) on an external corpus
p(d|q) on the target corpus
p(d|q) on an external corpus
docs words queries
trec12 469,949 438,338 150
robust 528,155 665,128 250
web 50,220,423 90,411,624 200
global local
target target
wikipedia+gigaword* gigaword†
google news* wikipedia†
*publicly available embedding; †publicly available external corpus
target
corpus
query
results
external
corpus
query
results
target
corpus
query
results
target
corpus
query
results
external
corpus
query
results
trec12 robust web
local vs global
NDCG@10
0.0
0.1
0.2
0.3
0.4
0.5
expansion
none
global
local
trec12 robust web
local embedding
NDCG@10
0.0
0.1
0.2
0.3
0.4
0.5
corpus
target
gigaword
wikipedia
• local embedding provides a stronger representation than
global embedding
• potential impact for other topic-specific natural language
processing tasks
• future work
• effectiveness improvements
• efficiency improvements

More Related Content

Viewers also liked

Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)Bhaskar Mitra
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalBhaskar Mitra
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
A Proposal for Evaluating Answer Distillation from Web Data
A Proposal for Evaluating Answer Distillation from Web DataA Proposal for Evaluating Answer Distillation from Web Data
A Proposal for Evaluating Answer Distillation from Web DataBhaskar Mitra
 
Neu-IR 2016: Lessons from the Trenches
Neu-IR 2016: Lessons from the TrenchesNeu-IR 2016: Lessons from the Trenches
Neu-IR 2016: Lessons from the TrenchesBhaskar Mitra
 
Neu-ir 2016: Opening note
Neu-ir 2016: Opening noteNeu-ir 2016: Opening note
Neu-ir 2016: Opening noteBhaskar Mitra
 
Recurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovRecurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovBhaskar Mitra
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsBhaskar Mitra
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchBhaskar Mitra
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jWilliam Lyon
 

Viewers also liked (10)

Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
A Proposal for Evaluating Answer Distillation from Web Data
A Proposal for Evaluating Answer Distillation from Web DataA Proposal for Evaluating Answer Distillation from Web Data
A Proposal for Evaluating Answer Distillation from Web Data
 
Neu-IR 2016: Lessons from the Trenches
Neu-IR 2016: Lessons from the TrenchesNeu-IR 2016: Lessons from the Trenches
Neu-IR 2016: Lessons from the Trenches
 
Neu-ir 2016: Opening note
Neu-ir 2016: Opening noteNeu-ir 2016: Opening note
Neu-ir 2016: Opening note
 
Recurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovRecurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas Mikolov
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
 

More from Bhaskar Mitra

Joint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationJoint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationBhaskar Mitra
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?Bhaskar Mitra
 
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...Bhaskar Mitra
 
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Bhaskar Mitra
 
Multisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationMultisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Neural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressNeural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressBhaskar Mitra
 
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackConformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackBhaskar Mitra
 
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBenchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBhaskar Mitra
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Learning to Rank with Neural Networks
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural NetworksBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Bhaskar Mitra
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalBhaskar Mitra
 

More from Bhaskar Mitra (20)

Joint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationJoint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and Recommendation
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?
 
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
 
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...
 
Multisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationMultisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and Recommendation
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Neural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressNeural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progress
 
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackConformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning Track
 
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBenchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Learning to Rank with Neural Networks
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural Networks
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrieval
 

Recently uploaded

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

  • 1. Query Expansion with Locally- Trained Word Embeddings Fernando Bhaskar Mitra Nick Craswell Microsoft
  • 4. cut global local* cutting tax squeeze deficit reduce vote slash budget reduction reduction spend house lower bill halve plan soften spend freeze billion global: trained using full corpus local: trained using topically- *gas
  • 5. global local t-SNE projection: top words by ˜p(d|q) (blue: query; red: top words by p(d|q))
  • 6.
  • 7.
  • 8. • local term clustering [Lesk 1968, Attar and Fraenkel 1977] • local latent semantic analysis [Hull 1995, Hull, 1994; Schutze et al., 1995; Singhal et al., 1997] • local document clustering [Tombros and van Rijsbergen, 2001; Tombros et al., 2002; Willett, 1985] • one sense per discourse [Gale et al., 1992]
  • 10. q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …] query = gas tax
  • 11. q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …] query = gas tax d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]
  • 12. q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …] query = gas tax … gas petroleum:0.9 indigestion:0.6 … tax tariff:0.7 strain:0.4 … …[ ]W=
  • 13. q = [gas:1.0 tax:1.0 petroleum:0.8 tariff:0.6 …] query = gas tax d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]
  • 14. W = UUT U m ⇥ k embedding matrix
  • 18. U = 8 >>>< >>>: uniform p(d) on the target corpus uniform p(d) on an external corpus p(d|q) on the target corpus p(d|q) on an external corpus
  • 19. docs words queries trec12 469,949 438,338 150 robust 528,155 665,128 250 web 50,220,423 90,411,624 200
  • 20. global local target target wikipedia+gigaword* gigaword† google news* wikipedia† *publicly available embedding; †publicly available external corpus target corpus query results external corpus query results target corpus query results target corpus query results external corpus query results
  • 21. trec12 robust web local vs global NDCG@10 0.0 0.1 0.2 0.3 0.4 0.5 expansion none global local
  • 22. trec12 robust web local embedding NDCG@10 0.0 0.1 0.2 0.3 0.4 0.5 corpus target gigaword wikipedia
  • 23. • local embedding provides a stronger representation than global embedding • potential impact for other topic-specific natural language processing tasks • future work • effectiveness improvements • efficiency improvements