SlideShare a Scribd company logo
1 of 16
Download to read offline
Efficient Methods for Incorporating
Knowledge into Topic Models
[Yang, Downey and Boyd-Graber 2015]
2015/10/24
EMNLP 2015 Reading
@shuyo
Large-scale Topic Model
• In academic papers
– Up to 10^3 topics
• Industrial applications
– 10^5~10^6 topics!
– Search engines, online ads. and so on
– To capture infrequent topics
• This paper handles up to 500 topics...
really?
(Standard) LDA
[Blei+ 2003, Griffiths+ 2004]
• "Conventional" Gibbs sampling
𝑃 𝑧 = 𝑡 𝒛−, 𝑤 ∝ 𝑞𝑡 ≔ 𝑛 𝑑,𝑡 + 𝛼
𝑛 𝑤,𝑡 + 𝛽
𝑛 𝑡 + 𝑉𝛽
– 𝑇 : Topic size
– For 𝑈~𝒰 0, 𝑧
𝑇 𝑞 𝑧 , find 𝑡 s.t. 𝑧
𝑡−1 𝑞 𝑧 < 𝑈 < 𝑧
𝑡 𝑞 𝑧
• For large T, it is computationally intensive
– 𝑛 𝑤,𝑡 is sparse
– When T is very large, 𝑛 𝑑,𝑡 is too e.g. 𝑇 = 106
> 𝑛 𝑑
SparseLDA [Yao+ 2009]
𝑡
𝑃 𝑧 = 𝑡 𝒛−, 𝑤 ∝
𝑡
𝛼𝛽
𝑛 𝑡 + 𝑉𝛽
+
𝑡
𝑛 𝑑,𝑡 𝛽
𝑛 𝑡 + 𝑉𝛽
+
𝑡
𝑛 𝑑,𝑡 + 𝛼 𝑛 𝑤,𝑡
𝑛 𝑡 + 𝑉𝛽
• 𝑠 = 𝑡 𝑠𝑡 , 𝑟 = 𝑡 𝑟𝑡 , 𝑞 = 𝑡 𝑞𝑡
• For 𝑈~𝒰 0, 𝑠 + 𝑟 + 𝑞 ,
– If 0 < 𝑈 < 𝑠, find 𝑡 s.t. 𝑧
𝑡−1
𝑠 𝑧 < 𝑈 < 𝑧
𝑡
𝑠 𝑧
– If 𝑠 < 𝑈 < 𝑠 + 𝑟, find 𝑡 s.t.𝑛 𝑑,𝑡 > 0, 𝑧
𝑡−1
𝑟𝑧 < 𝑈 − 𝑠 < 𝑧
𝑡
𝑟𝑧
– If 𝑠 + 𝑟 < 𝑈 < 𝑠 + 𝑟 + 𝑞,
find 𝑡 s.t.𝑛 𝑤,𝑡 > 0, 𝑧
𝑡−1 𝑞 𝑧 < 𝑈 − 𝑠 − 𝑟 < 𝑧
𝑡 𝑞 𝑧
• Faster because 𝑛 𝑤,𝑡 and 𝑛 𝑑,𝑡 are sparse
𝑠𝑡 𝑟𝑡 𝑞𝑡
independent on w, d dependent on d only
Leveraging Prior Knowledge
• The objective function of topic models
does not correlate with human
judgements
Word correlation prior
knowledge
• Must-link
– “quarterback” and “fumble” are both
related to American football
• Cannot-link
– “fumble” and “bank” imply two different
topics
SC-LDA [Yang+ 2015]
• 𝑚 ∈ 𝑀 : Prior knowledge
• 𝑓𝑚(𝑧, 𝑤, 𝑑) : Potential function of prior
knowledge 𝑚 about word 𝑤 with topic
𝑧 in document 𝑑
• 𝜓 𝒛, 𝑀 = 𝑧∈𝒛 exp 𝑓𝑚 𝑧, 𝑤, 𝑑
• 𝑃 𝒘, 𝒛 𝛼, 𝛽, 𝑀 = 𝑃 𝒘 𝒛, 𝛽 𝑃 𝒛 𝛼 𝜓(𝒛, 𝑀)
maybe ∝
maybe 𝑚 ∈ 𝑀, all 𝑤 with 𝑧 in all 𝑑
Sparse Constrained
Inference for SC-LDA
𝑉
Word correlation prior
knowledge for SC-LDA
• 𝑓𝑚 𝑧, 𝑤, 𝑑 =
𝑢∈𝑀 𝑤
𝑚
log max 𝜆, 𝑛 𝑢,𝑧 +
𝑣∈𝑀 𝑤
𝑐
log
1
max 𝜆, 𝑛 𝑣,𝑧
– where 𝑀 𝑤
𝑚 : Must-link of 𝑤, 𝑀 𝑤
𝑐 : Cannot-link of 𝑤
• 𝑃 𝑧 = 𝑡 𝒛−, 𝑤, 𝑀 ∝
𝛼𝛽
𝑛 𝑡+𝑉𝛽
+
𝑛 𝑑,𝑡 𝛽
𝑛 𝑡+𝑉𝛽
+
𝑛 𝑑,𝑡+𝛼 𝑛 𝑤,𝑡
𝑛 𝑡+𝑉𝛽
𝑢∈𝑀 𝑤
𝑚
max 𝜆, 𝑛 𝑢,𝑧
𝑣∈𝑀 𝑤
𝑐
1
max 𝜆, 𝑛 𝑣,𝑧
Factor Graph
• They tell that prior knowledge is incorporated
“by adding a factor graph to encode prior
knowledge,” but it does not be drawn.
• The potential function 𝑓𝑚 𝑧, 𝑤, 𝑑 contains 𝑛 𝑤,𝑧,
and 𝜑 𝑤,𝑧 ∝ 𝑛 𝑤,𝑧 + 𝛽.
• So the above model seems like Fig.b:
Fig.a Fig.b
[Ramage+ 2009] Labeled LDA
• Supervized LDA for labeled documents
– It is equivalent to SC-LDA with the
following potential function
𝑓𝑚 𝑧, 𝑤, 𝑑 =
1, if 𝑧 ∈ 𝑚 𝑑
−∞, else
where 𝑚 𝑑 specifies a label set of 𝑑
Experiments
• Baselines
– Dirichlet Forest-LDA [Andrzejewski+ 2009]
– Logic-LDA [Andrzejewski+ 2011]
– MRF-LDA [Xie+ 2015]
• Encodes word correlations in LDA as MRF
– SparseLDA
DATASET DOCS TYPE TOKEN(APPROX) Experiments
NIPS 1,500 12,419 1,900,000
Word correlation
NYT-NEWS 3,000,000 102,660 100,000,000
20NG 18,828 21,514 1,946,000 Labeled docs
Generate Word Correlation
• Must-link
– Obtain synsets from WordNet 3.0
– Similarity between the word and its
synsets on word embedding from
word2vec is higher than threshold 0.2
• Cannot-link
– Nothing?
Convergence Speed
The average running time per iteration
over 100 iterations, averaged over 5
seeds, on 20NG dataset.
Coherence [Mimno+ 2011]
• 𝐶 𝑡: 𝑉 𝑡 = 𝑚=2
𝑀
𝑙=1
𝑚−1
log
𝐹 𝑣 𝑚
𝑡
,𝑣𝑙
𝑡
+𝜖
𝐹 𝑣𝑙
𝑡
– 𝐹 𝑣 : document frequency of word type 𝑣
– 𝐹 𝑣, 𝑣′ :co-document frequency of word type 𝑣, 𝑣′
It means
“include”?
𝜖 is very small like
10−12
[Röder+ 2015]
-39.1 -36.6
References
• [Yang+ 2015] Efficient Methods for Incorporating Knowledge into Topic Models
• [Blei+ 2003] Latent Dirichlet allocation.
• [Griffiths+ 2004] Finding scientific topics.
• [Yao+ 2009] Efficient methods for topic model inference on streaming document
collections.
• [Ramage+ 2009] Labeled LDA: A supervised topic model for credit attribution in
multilabeled corpora.
• [Andrzejewski+ 2009] Incorporating domain knowledge into topic modeling via
Dirichlet forest priors.
• [Andrzejewski+ 2011] A framework for incorporating general domain knowledge
into latent Dirichlet allocation using first-order logic.
• [Xie+ 2015] Incorporating word correlation knowledge into topic modeling.
• [Mimno+ 2011] Optimizing semantic coherence in topic models.
• [Röder+ 2015] Exploring the space of topic coherence measures.

More Related Content

What's hot

A first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupA first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupDan Sullivan, Ph.D.
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...HPCC Systems
 
Maximum likelihood-set - introduction
Maximum likelihood-set - introductionMaximum likelihood-set - introduction
Maximum likelihood-set - introductionYusuke Matsubara
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015rusbase
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec👋 Christopher Moody
 
Probabilistic Retrieval
Probabilistic RetrievalProbabilistic Retrieval
Probabilistic Retrievalotisg
 
Navigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept AnalysisNavigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept AnalysisMehwish Alam
 
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...Masumi Shirakawa
 

What's hot (14)

Skip gram and cbow
Skip gram and cbowSkip gram and cbow
Skip gram and cbow
 
AI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, FlexudyAI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, Flexudy
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 
A first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupA first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetup
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
Word2vec and Friends
Word2vec and FriendsWord2vec and Friends
Word2vec and Friends
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
 
Maximum likelihood-set - introduction
Maximum likelihood-set - introductionMaximum likelihood-set - introduction
Maximum likelihood-set - introduction
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
 
Probabilistic Retrieval
Probabilistic RetrievalProbabilistic Retrieval
Probabilistic Retrieval
 
Navigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept AnalysisNavigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept Analysis
 
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
 
8. String
8. String8. String
8. String
 

Viewers also liked

Learning Better Embeddings for Rare Words Using Distributional Representations
Learning Better Embeddings for Rare Words Using Distributional RepresentationsLearning Better Embeddings for Rare Words Using Distributional Representations
Learning Better Embeddings for Rare Words Using Distributional RepresentationsTakanori Nakai
 
EMNLP 2015 yomikai
EMNLP 2015 yomikai EMNLP 2015 yomikai
EMNLP 2015 yomikai Yo Ehara
 
Humor Recognition and Humor Anchor Extraction
Humor Recognition and Humor Anchor ExtractionHumor Recognition and Humor Anchor Extraction
Humor Recognition and Humor Anchor Extraction裕樹 奥田
 
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Shuyo Nakatani
 
無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)Shuyo Nakatani
 
強化学習その1
強化学習その1強化学習その1
強化学習その1nishio
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章Shuyo Nakatani
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章Shuyo Nakatani
 
A Neural Attention Model for Sentence Summarization [Rush+2015]
A Neural Attention Model for Sentence Summarization [Rush+2015]A Neural Attention Model for Sentence Summarization [Rush+2015]
A Neural Attention Model for Sentence Summarization [Rush+2015]Yuta Kikuchi
 

Viewers also liked (10)

Learning Better Embeddings for Rare Words Using Distributional Representations
Learning Better Embeddings for Rare Words Using Distributional RepresentationsLearning Better Embeddings for Rare Words Using Distributional Representations
Learning Better Embeddings for Rare Words Using Distributional Representations
 
EMNLP 2015 yomikai
EMNLP 2015 yomikai EMNLP 2015 yomikai
EMNLP 2015 yomikai
 
Emnlp読み会資料
Emnlp読み会資料Emnlp読み会資料
Emnlp読み会資料
 
Humor Recognition and Humor Anchor Extraction
Humor Recognition and Humor Anchor ExtractionHumor Recognition and Humor Anchor Extraction
Humor Recognition and Humor Anchor Extraction
 
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)
 
無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)
 
強化学習その1
強化学習その1強化学習その1
強化学習その1
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章
 
A Neural Attention Model for Sentence Summarization [Rush+2015]
A Neural Attention Model for Sentence Summarization [Rush+2015]A Neural Attention Model for Sentence Summarization [Rush+2015]
A Neural Attention Model for Sentence Summarization [Rush+2015]
 

Similar to [Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowledge into Topic Models

04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptxShree Shree
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier ananth
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in RAshraf Uddin
 
Developing in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit editionDeveloping in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit editionRobin van Emden
 
Improving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingDataWorks Summit
 
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...ryanorban
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysisodsc
 
Word_Embeddings.pptx
Word_Embeddings.pptxWord_Embeddings.pptx
Word_Embeddings.pptxGowrySailaja
 
Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)fridolin.wild
 
A Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitmentA Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitmentKemal Can Kara
 
Learning deep structured semantic models for web search
Learning deep structured semantic models for web searchLearning deep structured semantic models for web search
Learning deep structured semantic models for web searchhyunsung lee
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringTraian Rebedea
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Cloudera, Inc.
 
Info 2402 irt-chapter_4
Info 2402 irt-chapter_4Info 2402 irt-chapter_4
Info 2402 irt-chapter_4Shahriar Rafee
 
Tensors for topic modeling and deep learning on AWS Sagemaker
Tensors for topic modeling and deep learning on AWS SagemakerTensors for topic modeling and deep learning on AWS Sagemaker
Tensors for topic modeling and deep learning on AWS SagemakerAnima Anandkumar
 

Similar to [Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowledge into Topic Models (20)

04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
 
R tutorial
R tutorialR tutorial
R tutorial
 
Text Mining
Text MiningText Mining
Text Mining
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in R
 
Developing in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit editionDeveloping in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit edition
 
Improving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language Processing
 
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
Word_Embeddings.pptx
Word_Embeddings.pptxWord_Embeddings.pptx
Word_Embeddings.pptx
 
Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
 
A Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitmentA Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitment
 
Learning deep structured semantic models for web search
Learning deep structured semantic models for web searchLearning deep structured semantic models for web search
Learning deep structured semantic models for web search
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18
 
Info 2402 irt-chapter_4
Info 2402 irt-chapter_4Info 2402 irt-chapter_4
Info 2402 irt-chapter_4
 
Tensors for topic modeling and deep learning on AWS Sagemaker
Tensors for topic modeling and deep learning on AWS SagemakerTensors for topic modeling and deep learning on AWS Sagemaker
Tensors for topic modeling and deep learning on AWS Sagemaker
 

More from Shuyo Nakatani

画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15Shuyo Nakatani
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksShuyo Nakatani
 
人工知能と機械学習の違いって?
人工知能と機械学習の違いって?人工知能と機械学習の違いって?
人工知能と機械学習の違いって?Shuyo Nakatani
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRRとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRShuyo Nakatani
 
ドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRShuyo Nakatani
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyoShuyo Nakatani
 
Zipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPZipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPShuyo Nakatani
 
ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014Shuyo Nakatani
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測Shuyo Nakatani
 
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5Shuyo Nakatani
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013Shuyo Nakatani
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門Shuyo Nakatani
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013Shuyo Nakatani
 
ノンパラベイズ入門の入門
ノンパラベイズ入門の入門ノンパラベイズ入門の入門
ノンパラベイズ入門の入門Shuyo Nakatani
 
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...Shuyo Nakatani
 
Short Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-GramShort Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-GramShuyo Nakatani
 
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing SystemsShuyo Nakatani
 
極大部分文字列を使った twitter 言語判定
極大部分文字列を使った twitter 言語判定極大部分文字列を使った twitter 言語判定
極大部分文字列を使った twitter 言語判定Shuyo Nakatani
 
人間言語判別 カタルーニャ語編
人間言語判別 カタルーニャ語編人間言語判別 カタルーニャ語編
人間言語判別 カタルーニャ語編Shuyo Nakatani
 
Extreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a WeekExtreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a WeekShuyo Nakatani
 

More from Shuyo Nakatani (20)

画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
人工知能と機械学習の違いって?
人工知能と機械学習の違いって?人工知能と機械学習の違いって?
人工知能と機械学習の違いって?
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRRとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
 
ドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoR
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo
 
Zipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPZipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLP
 
ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測
 
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013
 
ノンパラベイズ入門の入門
ノンパラベイズ入門の入門ノンパラベイズ入門の入門
ノンパラベイズ入門の入門
 
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
 
Short Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-GramShort Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-Gram
 
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
 
極大部分文字列を使った twitter 言語判定
極大部分文字列を使った twitter 言語判定極大部分文字列を使った twitter 言語判定
極大部分文字列を使った twitter 言語判定
 
人間言語判別 カタルーニャ語編
人間言語判別 カタルーニャ語編人間言語判別 カタルーニャ語編
人間言語判別 カタルーニャ語編
 
Extreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a WeekExtreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a Week
 

Recently uploaded

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Recently uploaded (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowledge into Topic Models

  • 1. Efficient Methods for Incorporating Knowledge into Topic Models [Yang, Downey and Boyd-Graber 2015] 2015/10/24 EMNLP 2015 Reading @shuyo
  • 2. Large-scale Topic Model • In academic papers – Up to 10^3 topics • Industrial applications – 10^5~10^6 topics! – Search engines, online ads. and so on – To capture infrequent topics • This paper handles up to 500 topics... really?
  • 3. (Standard) LDA [Blei+ 2003, Griffiths+ 2004] • "Conventional" Gibbs sampling 𝑃 𝑧 = 𝑡 𝒛−, 𝑤 ∝ 𝑞𝑡 ≔ 𝑛 𝑑,𝑡 + 𝛼 𝑛 𝑤,𝑡 + 𝛽 𝑛 𝑡 + 𝑉𝛽 – 𝑇 : Topic size – For 𝑈~𝒰 0, 𝑧 𝑇 𝑞 𝑧 , find 𝑡 s.t. 𝑧 𝑡−1 𝑞 𝑧 < 𝑈 < 𝑧 𝑡 𝑞 𝑧 • For large T, it is computationally intensive – 𝑛 𝑤,𝑡 is sparse – When T is very large, 𝑛 𝑑,𝑡 is too e.g. 𝑇 = 106 > 𝑛 𝑑
  • 4. SparseLDA [Yao+ 2009] 𝑡 𝑃 𝑧 = 𝑡 𝒛−, 𝑤 ∝ 𝑡 𝛼𝛽 𝑛 𝑡 + 𝑉𝛽 + 𝑡 𝑛 𝑑,𝑡 𝛽 𝑛 𝑡 + 𝑉𝛽 + 𝑡 𝑛 𝑑,𝑡 + 𝛼 𝑛 𝑤,𝑡 𝑛 𝑡 + 𝑉𝛽 • 𝑠 = 𝑡 𝑠𝑡 , 𝑟 = 𝑡 𝑟𝑡 , 𝑞 = 𝑡 𝑞𝑡 • For 𝑈~𝒰 0, 𝑠 + 𝑟 + 𝑞 , – If 0 < 𝑈 < 𝑠, find 𝑡 s.t. 𝑧 𝑡−1 𝑠 𝑧 < 𝑈 < 𝑧 𝑡 𝑠 𝑧 – If 𝑠 < 𝑈 < 𝑠 + 𝑟, find 𝑡 s.t.𝑛 𝑑,𝑡 > 0, 𝑧 𝑡−1 𝑟𝑧 < 𝑈 − 𝑠 < 𝑧 𝑡 𝑟𝑧 – If 𝑠 + 𝑟 < 𝑈 < 𝑠 + 𝑟 + 𝑞, find 𝑡 s.t.𝑛 𝑤,𝑡 > 0, 𝑧 𝑡−1 𝑞 𝑧 < 𝑈 − 𝑠 − 𝑟 < 𝑧 𝑡 𝑞 𝑧 • Faster because 𝑛 𝑤,𝑡 and 𝑛 𝑑,𝑡 are sparse 𝑠𝑡 𝑟𝑡 𝑞𝑡 independent on w, d dependent on d only
  • 5. Leveraging Prior Knowledge • The objective function of topic models does not correlate with human judgements
  • 6. Word correlation prior knowledge • Must-link – “quarterback” and “fumble” are both related to American football • Cannot-link – “fumble” and “bank” imply two different topics
  • 7. SC-LDA [Yang+ 2015] • 𝑚 ∈ 𝑀 : Prior knowledge • 𝑓𝑚(𝑧, 𝑤, 𝑑) : Potential function of prior knowledge 𝑚 about word 𝑤 with topic 𝑧 in document 𝑑 • 𝜓 𝒛, 𝑀 = 𝑧∈𝒛 exp 𝑓𝑚 𝑧, 𝑤, 𝑑 • 𝑃 𝒘, 𝒛 𝛼, 𝛽, 𝑀 = 𝑃 𝒘 𝒛, 𝛽 𝑃 𝒛 𝛼 𝜓(𝒛, 𝑀) maybe ∝ maybe 𝑚 ∈ 𝑀, all 𝑤 with 𝑧 in all 𝑑 Sparse Constrained
  • 9. Word correlation prior knowledge for SC-LDA • 𝑓𝑚 𝑧, 𝑤, 𝑑 = 𝑢∈𝑀 𝑤 𝑚 log max 𝜆, 𝑛 𝑢,𝑧 + 𝑣∈𝑀 𝑤 𝑐 log 1 max 𝜆, 𝑛 𝑣,𝑧 – where 𝑀 𝑤 𝑚 : Must-link of 𝑤, 𝑀 𝑤 𝑐 : Cannot-link of 𝑤 • 𝑃 𝑧 = 𝑡 𝒛−, 𝑤, 𝑀 ∝ 𝛼𝛽 𝑛 𝑡+𝑉𝛽 + 𝑛 𝑑,𝑡 𝛽 𝑛 𝑡+𝑉𝛽 + 𝑛 𝑑,𝑡+𝛼 𝑛 𝑤,𝑡 𝑛 𝑡+𝑉𝛽 𝑢∈𝑀 𝑤 𝑚 max 𝜆, 𝑛 𝑢,𝑧 𝑣∈𝑀 𝑤 𝑐 1 max 𝜆, 𝑛 𝑣,𝑧
  • 10. Factor Graph • They tell that prior knowledge is incorporated “by adding a factor graph to encode prior knowledge,” but it does not be drawn. • The potential function 𝑓𝑚 𝑧, 𝑤, 𝑑 contains 𝑛 𝑤,𝑧, and 𝜑 𝑤,𝑧 ∝ 𝑛 𝑤,𝑧 + 𝛽. • So the above model seems like Fig.b: Fig.a Fig.b
  • 11. [Ramage+ 2009] Labeled LDA • Supervized LDA for labeled documents – It is equivalent to SC-LDA with the following potential function 𝑓𝑚 𝑧, 𝑤, 𝑑 = 1, if 𝑧 ∈ 𝑚 𝑑 −∞, else where 𝑚 𝑑 specifies a label set of 𝑑
  • 12. Experiments • Baselines – Dirichlet Forest-LDA [Andrzejewski+ 2009] – Logic-LDA [Andrzejewski+ 2011] – MRF-LDA [Xie+ 2015] • Encodes word correlations in LDA as MRF – SparseLDA DATASET DOCS TYPE TOKEN(APPROX) Experiments NIPS 1,500 12,419 1,900,000 Word correlation NYT-NEWS 3,000,000 102,660 100,000,000 20NG 18,828 21,514 1,946,000 Labeled docs
  • 13. Generate Word Correlation • Must-link – Obtain synsets from WordNet 3.0 – Similarity between the word and its synsets on word embedding from word2vec is higher than threshold 0.2 • Cannot-link – Nothing?
  • 14. Convergence Speed The average running time per iteration over 100 iterations, averaged over 5 seeds, on 20NG dataset.
  • 15. Coherence [Mimno+ 2011] • 𝐶 𝑡: 𝑉 𝑡 = 𝑚=2 𝑀 𝑙=1 𝑚−1 log 𝐹 𝑣 𝑚 𝑡 ,𝑣𝑙 𝑡 +𝜖 𝐹 𝑣𝑙 𝑡 – 𝐹 𝑣 : document frequency of word type 𝑣 – 𝐹 𝑣, 𝑣′ :co-document frequency of word type 𝑣, 𝑣′ It means “include”? 𝜖 is very small like 10−12 [Röder+ 2015] -39.1 -36.6
  • 16. References • [Yang+ 2015] Efficient Methods for Incorporating Knowledge into Topic Models • [Blei+ 2003] Latent Dirichlet allocation. • [Griffiths+ 2004] Finding scientific topics. • [Yao+ 2009] Efficient methods for topic model inference on streaming document collections. • [Ramage+ 2009] Labeled LDA: A supervised topic model for credit attribution in multilabeled corpora. • [Andrzejewski+ 2009] Incorporating domain knowledge into topic modeling via Dirichlet forest priors. • [Andrzejewski+ 2011] A framework for incorporating general domain knowledge into latent Dirichlet allocation using first-order logic. • [Xie+ 2015] Incorporating word correlation knowledge into topic modeling. • [Mimno+ 2011] Optimizing semantic coherence in topic models. • [Röder+ 2015] Exploring the space of topic coherence measures.