SlideShare a Scribd company logo
1 of 49
Download to read offline
Topic Models  Claudia Wagner Graz, 16.9.2010
Semantic Representation of Text ,[object Object],[object Object],[object Object],(Griffiths, 2007)
Topic Models ,[object Object],[object Object],[object Object]
  Topic Models source: http://www.cs.umass.edu/~wallach/talks/priors.pdf
Topic Models Topic 1 Topic 2 3 latent variables: Word distribution per topic (word-topic-matrix) Topic distribution per doc (topic-doc-matrix) Topic word assignment (Steyvers, 2006)
Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Topic Models ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
pLSA  (Hoffmann, 1999) ,[object Object],[object Object],number of documents number of words P( z |  θ ) P( w | z ) Topic distribution of a document
Latent Dirichlet Allocation (LDA)  (Blei, 2003) ,[object Object],[object Object],P( w | z,  φ  (z)   ) P( φ (z)  |  β ) number of documents number of words
Dirichlet Prior  α ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],High  α Low  α Each doc’s topic distribution  θ is a smooth mix of all topics  Each doc’s topic distribution  θ must favor few topics Topic-distr. of Doc1 =  (1/3, 1/3, 1/3) Topic-distr. of Doc2 =  (1, 0, 0) Doc1 Doc2
Dirichlet Prior  β ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],High  β Low  β Topic-distr. of Doc1 =  (1/3, 1/3, 1/3) Word-distr. of Topic2 =  (1, 0, 0) Topic1 Topic2
Matrix Representation  of LDA observed latent latent θ (d) φ (z)
Statistical Inference and  Parameter Estimation ,[object Object],[object Object],[object Object],(Blei, 2003) Latent Vars Observed Vars and Priors
Statistical Inference and  Parameter Estimation ,[object Object],[object Object],[object Object],[object Object]
Markov Chain Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],source: http://en.wikipedia.org/wiki/Examples_of_Markov_chains
Markov Chain Example ,[object Object],[object Object],[object Object],[object Object]
Gibbs Sampling  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Gibbs Sampling for LDA  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Run Gibbs Sampling  Example (1) 1 1 2 2 2 2 1 1 1 2 1 2 1 2 1 1 2 1 2 2 1 2 1 2 ,[object Object],[object Object],[object Object],[object Object],1 2 Stream 2 2 River 1 2 Loan 6 3 bank 2 3 money topic2 topic1 topic2 topic1 4 4 doc1 4 4 doc2 4 4 doc3
Gibbs Sampling for LDA  Probability that topic  j  is chosen for word  w i ,  conditioned on all other assigned topics of words in this doc and all other observed vars. Count number of times a word token  w i  was assigned to a topic  j  across all docs Count number of times a topic  j  was already assigned to some word token in doc  d i unnormalized! => divide the probability of assigning topic j to word wi by the sum over all topics T
Run Gibbs Sampling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Run Gibbs Sampling  Example (2) 1 2 2 2 2 1 1 1 2 1 2 1 2 1 1 2 1 2 2 1 2 1 2 ,[object Object],[object Object],[object Object],3 2 2 5 3 1 2 Stream 2 2 River 1 2 Loan 6 3 bank 2 3 money topic2 topic1 topic2 topic1 4 4 doc1 4 4 doc2 4 4 doc3
Run Gibbs Sampling  Example (2) 1 2 2 2 2 1 1 1 2 1 2 1 2 1 1 2 1 2 2 1 2 1 2 ,[object Object],[object Object],[object Object],2 4 2 5 5 6 1 2 Stream 2 2 River 1 2 Loan 6 3 bank 3 2 money topic2 topic1 topic2 topic1 5 3 doc1 4 4 doc2 4 4 doc3
Run Gibbs Sampling  Example (3) ,[object Object],“ Bank” is assigned to Topic 2 How often were all other topics used in doc  d i   How often was topic j used in doc  d i
Summary:  Run Gibbs Sampling  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Gibbs Sampling  Convergence  Black = topic 1 White = topic2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Gibbs Sampling  Convergence ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Gibbs Sampling  Parameter Estimation ,[object Object],num of times word wi  was related with topic j num of times all other words  were related with topic j num of times topic j  was related with doc d num of times all other topics were related with doc d predictive distributions of sampling  a new token of word  i  from topic  j , predictive distributions of sampling  a new token in document d from topic j
Author-Topic (AT)Model (Rosen-Zvi, 2004) ,[object Object],[object Object],[object Object],[object Object],[object Object]
AT-Model Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],P( w | z,  φ  (z)   ) P( z | x,  θ (x)   )
AT Model Latent Variables Latent Variables: 2) Author-distribution of each topic    determines which topics are used by which authors    count matrix C AT 1) Author-Topic assignment for each word  3) Word-distribution of each topic    count matrix C WT ?
Matrix Representation of   Author-Topic-Model source: http://www.ics.uci.edu/~smyth/kddpapers/UCI_KD-D_author_topic_preprint.pdf θ (x) φ (z) a d observed observed latent latent
Example (1) 1 1 2 2 2 2 1 1 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 ,[object Object],[object Object],[object Object],[object Object],1 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 3 2 2 2 3 3 3 2 2 2 1 2 stream 2 2 river 1 2 loan 6 3 bank 2 3 money topic2 topic1 8 8 author2 topic2 topic1 0 4 author1 4 0 author3
Gibbs Sampling for  Author-Topic-Model ,[object Object],[object Object],[object Object],Count number of times an author k was already assigned to topic j. Count number of times a word token  w i  was assigned to a topic  j  across all docs
Problems of the  AT Model ,[object Object],[object Object],[object Object]
AT Model with Fictitious Authors ,[object Object],[object Object],[object Object],[object Object],[object Object]
Predictive Power of different models (Rosen-Zvi, 2005) Experiment: Trainingsdata: 1 557 papers  Testdata:183 papers (102 are single-authored papers). They choose test data documents in such a way that each author of a test set document also appears in the training set as an author.
Author-Recipients-Topic  (ART) Model (McCallum, 2004) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],,  R ,   x P( z | x, a d ,  θ (A,R)   ) P( w | z, φ (z)  )
Gibbs Sampling ART-Model Random Start:  Sample author-recipient pair for each word Sample topic for each word Compute for each word w i : Number of recipients of message to which word  w i  belongs Number of times topic t was assigned to  an author-recipient-pair Number of times current word token was  assigned to topic t Number of times all other topics were  assigned to an author-recipient-pair Number of times all other words were  assigned to topic t Number of words * beta
Labeled LDA (Ramage, 2009) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Group-Topic Model (Wang, 2005) ,[object Object],[object Object],[object Object],[object Object]
Group-Topic Model (Wang, 2005) ,[object Object],[object Object],[object Object],[object Object],Number of events (=interactions between entities) Number of entities
CART Model (Pathak, 2008) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Gibbs-sampling: alternates between updating latent communities c conditioned on other variables, and updating recipient-topic tuples (r, z) for each word conditioned on other variables.
Copycat Model (Dietz, 2007) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Copycat Model (Dietz, 2007) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],d1 c d2 cites
Copycat Model (Dietz, 2007) ,[object Object],[object Object],[object Object],[object Object]
Copycat Model (Dietz, 2007) ,[object Object],[object Object],[object Object]
Citation InfluenceModel (Dietz, 2007) ,[object Object],[object Object],[object Object],[object Object],i nnovation topic mixture of a citing publication distribution of citation influences parameter of the coin flip, choosing to draw topics from θ or ψ
References ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationSangwoo Mo
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelHemantha Kulathilake
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLPRupak Roy
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanismKhang Pham
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Edureka!
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity RecognitionTomer Lieber
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streamshktripathy
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsBhaskar Mitra
 
Text classification
Text classificationText classification
Text classificationHarry Potter
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information RetrievalNik Spirin
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model佳蓉 倪
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language ModelsLeon Dohmen
 
Text classification presentation
Text classification presentationText classification presentation
Text classification presentationMarijn van Zelst
 
Information retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsInformation retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsVaibhav Khanna
 

What's hot (20)

Text Similarity
Text SimilarityText Similarity
Text Similarity
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
Text Mining
Text MiningText Mining
Text Mining
 
Text Classification
Text ClassificationText Classification
Text Classification
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
Text classification
Text classificationText classification
Text classification
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
Text classification presentation
Text classification presentationText classification presentation
Text classification presentation
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
Information retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsInformation retrieval 9 tf idf weights
Information retrieval 9 tf idf weights
 

Viewers also liked

WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender Systems
WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender SystemsWSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender Systems
WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender SystemsKotaro Tanahashi
 
Manifold learning with application to object recognition
Manifold learning with application to object recognitionManifold learning with application to object recognition
Manifold learning with application to object recognitionzukun
 
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...wl820609
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsRyan B Harvey, CSDP, CSM
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)James McMurray
 
関東CV勉強会 Kernel PCA (2011.2.19)
関東CV勉強会 Kernel PCA (2011.2.19)関東CV勉強会 Kernel PCA (2011.2.19)
関東CV勉強会 Kernel PCA (2011.2.19)Akisato Kimura
 
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...Shuyo Nakatani
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNETomoki Hayashi
 
AutoEncoderで特徴抽出
AutoEncoderで特徴抽出AutoEncoderで特徴抽出
AutoEncoderで特徴抽出Kai Sasaki
 
非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2nd非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2ndMika Yoshimura
 
CVIM#11 3. 最小化のための数値計算
CVIM#11 3. 最小化のための数値計算CVIM#11 3. 最小化のための数値計算
CVIM#11 3. 最小化のための数値計算sleepy_yoshi
 
Numpy scipyで独立成分分析
Numpy scipyで独立成分分析Numpy scipyで独立成分分析
Numpy scipyで独立成分分析Shintaro Fukushima
 
基底変換、固有値・固有ベクトル、そしてその先
基底変換、固有値・固有ベクトル、そしてその先基底変換、固有値・固有ベクトル、そしてその先
基底変換、固有値・固有ベクトル、そしてその先Taketo Sano
 
Hyperoptとその周辺について
Hyperoptとその周辺についてHyperoptとその周辺について
Hyperoptとその周辺についてKeisuke Hosaka
 

Viewers also liked (16)

WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender Systems
WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender SystemsWSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender Systems
WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender Systems
 
Manifold learning with application to object recognition
Manifold learning with application to object recognitionManifold learning with application to object recognition
Manifold learning with application to object recognition
 
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
 
関東CV勉強会 Kernel PCA (2011.2.19)
関東CV勉強会 Kernel PCA (2011.2.19)関東CV勉強会 Kernel PCA (2011.2.19)
関東CV勉強会 Kernel PCA (2011.2.19)
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
 
AutoEncoderで特徴抽出
AutoEncoderで特徴抽出AutoEncoderで特徴抽出
AutoEncoderで特徴抽出
 
LDA入門
LDA入門LDA入門
LDA入門
 
非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2nd非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2nd
 
CVIM#11 3. 最小化のための数値計算
CVIM#11 3. 最小化のための数値計算CVIM#11 3. 最小化のための数値計算
CVIM#11 3. 最小化のための数値計算
 
Numpy scipyで独立成分分析
Numpy scipyで独立成分分析Numpy scipyで独立成分分析
Numpy scipyで独立成分分析
 
基底変換、固有値・固有ベクトル、そしてその先
基底変換、固有値・固有ベクトル、そしてその先基底変換、固有値・固有ベクトル、そしてその先
基底変換、固有値・固有ベクトル、そしてその先
 
Hyperoptとその周辺について
Hyperoptとその周辺についてHyperoptとその周辺について
Hyperoptとその周辺について
 

Similar to Topic Models

Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015rusbase
 
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...Aaron Li
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingTomonari Masada
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)Sihan Chen
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentationSoojung Hong
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
Research Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberResearch Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberAlex Klibisz
 
Mini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingMini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingTomonari Masada
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationEugene Nho
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesBryan Gummibearehausen
 
Calculating Projections via Type Checking
Calculating Projections via Type CheckingCalculating Projections via Type Checking
Calculating Projections via Type CheckingDaisuke BEKKI
 
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...AIST
 

Similar to Topic Models (20)

Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
 
Canini09a
Canini09aCanini09a
Canini09a
 
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)
 
Lec1
Lec1Lec1
Lec1
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Research Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberResearch Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, Gruber
 
poster
posterposter
poster
 
Topic modelling
Topic modellingTopic modelling
Topic modelling
 
Mini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingMini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic Modeling
 
Author Topic Model
Author Topic ModelAuthor Topic Model
Author Topic Model
 
话题模型2
话题模型2话题模型2
话题模型2
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News Stories
 
Calculating Projections via Type Checking
Calculating Projections via Type CheckingCalculating Projections via Type Checking
Calculating Projections via Type Checking
 
LDA on social bookmarking systems
LDA on social bookmarking systemsLDA on social bookmarking systems
LDA on social bookmarking systems
 
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
 

More from Claudia Wagner

Measuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaMeasuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaClaudia Wagner
 
Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Claudia Wagner
 
It's a Man's Wikipedia?
It's a Man's Wikipedia? It's a Man's Wikipedia?
It's a Man's Wikipedia? Claudia Wagner
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Claudia Wagner
 
When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...Claudia Wagner
 
WWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging StreamsWWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging StreamsClaudia Wagner
 
Welcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISWelcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISClaudia Wagner
 
Spatio and Temporal Dietary Patterns
Spatio and Temporal Dietary PatternsSpatio and Temporal Dietary Patterns
Spatio and Temporal Dietary PatternsClaudia Wagner
 
Eswc2013 audience short
Eswc2013 audience shortEswc2013 audience short
Eswc2013 audience shortClaudia Wagner
 
The Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksThe Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksClaudia Wagner
 
It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users Claudia Wagner
 
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...Claudia Wagner
 
Knowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsKnowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsClaudia Wagner
 
The wisdom in Tweetonomies
The wisdom in TweetonomiesThe wisdom in Tweetonomies
The wisdom in TweetonomiesClaudia Wagner
 

More from Claudia Wagner (17)

Measuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaMeasuring Gender Inequality in Wikipedia
Measuring Gender Inequality in Wikipedia
 
Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"
 
It's a Man's Wikipedia?
It's a Man's Wikipedia? It's a Man's Wikipedia?
It's a Man's Wikipedia?
 
Food and Culture
Food and CultureFood and Culture
Food and Culture
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014
 
When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...
 
WWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging StreamsWWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging Streams
 
Welcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISWelcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESIS
 
Spatio and Temporal Dietary Patterns
Spatio and Temporal Dietary PatternsSpatio and Temporal Dietary Patterns
Spatio and Temporal Dietary Patterns
 
Eswc2013 audience short
Eswc2013 audience shortEswc2013 audience short
Eswc2013 audience short
 
The Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksThe Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social Networks
 
It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users
 
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
 
Socialbots www2012
Socialbots www2012Socialbots www2012
Socialbots www2012
 
SDOW (ISWC2011)
SDOW (ISWC2011)SDOW (ISWC2011)
SDOW (ISWC2011)
 
Knowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsKnowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness Streams
 
The wisdom in Tweetonomies
The wisdom in TweetonomiesThe wisdom in Tweetonomies
The wisdom in Tweetonomies
 

Topic Models

  • 1. Topic Models Claudia Wagner Graz, 16.9.2010
  • 2.
  • 3.
  • 4. Topic Models source: http://www.cs.umass.edu/~wallach/talks/priors.pdf
  • 5. Topic Models Topic 1 Topic 2 3 latent variables: Word distribution per topic (word-topic-matrix) Topic distribution per doc (topic-doc-matrix) Topic word assignment (Steyvers, 2006)
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. Matrix Representation of LDA observed latent latent θ (d) φ (z)
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Gibbs Sampling for LDA Probability that topic j is chosen for word w i , conditioned on all other assigned topics of words in this doc and all other observed vars. Count number of times a word token w i was assigned to a topic j across all docs Count number of times a topic j was already assigned to some word token in doc d i unnormalized! => divide the probability of assigning topic j to word wi by the sum over all topics T
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31. AT Model Latent Variables Latent Variables: 2) Author-distribution of each topic  determines which topics are used by which authors  count matrix C AT 1) Author-Topic assignment for each word 3) Word-distribution of each topic  count matrix C WT ?
  • 32. Matrix Representation of Author-Topic-Model source: http://www.ics.uci.edu/~smyth/kddpapers/UCI_KD-D_author_topic_preprint.pdf θ (x) φ (z) a d observed observed latent latent
  • 33.
  • 34.
  • 35.
  • 36.
  • 37. Predictive Power of different models (Rosen-Zvi, 2005) Experiment: Trainingsdata: 1 557 papers Testdata:183 papers (102 are single-authored papers). They choose test data documents in such a way that each author of a test set document also appears in the training set as an author.
  • 38.
  • 39. Gibbs Sampling ART-Model Random Start: Sample author-recipient pair for each word Sample topic for each word Compute for each word w i : Number of recipients of message to which word w i belongs Number of times topic t was assigned to an author-recipient-pair Number of times current word token was assigned to topic t Number of times all other topics were assigned to an author-recipient-pair Number of times all other words were assigned to topic t Number of words * beta
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.