SlideShare a Scribd company logo
1 of 45
Download to read offline
Образец заголовка
Tutorial on Topic Modelling
by Ayush Jain
Prepared as an assignment for CS410: Text Information Systems in Spring
Образец заголовка
Topic Models
•  Discover	hidden	themes	that	
pervade	the	collec2on	
•  Tag	the	documents	on	the	basis	
of	these	themes	
•  Organize,	summarize	and	search	
the	documents	on	the	basis	of	
these	themes
Образец заголовкаTakeaways from this tutorial
•  What are probabilistic topic models?
•  What kind of things can they do?
•  How do we train/infer a topic model?
•  How do we evaluate a topic model?
Образец заголовкаTools
•  Topic models are a special application of
probability theory. In particular, they touch
– Probabilistic graphical Models
– Conjugate and non-conjugate priors
– Approximate posterior inference
– Exploratory data analysis
Образец заголовка
The Key Steps in every Topic
Model
Make	assump2ons	
Collect	Data	
Infer	posterior	
Evaluate	
Predict
Образец заголовкаOutline
•  Latent Dirichlet Allocation – Application of
key steps
– Graphical Model encoding the assumptions
– Inference Algorithms – Gibbs Sampling
•  Topic Models for more complex tasks
– Rating prediction
•  A completely novel topic model
incorporating sentiments (that we’ll
develop!)
Образец заголовкаLatent Dirichlet Allocation
•  Already covered in course
•  Application of the key steps
– Make assumptions
•  Each topic is a distribution over words
•  Each document is a mixture of topics
•  Each word is drawn from a topic
Образец заголовкаLatent Dirichlet Allocation
•  Graphical Model
•  Encodes	assump2ons	
•  Allows	us	to	break	down	the	joint	probability	into	product	of	condi2onals
Образец заголовкаLatent Dirichlet Allocation
•  Graphical Model
Образец заголовкаLatent Dirichlet Allocation
•  Application of the key steps
– Make assumptions (II)
•  Choose probability distributions
–  Choosing conjugate distributions makes life easier!
»  Eg: Multinomial and Dirichlet are conjugate
distributions
Образец заголовкаAside: Conjugate Distributions
•  Dirichlet Distribution:
: Probability of seeing different sides of die
•  Multinomial Distribution:
–  The number of occurrences of different sides (W) of the die is
distributed in a multinomial manner
•  Posterior distribution:
θ
p(W |θ) is	mul2nomial	
xi:	The	number	of	2mes	side	i	was	observed
Образец заголовкаLatent Dirichlet Allocation
•  Application of the key steps
– Make assumptions (II)
•  Choose probability distributions
–  Choosing conjugate distributions makes life easier!
»  Eg: Multinomial and Dirichlet are conjugate
distributions
– Collect Data
•  Corpus on which you want to detect themes
Образец заголовкаLatent Dirichlet Allocation
•  Application of the key steps
– Infer Posterior
•  Probabilistic graphical models provide algorithms
–  Mean field variational methods
–  Expectation Propagation (similar to EM)
–  Gibbs Sampling (most popular)
–  Variational Inference
Образец заголовкаAside: Gibbs Sampling
– Used when samples need to be drawn from a
joint distribution, but the joint distribution is
difficult to approximate
– Sample X=(x1, …, xn) from joint pdf p(x1, …, xn)
– Conditional distributions are relatively
strighforward
– Procedure:
•  Begin with some initial X(i)
•  Sample xj
(i+1) from
p(xj
(i+1) | x1
(i+1) ,.. , xj-1
(i+1) , xj+1
(i) , .., xn
(i+1) )
•  Repeat
Образец заголовкаLatent Dirichlet Allocation
•  Application of the key steps
– Infer Posterior (Gibbs Sampling)
•  Here, X is all parameters to be inferred
–  Per-word topic assignment zd,n
–  Per-document topic proportions d
–  Per-corpus topic-word distributions k
•  Extremely high dimensional!
•  Solution:
–  Integrate out and
–  Conjugate distributions make the integration
straightforward!
θ
β
θ β
Образец заголовкаLatent Dirichlet Allocation
•  Application of the key steps
– Infer Posterior (Gibbs Sampling)
•  After all computation:
•  nd,:
k, -(d,n): The number of words in document d that
belong to topic k, except for n-th word
•  v: Index of the n-th word in d-th document in the
vocabulary
•  Linear time in the number of tokens!
P Zd,n = k | Z−(d,n),W;α,β( )∝ nd,:
k,−(d,n)
+αk( )
n:,v
k,−(d,n)
+ βv
n:,r
k,−(d,n)
+ βr
r=1
V
∑
Образец заголовкаLatent Dirichlet Allocation
•  Application of the key steps
– Infer Posterior (Gibbs Sampling)
•  After all computation:
•  Linear time in the number of tokens!
•  Further improvements that use the sparsity of the
problem when corpus and number of topics is
large
P Zd,n = k | Z−(d,n),W;α,β( )∝ nd,:
k,−(d,n)
+αk( )
n:,v
k,−(d,n)
+ βv
n:,r
k,−(d,n)
+ βr
r=1
V
∑
Образец заголовкаTopic Models: Evaluation
•  Underlying topics are subjective
– Makes the evaluation difficult
– Workaround: Look at application and evaluate
•  Document classification
•  Information Retrieval
•  Rating Prediction
Образец заголовкаTopic Models: Evaluation
•  Use the trained model to predict probabilities
of seeing unseen documents
– Better models would give high probability
•  Even better:
– Predict the probability of second half of
documents using first halves as the corpus
– Does not require documents to be held out
Образец заголовкаBeyond LDA: Rating Prediction
•  Predict ratings associated with text
•  Additional assumption:
•  Rating is conditional on the topic assignment to different
words
•  Graphical Model:
Образец заголовкаBeyond LDA: Rating Prediction
•  Topics
–  Least, problem, unfortunately, supposed, worse, flat, dull
–  Bad, guys, watchable, not, one, movie
–  Both, motion, simple, perfect, fascinating, power
–  Cinematography, screenplay, performances, pictures, effective, sound
•  Notice how the assumption affects the extracted topics
–  Because of the dependence of the overall rating on number of words in
different topics, topics are collections of words that appear in similarly
ranked documents
–  Topics express sentiment but loose their original meaning!
Образец заголовкаBeyond LDA: Rating Prediction
•  Latent Aspect Rating Prediction
–  Joint Topic and Sentiment Modelling
Genera&ve	Model	
1.  Choose	aspects	and	words	for	
each	aspect	
	
Wdij
Образец заголовкаBeyond LDA: Rating Prediction
•  Latent Aspect Rating Prediction
–  Joint Topic and Sentiment Modelling
Genera&ve	Model	
1.  Choose	aspects	and	words	for	
each	aspect	
2.  Calculate	 aspect	 ra2ng	 based	
on	aspect	words	
	
sdi = βijWdij
j=1
n
∑
Образец заголовкаBeyond LDA: Rating Prediction
•  Latent Aspect Rating Prediction
–  Joint Topic and Sentiment Modelling
Genera&ve	Model	
1.  Choose	 aspects	 and	 words	
for	each	aspect	
2.  Calculate	 aspect	 ra2ng	
based	on	aspect	words	
3.  Overall	 ra2ng	 is	 weighted	
sum	of	aspect	ra2ngs	
	
rd ~ N αdi βijWdij
j=1
n
∑ ,δ2
i=1
k
∑
"
#
$$
%
&
''
Образец заголовкаBeyond LDA: Rating Prediction
•  Latent Aspect Rating Prediction
–  Joint Topic and Sentiment Modelling
Genera&ve	Model	
	
E-Step:	Infer	aspect	ra2ngs						
and	aspect	weights		
	
M-Step:	Update		
sd
αd
µ,Σ,β,δ( )
Образец заголовкаBeyond LDA: Rating Prediction
•  Latent Aspect Rating Prediction
–  Results
•  Detects sentiments without supervision
Образец заголовкаBeyond LDA: Rating Prediction
•  Latent Aspect Rating Prediction
–  Results
•  Requires keyword supervision – Any way to remove? (Think LDA!)
Образец заголовкаBeyond LDA: Rating Prediction
•  Latent Aspect Rating Prediction without
Aspect Keyword Supervision
–  Aspect Modelling Module from LDA included
Образец заголовка
Beyond LDA: Topic Phrase
Mining
•  Motivation:
–  machine learning is a phrase and should be assigned
to one topic
•  Assigning machine to “Industry” and learning to “Education”
is incorrect
•  Approach:
–  Extract high frequency phrases
•  If a phrase is infrequent, so is any super-phrase
•  If a document does not contain a frequent phrase of length n,
it also does not contain any of length > n
•  Use hierarchical clustering to find frequent phrases
–  Apply LDA on phrase tokens
Образец заголовкаSentiment Analysis
•  Let’s build our own simple model using the
key steps!
•  Use case:
Образец заголовкаSentiment Analysis
•  Make Assumptions
–  Each (topic, sentiment) pair has a vocabulary
•  ‘quick delivery’ has more probability for (service, +) than for
(service, -) or (food quality, +)
–  Each (topic, rating) pair has a sentiment distribution
•  + sentiments for food quality are more likely to appear in
highly rated reviews
•  A 4-star rated restaurant is likely to have good food quality
even if it does not provide wireless
–  Each review has
•  Overall rating
•  Topic distribution: Different users might talk about different
aspects in their reviews
Образец заголовкаSentiment Analysis
•  Graphical Model Genera&ve	Process	
1.  Choose	word	distribu2on	for	
all	(topic,	sen2ments)
Образец заголовкаSentiment Analysis
•  Graphical Model Genera&ve	Process	
1.  Choose	word	distribu2on	for	
all	(topic,	sen2ments)	
2.  Choose	sen2ment	distribu2on	
for	all	(topic,	ra2ng)
Образец заголовкаSentiment Analysis
•  Graphical Model Genera&ve	Process	
1.  Choose	word	distribu2on	for	
all	(topic,	sen2ments)	
2.  Choose	sen2ment	distribu2on	
for	all	(topic,	ra2ng)	
3.  For	each	review	
•  Choose	ra2ng
Образец заголовкаSentiment Analysis
•  Graphical Model Genera&ve	Process	
1.  Choose	word	distribu2on	for	
all	(topic,	sen2ments)	
2.  Choose	sen2ment	distribu2on	
for	all	(topic,	ra2ng)	
3.  For	each	review	
•  Choose	ra2ng	
•  Choose	topic	distribu2on
Образец заголовкаSentiment Analysis
•  Graphical Model Genera&ve	Process	
1.  Choose	word	distribu2on	for	
all	(topic,	sen2ments)	
2.  Choose	sen2ment	distribu2on	
for	all	(topic,	ra2ng)	
3.  For	each	review	
•  Choose	ra2ng	
•  Choose	topic	distribu2on	
•  For	each	word	in	review:	
•  Choose	topic
Образец заголовкаSentiment Analysis
•  Graphical Model Genera&ve	Process	
1.  Choose	word	distribu2on	for	
all	(topic,	sen2ments)	
2.  Choose	sen2ment	distribu2on	
for	all	(topic,	ra2ng)	
3.  For	each	review	
•  Choose	ra2ng	
•  Choose	topic	distribu2on	
•  For	each	word	in	review:	
•  Choose	topic	
•  Choose	sen2ment
Образец заголовкаSentiment Analysis
•  Graphical Model Genera&ve	Process	
1.  Choose	word	distribu2on	for	
all	(topic,	sen2ments)	
2.  Choose	sen2ment	distribu2on	
for	all	(topic,	ra2ng)	
3.  For	each	review	
•  Choose	ra2ng	
•  Choose	topic	distribu2on	
•  For	each	word	in	review:	
•  Choose	topic	
•  Choose	sen2ment	
•  Choose	word
Образец заголовкаSentiment Analysis
•  Inference Parameters	to	be	inferred	
1.  Per	document	topic	distribu2on	
2.  Ra2ng	distribu2on	
3.  Sen2ment	distribu2on	
4.  Word	distribu2ons	
Use	Collapsed	Gibbs	Sampling!	
Integrate	out						and		φ π
Образец заголовкаSentiment Analysis
•  Evaluation – Yelp
–  Sandwich: sandwich, slaw, primanti, coleslaw, cole, market, pastrami,
reuben, bro, mayo, famous, cheesesteak, rye, zucchini, swiss, sammy,
peppi, burgh, messi
–  Vietnamese: pho, noodl, bowl, soup, broth, sprout, vermicelli, peanut,
lemongrass, leaf
–  Payment options: server, check, custom, card, return, state, credit,
coupon, accept, tip, treat, gift, refill
–  Location: locat, park, street, drive, hill, window, south, car, downtown,
number, corner, distance
–  Ambience: crowd, fun, group, rock, play, loud, music, young, sing, club,
ticket, meet, entertain, dance, band, song
Образец заголовкаSentiment Analysis
•  Evaluation – Yelp
– Rating prediction
Образец заголовкаSentiment Analysis
•  Evaluation – Yelp
– Opinion Summarization
•  For all reviews of this restaurant
–  15% words assigned to topic “Vegetarian”
–  5% to “Breakfast” (Eggs) with sentiment 0.78
–  3% to “Staff Attitude” with sentiment 0.82
Образец заголовкаTopic Modelling: Future Work
•  Missing Links
– Model selection: Which model to pick for
which applications
– Incorporating linguistic structure/NLP:
•  How can our knowledge of language help?
– Bag of words:
•  Most models are based on the unigram bag of
words model
•  Context is lost – words like good or nice are often
associated with certain words within context, eg:
‘good standard of living’, ‘nice view from the hotel’
Образец заголовкаTopic Modelling
Questions?
Образец заголовкаTopic Modelling
Thank You!

More Related Content

What's hot

Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender systemKaren Li
 
Distributed Processing of Stream Text Mining
Distributed Processing of Stream Text MiningDistributed Processing of Stream Text Mining
Distributed Processing of Stream Text MiningLi Miao
 
Stock prediction using social network
Stock prediction using social networkStock prediction using social network
Stock prediction using social networkChanon Hongsirikulkit
 
Tag And Tag Based Recommender
Tag And Tag Based RecommenderTag And Tag Based Recommender
Tag And Tag Based Recommendergu wendong
 
Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?Sherpa Software
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systemsyoualab
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filteringD Yogendra Rao
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Ernesto Mislej
 
Recommender systems
Recommender systemsRecommender systems
Recommender systemsTamer Rezk
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperChangsung Moon
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Arjen de Vries
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsAlejandro Bellogin
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationArjen de Vries
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRoelof Pieters
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systemsFalitokiniaina Rabearison
 
Recommender system a-introduction
Recommender system a-introductionRecommender system a-introduction
Recommender system a-introductionzh3f
 

What's hot (20)

Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender system
 
Final deck
Final deckFinal deck
Final deck
 
Distributed Processing of Stream Text Mining
Distributed Processing of Stream Text MiningDistributed Processing of Stream Text Mining
Distributed Processing of Stream Text Mining
 
Stock prediction using social network
Stock prediction using social networkStock prediction using social network
Stock prediction using social network
 
Tag And Tag Based Recommender
Tag And Tag Based RecommenderTag And Tag Based Recommender
Tag And Tag Based Recommender
 
Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paper
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems
 
Recommender system a-introduction
Recommender system a-introductionRecommender system a-introduction
Recommender system a-introduction
 

Viewers also liked

ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGGeorge Simov
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalSudarsun Santhiappan
 
An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"sandinmyjoints
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec👋 Christopher Moody
 

Viewers also liked (6)

Vsm lsi
Vsm lsiVsm lsi
Vsm lsi
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERING
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
 
An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"
 
NLP and LSA getting started
NLP and LSA getting startedNLP and LSA getting started
NLP and LSA getting started
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
 

Similar to Topic Modelling: Tutorial on Usage and Applications

Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Lucidworks
 
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Sujit Pal
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)Nicolas Van Labeke
 
Survey Research in Software Engineering
Survey Research in Software EngineeringSurvey Research in Software Engineering
Survey Research in Software EngineeringDaniel Mendez
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptxGaytriDhingra1
 
ML slide share.pptx
ML slide share.pptxML slide share.pptx
ML slide share.pptxGoodReads1
 
Text analysis
Text analysisText analysis
Text analysisshahidzac
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...Dataconomy Media
 
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...Association for Computational Linguistics
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information RetrievalNik Spirin
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 
CEN6016-Chapter1.ppt
CEN6016-Chapter1.pptCEN6016-Chapter1.ppt
CEN6016-Chapter1.pptNelsonYanes6
 

Similar to Topic Modelling: Tutorial on Usage and Applications (20)

Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
 
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Survey Research in Software Engineering
Survey Research in Software EngineeringSurvey Research in Software Engineering
Survey Research in Software Engineering
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptx
 
ML slide share.pptx
ML slide share.pptxML slide share.pptx
ML slide share.pptx
 
Text analysis
Text analysisText analysis
Text analysis
 
5-CEN6016-Chapter1.ppt
5-CEN6016-Chapter1.ppt5-CEN6016-Chapter1.ppt
5-CEN6016-Chapter1.ppt
 
Thesis writing clinic 2014
Thesis writing clinic 2014Thesis writing clinic 2014
Thesis writing clinic 2014
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
 
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
OR
OROR
OR
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
CEN6016-Chapter1.ppt
CEN6016-Chapter1.pptCEN6016-Chapter1.ppt
CEN6016-Chapter1.ppt
 
CEN6016-Chapter1.ppt
CEN6016-Chapter1.pptCEN6016-Chapter1.ppt
CEN6016-Chapter1.ppt
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Topic Modelling: Tutorial on Usage and Applications

  • 1. Образец заголовка Tutorial on Topic Modelling by Ayush Jain Prepared as an assignment for CS410: Text Information Systems in Spring
  • 2. Образец заголовка Topic Models •  Discover hidden themes that pervade the collec2on •  Tag the documents on the basis of these themes •  Organize, summarize and search the documents on the basis of these themes
  • 3. Образец заголовкаTakeaways from this tutorial •  What are probabilistic topic models? •  What kind of things can they do? •  How do we train/infer a topic model? •  How do we evaluate a topic model?
  • 4. Образец заголовкаTools •  Topic models are a special application of probability theory. In particular, they touch – Probabilistic graphical Models – Conjugate and non-conjugate priors – Approximate posterior inference – Exploratory data analysis
  • 5. Образец заголовка The Key Steps in every Topic Model Make assump2ons Collect Data Infer posterior Evaluate Predict
  • 6. Образец заголовкаOutline •  Latent Dirichlet Allocation – Application of key steps – Graphical Model encoding the assumptions – Inference Algorithms – Gibbs Sampling •  Topic Models for more complex tasks – Rating prediction •  A completely novel topic model incorporating sentiments (that we’ll develop!)
  • 7. Образец заголовкаLatent Dirichlet Allocation •  Already covered in course •  Application of the key steps – Make assumptions •  Each topic is a distribution over words •  Each document is a mixture of topics •  Each word is drawn from a topic
  • 8. Образец заголовкаLatent Dirichlet Allocation •  Graphical Model •  Encodes assump2ons •  Allows us to break down the joint probability into product of condi2onals
  • 9. Образец заголовкаLatent Dirichlet Allocation •  Graphical Model
  • 10. Образец заголовкаLatent Dirichlet Allocation •  Application of the key steps – Make assumptions (II) •  Choose probability distributions –  Choosing conjugate distributions makes life easier! »  Eg: Multinomial and Dirichlet are conjugate distributions
  • 11. Образец заголовкаAside: Conjugate Distributions •  Dirichlet Distribution: : Probability of seeing different sides of die •  Multinomial Distribution: –  The number of occurrences of different sides (W) of the die is distributed in a multinomial manner •  Posterior distribution: θ p(W |θ) is mul2nomial xi: The number of 2mes side i was observed
  • 12. Образец заголовкаLatent Dirichlet Allocation •  Application of the key steps – Make assumptions (II) •  Choose probability distributions –  Choosing conjugate distributions makes life easier! »  Eg: Multinomial and Dirichlet are conjugate distributions – Collect Data •  Corpus on which you want to detect themes
  • 13. Образец заголовкаLatent Dirichlet Allocation •  Application of the key steps – Infer Posterior •  Probabilistic graphical models provide algorithms –  Mean field variational methods –  Expectation Propagation (similar to EM) –  Gibbs Sampling (most popular) –  Variational Inference
  • 14. Образец заголовкаAside: Gibbs Sampling – Used when samples need to be drawn from a joint distribution, but the joint distribution is difficult to approximate – Sample X=(x1, …, xn) from joint pdf p(x1, …, xn) – Conditional distributions are relatively strighforward – Procedure: •  Begin with some initial X(i) •  Sample xj (i+1) from p(xj (i+1) | x1 (i+1) ,.. , xj-1 (i+1) , xj+1 (i) , .., xn (i+1) ) •  Repeat
  • 15. Образец заголовкаLatent Dirichlet Allocation •  Application of the key steps – Infer Posterior (Gibbs Sampling) •  Here, X is all parameters to be inferred –  Per-word topic assignment zd,n –  Per-document topic proportions d –  Per-corpus topic-word distributions k •  Extremely high dimensional! •  Solution: –  Integrate out and –  Conjugate distributions make the integration straightforward! θ β θ β
  • 16. Образец заголовкаLatent Dirichlet Allocation •  Application of the key steps – Infer Posterior (Gibbs Sampling) •  After all computation: •  nd,: k, -(d,n): The number of words in document d that belong to topic k, except for n-th word •  v: Index of the n-th word in d-th document in the vocabulary •  Linear time in the number of tokens! P Zd,n = k | Z−(d,n),W;α,β( )∝ nd,: k,−(d,n) +αk( ) n:,v k,−(d,n) + βv n:,r k,−(d,n) + βr r=1 V ∑
  • 17. Образец заголовкаLatent Dirichlet Allocation •  Application of the key steps – Infer Posterior (Gibbs Sampling) •  After all computation: •  Linear time in the number of tokens! •  Further improvements that use the sparsity of the problem when corpus and number of topics is large P Zd,n = k | Z−(d,n),W;α,β( )∝ nd,: k,−(d,n) +αk( ) n:,v k,−(d,n) + βv n:,r k,−(d,n) + βr r=1 V ∑
  • 18. Образец заголовкаTopic Models: Evaluation •  Underlying topics are subjective – Makes the evaluation difficult – Workaround: Look at application and evaluate •  Document classification •  Information Retrieval •  Rating Prediction
  • 19. Образец заголовкаTopic Models: Evaluation •  Use the trained model to predict probabilities of seeing unseen documents – Better models would give high probability •  Even better: – Predict the probability of second half of documents using first halves as the corpus – Does not require documents to be held out
  • 20. Образец заголовкаBeyond LDA: Rating Prediction •  Predict ratings associated with text •  Additional assumption: •  Rating is conditional on the topic assignment to different words •  Graphical Model:
  • 21. Образец заголовкаBeyond LDA: Rating Prediction •  Topics –  Least, problem, unfortunately, supposed, worse, flat, dull –  Bad, guys, watchable, not, one, movie –  Both, motion, simple, perfect, fascinating, power –  Cinematography, screenplay, performances, pictures, effective, sound •  Notice how the assumption affects the extracted topics –  Because of the dependence of the overall rating on number of words in different topics, topics are collections of words that appear in similarly ranked documents –  Topics express sentiment but loose their original meaning!
  • 22. Образец заголовкаBeyond LDA: Rating Prediction •  Latent Aspect Rating Prediction –  Joint Topic and Sentiment Modelling Genera&ve Model 1.  Choose aspects and words for each aspect Wdij
  • 23. Образец заголовкаBeyond LDA: Rating Prediction •  Latent Aspect Rating Prediction –  Joint Topic and Sentiment Modelling Genera&ve Model 1.  Choose aspects and words for each aspect 2.  Calculate aspect ra2ng based on aspect words sdi = βijWdij j=1 n ∑
  • 24. Образец заголовкаBeyond LDA: Rating Prediction •  Latent Aspect Rating Prediction –  Joint Topic and Sentiment Modelling Genera&ve Model 1.  Choose aspects and words for each aspect 2.  Calculate aspect ra2ng based on aspect words 3.  Overall ra2ng is weighted sum of aspect ra2ngs rd ~ N αdi βijWdij j=1 n ∑ ,δ2 i=1 k ∑ " # $$ % & ''
  • 25. Образец заголовкаBeyond LDA: Rating Prediction •  Latent Aspect Rating Prediction –  Joint Topic and Sentiment Modelling Genera&ve Model E-Step: Infer aspect ra2ngs and aspect weights M-Step: Update sd αd µ,Σ,β,δ( )
  • 26. Образец заголовкаBeyond LDA: Rating Prediction •  Latent Aspect Rating Prediction –  Results •  Detects sentiments without supervision
  • 27. Образец заголовкаBeyond LDA: Rating Prediction •  Latent Aspect Rating Prediction –  Results •  Requires keyword supervision – Any way to remove? (Think LDA!)
  • 28. Образец заголовкаBeyond LDA: Rating Prediction •  Latent Aspect Rating Prediction without Aspect Keyword Supervision –  Aspect Modelling Module from LDA included
  • 29. Образец заголовка Beyond LDA: Topic Phrase Mining •  Motivation: –  machine learning is a phrase and should be assigned to one topic •  Assigning machine to “Industry” and learning to “Education” is incorrect •  Approach: –  Extract high frequency phrases •  If a phrase is infrequent, so is any super-phrase •  If a document does not contain a frequent phrase of length n, it also does not contain any of length > n •  Use hierarchical clustering to find frequent phrases –  Apply LDA on phrase tokens
  • 30. Образец заголовкаSentiment Analysis •  Let’s build our own simple model using the key steps! •  Use case:
  • 31. Образец заголовкаSentiment Analysis •  Make Assumptions –  Each (topic, sentiment) pair has a vocabulary •  ‘quick delivery’ has more probability for (service, +) than for (service, -) or (food quality, +) –  Each (topic, rating) pair has a sentiment distribution •  + sentiments for food quality are more likely to appear in highly rated reviews •  A 4-star rated restaurant is likely to have good food quality even if it does not provide wireless –  Each review has •  Overall rating •  Topic distribution: Different users might talk about different aspects in their reviews
  • 32. Образец заголовкаSentiment Analysis •  Graphical Model Genera&ve Process 1.  Choose word distribu2on for all (topic, sen2ments)
  • 33. Образец заголовкаSentiment Analysis •  Graphical Model Genera&ve Process 1.  Choose word distribu2on for all (topic, sen2ments) 2.  Choose sen2ment distribu2on for all (topic, ra2ng)
  • 34. Образец заголовкаSentiment Analysis •  Graphical Model Genera&ve Process 1.  Choose word distribu2on for all (topic, sen2ments) 2.  Choose sen2ment distribu2on for all (topic, ra2ng) 3.  For each review •  Choose ra2ng
  • 35. Образец заголовкаSentiment Analysis •  Graphical Model Genera&ve Process 1.  Choose word distribu2on for all (topic, sen2ments) 2.  Choose sen2ment distribu2on for all (topic, ra2ng) 3.  For each review •  Choose ra2ng •  Choose topic distribu2on
  • 36. Образец заголовкаSentiment Analysis •  Graphical Model Genera&ve Process 1.  Choose word distribu2on for all (topic, sen2ments) 2.  Choose sen2ment distribu2on for all (topic, ra2ng) 3.  For each review •  Choose ra2ng •  Choose topic distribu2on •  For each word in review: •  Choose topic
  • 37. Образец заголовкаSentiment Analysis •  Graphical Model Genera&ve Process 1.  Choose word distribu2on for all (topic, sen2ments) 2.  Choose sen2ment distribu2on for all (topic, ra2ng) 3.  For each review •  Choose ra2ng •  Choose topic distribu2on •  For each word in review: •  Choose topic •  Choose sen2ment
  • 38. Образец заголовкаSentiment Analysis •  Graphical Model Genera&ve Process 1.  Choose word distribu2on for all (topic, sen2ments) 2.  Choose sen2ment distribu2on for all (topic, ra2ng) 3.  For each review •  Choose ra2ng •  Choose topic distribu2on •  For each word in review: •  Choose topic •  Choose sen2ment •  Choose word
  • 39. Образец заголовкаSentiment Analysis •  Inference Parameters to be inferred 1.  Per document topic distribu2on 2.  Ra2ng distribu2on 3.  Sen2ment distribu2on 4.  Word distribu2ons Use Collapsed Gibbs Sampling! Integrate out and φ π
  • 40. Образец заголовкаSentiment Analysis •  Evaluation – Yelp –  Sandwich: sandwich, slaw, primanti, coleslaw, cole, market, pastrami, reuben, bro, mayo, famous, cheesesteak, rye, zucchini, swiss, sammy, peppi, burgh, messi –  Vietnamese: pho, noodl, bowl, soup, broth, sprout, vermicelli, peanut, lemongrass, leaf –  Payment options: server, check, custom, card, return, state, credit, coupon, accept, tip, treat, gift, refill –  Location: locat, park, street, drive, hill, window, south, car, downtown, number, corner, distance –  Ambience: crowd, fun, group, rock, play, loud, music, young, sing, club, ticket, meet, entertain, dance, band, song
  • 41. Образец заголовкаSentiment Analysis •  Evaluation – Yelp – Rating prediction
  • 42. Образец заголовкаSentiment Analysis •  Evaluation – Yelp – Opinion Summarization •  For all reviews of this restaurant –  15% words assigned to topic “Vegetarian” –  5% to “Breakfast” (Eggs) with sentiment 0.78 –  3% to “Staff Attitude” with sentiment 0.82
  • 43. Образец заголовкаTopic Modelling: Future Work •  Missing Links – Model selection: Which model to pick for which applications – Incorporating linguistic structure/NLP: •  How can our knowledge of language help? – Bag of words: •  Most models are based on the unigram bag of words model •  Context is lost – words like good or nice are often associated with certain words within context, eg: ‘good standard of living’, ‘nice view from the hotel’