SlideShare a Scribd company logo
1 of 36
Download to read offline
The Author-Topic Model
Presented by:
Darshan Ramakant Bhat
Freeze Francis
Mohammed Haroon D
by M Rosen-Zvi, et al. (UAI 2004)
Overview
● Motivation
● Model Formulation
○ Topic Model ( i.e LDA)
○ Author Model
○ Author Topic Model
● Parameter Estimation
○ Gibbs Sampling Algorithms
● Results and Evaluation
● Applications
Motivation
● joint author-topic modeling had received little attention.
● author modeling has tended to focus on problem of predicting an author given
the document.
● modelling the interests of authors is a fundamental problem raised by a large
corpus (eg: NIPS dataset).
● this paper introduced a generative model that simultaneously models the
content of documents and the interests of authors.
Notations
● Vocabulary set size : V
● Total unique authors : A
● Document d is identified as (wd
, ad
)
● Nd
: number of words in Document d.
● wd
: Vector of Nd
words (subset of vocabulary)
wd
= [w1d
w2d
…….. wNd d
]
● wid
: ith
word in document d
● Ad
: number of authors of Document d
● ad
: Vector of Ad
authors (subset of authors)
ad
= [a1d
a2d
…….. aAd d
]
● Corpus : Collection D documents
= { (w1
, a1
) , (w2
, a2
) ……. (wD
, aD
) }
Plate Notation for Graphical models
● Nodes are random variables
● Edges denote possible dependence
● Observed variables are shaded
● Plates denote replicated structure
Dirichlet distribution
● generalization of beta distribution into multiple dimensions
● Domain : X = (x1
, x2
….. xk
)
○ xi
∈ (0 , 1)
○ k
xi
= 1
○ k-dimensional multinomial distributions
● "distribution over multinomial distributions"
● Parameter : = ( 1
, 2
…… k
)
○ i
> 0 , ∀i
○ determines how the probability mass is distributed
○ Symmetric : 1
= 2
=…… k
● Each sample from a dirichlet is a multinomial distribution.
○ ↑ : denser samples
○ ↓ : sparse samples
Topic 2
(0,1,0)
Topic 1
(1,0,0)
Topic 3
(0,0,1)
.
(⅓,⅓,⅓ )
Fig : Domain of Dirichlet (k=3)
Fig : Symmetric Dirichlet distributions for k=3
pdf
samples
Model Formulation
Topic Model (LDA)
Model topics as distribution over words
Model documents as distribution over topics
Author Model
Model author as distribution over words
Author-Topic Model
Probabilistic model for both author and topics
Model topics as distribution over words
Model authors as distribution over topics
Topic Model : LDA
● Generative model
● Bag of word
● Mixed membership
● Each word in a document has a topic
assignment.
● Each document is associated with a
distribution over topics.
● Each topic is associated with a
distribution over words.
● α : Dirichlet prior parameter on the
per-document topic distributions.
● β : Dirichlet prior parameter on the
per-topic word distribution.
● w : observed word in document
● z : is the topic assigned for w
Fig : LDA model
Author Model
● Simple generative model
● Each author is associated with a
distribution over words.
● Each word has a author assignment.
● β : Dirichlet prior parameter on the
per-author word distributions.
● w : Observed word
● x : is the author chosen for word w
Fig : Author Model
Model Formulation
Topic Model (LDA)
Model topics as distribution over words
Model documents as distribution over topics
Author Model
Model author as distribution over words
Author-Topic Model
Probabilistic model for both author and topics
Model topics as distribution over words
Model authors as distribution over topics
Author-Topic
Model
● Each topic is associated with a
distribution over words.
● Each author is associated with a
distribution over topics.
● Each word in a document has an
author and a topic assignment.
● Topic distribution of document is a
mixture of its authors topic distribution
● α : Dirichlet prior parameter on the
per-author topic distributions.
● x : author chosen for word w
● z : is the topic assigned for w from
author x topic distribution
● w : observed word in document
Fig : Author Topic Model plate notation
Author-Topic
Model
Fig : Author Topic model
Special Cases of ATM
Topic Model (LDA):
Each document is written by a
unique author
● the document determines
the author.
● author assignment becomes
trivial.
● author’s topic distribution
can be viewed as
document’s topic
distribution.
Special Cases of ATM
Author Model:
each author writes about a unique topic
● the author determines the topic.
● topic assignment becomes trivial.
● topic’s word distribution can be
viewed as author’s word
distribution.
Variables to be inferred
Variables to be inferred
Variables to be inferred
Collapsed Gibbs Sampling - LDA
● The assignment variables contain information about and .
● Use this information to directly update the Topic assignment variable with the
next sample. This is Collapsed Gibbs Sampling.
Collapsed Gibbs Sampling - Author Model
● Need to infer Author - word distribution A
● WIth Collapsing GIbbs sampling, directly infer author word assignments
Author Topic Model - Collapsed Gibbs Sampling
● Need to infer T
and ϴA
● Instead through
collapsed Gibbs
Sampling directly infer
Topic and Author
assignments
● By finding the author -
topic joint distribution
conditioned on all other
variables
Example
AUTHOR 1 2 2 1 1
TOPIC 3 2 1 3 1
WORD epilepsy dynamic bayesian EEG model
TOPIC-1 TOPIC-2 TOPIC-3
epilepsy 1 0 35
dynamic 18 20 1
bayesian 42 15 0
EEG 0 5 20
model 10 8 1
...
TOPIC-1 TOPIC-2 TOPIC-3
AUTHOR-1 20 14 2
AUTHOR-2 20 21 5
AUTHOR-3 11 8 15
AUTHOR-4 5 15 14
...
Example
AUTHOR 1 2 ? 1 1
TOPIC 3 2 ? 3 1
WORD epilepsy dynamic bayesian EEG model
TOPIC-1 TOPIC-2 TOPIC-3
epilepsy 1 0 35
dynamic 18 20 1
bayesian 42(41) 15 0
EEG 0 5 20
model 10 8 1
...
TOPIC-1 TOPIC-2 TOPIC-3
AUTHOR-1 20 14 2
AUTHOR-2 20(19) 21 5
AUTHOR-3 11 8 15
AUTHOR-4 5 15 14
...
Author-Topic How much topic ‘j’ likes the word ‘bayesian’ How much author ‘k’ likes the topic
A1T1 41/97 20/36
A2T1 41/97 19/45
A1T2 15/105 14/36
A2T2 ... ...
A2T3 ... ...
A2T3 ... ...
● To draw new Author-Topic assignment (equivalently)
○ Roll K X ad
sided die with these probabilities
○ Assign the Author-Topic tuple to the word
AUTHOR 1 2 1 1 1
TOPIC 3 2 3 3 1
WORD epilepsy dynamic bayesian EEG model
Example (cont..)
● Update the probability ( T
and A
) tables with new values
● This implies increasing the popularity of topic in author and word in topic
Sample Output (CORA dataset)Topicprobabilities
Topics shown with 10 most probable words
Author :
Experimental Results
● Experiment is done on two types of dataset, papers are chosen from NIPS and
CiteSeer database
● Extremely common words are removed from the corpus, leading V=13,649
unique words in NIPS and V=30,799 in CiteSeer, 2037 authors in NIPS and
85,465 in CiteSeer
● NIPS contains many documents which are closely related to learning theory
and machine learning
● CiteSeer contains variety of topics from User Interface to Solar astrophysics
Examples of topic author distribution
Importance ofTopicprobabilityofarandomauthor
Evaluating the predictive power
● Predictive power of the model is measured quantitatively using perplexity.
● Perplexity is ability to predict the words on a new unseen documents
● Perplexity of set of d words in a document (wd
,ad
) is
● When p(wd
|ad
) is high perplexity ≈ 1, otherwise it will be a high positive
quantity.
Continued….
● Approximate equation to calculate the joint probability
Continued….
Applications
● It can be used in automatic reviewer recommendation for a paper review.
● Given an abstract of a paper and a list of the authors plus their known past
collaborators, generate a list of other highly likely authors for this abstract who
might serve as good reviewers.
References
● Rosen-Zvi, Michal, et al. "The author-topic model for authors and documents."
Proceedings of the 20th conference on Uncertainty in artificial intelligence.
AUAI Press, 2004.
● D. M. Blei, A. Y. Ng, and M. I. Jordan (2003). “Latent Dirichlet Allocation”.
Journal of Machine Learning Research, 3 : 993–1022.
● T. L. Griffiths, and M. Steyvers (2004). “Finding scientific topics”. Proceedings
of the National Academy of Sciences, 101 (suppl. 1), 5228–5235.
● https://github.com/arongdari/python-topic-model
● https://www.coursera.org/learn/ml-clustering-and-retrieval/home/welcome
Questions?

More Related Content

What's hot (20)

Bp 2
Bp 2Bp 2
Bp 2
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
fastTextの実装を見てみた
fastTextの実装を見てみたfastTextの実装を見てみた
fastTextの実装を見てみた
 
U.cs101 алгоритм программчлал-7
U.cs101   алгоритм программчлал-7U.cs101   алгоритм программчлал-7
U.cs101 алгоритм программчлал-7
 
БАРАА МАТЕРИАЛЫН НӨӨЦ
БАРАА МАТЕРИАЛЫН НӨӨЦБАРАА МАТЕРИАЛЫН НӨӨЦ
БАРАА МАТЕРИАЛЫН НӨӨЦ
 
LSA algorithm
LSA algorithmLSA algorithm
LSA algorithm
 
Rcppのすすめ
RcppのすすめRcppのすすめ
Rcppのすすめ
 
Лекц №6 Үйлдвэрлэлийн нэгдсэн төлөвлөлт
Лекц №6 Үйлдвэрлэлийн нэгдсэн төлөвлөлтЛекц №6 Үйлдвэрлэлийн нэгдсэн төлөвлөлт
Лекц №6 Үйлдвэрлэлийн нэгдсэн төлөвлөлт
 
言語モデル入門
言語モデル入門言語モデル入門
言語モデル入門
 
CS203 Лекц01 Prefeace
CS203 Лекц01  PrefeaceCS203 Лекц01  Prefeace
CS203 Лекц01 Prefeace
 
証明プログラミング入門2
証明プログラミング入門2証明プログラミング入門2
証明プログラミング入門2
 
C
CC
C
 
лекц 6 7
лекц 6 7лекц 6 7
лекц 6 7
 
Econ ch 7
Econ ch 7Econ ch 7
Econ ch 7
 
Lekts 1
Lekts 1Lekts 1
Lekts 1
 
Г.Мөнгөнцэцэг - CAPM ЗАГВАР ба ӨРГӨТГӨЛ
Г.Мөнгөнцэцэг - CAPM ЗАГВАР ба ӨРГӨТГӨЛГ.Мөнгөнцэцэг - CAPM ЗАГВАР ба ӨРГӨТГӨЛ
Г.Мөнгөнцэцэг - CAPM ЗАГВАР ба ӨРГӨТГӨЛ
 
Copy of нөхцөлт магадлал
Copy of нөхцөлт магадлалCopy of нөхцөлт магадлал
Copy of нөхцөлт магадлал
 
Econ l10 a.2020
Econ l10 a.2020Econ l10 a.2020
Econ l10 a.2020
 
Лекц №5
Лекц №5Лекц №5
Лекц №5
 
Ci prog tolgoi file хичээл 2
Ci prog tolgoi file хичээл 2Ci prog tolgoi file хичээл 2
Ci prog tolgoi file хичээл 2
 

Similar to Author Topic Model

TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)Sihan Chen
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentationSoojung Hong
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesBryan Gummibearehausen
 
Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008Roman Stanchak
 
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...Aaron Li
 
Author paper identification problem final presentation
Author  paper identification problem final presentationAuthor  paper identification problem final presentation
Author paper identification problem final presentationPooja Mishra
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015rusbase
 
Graph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingGraph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingSujit Pal
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 
A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2Jisoo Jang
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modellingcsandit
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGcscpconf
 

Similar to Author Topic Model (20)

Topic modelling
Topic modellingTopic modelling
Topic modelling
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News Stories
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008
 
话题模型2
话题模型2话题模型2
话题模型2
 
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
 
Author paper identification problem final presentation
Author  paper identification problem final presentationAuthor  paper identification problem final presentation
Author paper identification problem final presentation
 
Lec1
Lec1Lec1
Lec1
 
Spatial LDA
Spatial LDASpatial LDA
Spatial LDA
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
 
Graph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingGraph Techniques for Natural Language Processing
Graph Techniques for Natural Language Processing
 
Probabilistic content models,
Probabilistic content models,Probabilistic content models,
Probabilistic content models,
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modelling
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Author Topic Model

  • 1. The Author-Topic Model Presented by: Darshan Ramakant Bhat Freeze Francis Mohammed Haroon D by M Rosen-Zvi, et al. (UAI 2004)
  • 2. Overview ● Motivation ● Model Formulation ○ Topic Model ( i.e LDA) ○ Author Model ○ Author Topic Model ● Parameter Estimation ○ Gibbs Sampling Algorithms ● Results and Evaluation ● Applications
  • 3. Motivation ● joint author-topic modeling had received little attention. ● author modeling has tended to focus on problem of predicting an author given the document. ● modelling the interests of authors is a fundamental problem raised by a large corpus (eg: NIPS dataset). ● this paper introduced a generative model that simultaneously models the content of documents and the interests of authors.
  • 4. Notations ● Vocabulary set size : V ● Total unique authors : A ● Document d is identified as (wd , ad ) ● Nd : number of words in Document d. ● wd : Vector of Nd words (subset of vocabulary) wd = [w1d w2d …….. wNd d ] ● wid : ith word in document d ● Ad : number of authors of Document d ● ad : Vector of Ad authors (subset of authors) ad = [a1d a2d …….. aAd d ] ● Corpus : Collection D documents = { (w1 , a1 ) , (w2 , a2 ) ……. (wD , aD ) }
  • 5. Plate Notation for Graphical models ● Nodes are random variables ● Edges denote possible dependence ● Observed variables are shaded ● Plates denote replicated structure
  • 6. Dirichlet distribution ● generalization of beta distribution into multiple dimensions ● Domain : X = (x1 , x2 ….. xk ) ○ xi ∈ (0 , 1) ○ k xi = 1 ○ k-dimensional multinomial distributions ● "distribution over multinomial distributions" ● Parameter : = ( 1 , 2 …… k ) ○ i > 0 , ∀i ○ determines how the probability mass is distributed ○ Symmetric : 1 = 2 =…… k ● Each sample from a dirichlet is a multinomial distribution. ○ ↑ : denser samples ○ ↓ : sparse samples Topic 2 (0,1,0) Topic 1 (1,0,0) Topic 3 (0,0,1) . (⅓,⅓,⅓ ) Fig : Domain of Dirichlet (k=3)
  • 7. Fig : Symmetric Dirichlet distributions for k=3 pdf samples
  • 8. Model Formulation Topic Model (LDA) Model topics as distribution over words Model documents as distribution over topics Author Model Model author as distribution over words Author-Topic Model Probabilistic model for both author and topics Model topics as distribution over words Model authors as distribution over topics
  • 9. Topic Model : LDA ● Generative model ● Bag of word ● Mixed membership ● Each word in a document has a topic assignment. ● Each document is associated with a distribution over topics. ● Each topic is associated with a distribution over words. ● α : Dirichlet prior parameter on the per-document topic distributions. ● β : Dirichlet prior parameter on the per-topic word distribution. ● w : observed word in document ● z : is the topic assigned for w Fig : LDA model
  • 10. Author Model ● Simple generative model ● Each author is associated with a distribution over words. ● Each word has a author assignment. ● β : Dirichlet prior parameter on the per-author word distributions. ● w : Observed word ● x : is the author chosen for word w Fig : Author Model
  • 11. Model Formulation Topic Model (LDA) Model topics as distribution over words Model documents as distribution over topics Author Model Model author as distribution over words Author-Topic Model Probabilistic model for both author and topics Model topics as distribution over words Model authors as distribution over topics
  • 12. Author-Topic Model ● Each topic is associated with a distribution over words. ● Each author is associated with a distribution over topics. ● Each word in a document has an author and a topic assignment. ● Topic distribution of document is a mixture of its authors topic distribution ● α : Dirichlet prior parameter on the per-author topic distributions. ● x : author chosen for word w ● z : is the topic assigned for w from author x topic distribution ● w : observed word in document Fig : Author Topic Model plate notation
  • 14. Special Cases of ATM Topic Model (LDA): Each document is written by a unique author ● the document determines the author. ● author assignment becomes trivial. ● author’s topic distribution can be viewed as document’s topic distribution.
  • 15. Special Cases of ATM Author Model: each author writes about a unique topic ● the author determines the topic. ● topic assignment becomes trivial. ● topic’s word distribution can be viewed as author’s word distribution.
  • 16. Variables to be inferred
  • 17. Variables to be inferred
  • 18. Variables to be inferred
  • 19. Collapsed Gibbs Sampling - LDA ● The assignment variables contain information about and . ● Use this information to directly update the Topic assignment variable with the next sample. This is Collapsed Gibbs Sampling.
  • 20. Collapsed Gibbs Sampling - Author Model ● Need to infer Author - word distribution A ● WIth Collapsing GIbbs sampling, directly infer author word assignments
  • 21. Author Topic Model - Collapsed Gibbs Sampling ● Need to infer T and ϴA ● Instead through collapsed Gibbs Sampling directly infer Topic and Author assignments ● By finding the author - topic joint distribution conditioned on all other variables
  • 22. Example AUTHOR 1 2 2 1 1 TOPIC 3 2 1 3 1 WORD epilepsy dynamic bayesian EEG model TOPIC-1 TOPIC-2 TOPIC-3 epilepsy 1 0 35 dynamic 18 20 1 bayesian 42 15 0 EEG 0 5 20 model 10 8 1 ... TOPIC-1 TOPIC-2 TOPIC-3 AUTHOR-1 20 14 2 AUTHOR-2 20 21 5 AUTHOR-3 11 8 15 AUTHOR-4 5 15 14 ...
  • 23. Example AUTHOR 1 2 ? 1 1 TOPIC 3 2 ? 3 1 WORD epilepsy dynamic bayesian EEG model TOPIC-1 TOPIC-2 TOPIC-3 epilepsy 1 0 35 dynamic 18 20 1 bayesian 42(41) 15 0 EEG 0 5 20 model 10 8 1 ... TOPIC-1 TOPIC-2 TOPIC-3 AUTHOR-1 20 14 2 AUTHOR-2 20(19) 21 5 AUTHOR-3 11 8 15 AUTHOR-4 5 15 14 ...
  • 24. Author-Topic How much topic ‘j’ likes the word ‘bayesian’ How much author ‘k’ likes the topic A1T1 41/97 20/36 A2T1 41/97 19/45 A1T2 15/105 14/36 A2T2 ... ... A2T3 ... ... A2T3 ... ...
  • 25. ● To draw new Author-Topic assignment (equivalently) ○ Roll K X ad sided die with these probabilities ○ Assign the Author-Topic tuple to the word AUTHOR 1 2 1 1 1 TOPIC 3 2 3 3 1 WORD epilepsy dynamic bayesian EEG model
  • 26. Example (cont..) ● Update the probability ( T and A ) tables with new values ● This implies increasing the popularity of topic in author and word in topic
  • 27. Sample Output (CORA dataset)Topicprobabilities Topics shown with 10 most probable words Author :
  • 28. Experimental Results ● Experiment is done on two types of dataset, papers are chosen from NIPS and CiteSeer database ● Extremely common words are removed from the corpus, leading V=13,649 unique words in NIPS and V=30,799 in CiteSeer, 2037 authors in NIPS and 85,465 in CiteSeer ● NIPS contains many documents which are closely related to learning theory and machine learning ● CiteSeer contains variety of topics from User Interface to Solar astrophysics
  • 29. Examples of topic author distribution
  • 31. Evaluating the predictive power ● Predictive power of the model is measured quantitatively using perplexity. ● Perplexity is ability to predict the words on a new unseen documents ● Perplexity of set of d words in a document (wd ,ad ) is ● When p(wd |ad ) is high perplexity ≈ 1, otherwise it will be a high positive quantity.
  • 32. Continued…. ● Approximate equation to calculate the joint probability
  • 34. Applications ● It can be used in automatic reviewer recommendation for a paper review. ● Given an abstract of a paper and a list of the authors plus their known past collaborators, generate a list of other highly likely authors for this abstract who might serve as good reviewers.
  • 35. References ● Rosen-Zvi, Michal, et al. "The author-topic model for authors and documents." Proceedings of the 20th conference on Uncertainty in artificial intelligence. AUAI Press, 2004. ● D. M. Blei, A. Y. Ng, and M. I. Jordan (2003). “Latent Dirichlet Allocation”. Journal of Machine Learning Research, 3 : 993–1022. ● T. L. Griffiths, and M. Steyvers (2004). “Finding scientific topics”. Proceedings of the National Academy of Sciences, 101 (suppl. 1), 5228–5235. ● https://github.com/arongdari/python-topic-model ● https://www.coursera.org/learn/ml-clustering-and-retrieval/home/welcome