Topic models such as Latent Dirichlet Allocation (LDA) have been extensively used for characterizing text collections according to the topics discussed in documents. Organizing documents according to topic can be applied to different information access tasks such as document clustering, content-based recommendation or summarization. Spoken documents such as podcasts typically involve more than one speaker (e.g., meetings, interviews, chat shows or news with reporters). This paper presents a work-in-progress based on a variation of LDA that includes in the model the different speakers participating in conversational audio transcripts. Intuitively, each speaker has her own background knowledge which generates different topic and word distributions. We believe that informing a topic model with speaker segmentation (e.g., using existing speaker diarization techniques) may enhance discovery of topics in multi-speaker audio content.
4. Example: Topic Discovery for Recommendation
{dance, Terpsichore,
Latin, America}
{dessert, whip, white,
egg}
More Like This More content about
dance
More content about
desserts
5. Topic Discovery in Multi-Speaker
Audio Contents: Applications
• Multi-Speaker Audio Contents:
• Podcasts (news, shows, interviews, etc.)
• Meetings
• TV programs
• Applications:
• Content-based Recommendation: ‘more like this’
• Clustering
• Group search results according to topics
• E.g., Search Result Presentation
6. Research Question
What is the impact in terms of effectiveness of
adding speaker information to a topic model
when compared to traditional approaches (i.e., LDA)?
7. Topic Discovery
[Image from Blei, D. Probabilistic Topic Models, Communication of the ACM, 2012]
Distribution of
topics over words
Distribution of
topics over
documents
8. Topic Discovery vs. Topic Segmentation
Topic Discovery Topic Segmentation
• Characterizes how a
conversation evolves over time
in terms of topics
• 1 document ~ sequence of
topics
• Characterizes documents
according to topics
• 1 document ~ distribution of
topics
t1 t3 t2 t3 t2 t1
time
t1 t2 t3
9. Topic Discovery vs. Topic Segmentation
Topic Discovery Topic Segmentation
Not using speaker information Latent Dirichlet Allocation (LDA)
[Blei et al., 2003]
TextTiling [Hearst, 1997]
[Purver et al. 2006]
Using speaker information ? SITS [Nguyen et al., 2012]
10. Topic Discovery vs. Topic Segmentation
Topic Discovery Topic Segmentation
Not using speaker information Latent Dirichlet Allocation (LDA)
[Blei et al., 2003]
TextTiling [Hearst, 1997]
[Purver et al. 2006]
Using speaker information SpeakerLDA SITS [Nguyen et al., 2012]
RQ
RQ'
11. Proposed Approach: SpeakerLDA
• Split documents (D) according to speakers (S)
• Run LDA
• Combine topic distributions obtained for each speaker’s pseudo-
document ds
qds
13. Evaluation Framework
• Topic models are typically evaluated by
(i) computing intrinsic metrics (e.g., perplexity) of the the model in an unseen
set of documents or
(ii) being applied to external information access tasks (e.g., topic detection as
a clustering task)
• Needs manually annotated ground truth
• One possible measure: Precision/Recall of clustering relationships
14. Evaluation Framework II
• Is there any test collection suitable for measuring differences
between our approach and existent topic models?
• Must satisfy following conditions
A. Each topic is discussed in two or more documents
B. Include spoken documents with two or more speakers
The AMI Corpus satisfies both conditions!
15. The AMI Corpus
• Augmented Multi-Party Interaction (AMI) Corpus
• 100 hours of recorded audio
• More than 100 meetings with multiple speakers (generally 4)
• Real and elicited scenario-driven meetings
• Speakers play different roles:
• Interface designer, project manager, industrial designer, marketing
• Manual transcriptions, including speaker segmentation
• Transcripts segmented according to topics and subtopics
18. Work in Progress
• Compare the effectiveness of SpeakerLDA vs. LDA (and vs. topic
segmentation approaches)
• Extrinsic Evaluation: compare system outputs to clustering gold
standard
19. 0.0
0.2
0.4
0.6
0.8
0.25 0.50 0.75 1.00
Sensitivity (BCubed Recall)
Reliability(BCubedPrecision)
system
LDA
SpeakerLDA
• AMI Corpus
• Topic Segmentation
annotations as clustering
gold standard
• Varying initial number of
topics
• Considering the n most
frequent topics in the
topic-document
distribution for topic
assignment
20. Work in Progress
• Compare the effectiveness of SpeakerLDA vs. LDA (and vs. topic
segmentation approaches)
• Extrinsic Evaluation: compare system outputs to clustering gold
standard
• Challenge: How to define a valid clustering gold standard from topic
segmentation annotations?
• Opportunity: Compare system output to topic distribution gold standard.
• Generate distributions from annotated segments
22. Conclusions
• We propose SpeakerLDA, a topic model that takes into account
speaker information to discover what a set of audio documents (such
as podcasts) is about
• It can be used for clustering search results or content-based
recommendation (´more like this´)
• We are currently investigating how to generate a clustering gold
standard from topic segmentation annotations in the AMI Corpus
• Evaluate topic models by comparing against a topic distribution gold
standard?
24. SpeakerLDA: Discovering Topics
in Transcribed Multi-Speaker
Audio Contents
Damiano Spina, Johanne R. Trippas, Lawrence Cavedon, Mark Sanderson
@damiano10
damiano.spina@rmit.edu.au