SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ SLAM 2015

SpeakerLDA: Discovering Topics
in Transcribed Multi-Speaker
Audio Contents
Damiano Spina, Johanne R. Trippas, Lawrence Cavedon, Mark Sanderson

An Extreme Example: Discussing about
‘Merengue’ (Spanish)
{dance, egg, whip, Terpsichore,
Latin, America, white, dessert}
{dance, Terpsichore, Latin,
America}
{dessert, whip, white, egg}
What is the dialogue about?
Not considering speakers Considering speakersVS.

Hypothesis
Considering information about speakers—which words/fragments
correspond to each speaker—would improve topic discovery

Example: Topic Discovery for Recommendation
{dance, Terpsichore,
Latin, America}
{dessert, whip, white,
egg}
More Like This More content about
dance
More content about
desserts

Topic Discovery in Multi-Speaker
Audio Contents: Applications
• Multi-Speaker Audio Contents:
• Podcasts (news, shows, interviews, etc.)
• Meetings
• TV programs
• Applications:
• Content-based Recommendation: ‘more like this’
• Clustering
• Group search results according to topics
• E.g., Search Result Presentation

Research Question
What is the impact in terms of effectiveness of
adding speaker information to a topic model
when compared to traditional approaches (i.e., LDA)?

Topic Discovery
[Image from Blei, D. Probabilistic Topic Models, Communication of the ACM, 2012]
Distribution of
topics over words
Distribution of
topics over
documents

Topic Discovery vs. Topic Segmentation
Topic Discovery Topic Segmentation
• Characterizes how a
conversation evolves over time
in terms of topics
• 1 document ~ sequence of
topics
• Characterizes documents
according to topics
• 1 document ~ distribution of
topics
t1 t3 t2 t3 t2 t1
time
t1 t2 t3

Not using speaker information Latent Dirichlet Allocation (LDA)
[Blei et al., 2003]
TextTiling [Hearst, 1997]
[Purver et al. 2006]
Using speaker information ? SITS [Nguyen et al., 2012]

Not using speaker information Latent Dirichlet Allocation (LDA)
[Blei et al., 2003]
TextTiling [Hearst, 1997]
[Purver et al. 2006]
Using speaker information SpeakerLDA SITS [Nguyen et al., 2012]
RQ
RQ'

Proposed Approach: SpeakerLDA
• Split documents (D) according to speakers (S)
• Run LDA
• Combine topic distributions obtained for each speaker’s pseudo-
document ds
qds

Evaluation Framework
• Topic models are typically evaluated by
(i) computing intrinsic metrics (e.g., perplexity) of the the model in an unseen
set of documents or
(ii) being applied to external information access tasks (e.g., topic detection as
a clustering task)
• Needs manually annotated ground truth
• One possible measure: Precision/Recall of clustering relationships

Evaluation Framework II
• Is there any test collection suitable for measuring differences
between our approach and existent topic models?
• Must satisfy following conditions
A. Each topic is discussed in two or more documents
B. Include spoken documents with two or more speakers
The AMI Corpus satisfies both conditions!

The AMI Corpus
• Augmented Multi-Party Interaction (AMI) Corpus
• 100 hours of recorded audio
• More than 100 meetings with multiple speakers (generally 4)
• Real and elicited scenario-driven meetings
• Speakers play different roles:
• Interface designer, project manager, industrial designer, marketing
• Manual transcriptions, including speaker segmentation
• Transcripts segmented according to topics and subtopics

Generating a Gold Standard for Topic
Discovery

Work in Progress
• Compare the effectiveness of SpeakerLDA vs. LDA (and vs. topic
segmentation approaches)
• Extrinsic Evaluation: compare system outputs to clustering gold
standard

0.0
0.2
0.4
0.6
0.8
0.25 0.50 0.75 1.00
Sensitivity (BCubed Recall)
Reliability(BCubedPrecision)
system
LDA
SpeakerLDA
• AMI Corpus
• Topic Segmentation
annotations as clustering
gold standard
• Varying initial number of
topics
• Considering the n most
frequent topics in the
topic-document
distribution for topic
assignment

Work in Progress
• Compare the effectiveness of SpeakerLDA vs. LDA (and vs. topic
segmentation approaches)
• Extrinsic Evaluation: compare system outputs to clustering gold
standard
• Challenge: How to define a valid clustering gold standard from topic
segmentation annotations?
• Opportunity: Compare system output to topic distribution gold standard.
• Generate distributions from annotated segments

{closing=0.09, opening=0.03, components...=0.21, discussion=0.06,
industrial...=0.21, interface…=0.21, marketing...=0.20}
Gold topic distribution for the meeting IS1008c:

Conclusions
• We propose SpeakerLDA, a topic model that takes into account
speaker information to discover what a set of audio documents (such
as podcasts) is about
• It can be used for clustering search results or content-based
recommendation (´more like this´)
• We are currently investigating how to generate a clustering gold
standard from topic segmentation annotations in the AMI Corpus
• Evaluate topic models by comparing against a topic distribution gold
standard?

Thank you!
- For dessert we have...'Merengue'!

SpeakerLDA: Discovering Topics
in Transcribed Multi-Speaker
Audio Contents
Damiano Spina, Johanne R. Trippas, Lawrence Cavedon, Mark Sanderson
@damiano10
damiano.spina@rmit.edu.au

SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ SLAM 2015

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ SLAM 2015

Similar to SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ SLAM 2015 (20)

More from Damiano Spina

More from Damiano Spina (12)

Recently uploaded

Recently uploaded (20)

SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ SLAM 2015