An emerging paradigm for the processing of data streams involves human and machine computation working together, allowing human intelligence to process large-scale data. We apply this approach to the classification of crisis-related messages in microblog streams. We begin by describing the platform AIDR (Artificial Intelligence for Disaster Response), which collects human annotations over time to create and maintain automatic supervised classifiers for social media messages. Next, we study two significant challenges in its design: (1) identifying which elements must be labeled by humans, and (2) determining when to ask for such annotations to be done. The first challenge is selecting the items to be labeled by crowdsourcing workers to maximize the productivity of their work. The second challenge is to schedule the work in order to reliably maintain high classification accuracy over time. We provide and validate answers to these challenges by extensive experimentation on real- world datasets.
DevoxxFR 2024 Reproducible Builds with Apache Maven
Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises
1. Muhammad
Imran,
Carlos
Cas)llo,
Ji
Lucas,
Patrick
Meier,
Jakob
Rogstadius
Qatar
Compu0ng
Research
Ins0tute
(QCRI)
Doha,
Qatar
Coordina0ng
Human
and
Machine
Intelligence
to
Classify
Microblog
Communica0ons
in
Crises
2. USEFUL
INFORMATION
ON
TWITTER
Cau0on
and
advice
Informa0on
source
Dona0ons
Causali0es
&
damage
A
siren
heard
Tornado
warning
issued/li>ed
Tornado
sigh)ng/touchdown
42%
50%
30%
12%
18%
Photos
as
info.
source
Webpages
info.
source
Videos
as
info.
source
44%
20%
16%
Other
dona)ons
Money
Equipment,
shelter,
Volunteers,
Blood
38%
8%
54%
People
injured
People
dead
Damage
44%
44%
2%
16%
10%
%
of
informa0ve
tweets
Ref:
“Extrac-ng
Informa-on
Nuggets
from
Disaster-‐Related
Messages
in
Social
Media”.
Imran
et
al.
ISCRAM-‐2013,
Baden-‐Baden,
Germany.
3. SOCIAL
MEDIA
INFORMATION
PROCESSING:
OFFLINE
APPROACH
Data
collec)on
1
2
Human
annota)ons
on
sample
data
Machine
training
3
Classifica)on
4
Disaster
Timeline:
DATA
COLLECTION
4. IMPACT
AND
RESPONSE
TIMELINE
Source:
Department
of
Community
Safety,
Queensland
Govt.
2011
&
UNOCHA
Disaster
response
(today)
Disaster
response
(target)
Target
disaster
response
requires
real-‐0me
processing.
5. REAL-‐TIME
SOCIAL
MEDIA
ANALYSIS
Key
requirements:
• Real-‐0me
data
collec)on
• Capable
to
incorporate
new
data
collec0on
strategies
• Obtain
human-‐labels
in
real-‐0me
• Perform
de-‐duplica0on
• Perform
almost
online
machine
learning
• Con)nuous
learning
• Learn
as
new
labels
arrive
• Perform
real-‐0me
classifica0on
• Scale
with
big
disasters
(Sandy
15k
posts/min)
6. Data
collec)on
1
2
Human
annota)ons
Machine
training
3
Classifica)on
4
ONLINE
APPROACH
DATA
COLLECTION
H
A
Learning-‐1
CLASSIFICATION
Learning-‐2
Learning-‐3
…
Learning-‐n
Human
annota)on
-‐
1
Human
annota)on
-‐
2
Human
annota)on
-‐
3
…
Human
annota)on
-‐
n
First
few
hours
SOCIAL
MEDIA
INFORMATION
PROCESSING:
ONLINE
APPROACH
(REAL-‐TIME)
7. hdp://aidr.qcri.org/
AIDR
—Ar)ficial
Intelligence
for
Disaster
Response—
is
a
free,
open-‐source,
and
easy-‐to-‐use
plagorm
to
automa)cally
filter
and
classify
relevant
tweets
posted
during
humanitarian
crises.
1
2
3
Collect
Curate
Classify
8. AIDR:
FROM
END-‐USERS
PERSPECTIVE
Collec0on
Classifier(s)
• Keywords,
Hashtags
• Geographical
bounding
box
• Languages
• Follow
specific
set
of
users
A
collec0on
is
a
set
of
filters
A
classifier
is
a
set
of
tags
• Dona0ons
requests
&
offers
• Damage
&
causali0es
• Eyewitness
accounts
2
step
approach
1
2
hdp://aidr.qcri.org/
9. REAL-‐TIME
CLASSIFICATION
IN
AIDR
Collec0on
Classifier(s)
Tag
Tag
Tag
Tag
Learner
Classifier-‐1
Tag
Tag
Tag
Tag
30k/min
Classifier-‐2
hdp://aidr.qcri.org/
Tag
Tag
Tag
Labeling
task
Model
10. HUMAN
ANNOTATION:
CHALLENGES
hdp://aidr.qcri.org/
• Crisis-‐specific
labels
are
necessary
• Contras)ng
vocabulary
• Differences
in
public
concerns,
affected
infrastructure
• New
labels
should
be
collected
for
each
new
crisis
1-‐
Labeling
task
selec0on
2-‐
Labeling
task
scheduling
• Which
tasks
to
pick?
• No
duplicate
tasks
should
be
labeled
• Priori0ze
tasks
that
are
likely
to
increase
accuracy
• All-‐at-‐once
labeling
• Gradual
labeling
• Independent
labeling
Crowdsourcing
is
a
big
research
topic.
We
address
two
challenges
here:
[
Imran
et
al.
2013b
]
11. DATASETS
hdp://aidr.qcri.org/
1. Joplin-‐2011
• Consists
of
206,764
tweets
collected
using
(#joplin)
2. Sandy-‐2012
• Consists
of
4,906,521
tweets
collected
using
(#sandy,
hurricane
sandy,
…)
3. Oklahoma-‐2013
• Consists
of
2,742,588
tweets
collected
using
(Oklahoma,
tornado,
…)
12. DISASTER
PHASES
&
#
OF
TWEETS
hdp://aidr.qcri.org/
Pre:
preparedness
phase
Impact:
phase
corresponds
to
the
period
in
which
the
main
effects
are
felt
Post:
corresponds
to
response
and
recovery
phase
Joplin
(leL),
Sandy
(center),
and
Oklahoma
(right).
Number
of
tweets
per
day
in
all
datasets.
13. LABELING
TASK
SELECTION
hdp://aidr.qcri.org/
Experiment:
Are
crisis-‐specific
labels
necessary?
Manual
labeling
(using
Crowdflower)
Train
Test
AUC
Joplin
Sandy
0.52
Joplin
Oklahoma
0.56
Sandy
Oklahoma
0.53
Dataset
Phase-‐S1
Phase-‐S2
Phase-‐S3
Phase-‐S4
Joplin
2,000
1,000
1,000
1,000
Sandy
2,000
1,000
1,000
1,000
Oklahoma
2,000
1,000
1,000
N/A
Classifica0on
accuracy
in
various
transfer
scenarios
*
AUC
0.5
represents
a
random
classifier
14. LABELING
TASK
SELECTION
hdp://aidr.qcri.org/
Experiment:
Is
de-‐duplica0on
necessary?
Phase
Train
Phase
Test
AUC
(without
de-‐
duplica0on)
AUC
(with
de-‐
duplica0on)
S1
(pre)
1,500
S1
(pre)
500
0.78
0.74
S1
(pre)
500
S1
(pre)
500
0.73
0.72
S2
(impact)
500
S2
(impact)
500
0.80
0.72
S3
(post)
500
S3
(post)
500
0.79
0.73
S4
(post’)
500
S4
(post’)
500
0.70
0.64
• 29-‐74%
of
tweets
are
re-‐tweets
&
60-‐75%
are
near
duplicates
• Duplica)on
causes
an
ar0ficial
increase
in
accuracy
• Necessary
to
reduce
classifier
bias.
Otherwise
learning
on
a
fewer
concepts
• Necessary
to
improve
workers
experience
[
Rogstadius
et
al.
2011
]
15. LABELING
TASK
SELECTION
hdp://aidr.qcri.org/
Experiment:
Which
approach
Passive
vs.
Ac0ve
learning?
JOPLIN
SANDY
OKLAHOMA
S1
S2
S3
S4
16. LABELING
TASK
SELECTION
hdp://aidr.qcri.org/
• Are
crisis-‐specific
labels
necessary?
[YES]
• Is
de-‐duplica0on
necessary?
[YES]
• Which
approach
to
follow
Passive
vs.
Ac0ve
learning?
[Ac0ve
learning]
Now
we
know
WHICH
tasks
to
select.
But
we
s0ll
don’t
know
WHEN
to
label
them?
17. LABELING
TASK
SCHEDULING
hdp://aidr.qcri.org/
• All-‐at-‐once
labeling
• Obtain
1,500
labels
on
S1
and
use
all
for
training
• Cumula0ve
labeling
• Obtain
500
labels
in
each
of
S1,
S2,
and
S3
and
train
on
labels
available
up
to
each
phase
• Independent
labeling
• Obtain
500
labels
in
each
of
S1,
S2,
and
S3
and
use
the
most
recent
labels
for
training,
discarding
old.
18. LABELING
TASK
SCHEDULING
Experiment:
Which
labeling
strategy
to
follow?
JOPLIN
SANDY
OKLAHOMA
Informa0ve
Informa0ve
(50%)
Dona0ons
19. CONCLUSION
&
FUTURE
WORK
hdp://aidr.qcri.org/
• Adap0ve
collec0on
• Post-‐processing/filtering
• More
features
and
learning
schemes
• Task
selec0on
• De-‐duplica)on
is
necessary
• Ac)ve
learning
approach
must
be
employed
• Task
scheduling
• All-‐at-‐once
for
small-‐scale
crises
• Incremental
for
medium-‐scale
crises
(needs
tests)
Future
work:
20. hdp://aidr.qcri.org/
AIDR
—Ar)ficial
Intelligence
for
Disaster
Response—
is
a
free,
open-‐source,
and
easy-‐to-‐use
plagorm
to
automa)cally
filter
and
classify
relevant
tweets
posted
during
humanitarian
crises.
Thank
you!