SlideShare a Scribd company logo
1 of 20
Download to read offline
Muhammad	
  Imran,	
  Carlos	
  Cas)llo,	
  Ji	
  Lucas,	
  	
  
Patrick	
  Meier,	
  Jakob	
  Rogstadius	
  
Qatar	
  Compu0ng	
  Research	
  Ins0tute	
  (QCRI)	
  
Doha,	
  Qatar	
  
Coordina0ng	
  Human	
  and	
  Machine	
  
Intelligence	
  to	
  Classify	
  Microblog	
  
Communica0ons	
  in	
  Crises	
  
USEFUL	
  INFORMATION	
  ON	
  TWITTER	
  
Cau0on	
  	
  
and	
  advice	
  
Informa0on	
  	
  
source	
  
Dona0ons	
  
Causali0es	
  	
  
&	
  damage	
  
A	
  siren	
  heard	
  
Tornado	
  warning	
  issued/li>ed	
  
Tornado	
  sigh)ng/touchdown	
  
42%	
  
50%	
  
30%	
  
12%	
  
18%	
  
Photos	
  as	
  info.	
  source	
  
Webpages	
  info.	
  source	
  
Videos	
  as	
  info.	
  source	
  
44%	
  
20%	
  
16%	
  
Other	
  dona)ons	
  
Money	
  
Equipment,	
  shelter,	
  	
  
Volunteers,	
  Blood	
  
38%	
  
8%	
  
54%	
  
People	
  injured	
  
People	
  dead	
  
Damage	
  
44%	
  
44%	
  
2%	
  
16%	
  
10%	
  
%	
  of	
  informa0ve	
  tweets	
  
Ref:	
  “Extrac-ng	
  Informa-on	
  Nuggets	
  from	
  Disaster-­‐Related	
  Messages	
  in	
  Social	
  Media”.	
  Imran	
  et	
  al.	
  ISCRAM-­‐2013,	
  Baden-­‐Baden,	
  Germany.	
  
SOCIAL	
  MEDIA	
  INFORMATION	
  PROCESSING:	
  	
  
OFFLINE	
  APPROACH	
  
Data	
  collec)on	
  
1	
   2	
  
Human	
  annota)ons	
  
on	
  sample	
  data	
  
Machine	
  training	
  
3	
  
Classifica)on	
  
4	
  
Disaster	
  Timeline:	
  
DATA	
  COLLECTION	
  
IMPACT	
  AND	
  RESPONSE	
  TIMELINE	
  
Source:	
  Department	
  of	
  Community	
  Safety,	
  Queensland	
  Govt.	
  2011	
  &	
  UNOCHA	
  
Disaster	
  response	
  (today)	
   Disaster	
  response	
  (target)	
  
Target	
  disaster	
  response	
  requires	
  real-­‐0me	
  processing.	
  
REAL-­‐TIME	
  SOCIAL	
  MEDIA	
  ANALYSIS	
  
Key	
  requirements:	
  
•  Real-­‐0me	
  data	
  collec)on	
  
•  Capable	
  to	
  incorporate	
  new	
  data	
  collec0on	
  strategies	
  
•  Obtain	
  human-­‐labels	
  in	
  real-­‐0me	
  
•  Perform	
  de-­‐duplica0on	
  
•  Perform	
  almost	
  online	
  machine	
  learning	
  
•  Con)nuous	
  learning	
  
•  Learn	
  as	
  new	
  labels	
  arrive	
  	
  
•  Perform	
  real-­‐0me	
  classifica0on	
  
•  Scale	
  with	
  big	
  disasters	
  (Sandy	
  15k	
  posts/min)	
  
Data	
  collec)on	
  
1	
   2	
  
Human	
  annota)ons	
   Machine	
  training	
  
3	
  
Classifica)on	
  
4	
  
ONLINE	
  APPROACH	
  
DATA	
  COLLECTION	
  
H
A	
  
Learning-­‐1	
  
CLASSIFICATION	
  
Learning-­‐2	
   Learning-­‐3	
   …	
   Learning-­‐n	
  
Human	
  
annota)on	
  -­‐	
  1	
  	
  
Human	
  
annota)on	
  -­‐	
  2	
  
Human	
  
annota)on	
  -­‐	
  3	
   …	
  
Human	
  
annota)on	
  -­‐	
  n	
  
First	
  few	
  hours	
  
SOCIAL	
  MEDIA	
  INFORMATION	
  PROCESSING:	
  	
  
ONLINE	
  APPROACH	
  (REAL-­‐TIME)	
  
hdp://aidr.qcri.org/	
  
AIDR	
  —Ar)ficial	
  Intelligence	
  for	
  Disaster	
  Response—	
  is	
  a	
  free,	
  open-­‐source,	
  and	
  easy-­‐to-­‐use	
  
	
  plagorm	
  to	
  automa)cally	
  filter	
  and	
  classify	
  relevant	
  tweets	
  posted	
  during	
  humanitarian	
  crises.	
  
1	
   2	
   3	
  
Collect	
   Curate	
   Classify	
  
AIDR:	
  FROM	
  END-­‐USERS	
  PERSPECTIVE	
  
Collec0on	
   Classifier(s)	
  
•  Keywords,	
  Hashtags	
  
•  Geographical	
  bounding	
  box	
  
•  Languages	
  
•  Follow	
  specific	
  set	
  of	
  users	
  
A	
  collec0on	
  is	
  a	
  set	
  of	
  filters	
   A	
  classifier	
  is	
  a	
  set	
  of	
  tags	
  
•  Dona0ons	
  requests	
  &	
  offers	
  
•  Damage	
  &	
  causali0es	
  
•  Eyewitness	
  accounts	
  
2	
  step	
  approach	
  
1	
   2	
  
hdp://aidr.qcri.org/	
  
REAL-­‐TIME	
  CLASSIFICATION	
  IN	
  AIDR	
  
Collec0on	
   Classifier(s)	
  
Tag	
   Tag	
  
Tag	
   Tag	
  
Learner	
  
Classifier-­‐1	
  
Tag	
  
Tag	
   Tag	
   Tag	
  
30k/min	
  
Classifier-­‐2	
  
hdp://aidr.qcri.org/	
  
Tag	
   Tag	
   Tag	
  
Labeling	
  task	
  
Model	
  
HUMAN	
  ANNOTATION:	
  CHALLENGES	
  
hdp://aidr.qcri.org/	
  
•  Crisis-­‐specific	
  labels	
  are	
  necessary	
  
•  Contras)ng	
  vocabulary	
  
•  Differences	
  in	
  public	
  concerns,	
  affected	
  infrastructure	
  
•  New	
  labels	
  should	
  be	
  collected	
  for	
  each	
  new	
  crisis	
  
1-­‐	
  Labeling	
  task	
  selec0on	
   2-­‐	
  Labeling	
  task	
  scheduling	
  
•  Which	
  tasks	
  to	
  pick?	
  
•  No	
  duplicate	
  tasks	
  should	
  be	
  labeled	
  
•  Priori0ze	
  tasks	
  that	
  are	
  likely	
  to	
  
increase	
  accuracy	
  
	
  
•  All-­‐at-­‐once	
  labeling	
  
•  Gradual	
  labeling	
  
•  Independent	
  labeling	
  
	
  
Crowdsourcing	
  is	
  a	
  big	
  research	
  topic.	
  We	
  address	
  two	
  challenges	
  here:	
  
[	
  Imran	
  et	
  al.	
  2013b	
  ]	
  
DATASETS	
  
hdp://aidr.qcri.org/	
  
1.  Joplin-­‐2011	
  
•  Consists	
  of	
  206,764	
  tweets	
  collected	
  using	
  (#joplin)	
  
2.  Sandy-­‐2012	
  
•  Consists	
  of	
  4,906,521	
  tweets	
  collected	
  using	
  
(#sandy,	
  hurricane	
  sandy,	
  …)	
  
3.  Oklahoma-­‐2013	
  
•  Consists	
  of	
  2,742,588	
  tweets	
  collected	
  using	
  
(Oklahoma,	
  tornado,	
  …)	
  	
  
DISASTER	
  PHASES	
  &	
  #	
  OF	
  TWEETS	
  
hdp://aidr.qcri.org/	
  
Pre:	
  preparedness	
  phase	
  
Impact:	
  phase	
  corresponds	
  to	
  the	
  period	
  in	
  which	
  the	
  main	
  effects	
  are	
  felt	
  
Post:	
  corresponds	
  to	
  response	
  and	
  recovery	
  phase	
  
Joplin	
  (leL),	
  Sandy	
  (center),	
  and	
  Oklahoma	
  (right).	
  Number	
  of	
  tweets	
  per	
  day	
  in	
  all	
  datasets.	
  
LABELING	
  TASK	
  SELECTION	
  
hdp://aidr.qcri.org/	
  
Experiment:	
  	
  Are	
  crisis-­‐specific	
  labels	
  necessary?	
  
Manual	
  labeling	
  (using	
  Crowdflower)	
  
Train	
   Test	
   AUC	
  
Joplin	
   Sandy	
   0.52	
  
Joplin	
   Oklahoma	
   0.56	
  
Sandy	
   Oklahoma	
   0.53	
  
Dataset	
   Phase-­‐S1	
   Phase-­‐S2	
   Phase-­‐S3	
   Phase-­‐S4	
  
Joplin	
   2,000	
   1,000	
   1,000	
   1,000	
  
Sandy	
   2,000	
   1,000	
   1,000	
   1,000	
  
Oklahoma	
   2,000	
   1,000	
   1,000	
   N/A	
  
Classifica0on	
  accuracy	
  in	
  various	
  transfer	
  scenarios	
  
*	
  AUC	
  0.5	
  represents	
  a	
  random	
  classifier	
  	
  
LABELING	
  TASK	
  SELECTION	
  
hdp://aidr.qcri.org/	
  
Experiment:	
  	
  Is	
  de-­‐duplica0on	
  necessary?	
  
Phase	
   Train	
   Phase	
   Test	
   AUC	
  (without	
  de-­‐
duplica0on)	
  
	
  
AUC	
  (with	
  de-­‐
duplica0on)	
  
S1	
  (pre)	
   1,500	
   S1	
  (pre)	
   500	
   0.78	
   0.74	
  
S1	
  (pre)	
   500	
   S1	
  (pre)	
   500	
   0.73	
   0.72	
  
S2	
  (impact)	
   500	
   S2	
  (impact)	
   500	
   0.80	
   0.72	
  
S3	
  (post)	
   500	
   S3	
  (post)	
   500	
   0.79	
   0.73	
  
S4	
  (post’)	
   500	
   S4	
  (post’)	
   500	
   0.70	
   0.64	
  
•  29-­‐74%	
  of	
  tweets	
  are	
  re-­‐tweets	
  &	
  60-­‐75%	
  are	
  near	
  duplicates	
  
•  Duplica)on	
  causes	
  an	
  ar0ficial	
  increase	
  in	
  accuracy	
  
•  Necessary	
  to	
  reduce	
  classifier	
  bias.	
  Otherwise	
  learning	
  on	
  a	
  fewer	
  concepts	
  
•  Necessary	
  to	
  improve	
  workers	
  experience	
  
[	
  Rogstadius	
  et	
  al.	
  2011	
  ]	
  
LABELING	
  TASK	
  SELECTION	
  
hdp://aidr.qcri.org/	
  
Experiment:	
  	
  Which	
  approach	
  Passive	
  vs.	
  Ac0ve	
  learning?	
  
JOPLIN	
  
SANDY	
  
OKLAHOMA	
  
S1	
   S2	
   S3	
   S4	
  
LABELING	
  TASK	
  SELECTION	
  
hdp://aidr.qcri.org/	
  
•  Are	
  crisis-­‐specific	
  labels	
  necessary?	
  [YES]	
  
•  Is	
  de-­‐duplica0on	
  necessary?	
  [YES]	
  
•  Which	
  approach	
  to	
  follow	
  Passive	
  vs.	
  Ac0ve	
  learning?	
  
[Ac0ve	
  learning]	
  
Now	
  we	
  know	
  WHICH	
  tasks	
  to	
  select.	
  
But	
  we	
  s0ll	
  don’t	
  know	
  WHEN	
  to	
  label	
  them?	
  
LABELING	
  TASK	
  SCHEDULING	
  
hdp://aidr.qcri.org/	
  
•  All-­‐at-­‐once	
  labeling	
  
•  Obtain	
  1,500	
  labels	
  on	
  S1	
  and	
  use	
  all	
  for	
  training	
  
•  Cumula0ve	
  labeling	
  
•  Obtain	
  500	
  labels	
  in	
  each	
  of	
  S1,	
  S2,	
  and	
  S3	
  and	
  train	
  on	
  
labels	
  available	
  up	
  to	
  each	
  phase	
  
•  Independent	
  labeling	
  
•  Obtain	
  500	
  labels	
  in	
  each	
  of	
  S1,	
  S2,	
  and	
  S3	
  and	
  use	
  the	
  
most	
  recent	
  labels	
  for	
  training,	
  discarding	
  old.	
  
	
  
LABELING	
  TASK	
  SCHEDULING	
  
Experiment:	
  	
  Which	
  labeling	
  strategy	
  to	
  follow?	
  
JOPLIN	
  
SANDY	
  
OKLAHOMA	
  
Informa0ve	
   Informa0ve	
  (50%)	
   Dona0ons	
  
CONCLUSION	
  &	
  FUTURE	
  WORK	
  
hdp://aidr.qcri.org/	
  
•  Adap0ve	
  collec0on	
  
•  Post-­‐processing/filtering	
  
•  More	
  features	
  and	
  learning	
  schemes	
  
•  Task	
  selec0on	
  
•  De-­‐duplica)on	
  is	
  necessary	
  
•  Ac)ve	
  learning	
  approach	
  must	
  be	
  employed	
  
•  Task	
  scheduling	
  
•  All-­‐at-­‐once	
  for	
  small-­‐scale	
  crises	
  
•  Incremental	
  for	
  medium-­‐scale	
  crises	
  (needs	
  tests)	
  
Future	
  work:	
  
hdp://aidr.qcri.org/	
  
AIDR	
  —Ar)ficial	
  Intelligence	
  for	
  Disaster	
  Response—	
  is	
  a	
  free,	
  open-­‐source,	
  and	
  easy-­‐to-­‐use	
  
	
  plagorm	
  to	
  automa)cally	
  filter	
  and	
  classify	
  relevant	
  tweets	
  posted	
  during	
  humanitarian	
  crises.	
  
Thank	
  you!	
  

More Related Content

Similar to Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

Collecting and Coding Twitter Data in DiscoverText
Collecting and Coding Twitter Data in DiscoverTextCollecting and Coding Twitter Data in DiscoverText
Collecting and Coding Twitter Data in DiscoverTextJill Hopke
 
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...Denis Parra Santander
 
Processing and understanding geo-social media content
Processing and understanding geo-social media contentProcessing and understanding geo-social media content
Processing and understanding geo-social media contentfoostermann
 
ETech2008 DisasterTech Robbins Maron 20080305a
ETech2008 DisasterTech Robbins Maron 20080305aETech2008 DisasterTech Robbins Maron 20080305a
ETech2008 DisasterTech Robbins Maron 20080305aJesse Robbins
 
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Artificial Intelligence Institute at UofSC
 
MediaEval 2018: Multimedia Satellite Task: Emergency Response for Flooding Ev...
MediaEval 2018: Multimedia Satellite Task: Emergency Response for Flooding Ev...MediaEval 2018: Multimedia Satellite Task: Emergency Response for Flooding Ev...
MediaEval 2018: Multimedia Satellite Task: Emergency Response for Flooding Ev...multimediaeval
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
 
Working at the Speed of Night - Vicky Hoyt, Flatiron West, Inc.
Working at the Speed of Night - Vicky Hoyt, Flatiron West, Inc.Working at the Speed of Night - Vicky Hoyt, Flatiron West, Inc.
Working at the Speed of Night - Vicky Hoyt, Flatiron West, Inc.AGC of California
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...Paolo Missier
 
Benevolent machine learning sgs
Benevolent machine learning sgsBenevolent machine learning sgs
Benevolent machine learning sgsScott Turner
 
What Can I Do Now? (web 2.0 pedagogy) v4
What Can I Do Now? (web 2.0 pedagogy) v4What Can I Do Now? (web 2.0 pedagogy) v4
What Can I Do Now? (web 2.0 pedagogy) v4Darren Kuropatwa
 
A Real-time System for Detecting Landslide Reports on Social Media using Arti...
A Real-time System for Detecting Landslide Reports on Social Media using Arti...A Real-time System for Detecting Landslide Reports on Social Media using Arti...
A Real-time System for Detecting Landslide Reports on Social Media using Arti...ferda ofli
 
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015fridolin.wild
 
Interpretability and Reproducibility in Production Machine Learning Applicat...
 Interpretability and Reproducibility in Production Machine Learning Applicat... Interpretability and Reproducibility in Production Machine Learning Applicat...
Interpretability and Reproducibility in Production Machine Learning Applicat...Swaminathan Sundararaman
 
Conquering Tough Challenges for More Effective Emergency Notification
Conquering Tough Challenges for More Effective Emergency NotificationConquering Tough Challenges for More Effective Emergency Notification
Conquering Tough Challenges for More Effective Emergency NotificationEverbridge, Inc.
 
Social-aware Opportunistic Routing Protocol based on User's Interactions and ...
Social-aware Opportunistic Routing Protocol based on User's Interactions and ...Social-aware Opportunistic Routing Protocol based on User's Interactions and ...
Social-aware Opportunistic Routing Protocol based on User's Interactions and ...Waldir Moreira
 

Similar to Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises (20)

Collecting and Coding Twitter Data in DiscoverText
Collecting and Coding Twitter Data in DiscoverTextCollecting and Coding Twitter Data in DiscoverText
Collecting and Coding Twitter Data in DiscoverText
 
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
 
Processing and understanding geo-social media content
Processing and understanding geo-social media contentProcessing and understanding geo-social media content
Processing and understanding geo-social media content
 
ETech2008 DisasterTech Robbins Maron 20080305a
ETech2008 DisasterTech Robbins Maron 20080305aETech2008 DisasterTech Robbins Maron 20080305a
ETech2008 DisasterTech Robbins Maron 20080305a
 
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
 
Scc codesign-cmc
Scc codesign-cmcScc codesign-cmc
Scc codesign-cmc
 
Caravan Studios Update for TechSoup Global Staff
Caravan Studios Update for TechSoup Global StaffCaravan Studios Update for TechSoup Global Staff
Caravan Studios Update for TechSoup Global Staff
 
MediaEval 2018: Multimedia Satellite Task: Emergency Response for Flooding Ev...
MediaEval 2018: Multimedia Satellite Task: Emergency Response for Flooding Ev...MediaEval 2018: Multimedia Satellite Task: Emergency Response for Flooding Ev...
MediaEval 2018: Multimedia Satellite Task: Emergency Response for Flooding Ev...
 
Final Report: 2006 StrongAngel III - integrated disaster response demonstrati...
Final Report: 2006 StrongAngel III - integrated disaster response demonstrati...Final Report: 2006 StrongAngel III - integrated disaster response demonstrati...
Final Report: 2006 StrongAngel III - integrated disaster response demonstrati...
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
 
Working at the Speed of Night - Vicky Hoyt, Flatiron West, Inc.
Working at the Speed of Night - Vicky Hoyt, Flatiron West, Inc.Working at the Speed of Night - Vicky Hoyt, Flatiron West, Inc.
Working at the Speed of Night - Vicky Hoyt, Flatiron West, Inc.
 
Decision Support System for people evacuation: mobility demand and transporta...
Decision Support System for people evacuation: mobility demand and transporta...Decision Support System for people evacuation: mobility demand and transporta...
Decision Support System for people evacuation: mobility demand and transporta...
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
 
Benevolent machine learning sgs
Benevolent machine learning sgsBenevolent machine learning sgs
Benevolent machine learning sgs
 
What Can I Do Now? (web 2.0 pedagogy) v4
What Can I Do Now? (web 2.0 pedagogy) v4What Can I Do Now? (web 2.0 pedagogy) v4
What Can I Do Now? (web 2.0 pedagogy) v4
 
A Real-time System for Detecting Landslide Reports on Social Media using Arti...
A Real-time System for Detecting Landslide Reports on Social Media using Arti...A Real-time System for Detecting Landslide Reports on Social Media using Arti...
A Real-time System for Detecting Landslide Reports on Social Media using Arti...
 
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
 
Interpretability and Reproducibility in Production Machine Learning Applicat...
 Interpretability and Reproducibility in Production Machine Learning Applicat... Interpretability and Reproducibility in Production Machine Learning Applicat...
Interpretability and Reproducibility in Production Machine Learning Applicat...
 
Conquering Tough Challenges for More Effective Emergency Notification
Conquering Tough Challenges for More Effective Emergency NotificationConquering Tough Challenges for More Effective Emergency Notification
Conquering Tough Challenges for More Effective Emergency Notification
 
Social-aware Opportunistic Routing Protocol based on User's Interactions and ...
Social-aware Opportunistic Routing Protocol based on User's Interactions and ...Social-aware Opportunistic Routing Protocol based on User's Interactions and ...
Social-aware Opportunistic Routing Protocol based on User's Interactions and ...
 

More from Muhammad Imran

Processing Social Media Messages in Mass Emergency: A Survey
Processing Social Media Messages in Mass Emergency: A SurveyProcessing Social Media Messages in Mass Emergency: A Survey
Processing Social Media Messages in Mass Emergency: A SurveyMuhammad Imran
 
Damage Assessment from Social Media Imagery Data During Disasters
Damage Assessment from Social Media Imagery Data During DisastersDamage Assessment from Social Media Imagery Data During Disasters
Damage Assessment from Social Media Imagery Data During DisastersMuhammad Imran
 
Image4Act: Online Social Media Image Processing for Disaster Response
Image4Act: Online Social Media Image Processing for Disaster ResponseImage4Act: Online Social Media Image Processing for Disaster Response
Image4Act: Online Social Media Image Processing for Disaster ResponseMuhammad Imran
 
Real-Time Processing of Social Media Content for Social Good
Real-Time Processing of Social Media Content for Social GoodReal-Time Processing of Social Media Content for Social Good
Real-Time Processing of Social Media Content for Social GoodMuhammad Imran
 
AIDR Tutorial (Artificial Intelligence for Disaster Response)
AIDR Tutorial (Artificial Intelligence for Disaster Response)AIDR Tutorial (Artificial Intelligence for Disaster Response)
AIDR Tutorial (Artificial Intelligence for Disaster Response)Muhammad Imran
 
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...Muhammad Imran
 
Summarizing Situational Tweets in Crisis Scenario
Summarizing Situational Tweets in Crisis ScenarioSummarizing Situational Tweets in Crisis Scenario
Summarizing Situational Tweets in Crisis ScenarioMuhammad Imran
 
Artificial Intelligence for Disaster Response
Artificial Intelligence for Disaster ResponseArtificial Intelligence for Disaster Response
Artificial Intelligence for Disaster ResponseMuhammad Imran
 
A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Di...
A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Di...A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Di...
A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Di...Muhammad Imran
 
Domain Specific Mashups
Domain Specific MashupsDomain Specific Mashups
Domain Specific MashupsMuhammad Imran
 
Reseval Mashup Platform Talk at SECO
Reseval Mashup Platform Talk at SECOReseval Mashup Platform Talk at SECO
Reseval Mashup Platform Talk at SECOMuhammad Imran
 
ResEval: Resource-oriented Research Impact Evaluation platform
ResEval: Resource-oriented Research Impact Evaluation platformResEval: Resource-oriented Research Impact Evaluation platform
ResEval: Resource-oriented Research Impact Evaluation platformMuhammad Imran
 

More from Muhammad Imran (12)

Processing Social Media Messages in Mass Emergency: A Survey
Processing Social Media Messages in Mass Emergency: A SurveyProcessing Social Media Messages in Mass Emergency: A Survey
Processing Social Media Messages in Mass Emergency: A Survey
 
Damage Assessment from Social Media Imagery Data During Disasters
Damage Assessment from Social Media Imagery Data During DisastersDamage Assessment from Social Media Imagery Data During Disasters
Damage Assessment from Social Media Imagery Data During Disasters
 
Image4Act: Online Social Media Image Processing for Disaster Response
Image4Act: Online Social Media Image Processing for Disaster ResponseImage4Act: Online Social Media Image Processing for Disaster Response
Image4Act: Online Social Media Image Processing for Disaster Response
 
Real-Time Processing of Social Media Content for Social Good
Real-Time Processing of Social Media Content for Social GoodReal-Time Processing of Social Media Content for Social Good
Real-Time Processing of Social Media Content for Social Good
 
AIDR Tutorial (Artificial Intelligence for Disaster Response)
AIDR Tutorial (Artificial Intelligence for Disaster Response)AIDR Tutorial (Artificial Intelligence for Disaster Response)
AIDR Tutorial (Artificial Intelligence for Disaster Response)
 
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
 
Summarizing Situational Tweets in Crisis Scenario
Summarizing Situational Tweets in Crisis ScenarioSummarizing Situational Tweets in Crisis Scenario
Summarizing Situational Tweets in Crisis Scenario
 
Artificial Intelligence for Disaster Response
Artificial Intelligence for Disaster ResponseArtificial Intelligence for Disaster Response
Artificial Intelligence for Disaster Response
 
A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Di...
A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Di...A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Di...
A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Di...
 
Domain Specific Mashups
Domain Specific MashupsDomain Specific Mashups
Domain Specific Mashups
 
Reseval Mashup Platform Talk at SECO
Reseval Mashup Platform Talk at SECOReseval Mashup Platform Talk at SECO
Reseval Mashup Platform Talk at SECO
 
ResEval: Resource-oriented Research Impact Evaluation platform
ResEval: Resource-oriented Research Impact Evaluation platformResEval: Resource-oriented Research Impact Evaluation platform
ResEval: Resource-oriented Research Impact Evaluation platform
 

Recently uploaded

Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

  • 1. Muhammad  Imran,  Carlos  Cas)llo,  Ji  Lucas,     Patrick  Meier,  Jakob  Rogstadius   Qatar  Compu0ng  Research  Ins0tute  (QCRI)   Doha,  Qatar   Coordina0ng  Human  and  Machine   Intelligence  to  Classify  Microblog   Communica0ons  in  Crises  
  • 2. USEFUL  INFORMATION  ON  TWITTER   Cau0on     and  advice   Informa0on     source   Dona0ons   Causali0es     &  damage   A  siren  heard   Tornado  warning  issued/li>ed   Tornado  sigh)ng/touchdown   42%   50%   30%   12%   18%   Photos  as  info.  source   Webpages  info.  source   Videos  as  info.  source   44%   20%   16%   Other  dona)ons   Money   Equipment,  shelter,     Volunteers,  Blood   38%   8%   54%   People  injured   People  dead   Damage   44%   44%   2%   16%   10%   %  of  informa0ve  tweets   Ref:  “Extrac-ng  Informa-on  Nuggets  from  Disaster-­‐Related  Messages  in  Social  Media”.  Imran  et  al.  ISCRAM-­‐2013,  Baden-­‐Baden,  Germany.  
  • 3. SOCIAL  MEDIA  INFORMATION  PROCESSING:     OFFLINE  APPROACH   Data  collec)on   1   2   Human  annota)ons   on  sample  data   Machine  training   3   Classifica)on   4   Disaster  Timeline:   DATA  COLLECTION  
  • 4. IMPACT  AND  RESPONSE  TIMELINE   Source:  Department  of  Community  Safety,  Queensland  Govt.  2011  &  UNOCHA   Disaster  response  (today)   Disaster  response  (target)   Target  disaster  response  requires  real-­‐0me  processing.  
  • 5. REAL-­‐TIME  SOCIAL  MEDIA  ANALYSIS   Key  requirements:   •  Real-­‐0me  data  collec)on   •  Capable  to  incorporate  new  data  collec0on  strategies   •  Obtain  human-­‐labels  in  real-­‐0me   •  Perform  de-­‐duplica0on   •  Perform  almost  online  machine  learning   •  Con)nuous  learning   •  Learn  as  new  labels  arrive     •  Perform  real-­‐0me  classifica0on   •  Scale  with  big  disasters  (Sandy  15k  posts/min)  
  • 6. Data  collec)on   1   2   Human  annota)ons   Machine  training   3   Classifica)on   4   ONLINE  APPROACH   DATA  COLLECTION   H A   Learning-­‐1   CLASSIFICATION   Learning-­‐2   Learning-­‐3   …   Learning-­‐n   Human   annota)on  -­‐  1     Human   annota)on  -­‐  2   Human   annota)on  -­‐  3   …   Human   annota)on  -­‐  n   First  few  hours   SOCIAL  MEDIA  INFORMATION  PROCESSING:     ONLINE  APPROACH  (REAL-­‐TIME)  
  • 7. hdp://aidr.qcri.org/   AIDR  —Ar)ficial  Intelligence  for  Disaster  Response—  is  a  free,  open-­‐source,  and  easy-­‐to-­‐use    plagorm  to  automa)cally  filter  and  classify  relevant  tweets  posted  during  humanitarian  crises.   1   2   3   Collect   Curate   Classify  
  • 8. AIDR:  FROM  END-­‐USERS  PERSPECTIVE   Collec0on   Classifier(s)   •  Keywords,  Hashtags   •  Geographical  bounding  box   •  Languages   •  Follow  specific  set  of  users   A  collec0on  is  a  set  of  filters   A  classifier  is  a  set  of  tags   •  Dona0ons  requests  &  offers   •  Damage  &  causali0es   •  Eyewitness  accounts   2  step  approach   1   2   hdp://aidr.qcri.org/  
  • 9. REAL-­‐TIME  CLASSIFICATION  IN  AIDR   Collec0on   Classifier(s)   Tag   Tag   Tag   Tag   Learner   Classifier-­‐1   Tag   Tag   Tag   Tag   30k/min   Classifier-­‐2   hdp://aidr.qcri.org/   Tag   Tag   Tag   Labeling  task   Model  
  • 10. HUMAN  ANNOTATION:  CHALLENGES   hdp://aidr.qcri.org/   •  Crisis-­‐specific  labels  are  necessary   •  Contras)ng  vocabulary   •  Differences  in  public  concerns,  affected  infrastructure   •  New  labels  should  be  collected  for  each  new  crisis   1-­‐  Labeling  task  selec0on   2-­‐  Labeling  task  scheduling   •  Which  tasks  to  pick?   •  No  duplicate  tasks  should  be  labeled   •  Priori0ze  tasks  that  are  likely  to   increase  accuracy     •  All-­‐at-­‐once  labeling   •  Gradual  labeling   •  Independent  labeling     Crowdsourcing  is  a  big  research  topic.  We  address  two  challenges  here:   [  Imran  et  al.  2013b  ]  
  • 11. DATASETS   hdp://aidr.qcri.org/   1.  Joplin-­‐2011   •  Consists  of  206,764  tweets  collected  using  (#joplin)   2.  Sandy-­‐2012   •  Consists  of  4,906,521  tweets  collected  using   (#sandy,  hurricane  sandy,  …)   3.  Oklahoma-­‐2013   •  Consists  of  2,742,588  tweets  collected  using   (Oklahoma,  tornado,  …)    
  • 12. DISASTER  PHASES  &  #  OF  TWEETS   hdp://aidr.qcri.org/   Pre:  preparedness  phase   Impact:  phase  corresponds  to  the  period  in  which  the  main  effects  are  felt   Post:  corresponds  to  response  and  recovery  phase   Joplin  (leL),  Sandy  (center),  and  Oklahoma  (right).  Number  of  tweets  per  day  in  all  datasets.  
  • 13. LABELING  TASK  SELECTION   hdp://aidr.qcri.org/   Experiment:    Are  crisis-­‐specific  labels  necessary?   Manual  labeling  (using  Crowdflower)   Train   Test   AUC   Joplin   Sandy   0.52   Joplin   Oklahoma   0.56   Sandy   Oklahoma   0.53   Dataset   Phase-­‐S1   Phase-­‐S2   Phase-­‐S3   Phase-­‐S4   Joplin   2,000   1,000   1,000   1,000   Sandy   2,000   1,000   1,000   1,000   Oklahoma   2,000   1,000   1,000   N/A   Classifica0on  accuracy  in  various  transfer  scenarios   *  AUC  0.5  represents  a  random  classifier    
  • 14. LABELING  TASK  SELECTION   hdp://aidr.qcri.org/   Experiment:    Is  de-­‐duplica0on  necessary?   Phase   Train   Phase   Test   AUC  (without  de-­‐ duplica0on)     AUC  (with  de-­‐ duplica0on)   S1  (pre)   1,500   S1  (pre)   500   0.78   0.74   S1  (pre)   500   S1  (pre)   500   0.73   0.72   S2  (impact)   500   S2  (impact)   500   0.80   0.72   S3  (post)   500   S3  (post)   500   0.79   0.73   S4  (post’)   500   S4  (post’)   500   0.70   0.64   •  29-­‐74%  of  tweets  are  re-­‐tweets  &  60-­‐75%  are  near  duplicates   •  Duplica)on  causes  an  ar0ficial  increase  in  accuracy   •  Necessary  to  reduce  classifier  bias.  Otherwise  learning  on  a  fewer  concepts   •  Necessary  to  improve  workers  experience   [  Rogstadius  et  al.  2011  ]  
  • 15. LABELING  TASK  SELECTION   hdp://aidr.qcri.org/   Experiment:    Which  approach  Passive  vs.  Ac0ve  learning?   JOPLIN   SANDY   OKLAHOMA   S1   S2   S3   S4  
  • 16. LABELING  TASK  SELECTION   hdp://aidr.qcri.org/   •  Are  crisis-­‐specific  labels  necessary?  [YES]   •  Is  de-­‐duplica0on  necessary?  [YES]   •  Which  approach  to  follow  Passive  vs.  Ac0ve  learning?   [Ac0ve  learning]   Now  we  know  WHICH  tasks  to  select.   But  we  s0ll  don’t  know  WHEN  to  label  them?  
  • 17. LABELING  TASK  SCHEDULING   hdp://aidr.qcri.org/   •  All-­‐at-­‐once  labeling   •  Obtain  1,500  labels  on  S1  and  use  all  for  training   •  Cumula0ve  labeling   •  Obtain  500  labels  in  each  of  S1,  S2,  and  S3  and  train  on   labels  available  up  to  each  phase   •  Independent  labeling   •  Obtain  500  labels  in  each  of  S1,  S2,  and  S3  and  use  the   most  recent  labels  for  training,  discarding  old.    
  • 18. LABELING  TASK  SCHEDULING   Experiment:    Which  labeling  strategy  to  follow?   JOPLIN   SANDY   OKLAHOMA   Informa0ve   Informa0ve  (50%)   Dona0ons  
  • 19. CONCLUSION  &  FUTURE  WORK   hdp://aidr.qcri.org/   •  Adap0ve  collec0on   •  Post-­‐processing/filtering   •  More  features  and  learning  schemes   •  Task  selec0on   •  De-­‐duplica)on  is  necessary   •  Ac)ve  learning  approach  must  be  employed   •  Task  scheduling   •  All-­‐at-­‐once  for  small-­‐scale  crises   •  Incremental  for  medium-­‐scale  crises  (needs  tests)   Future  work:  
  • 20. hdp://aidr.qcri.org/   AIDR  —Ar)ficial  Intelligence  for  Disaster  Response—  is  a  free,  open-­‐source,  and  easy-­‐to-­‐use    plagorm  to  automa)cally  filter  and  classify  relevant  tweets  posted  during  humanitarian  crises.   Thank  you!