This document discusses extracting useful information from social media messages during disasters. It outlines filtering disaster-related tweets, classifying them by type (e.g. caution/advice, casualties), and extracting key information within tweets (e.g. locations, needs). The approach is demonstrated on datasets from the 2011 Joplin tornado and 2012 Hurricane Sandy. Automatic classification achieves over 80% accuracy for some classes. Information extraction obtains up to 90% precision. Ongoing work includes providing these tools as a machine learning service to help during crises.
Six Myths about Ontologies: The Basics of Formal Ontology
Extracting Information Nuggets from Disaster-Related Messages in Social Media
1. Extracting Information Nuggets from
Disaster-Related Messages in Social Media
Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz, Patrick Meier
2. Outline
• Social media response to disaster
• Finding tactical and actionable information
• Disaster ontologies
• Filtering, classification and extraction
• Ongoing work
• Discussion
3. Disaster and Social Media
2.3 million tweets reflecting the words “Haiti”
or “Red Cross” from Jan 12 to Jan 14, 2010
http://www.sysomos.com
5. Why Social Media?
• Virtual Collaboration, Information Sharing
• Highly valuable information
• Contribute to situational awareness
• Highly useful, if analyzed timely and
effectively
6. Sandy Tweets
@NYGovCuomo orders closing of NYC bridges. Only Staten Island
bridges unaffected at this time. Bridges must close by 7pm. #Sandy
#NYC.
rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours
after they got separated from their mom when car submerged in si.
#sandy #911buff
freaking out. home alone. will just watch tv #Sandy #NYC.
400 Volunteers are needed for areas that #Sandy destroyed.
7. Sandy Tweets
@NYGovCuomo orders closing of NYC bridges. Only Staten Island
bridges unaffected at this time. Bridges must close by 7pm. #Sandy
#NYC.
rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours
after they got separated from their mom when car submerged in si.
#sandy #911buff
freaking out. home alone. will just watch tv #Sandy #NYC.
400 Volunteers are needed for areas that #Sandy destroyed.
Personal
Informative
8. Sandy Tweets
@NYGovCuomo orders closing of NYC bridges. Only Staten Island
bridges unaffected at this time. Bridges must close by 7pm. #Sandy
#NYC.
rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours
after they got separated from their mom when car submerged in si.
#sandy #911buff
freaking out. home alone. will just watch tv #Sandy #NYC.
400 Volunteers are needed for areas that #Sandy destroyed.
Personal
Informative
Caution and Advice
Casualties and Damage
Donations
9. Sandy Tweets
@NYGovCuomo orders closing of NYC bridges. Only Staten Island
bridges unaffected at this time. Bridges must close by 7pm. #Sandy
#NYC.
rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours
after they got separated from their mom when car submerged in si.
#sandy #911buff
freaking out. home alone. will just watch tv #Sandy #NYC.
400 Volunteers are needed for areas that #Sandy destroyed.
Personal
Informative
Caution and Advice
Casualties and Damage
Donations
10. Finding Tactical & Actionable Information
Personal
Informative
(Direct & Indirect)
Other
Caution and advice
Casualties and damage
Donations
People missing, found, or seen
Information source
Siren heard, warning issued/lifted etc.
People dead, injured, damage etc.
Money, shelter, blood, goods, or services
Webpages, photos, videos information sources
…
12. Our Datasets
Joplin Dataset
• 206,764 tweets collected during Joplin tornado
that hit Joplin, Missouri on May 22, 2011
• Collected by researchers at the university of
Colorado at Boulder
• Collected through Twitter API by monitoring the
tweets with hashtags #joplin or #tornado
13. Our Datasets
Sandy Dataset:
• 140,000 tweets collected during hurricane Sandy
that hit northeastern USA on Oct 29, 2012
• Collected through Twitter API by monitoring the
tweets with hashtag #sandy or #nyc
15. 1. Filtering: Training Data
32%
60%
8%
4406 tweets sampled uniformly from the
Joplin dataset Annotated using CrowdFlower
Personal
Informative
Other
20. Labels for Extraction: Training Data
• Type-dependent instruction
• Ask evaluators to copy-paste a word/phrase
from each tweet
21. Tool
• CMU ARK Twitter NLP
– Tokenization
– Feature extraction
– CRF learning
• Very easy to use: simply change the training
set (part-of-speech tags) into anything, and re-
train
22. Extraction Evaluation
Setting Rec Prec
Train 2/3 Joplin, Test 1/3 Joplin 78% 90%
Train 2/3 Sandy, Test 1/3 Sandy 41% 79%
Train Joplin, Test Sandy 11% 78%
Train Joplin + 10% Sandy, Test 90% Sandy 21% 81%
• Precision is: one word or more in common with
what humans extracted (Imran et al., 2013)
24. Self-service for crisis-related classification
• Machine learning software can be provided as
a service
– e.g. Google Prediction API
• Can we provide crisis-related tweet
classification as a service?
– Automatic collection of tweets
– Re-usable ontologies / default training sets
– Active learning
25. Request Labeled / Unlabeled Datasets
Contact us at: mimran@qf.org.qa
26. References
• K. Starbird, L. Palen, A. Hughes, and S. Vieweg (2010) Chatter on the red: what hazards
threat reveals about the social life of microblogged information. In Proceedings of the 2010
ACM conference on Computer supported cooperative work, pages 241–250. ACM.
• Latonero, Mark, and Irina Shklovski. "“Respectfully Yours in Safety and Service”: Emergency
Management & Social Media Evangelism." Proceedings of the 7th International ISCRAM
Conference–Seattle. Vol. 1. 2010.
• Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier.
Practical Extraction of Disaster-Relevant Information from Social Media. WWW-2013
SWDM, May 2013
Social media empowers individuals, providing them a platform from which to share opinions, experiences and information from anywhere at any time. Ultimately the shared information can be highly useful provided if analyzed timely and effectively. And that’s what I am going to present in this session.
Finding tactical and actionable information from a millions of messages that people post on social media is a complex and challenging task. For this purpose, specifically for disasters we came up with a sensible ontology that has mainly three stages. Every stage refine a piece of information that thus can highly contribute to disaster management. In order to get to the actionable information it is required that we first categories a coming message to a predefined category that is of disaster-specific.
Identifies what named entities, what caution/advice and temporal information and others.
The inter annotator agreement value shows the level of agreement among workers on an assessable unit(i.e., in our case a tweet). High agreement indicates that different workers frequently gave the same response forthe same tweet message.