SlideShare a Scribd company logo
1 of 21
Classifying Microblogs for Disasters 
Sarvnaz Karimi Jessie Yin Cecile Paris
Social media plays an important role during 
disasters 
• Realtime, popular, free 
• Accessible 
• Available 
CSIRO: positive impact | Classifying Microblogs 2 | for Disasters | Sarvnaz Karimi
During disasters people share useful information 
• lyttelton tunnel had reopened last night #eqnz 
Or ask for help or information 
• Kindercare in Fendalton, Christchurch - all okay? We are trying to 
get through with no luck. #eqnz 
• Need help. Any donors of medicines for diarrhea cases in Baganga, 
Davao Oriental pls? #reliefPH #PabloPH pls tweet @KarloPuerto 
Or even offer help 
• I hv final yr medstudents in parade rd addington! They cn help. 
Bruce n boys #eqnz 
And sometimes not so useful 
• Someone just wondered aloud if the #eqnz was just another sign 
from God that he doesn't want The Hobbit to get made. #maybe? 
CSIRO: positive impact | Classifying Microblogs 3 | for Disasters | Sarvnaz Karimi
Challenges of Working with Twitter Data 
• In fact, lots of times Tweets are useless 
babbles 
• Tweets are really short (140 characters) 
• People often speak informal language 
• And even in serious messages, tweets can be 
abbreviated to compensate for the length 
I hv final yr medstudents in parade rd addington! They cn help. Bruce n boys #eqnz 
Finding useful content can become looking for a needle 
in a haystack! 
CSIRO: positive impact | Classifying Microblogs 4 | for Disasters | Sarvnaz Karimi
How to filter massive amount of Twitter 
messages in order to identify high value 
tweets related to natural or man-made 
disasters, or even specific types of disaster? 
CSIRO: positive impact | Presentation 5 | title | Presenter name
Keyword search to find disaster-related tweets 
• Lots of false-positives due to multiple senses or ambiguities of 
keywords such as “fire”, or even “earthquake” 
She’s a natural disaster: a tsunami in her 
eyes an earthquake in her chest a hurricane 
flooding her mind she’s a traveling 
catastrophe 
In a pool of over 5700 tweets retrieved using keyword 
search, we had over 50% false positives. 
CSIRO: positive impact | Classifying Microblogs 6 | for Disasters | Sarvnaz Karimi
Our work: Classify Twitter Stream for Disasters 
• Classify tweets as Disaster and Non-disaster 
Binary Classification 
• Classify tweets into disaster types: 
– Earthquake 
– Storm (hurricane, tornado, cyclone) 
– Fire 
– Flooding 
– Other (e.g Civil disorder, Traffic accident) 
Multi-class classification problem 
CSIRO: positive impact | Classifying Microblogs 7 | for Disasters | Sarvnaz Karimi
Related Studies 
• Tweet classification: 
o Papers that used classifiers for categories such as news and junk, or opinion, 
and private messages. 
o Papers that heavily used hashtags. 
o Adding context to short tweets by aggregating those that share the same 
hashtags, or by adding URL contents. 
• Twitter during disasters: 
o Qualitative analysis on tweets published during a specific event to study 
microblogger behaviour. 
o On of the most cited works is by Sakaki et al. (2010), which made a classifier for 
earthquake to alert people. Their classifier was based on tweet length, position 
of query term (earthquake or shaking) in the tweet, n-grams, context of the 
query terms. 
We do not focus on specific incidents, and do not assume the hashtags are known. 
We study different types of disasters, not just one. 
CSIRO: positive impact | Presentation 8 | title | Presenter name
Twitter Data 
• Sampled a total of 6,500 tweets published in a range of 
two years, from December 2010 till November 2012 
• Data was gathered using keyword search (fire, flooding, 
storm, tornado, hurricane, cyclone, and earthquake, 
accident). 
• No retweets 
• A number of disasters were included, among others: 
earthquake in Christchurch, New Zealand, 2011, Cyclone 
Yasi QLD, 2011, QLD floods, 2010-2011, bushfires in VIC, 
2011, and the Hurricane Sandy, US 2012. 
CSIRO: positive impact | Classifying Microblogs 9 | for Disasters | Sarvnaz Karimi
Annotations 
• Two stage annotations 
• Crowd-sourced the annotations using Crowdflower. 
• Annotators where asked: 
1. Is this tweet talking about a disaster? (Yes or No); 
2. What type of disaster is it talking about? (multiple choice) 
• Each tweet was annotated by three annotators 
• 5,747 had full agreement 
• 2850 tweets were identified as disaster-related and 
2,897 as non-disaster 
CSIRO: positive impact | Classifying Microblogs 10 | for Disasters | Sarvnaz Karimi
Classifiers 
• SVM Classifier 
• Multinomial Naive Bayes Classifier 
• We only reported SVM. Naive Bayes consistently 
performed worse in all the experiments. 
C. Chang and C. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 
CSIRO: positive impact | Classifying Microblogs 11 | for Disasters | Sarvnaz Karimi
Classification Features 
Specific Features: 
• N-grams 
• Hashtags 
• Mentions 
Generic Features: 
• Mention count 
• Hashtag count 
• Links 
• Tweet length 
CSIRO: positive impact | Presentation 12 | title | Presenter name 
What is the effect of using 
incident-specific compared to 
generic features in 
classification accuracy? What 
are the best features to use for 
disaster classifiers?
Evaluation: Cross-validation vs. Time-Split 
• K-fold cross-validation (e.g., 10 fold) is used in most similar 
studies (Sriram et al., 2010, Takemura and Tajima, 2012, Vosecky et al., 2012) 
Problem: 
• It overlooks the time-dependency among microblog data, and 
uses future-evidence, including hashtags, disaster names 
Alternative: 
• Time-split evaluation: Sort the data based on time, take the latest 
chunk as testing and others for training. 
CSIRO: positive impact | Classifying Microblogs 13 | for Disasters | Sarvnaz Karimi
Disaster or Non-Disaster 
CSIRO: positive impact | Classifying Microblogs 14 | for Disasters | Sarvnaz Karimi
Disaster-Type Classification 
CSIRO: positive impact | Classifying Microblogs 15 | for Disasters | Sarvnaz Karimi
What features worked 
• When training data is small, counts were better features. 
– Disaster-related tweets had 1.2 hashtags on average, versus 0.4 for non-disaster 
tweets 
• When our knowledge of an event is limited, hashtags or mentions 
are not so useful. 
• In our experiments, classification accuracy using bigram features 
was worse than unigram. 
CSIRO: positive impact | Presentation 16 | title | Presenter name
Generic Features vs. Event-specific Features 
• We need to learn the patterns that imply a type of natural or man-made 
disaster: 
A massive cloud of smoke can be seen in south-west Lake 
Macquarie from the Wyee bushfire #nswfires #wyeefire 
@NewcastleHerald 
Same location, no disaster: 
Lake Macquarie is big & beautiful http: // lockerz. 
com/ s/ 257143427 
CSIRO: positive impact | Presentation 17 | title | Presenter name
Can we cross-train for disaster types? 
Training Testing 
Application: 
- Compromise for disaster types with little training data. 
- Reduce ambiguity 
CSIRO: positive impact | Classifying Microblogs 18 | for Disasters | Sarvnaz Karimi
Cross-Disaster Classification 
How much our classifiers can be generalised to identify 
previously unseen disaster types? 
CSIRO: positive impact | Classifying Microblogs 19 | for Disasters | Sarvnaz Karimi 
Specific Feature Generic feature 
•We used under-sampling to create training and testing data
Can we cross-train for disaster types? 
• Yes! Our results showed promise, especially for fire. 
• “Language of disaster” 
• Using generic features was more effective. 
CSIRO: positive impact | Classifying Microblogs 20 | for Disasters | Sarvnaz Karimi
What’s Next 
Events are often associated with a location 
1. Better Classifiers: We can use existence of location information 
as a feature to strengthen our classifiers 
2. Help taking actions on the information: Once we know a tweet 
is talking about a disaster, we can then extract information on 
locations. This could help emergency responders in resource 
allocation. 
• We have already established that traditional Named Entity 
Recognisers are able to identify locations in tweets with high 
accuracy*. Now we need to pinpoint them on the map! 
* J. Lingad, S. Karimi, J. Yin, Location Extraction From Disaster-Related Microblogs, Proceedings of the 22nd international conference on World 
Wide Web companion, 2013 
CSIRO: positive impact | Classifying Microblogs 21 | for Disasters | Sarvnaz Karimi

More Related Content

Similar to Classifying Microblogs For Disasters

Social Media for Crisis Management
Social Media for Crisis ManagementSocial Media for Crisis Management
Social Media for Crisis Management
Ishraq Dhaly
 
Senior Living Crisis Communications
Senior Living Crisis CommunicationsSenior Living Crisis Communications
Senior Living Crisis Communications
Janis Ehlers
 
World conference on disaster management 2014
World conference on disaster management 2014World conference on disaster management 2014
World conference on disaster management 2014
Corey Makar
 
Webinar: How to Get Leadership Support for Your Disaster Recovery Plan
Webinar: How to Get Leadership Support for Your Disaster Recovery Plan Webinar: How to Get Leadership Support for Your Disaster Recovery Plan
Webinar: How to Get Leadership Support for Your Disaster Recovery Plan
Malika Bennett
 
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Artificial Intelligence Institute at UofSC
 

Similar to Classifying Microblogs For Disasters (20)

Crisis Communication for Your Community
Crisis Communication for Your CommunityCrisis Communication for Your Community
Crisis Communication for Your Community
 
Engaging the Community in Disaster Response - July 2013 VolunteerMatch BPN We...
Engaging the Community in Disaster Response - July 2013 VolunteerMatch BPN We...Engaging the Community in Disaster Response - July 2013 VolunteerMatch BPN We...
Engaging the Community in Disaster Response - July 2013 VolunteerMatch BPN We...
 
Effective Cybersecurity Communication Skills
Effective Cybersecurity Communication SkillsEffective Cybersecurity Communication Skills
Effective Cybersecurity Communication Skills
 
Computational Verification Challenges in Social Media
Computational Verification Challenges in Social MediaComputational Verification Challenges in Social Media
Computational Verification Challenges in Social Media
 
Social Networks And Crisis Management
Social Networks And Crisis ManagementSocial Networks And Crisis Management
Social Networks And Crisis Management
 
Social Media for Crisis Management
Social Media for Crisis ManagementSocial Media for Crisis Management
Social Media for Crisis Management
 
ISCRAM 2013: Extracting Information Nuggets from Disaster-Related Messages i...
ISCRAM 2013: Extracting Information Nuggets from  Disaster-Related Messages i...ISCRAM 2013: Extracting Information Nuggets from  Disaster-Related Messages i...
ISCRAM 2013: Extracting Information Nuggets from Disaster-Related Messages i...
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social Media
 
Presentation Crisis Communications
Presentation Crisis CommunicationsPresentation Crisis Communications
Presentation Crisis Communications
 
Online firestorms - A tempest in a teapot
Online firestorms - A tempest in a teapotOnline firestorms - A tempest in a teapot
Online firestorms - A tempest in a teapot
 
Everbridge Webinar: Mother Nature Strikes Back
Everbridge Webinar: Mother Nature Strikes BackEverbridge Webinar: Mother Nature Strikes Back
Everbridge Webinar: Mother Nature Strikes Back
 
Caravan Studios Update for TechSoup Global Staff
Caravan Studios Update for TechSoup Global StaffCaravan Studios Update for TechSoup Global Staff
Caravan Studios Update for TechSoup Global Staff
 
Social Media for Nonprofits Conference 2016 - Avoiding a Social Media Disaste...
Social Media for Nonprofits Conference 2016 - Avoiding a Social Media Disaste...Social Media for Nonprofits Conference 2016 - Avoiding a Social Media Disaste...
Social Media for Nonprofits Conference 2016 - Avoiding a Social Media Disaste...
 
Senior Living Crisis Communications
Senior Living Crisis CommunicationsSenior Living Crisis Communications
Senior Living Crisis Communications
 
Ti digitale trends 2012
Ti digitale trends 2012Ti digitale trends 2012
Ti digitale trends 2012
 
World conference on disaster management 2014
World conference on disaster management 2014World conference on disaster management 2014
World conference on disaster management 2014
 
Webinar: How to Get Leadership Support for Your Disaster Recovery Plan
Webinar: How to Get Leadership Support for Your Disaster Recovery Plan Webinar: How to Get Leadership Support for Your Disaster Recovery Plan
Webinar: How to Get Leadership Support for Your Disaster Recovery Plan
 
Before Disaster Strikes: Creating an Effective Crisis Communications Plan
Before Disaster Strikes: Creating an Effective Crisis Communications PlanBefore Disaster Strikes: Creating an Effective Crisis Communications Plan
Before Disaster Strikes: Creating an Effective Crisis Communications Plan
 
Situational Awareness Through Social Media
 Situational Awareness Through Social Media  Situational Awareness Through Social Media
Situational Awareness Through Social Media
 
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
 

More from Sarvnaz Karimi

More from Sarvnaz Karimi (7)

Search in Medical Text
Search in Medical TextSearch in Medical Text
Search in Medical Text
 
Corpus Effects on the Evaluation of Automated Transliteration Systems
Corpus Effects on the Evaluation of Automated Transliteration SystemsCorpus Effects on the Evaluation of Automated Transliteration Systems
Corpus Effects on the Evaluation of Automated Transliteration Systems
 
Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...
Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...
Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
 
Karimi esair2015
Karimi esair2015Karimi esair2015
Karimi esair2015
 
Pinpointing Location Focus in Microblogs
Pinpointing Location Focus in MicroblogsPinpointing Location Focus in Microblogs
Pinpointing Location Focus in Microblogs
 
Biomedical Search
Biomedical SearchBiomedical Search
Biomedical Search
 

Recently uploaded

Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 

Recently uploaded (20)

Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 

Classifying Microblogs For Disasters

  • 1. Classifying Microblogs for Disasters Sarvnaz Karimi Jessie Yin Cecile Paris
  • 2. Social media plays an important role during disasters • Realtime, popular, free • Accessible • Available CSIRO: positive impact | Classifying Microblogs 2 | for Disasters | Sarvnaz Karimi
  • 3. During disasters people share useful information • lyttelton tunnel had reopened last night #eqnz Or ask for help or information • Kindercare in Fendalton, Christchurch - all okay? We are trying to get through with no luck. #eqnz • Need help. Any donors of medicines for diarrhea cases in Baganga, Davao Oriental pls? #reliefPH #PabloPH pls tweet @KarloPuerto Or even offer help • I hv final yr medstudents in parade rd addington! They cn help. Bruce n boys #eqnz And sometimes not so useful • Someone just wondered aloud if the #eqnz was just another sign from God that he doesn't want The Hobbit to get made. #maybe? CSIRO: positive impact | Classifying Microblogs 3 | for Disasters | Sarvnaz Karimi
  • 4. Challenges of Working with Twitter Data • In fact, lots of times Tweets are useless babbles • Tweets are really short (140 characters) • People often speak informal language • And even in serious messages, tweets can be abbreviated to compensate for the length I hv final yr medstudents in parade rd addington! They cn help. Bruce n boys #eqnz Finding useful content can become looking for a needle in a haystack! CSIRO: positive impact | Classifying Microblogs 4 | for Disasters | Sarvnaz Karimi
  • 5. How to filter massive amount of Twitter messages in order to identify high value tweets related to natural or man-made disasters, or even specific types of disaster? CSIRO: positive impact | Presentation 5 | title | Presenter name
  • 6. Keyword search to find disaster-related tweets • Lots of false-positives due to multiple senses or ambiguities of keywords such as “fire”, or even “earthquake” She’s a natural disaster: a tsunami in her eyes an earthquake in her chest a hurricane flooding her mind she’s a traveling catastrophe In a pool of over 5700 tweets retrieved using keyword search, we had over 50% false positives. CSIRO: positive impact | Classifying Microblogs 6 | for Disasters | Sarvnaz Karimi
  • 7. Our work: Classify Twitter Stream for Disasters • Classify tweets as Disaster and Non-disaster Binary Classification • Classify tweets into disaster types: – Earthquake – Storm (hurricane, tornado, cyclone) – Fire – Flooding – Other (e.g Civil disorder, Traffic accident) Multi-class classification problem CSIRO: positive impact | Classifying Microblogs 7 | for Disasters | Sarvnaz Karimi
  • 8. Related Studies • Tweet classification: o Papers that used classifiers for categories such as news and junk, or opinion, and private messages. o Papers that heavily used hashtags. o Adding context to short tweets by aggregating those that share the same hashtags, or by adding URL contents. • Twitter during disasters: o Qualitative analysis on tweets published during a specific event to study microblogger behaviour. o On of the most cited works is by Sakaki et al. (2010), which made a classifier for earthquake to alert people. Their classifier was based on tweet length, position of query term (earthquake or shaking) in the tweet, n-grams, context of the query terms. We do not focus on specific incidents, and do not assume the hashtags are known. We study different types of disasters, not just one. CSIRO: positive impact | Presentation 8 | title | Presenter name
  • 9. Twitter Data • Sampled a total of 6,500 tweets published in a range of two years, from December 2010 till November 2012 • Data was gathered using keyword search (fire, flooding, storm, tornado, hurricane, cyclone, and earthquake, accident). • No retweets • A number of disasters were included, among others: earthquake in Christchurch, New Zealand, 2011, Cyclone Yasi QLD, 2011, QLD floods, 2010-2011, bushfires in VIC, 2011, and the Hurricane Sandy, US 2012. CSIRO: positive impact | Classifying Microblogs 9 | for Disasters | Sarvnaz Karimi
  • 10. Annotations • Two stage annotations • Crowd-sourced the annotations using Crowdflower. • Annotators where asked: 1. Is this tweet talking about a disaster? (Yes or No); 2. What type of disaster is it talking about? (multiple choice) • Each tweet was annotated by three annotators • 5,747 had full agreement • 2850 tweets were identified as disaster-related and 2,897 as non-disaster CSIRO: positive impact | Classifying Microblogs 10 | for Disasters | Sarvnaz Karimi
  • 11. Classifiers • SVM Classifier • Multinomial Naive Bayes Classifier • We only reported SVM. Naive Bayes consistently performed worse in all the experiments. C. Chang and C. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology CSIRO: positive impact | Classifying Microblogs 11 | for Disasters | Sarvnaz Karimi
  • 12. Classification Features Specific Features: • N-grams • Hashtags • Mentions Generic Features: • Mention count • Hashtag count • Links • Tweet length CSIRO: positive impact | Presentation 12 | title | Presenter name What is the effect of using incident-specific compared to generic features in classification accuracy? What are the best features to use for disaster classifiers?
  • 13. Evaluation: Cross-validation vs. Time-Split • K-fold cross-validation (e.g., 10 fold) is used in most similar studies (Sriram et al., 2010, Takemura and Tajima, 2012, Vosecky et al., 2012) Problem: • It overlooks the time-dependency among microblog data, and uses future-evidence, including hashtags, disaster names Alternative: • Time-split evaluation: Sort the data based on time, take the latest chunk as testing and others for training. CSIRO: positive impact | Classifying Microblogs 13 | for Disasters | Sarvnaz Karimi
  • 14. Disaster or Non-Disaster CSIRO: positive impact | Classifying Microblogs 14 | for Disasters | Sarvnaz Karimi
  • 15. Disaster-Type Classification CSIRO: positive impact | Classifying Microblogs 15 | for Disasters | Sarvnaz Karimi
  • 16. What features worked • When training data is small, counts were better features. – Disaster-related tweets had 1.2 hashtags on average, versus 0.4 for non-disaster tweets • When our knowledge of an event is limited, hashtags or mentions are not so useful. • In our experiments, classification accuracy using bigram features was worse than unigram. CSIRO: positive impact | Presentation 16 | title | Presenter name
  • 17. Generic Features vs. Event-specific Features • We need to learn the patterns that imply a type of natural or man-made disaster: A massive cloud of smoke can be seen in south-west Lake Macquarie from the Wyee bushfire #nswfires #wyeefire @NewcastleHerald Same location, no disaster: Lake Macquarie is big & beautiful http: // lockerz. com/ s/ 257143427 CSIRO: positive impact | Presentation 17 | title | Presenter name
  • 18. Can we cross-train for disaster types? Training Testing Application: - Compromise for disaster types with little training data. - Reduce ambiguity CSIRO: positive impact | Classifying Microblogs 18 | for Disasters | Sarvnaz Karimi
  • 19. Cross-Disaster Classification How much our classifiers can be generalised to identify previously unseen disaster types? CSIRO: positive impact | Classifying Microblogs 19 | for Disasters | Sarvnaz Karimi Specific Feature Generic feature •We used under-sampling to create training and testing data
  • 20. Can we cross-train for disaster types? • Yes! Our results showed promise, especially for fire. • “Language of disaster” • Using generic features was more effective. CSIRO: positive impact | Classifying Microblogs 20 | for Disasters | Sarvnaz Karimi
  • 21. What’s Next Events are often associated with a location 1. Better Classifiers: We can use existence of location information as a feature to strengthen our classifiers 2. Help taking actions on the information: Once we know a tweet is talking about a disaster, we can then extract information on locations. This could help emergency responders in resource allocation. • We have already established that traditional Named Entity Recognisers are able to identify locations in tweets with high accuracy*. Now we need to pinpoint them on the map! * J. Lingad, S. Karimi, J. Yin, Location Extraction From Disaster-Related Microblogs, Proceedings of the 22nd international conference on World Wide Web companion, 2013 CSIRO: positive impact | Classifying Microblogs 21 | for Disasters | Sarvnaz Karimi

Editor's Notes

  1. What is happening in realtime (official news come later) How to get help how we all can help Let others know you’re fine In this talk, we focus on Twitter but general finding may apply for other social media.
  2. There are useful information people share during an incident, e.g. A tunnel oppened
  3. There are many useless tweets that pollute the ones with useful information and add noise. In terms of language processing, we have to deal with short text with no obvious context, that is often informal, and words are arbitrarily shortened. Because we’re dealing with a media that is not supervised, we have to know that there are large volume of data there that are largely useless for many applications.
  4. If one was to monitor social media for events such as fire or earthquake, plain keyword search is pretty much useless.
  5. We decided to go with a classification approach
  6. Tweet classification is not new.
  7. Other avenue is adding context to the classifiers.