Rapid access to situation-sensitive data through social media networks creates new opportunities to address a number of real-world problems. Damage assessment during dis- asters is a core situational awareness task for many humanitarian organizations that traditionally takes weeks and months. In this work, we analyze images posted on social media platforms during natural disasters to determine the level of damage caused by the disasters. We employ state-of-the-art machine learning techniques to perform an extensive experimentation of damage assessment using images from four major natural disasters. We show that the domain-specific fine-tuning of deep Convolutional Neural Networks (CNN) outperforms other state-of-the-art techniques such as Bag-of-Visual-Words (BoVW). High classification ac- curacy under both event-specific and cross-event test settings demonstrate that the proposed approach can effectively adapt deep-CNN features to identify the severity of destruction from social media images taken after a disaster strike.
Damage Assessment from Social Media Imagery Data During Disasters
1. Damage Assessment from Social Media
Imagery Data During Disasters
Dat T. Nguyen, Ferda Ofli, Muhammad Imran, Prasenjit Mitra
Qatar Computing Research Institute, Qatar
The Pennsylvania State University, University Park, PA, USA
Partners & Clients:
New York (Suffolk)
Emergency Management Dept.
2. Types of Information on Twitter
- Twitter data from 13
recent crises
- Over 100,000 tweets
- Information types
- Types of sources
Source: Qatar Computing Research Institute - Published in World Humanitarian Data and Trends 2014 (UN OCHA)
3. The Value of Timely Information
During Disasters
Based on FEMA large-scale survey among emergency management professionals across the US.
Informationvalue
When information is too late
4. The Value of Timely Information
During Disasters
Based on FEMA large-scale survey among emergency management professionals across the US.
Informationvalue
When information is too late
5. 2013 Pakistan Earthquake
September 28 at 07:34 UTC
2010 Haiti Earthquake
January 12 at 21:53 UTC
Social Media Data and Opportunities
Social Media
Platforms
Availability of Immense Data:
Around 16 thousands tweets
per minute were posted during
the hurricane Sandy in the US.
Opportunities:
- Early warning and event detection
- Situational awareness
- Actionable information
- Rapid crisis response
- Post-disaster analysis
Disease outbreaks
6. “A picture is worth a thousand words.”
Images from 3 Different Disasters
7. Time-Critical Events and Information Gaps
Info. Info. Info.
Disaster event (earthquake, flood) Destruction, Damage
Information gathering
Humanitarian organizations and local administration
Need information to help and launch response
Information gathering,
especially in real-time, is
the most challenging part
Relief operations & reconstruction
Disaster
Government orgs.
9. Damage Severity Assessment from Images
Task: Our Task is to classify each incoming image
Into one of the three classes.
10. Challenges
• Task complexity: lack of labeled data, ill-defined
objects
• Poor signal-to-noise ration: social media data is
extremely noisy. E.g., duplicates, irrelevant
• Task subjectivity: confusion between damage severity
classes “severe” and “mild”
• Cold-start issue: first few hours of a disaster are
critical, learning ML classifiers needs labeled data
11. Images Datasets: Twitter + Google
Twitter messages
collected using
- Damaged building
- Damaged road
- Damaged bridge
Queries we used:
12. Human Annotations
We used AIDR (volunteers) and Crowdflower (paid workers)
The purpose of this task is to assess the severity of damage shown in an image…
1. Severe damage
Substantial destruction, a non-livable
Or non-useable building, a non-
crossable Bridge, a non-drivable road
2. Mild damage
Damage generally exceeding minor
(e.g., 50% of a building is damaged),
partial loss of amenity/roof, part of
bridge is unusable or needs repairs
3. Little-to-no damage
Images that show damage-free infrastructure
Or small cracks, wear and tear due to age
Three classes:
Instructions:
13. Human Annotations
We used AIDR (volunteers) and Crowdflower (paid workers)
Crowdflower annotations
AIDR was used during the actual event.
14. Learning Schemes
1. Baseline (PHOW + SVM):
Pyramid Histogram of Visual Words (PHOW) features
with linear SVM
2. Pre-trained CNN as feature extractor:
We used VGG-16 network trained on the ImageNet dataset
1.2M images and 1000 classes. We used fc7 layer i.e., removed the last layer
to get a 4097-dimensional vector for every image.
3. Fine-tuning a pre-trained CNN:
Used existing weights of a pre-trained CNN as an initialization for our dataset
Where last layer representing our task (3 classes)
15. Learning Settings
1. Event-specific setting:
Training, development, and test sets are form the same event
Train: 60%, Dev = 20%, Test = 20%
2. Cross-event setting:
Scenario: no labeled data for the target event. Labeled data from past events
is abundant.
Cross-event: train on past events (source) and test on current event (target)
For example:
Train: Nepal earthquake + Ecuador earthquake
Test: Typhoon Ruby
We use Google data assuming no past event data is available
17. Cross-Event using
Ecuador and Matthew as Test
Ecuador earthquake (20%) as fixed test set and all sources with 60%
Hurricane Matthew (20%) as fixed test set and all sources with 60%
20. Conclusions
• We presented results for the task of damage
assessment from social media images
• We used real world datasets
• Compared non-deep learning, deep learning and
transfer learning approaches
• In the event-specific case, transfer learning
approach performs better
• In the cross-event case, we observed the more the
data the better, same event data always helps
Affected people’s use of social media during a crisis has become a common practice in recent years. Twitter, with its oneto-many
format, is the platform of choice for many Internet users during a crisis. The infographic below presents a sample
of 13 recent crises caused by natural hazards that generated over 100,000 Twitter messages or “tweets”. The information
provided in the tweets, and the type of sources who tweet the most, vary widely between crises. For example, Government
sources produced far more tweets during the Alberta floods (2013) in Canada than during Super Typhoon Haiyan (2013)
in the Philippines. Overall, social media data is still an experimental field for humanitarian practitioners. But with a few
frameworks of reference—including hashtag standardization in emergencies—the humanitarian community only stands to
benefit from these technological opportunities.
FEMA (Federal Emergency Management Agency) conducted a large-scale survey where they interviewed emergency professional and organizations in the US. This graph shows the value of useful information for crisis response and management perceived by those professional. We can see that as time passes, the value of information decreases. For example, one such critical information is building damage, whose value drops by 10% after 24 hours and 30% after 48 hours and so on.
According to these emergency professionals, information collected during the first 48 hours is considered tactical. After that point, the information is useful for Head quarters for high-level decision making.
SM played a major role during disasters such as 2005 Hurricane Katrina, the 2011 Japanese earthquake and tsunami, and more recently Typhoon Haiyan, followed by the Nepal tragedy. Consequently, more and more emergency managers are turning to social media as a vital tool in disaster management. Twitter, the most used tool for updates, response and relief, enabled greater connectivity and information sharing capabilities.
During situations like mass emergencies, disasters, epidemics nothing better than Social Media platforms like Twitter which provides unique opportunities for both affected people and
Emergency responders. People share situational awareness messages, and ask for help, donations, food, water, shelter etc. On the other hand responders want to help.
However, I know it is a bit cliché but a picture is worth a thousand words.
For example, these are some real images collected during different disasters.. these can be used in understanding
Building damage,
Road or bridge damage, whether they are completely destroyed or can still be used
Shelter and aid needs
Extent of overall destruction
At the onset of a disaster situation, urgent needs emerge from affected people like food, water, shelter, medical assistance etc.
On the other hand, humanitarian organizations like UN OCHA, UNICEF, WHO, or local administrations want to launch relief operations to help victims of the disaster
However, in order for them to plan relief operations, they need information from the disaster zone. Traditional approaches to get this information includes sending experts in the disaster zone, or wait until information is publically available for example through main stream media
This could potentially take days or weeks.
After a disaster event happens, urgent needs of affected people emerge. For Humanitarian organizations like OCHA, UNICEF, to launch relief operations, they need information about victims etc.
Here is an overview of the proposed image processing pipeline. Let us say we receive tweets using Twitter streaming api.
We extract image URLs, if there is any, and download these images from the web.
Then, the downloaded images go through a series of operations. Specifically, we have a module that filters out irrelevant images… followed by de-duplication filtering…
And finally we have a relatively cleaned version of the incoming data… in this particular scenario, we have a damage assessment module that assess the overall level of damage depicted in an image.
I am not going to implementation details of the system. For the sake of this talk, I will focus on the last three components of the system.