ADCS 2014 Presentation for the paper: http://dl.acm.org/citation.cfm?id=2682868
"Extracting the geographical location that a tweet is about is crucial for many important applications ranging from disaster management to recommendation systems. We address the problem of finding the locational focus of tweets that is geographically identifiable on a map. Because of the short, noisy nature of tweets and inherent ambiguity of locations, tweet text alone cannot provide sufficient information for disambiguating the location mentions and inferring the actual location focus being referred to in a tweet. Therefore, we present a novel algorithm that identifies all location mentions from three information sources---tweet text, hashtags, and user profile---and then uses a gazetteer database to infer the most probable locational focus of a tweet. Our novel algorithm has the ability to infer a locational focus that may not be explicitly mentioned in the tweet and determine its most appropriate granularity, e.g., city or country."
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
Pinpointing Location Focus in Microblogs
1. Pinpointing Locational Focus in
Microblogs
Jie Yin, Sarvnaz Karimi, John Lingad
November 2014
DIGITAL PRODUCTIVITY FLAGSHIP
2. Where is it happening?
For those monitoring social media to
• send help in emergency
• avoid certain area(e.g., for traffic)
• recommend services (ads)
CSIRO: positive impact | Presentation title | Presenter name2 |
3. CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi3 |
Find it on the map!
4. Locational focus
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi4 |
Locational focus: Macquarie Centre, North Ryde, New South Wales, Australia
Location mentions: Sydney, Macquarie Centre
Author Location: Brisbane, Australia
5. Some tweets mention multiple locations: Not
easy to identify the focus
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi5 |
Mary river, Queensland, Australia
Mary river, Queensland, Australia
6. Some tweets have no locational focus
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi6 |
There is an unknown location
(Ambiguity)
No specific focus
(World Level?)
7. To find locational focus, we have two tasks:
1. Find mentions of locations
2. Aggregate these to infer the main focus
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi7 |
8. Finding location mentions
1. Where to look for the location mentions
• Location mentions can be in Tweet text and or in hashtags
• Some hashtags are concatenated words or abbreviations, e.g., #QLDflood =
QLD + flood
• Tweet texts may mention a geographical location, such as Sydney, or a Point-
of-Interest (POI) such as an organisations name or a shop
• Authors’ locations in their profile (not exactly a location mention)
2. How to find these mentions?
• Hashtag segmentation
• Named Entity Recognition
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi8 |
9. Location mention extraction
• Related work
• NER tools for formal text, such Stanford NER and OpenNLP, are highly
accurate (solved problem).
• NER specific for Twitter: TwiNER [Wang et al.,2012], TwitterNLP [Ritter et al.,
2011]
• Retrained NER tools for Twitter [Lingad et al., 2013] – Location and
Organisation entities only
• In this work:
• Segmented the hashtags using a simple greedy maximal matching heuristics:
Used an English dictionary augmented with place name abbreviations
• Used retrained Stanford NER, and used LOC and ORG
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi9 |
10. Inferring locational focus
Given a list of location mentions, determine what the focus is.
For example:
If mentions are VIC,NSW,QLD,WA then focus is Australia
If mentions are Swanston St, RMIT then focus is RMIT University,
Melbourne, VIC, Australia
• Requires knowledge of the geographical locations as well as POIs
and their relationships/hierarchy.
• Gazetteer Australia 2010, GeoNames New Zealand, OpenStreetMap
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi10 |
11. Gazetteer as a tree
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi11 |
Specific POI
City/Suburb/Town/
Non-Specific POI
(e.g., river, highway)
State/Territory/
Region
Country
12. Inference algorithm: Where on the map?
• Step 1: Query location mentions from the gazetteer, and return
matching (partial or exact match) results in full path in the
gazetteer tree
• Step 2: Create an inference tree using the returned paths
• Step 3: Propagate the scores in the tree
• Step 4: Find a maximum scoring path
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi12 |
Goal: Finding the lowest granularity possible
Assumption: More possible matches found within a geographical
region indicates that region on the map is more likely the focus
13. Querying the gazetteer tree
• Location mentions: Sydney, Macquarie Centre
• Author Location: Brisbane, Australia
• Gazetteer querying returns:
- brisbane, queensland, australia
- south brisbane, queensland, australia
- macquaire centre, north ryde, new south wales, australia
- macquaire university, macquaire park, new south wales, australia
- ...
Each of these returned results get a matching score based on
Jaccard similarity of the query and the matched node.
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi13 |
14. Building the inference tree
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi14 |
earth
australia
queensland
brisbaneLeaf Score
brisbane, queensland, australia
macquaire centre, north ryde, new south wales, australia
new south wales
north ryde
macquaire centre
Leaf Score
15. Propagating scores to the parents and finding the
maximal path
• More branches within a sub-tree increase the chance of their
parent to be in the maximal path
• Bottom-up scoring of parents from leaves to the root
• Parent score = current score + 0.5*score of the highest scoring
child
• Top-down selection of the maximal path based on entropy as the
termination condition. If entropy of children scores are higher
than a pre-defined threshold, the algorithm stops at that level.
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi15 |
16. CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi16 |
earth
australia
queensland
brisbane
brisbane, queensland, australia
macquaire centre, north ryde, new south wales, australia
Sydney, new south wales, australia
new south wales
north ryde
macquaire centre
sydneyA
0.5*A
B
D
macquaire University
C
0.5*Max(B,C)
0.5*Max(0.5*B,D)
Leaf score= w*2^level*Jaccard similarity
17. Dataset & annotation
• Queried Twitter with keywords such as fire, earthquake, storm,
hurricane
• Randomly sampled 7,000 tweets
• Two annotation steps:
1. Indentify location mentions
2. Identify locational focus (based on tweet and author location)
• Three annotators per tweet, only tweets with majority agreement
(2 out of 3) were kept in the final set.
• Tweets that their focus was not within Australia and New Zealand
were removed.
• There was a small set of tweets that were marked as impossible to
detect the focus which were removed.
• Final set: 1398 tweets (80 kept for parameter tuning)
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi17 |
18. Baseline: Yahoo! PlaceFinder*
• A service that accepted queries and returned a list of matching
places in the form of country, state, city, poi
• A query to the service was similar to a database querying:
SELECT * FROM geo.placefinder WHERE text = query text
And we chose the query text to be
(a) tweet (text & hashtag) and user location from their profile
(b) the list of location mentions from one tweet (human
annotated)
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi18 |
* As it was called in Jan 2014
19. Accuracy with manual location mentions (without
NER)
All Text Hashtag User Location
Level 1 - Country 89.9 35.3 45.2 71.6
Level 2 - State 73.5 29.3 37.4 36.3
Level 3 - City/Suburb 51.0 24.5 12.4 4.9
Level 4 - POI 29.7 11.7 8.1 1.8
No focus 58.5 95.8 96.4 63.2
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi19 |
User location was most useful in the county level, but did not contribute much in other
levels of granularity.
POI was hardest with only ~30% were correctly identified.
All = 0.6 text + 0.3 hashtag + 0.1 user location
20. Accuracy with location mentions extracted using
NER
Level 1 Level 2 Level 3 Level 4 No Focus
PlaceFinder (a) 87.9 58.6 22.9 21.0 0.3
PlaceFinder (b) 87.8 59.1 23.5 18.8 25.5
Our Alg. No NER 89.9 73.5 51.0 29.7 58.5
Our Alg. With NER 91.3 65.7 47.0 24.9 53.4
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi20 |
(a) The whole tweet was queried (b) location mentions were queries (no NER)
Country level focus was the easiest with all settings performing similar.
PlaceFinder was consistently worse in other levels, but that could also be the effect of
our gazetteer hierarchy.
21. The sources of errors in our algorithm
• Annotation mistakes: human annotators missed some of
the mentions.
• Missing some of the street and POIs in the gazetteer.
• Heavily misspelled place named that were not corrected
in our pre-processing step.
• We favoured lower granularities in our scoring, which
introduced wrong POIs that were not needed.
• Gazetteer bias: if one mention had many matches in a
region, the path could wrongly get stronger.
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi21 |
22. What we learnt and what’s next?
• Finding locational focus is difficult, even for human (low
agreement in annotation)
• Our method was accurate (90%) at country level, but accuracy
dropped for state, city, and POI levels (29%).
• All three information sources (text, hashtag, and author location)
contribute in finding focus, but in different levels.
• How to make it better?
• Incorporate some context, e.g., tweets that share hashtags, replies,
temporally close
• Learning the weights of different information sources
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi22 |
23. Related Studies
• Twitterstand: geotagging content of tweets. Used GeoNames
gazetteer and heuristic rules to find and disambiguate the location
focus.
• Kinsella et al [2011]: learning language model of locations
CSIRO: positive impact | Presentation title | Presenter name23 |
Editor's Notes
Given a Twitter message, determine where on the map does the content refer to.
Not interested in the location of the author.
No location mention
From the content we cannot identify where the author is refering to (unless other background information such as other tweets help)
Multiple location mentions with all having a similar improtance