SlideShare a Scribd company logo
1 of 23
Pinpointing Locational Focus in
Microblogs
Jie Yin, Sarvnaz Karimi, John Lingad
November 2014
DIGITAL PRODUCTIVITY FLAGSHIP
Where is it happening?
For those monitoring social media to
• send help in emergency
• avoid certain area(e.g., for traffic)
• recommend services (ads)
CSIRO: positive impact | Presentation title | Presenter name2 |
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi3 |
Find it on the map!
Locational focus
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi4 |
Locational focus: Macquarie Centre, North Ryde, New South Wales, Australia
Location mentions: Sydney, Macquarie Centre
Author Location: Brisbane, Australia
Some tweets mention multiple locations: Not
easy to identify the focus
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi5 |
Mary river, Queensland, Australia
Mary river, Queensland, Australia
Some tweets have no locational focus
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi6 |
There is an unknown location
(Ambiguity)
No specific focus
(World Level?)
To find locational focus, we have two tasks:
1. Find mentions of locations
2. Aggregate these to infer the main focus
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi7 |
Finding location mentions
1. Where to look for the location mentions
• Location mentions can be in Tweet text and or in hashtags
• Some hashtags are concatenated words or abbreviations, e.g., #QLDflood =
QLD + flood
• Tweet texts may mention a geographical location, such as Sydney, or a Point-
of-Interest (POI) such as an organisations name or a shop
• Authors’ locations in their profile (not exactly a location mention)
2. How to find these mentions?
• Hashtag segmentation
• Named Entity Recognition
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi8 |
Location mention extraction
• Related work
• NER tools for formal text, such Stanford NER and OpenNLP, are highly
accurate (solved problem).
• NER specific for Twitter: TwiNER [Wang et al.,2012], TwitterNLP [Ritter et al.,
2011]
• Retrained NER tools for Twitter [Lingad et al., 2013] – Location and
Organisation entities only
• In this work:
• Segmented the hashtags using a simple greedy maximal matching heuristics:
Used an English dictionary augmented with place name abbreviations
• Used retrained Stanford NER, and used LOC and ORG
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi9 |
Inferring locational focus
Given a list of location mentions, determine what the focus is.
For example:
If mentions are VIC,NSW,QLD,WA then focus is Australia
If mentions are Swanston St, RMIT then focus is RMIT University,
Melbourne, VIC, Australia
• Requires knowledge of the geographical locations as well as POIs
and their relationships/hierarchy.
• Gazetteer Australia 2010, GeoNames New Zealand, OpenStreetMap
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi10 |
Gazetteer as a tree
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi11 |
Specific POI
City/Suburb/Town/
Non-Specific POI
(e.g., river, highway)
State/Territory/
Region
Country
Inference algorithm: Where on the map?
• Step 1: Query location mentions from the gazetteer, and return
matching (partial or exact match) results in full path in the
gazetteer tree
• Step 2: Create an inference tree using the returned paths
• Step 3: Propagate the scores in the tree
• Step 4: Find a maximum scoring path
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi12 |
Goal: Finding the lowest granularity possible
Assumption: More possible matches found within a geographical
region indicates that region on the map is more likely the focus
Querying the gazetteer tree
• Location mentions: Sydney, Macquarie Centre
• Author Location: Brisbane, Australia
• Gazetteer querying returns:
- brisbane, queensland, australia
- south brisbane, queensland, australia
- macquaire centre, north ryde, new south wales, australia
- macquaire university, macquaire park, new south wales, australia
- ...
Each of these returned results get a matching score based on
Jaccard similarity of the query and the matched node.
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi13 |
Building the inference tree
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi14 |
earth
australia
queensland
brisbaneLeaf Score
brisbane, queensland, australia
macquaire centre, north ryde, new south wales, australia
new south wales
north ryde
macquaire centre
Leaf Score
Propagating scores to the parents and finding the
maximal path
• More branches within a sub-tree increase the chance of their
parent to be in the maximal path
• Bottom-up scoring of parents from leaves to the root
• Parent score = current score + 0.5*score of the highest scoring
child
• Top-down selection of the maximal path based on entropy as the
termination condition. If entropy of children scores are higher
than a pre-defined threshold, the algorithm stops at that level.
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi15 |
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi16 |
earth
australia
queensland
brisbane
brisbane, queensland, australia
macquaire centre, north ryde, new south wales, australia
Sydney, new south wales, australia
new south wales
north ryde
macquaire centre
sydneyA
0.5*A
B
D
macquaire University
C
0.5*Max(B,C)
0.5*Max(0.5*B,D)
Leaf score= w*2^level*Jaccard similarity
Dataset & annotation
• Queried Twitter with keywords such as fire, earthquake, storm,
hurricane
• Randomly sampled 7,000 tweets
• Two annotation steps:
1. Indentify location mentions
2. Identify locational focus (based on tweet and author location)
• Three annotators per tweet, only tweets with majority agreement
(2 out of 3) were kept in the final set.
• Tweets that their focus was not within Australia and New Zealand
were removed.
• There was a small set of tweets that were marked as impossible to
detect the focus which were removed.
• Final set: 1398 tweets (80 kept for parameter tuning)
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi17 |
Baseline: Yahoo! PlaceFinder*
• A service that accepted queries and returned a list of matching
places in the form of country, state, city, poi
• A query to the service was similar to a database querying:
SELECT * FROM geo.placefinder WHERE text = query text
And we chose the query text to be
(a) tweet (text & hashtag) and user location from their profile
(b) the list of location mentions from one tweet (human
annotated)
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi18 |
* As it was called in Jan 2014
Accuracy with manual location mentions (without
NER)
All Text Hashtag User Location
Level 1 - Country 89.9 35.3 45.2 71.6
Level 2 - State 73.5 29.3 37.4 36.3
Level 3 - City/Suburb 51.0 24.5 12.4 4.9
Level 4 - POI 29.7 11.7 8.1 1.8
No focus 58.5 95.8 96.4 63.2
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi19 |
User location was most useful in the county level, but did not contribute much in other
levels of granularity.
POI was hardest with only ~30% were correctly identified.
All = 0.6 text + 0.3 hashtag + 0.1 user location
Accuracy with location mentions extracted using
NER
Level 1 Level 2 Level 3 Level 4 No Focus
PlaceFinder (a) 87.9 58.6 22.9 21.0 0.3
PlaceFinder (b) 87.8 59.1 23.5 18.8 25.5
Our Alg. No NER 89.9 73.5 51.0 29.7 58.5
Our Alg. With NER 91.3 65.7 47.0 24.9 53.4
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi20 |
(a) The whole tweet was queried (b) location mentions were queries (no NER)
Country level focus was the easiest with all settings performing similar.
PlaceFinder was consistently worse in other levels, but that could also be the effect of
our gazetteer hierarchy.
The sources of errors in our algorithm
• Annotation mistakes: human annotators missed some of
the mentions.
• Missing some of the street and POIs in the gazetteer.
• Heavily misspelled place named that were not corrected
in our pre-processing step.
• We favoured lower granularities in our scoring, which
introduced wrong POIs that were not needed.
• Gazetteer bias: if one mention had many matches in a
region, the path could wrongly get stronger.
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi21 |
What we learnt and what’s next?
• Finding locational focus is difficult, even for human (low
agreement in annotation)
• Our method was accurate (90%) at country level, but accuracy
dropped for state, city, and POI levels (29%).
• All three information sources (text, hashtag, and author location)
contribute in finding focus, but in different levels.
• How to make it better?
• Incorporate some context, e.g., tweets that share hashtags, replies,
temporally close
• Learning the weights of different information sources
CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi22 |
Related Studies
• Twitterstand: geotagging content of tweets. Used GeoNames
gazetteer and heuristic rules to find and disambiguate the location
focus.
• Kinsella et al [2011]: learning language model of locations
CSIRO: positive impact | Presentation title | Presenter name23 |

More Related Content

Similar to Pinpointing Location Focus in Microblogs

Stance classification - Presentation QMUL by Carolina Scarton, USFD
Stance classification - Presentation QMUL by Carolina Scarton, USFDStance classification - Presentation QMUL by Carolina Scarton, USFD
Stance classification - Presentation QMUL by Carolina Scarton, USFDWeverify
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesAlan Said
 
Social Network Analysis with Spark
Social Network Analysis with SparkSocial Network Analysis with Spark
Social Network Analysis with SparkGhulam Imaduddin
 
Classifying Microblogs For Disasters
Classifying Microblogs For DisastersClassifying Microblogs For Disasters
Classifying Microblogs For DisastersSarvnaz Karimi
 
Finding Co-solvers on Twitter, with the Little Help from Linked Data
Finding Co-solvers on Twitter, with the Little Help from Linked DataFinding Co-solvers on Twitter, with the Little Help from Linked Data
Finding Co-solvers on Twitter, with the Little Help from Linked DataMilan Stankovic
 
Applications of Machine Learning to Location-based Social Networks
Applications of Machine Learning to Location-based Social NetworksApplications of Machine Learning to Location-based Social Networks
Applications of Machine Learning to Location-based Social NetworksJoan Capdevila Pujol
 
How to Benchmark Your Online Customer Experience Against Competition
How to Benchmark Your Online Customer Experience Against CompetitionHow to Benchmark Your Online Customer Experience Against Competition
How to Benchmark Your Online Customer Experience Against CompetitionUserZoom
 
Graduate Retreat Presentation
Graduate Retreat PresentationGraduate Retreat Presentation
Graduate Retreat PresentationStefan Hyman
 
Doctoral Consortium Slides at SIGIR 2017
Doctoral Consortium Slides at SIGIR 2017Doctoral Consortium Slides at SIGIR 2017
Doctoral Consortium Slides at SIGIR 2017Jarana Manotumruksa
 
Slides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data PerspectivesSlides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data PerspectivesParang Saraf
 
Esriuk_track3_esri spatial analysis presentation
Esriuk_track3_esri spatial analysis presentationEsriuk_track3_esri spatial analysis presentation
Esriuk_track3_esri spatial analysis presentationEsri UK
 
Slides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Slides: Safeguarding Abila: Spatio-Temporal Activity ModelingSlides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Slides: Safeguarding Abila: Spatio-Temporal Activity ModelingParang Saraf
 
SM Analytics - Evil & Essential
SM Analytics - Evil & Essential SM Analytics - Evil & Essential
SM Analytics - Evil & Essential Kelly Craft
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsOCLC
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsLynn Connaway
 
One voice + one team = success
One voice + one team = successOne voice + one team = success
One voice + one team = successMike DePaulo
 
Tbrodbeck deliver 4 building an effective team2 051618 [autosaved]
Tbrodbeck deliver 4 building an effective team2 051618 [autosaved]Tbrodbeck deliver 4 building an effective team2 051618 [autosaved]
Tbrodbeck deliver 4 building an effective team2 051618 [autosaved]TinaBrodbeck
 
Inferring the Geolocation of Tweets at a Fine-Grained Level - PhD Thesis
Inferring the Geolocation of Tweets at a Fine-Grained Level - PhD ThesisInferring the Geolocation of Tweets at a Fine-Grained Level - PhD Thesis
Inferring the Geolocation of Tweets at a Fine-Grained Level - PhD ThesisDavid Paule
 

Similar to Pinpointing Location Focus in Microblogs (20)

Stance classification - Presentation QMUL by Carolina Scarton, USFD
Stance classification - Presentation QMUL by Carolina Scarton, USFDStance classification - Presentation QMUL by Carolina Scarton, USFD
Stance classification - Presentation QMUL by Carolina Scarton, USFD
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
 
Social Network Analysis with Spark
Social Network Analysis with SparkSocial Network Analysis with Spark
Social Network Analysis with Spark
 
Classifying Microblogs For Disasters
Classifying Microblogs For DisastersClassifying Microblogs For Disasters
Classifying Microblogs For Disasters
 
Finding Co-solvers on Twitter, with the Little Help from Linked Data
Finding Co-solvers on Twitter, with the Little Help from Linked DataFinding Co-solvers on Twitter, with the Little Help from Linked Data
Finding Co-solvers on Twitter, with the Little Help from Linked Data
 
Applications of Machine Learning to Location-based Social Networks
Applications of Machine Learning to Location-based Social NetworksApplications of Machine Learning to Location-based Social Networks
Applications of Machine Learning to Location-based Social Networks
 
How to Benchmark Your Online Customer Experience Against Competition
How to Benchmark Your Online Customer Experience Against CompetitionHow to Benchmark Your Online Customer Experience Against Competition
How to Benchmark Your Online Customer Experience Against Competition
 
Graduate Retreat Presentation
Graduate Retreat PresentationGraduate Retreat Presentation
Graduate Retreat Presentation
 
Doctoral Consortium Slides at SIGIR 2017
Doctoral Consortium Slides at SIGIR 2017Doctoral Consortium Slides at SIGIR 2017
Doctoral Consortium Slides at SIGIR 2017
 
Slides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data PerspectivesSlides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data Perspectives
 
Esriuk_track3_esri spatial analysis presentation
Esriuk_track3_esri spatial analysis presentationEsriuk_track3_esri spatial analysis presentation
Esriuk_track3_esri spatial analysis presentation
 
Slides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Slides: Safeguarding Abila: Spatio-Temporal Activity ModelingSlides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Slides: Safeguarding Abila: Spatio-Temporal Activity Modeling
 
Public Relations Planning
Public Relations PlanningPublic Relations Planning
Public Relations Planning
 
SM Analytics - Evil & Essential
SM Analytics - Evil & Essential SM Analytics - Evil & Essential
SM Analytics - Evil & Essential
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and Residents
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and Residents
 
Garfield Heights Master Plan, Public Meeting #3
Garfield Heights Master Plan, Public Meeting #3Garfield Heights Master Plan, Public Meeting #3
Garfield Heights Master Plan, Public Meeting #3
 
One voice + one team = success
One voice + one team = successOne voice + one team = success
One voice + one team = success
 
Tbrodbeck deliver 4 building an effective team2 051618 [autosaved]
Tbrodbeck deliver 4 building an effective team2 051618 [autosaved]Tbrodbeck deliver 4 building an effective team2 051618 [autosaved]
Tbrodbeck deliver 4 building an effective team2 051618 [autosaved]
 
Inferring the Geolocation of Tweets at a Fine-Grained Level - PhD Thesis
Inferring the Geolocation of Tweets at a Fine-Grained Level - PhD ThesisInferring the Geolocation of Tweets at a Fine-Grained Level - PhD Thesis
Inferring the Geolocation of Tweets at a Fine-Grained Level - PhD Thesis
 

More from Sarvnaz Karimi

Search in Medical Text
Search in Medical TextSearch in Medical Text
Search in Medical TextSarvnaz Karimi
 
Corpus Effects on the Evaluation of Automated Transliteration Systems
Corpus Effects on the Evaluation of Automated Transliteration SystemsCorpus Effects on the Evaluation of Automated Transliteration Systems
Corpus Effects on the Evaluation of Automated Transliteration SystemsSarvnaz Karimi
 
Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...
Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...
Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...Sarvnaz Karimi
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionSarvnaz Karimi
 

More from Sarvnaz Karimi (6)

Search in Medical Text
Search in Medical TextSearch in Medical Text
Search in Medical Text
 
Corpus Effects on the Evaluation of Automated Transliteration Systems
Corpus Effects on the Evaluation of Automated Transliteration SystemsCorpus Effects on the Evaluation of Automated Transliteration Systems
Corpus Effects on the Evaluation of Automated Transliteration Systems
 
Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...
Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...
Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
 
Karimi esair2015
Karimi esair2015Karimi esair2015
Karimi esair2015
 
Biomedical Search
Biomedical SearchBiomedical Search
Biomedical Search
 

Recently uploaded

CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...marjmae69
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptxBasil Achie
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationNathan Young
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxJohnree4
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸mathanramanathan2005
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...NETWAYS
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)Basil Achie
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 

Recently uploaded (20)

CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism Presentation
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptx
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 

Pinpointing Location Focus in Microblogs

  • 1. Pinpointing Locational Focus in Microblogs Jie Yin, Sarvnaz Karimi, John Lingad November 2014 DIGITAL PRODUCTIVITY FLAGSHIP
  • 2. Where is it happening? For those monitoring social media to • send help in emergency • avoid certain area(e.g., for traffic) • recommend services (ads) CSIRO: positive impact | Presentation title | Presenter name2 |
  • 3. CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi3 | Find it on the map!
  • 4. Locational focus CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi4 | Locational focus: Macquarie Centre, North Ryde, New South Wales, Australia Location mentions: Sydney, Macquarie Centre Author Location: Brisbane, Australia
  • 5. Some tweets mention multiple locations: Not easy to identify the focus CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi5 | Mary river, Queensland, Australia Mary river, Queensland, Australia
  • 6. Some tweets have no locational focus CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi6 | There is an unknown location (Ambiguity) No specific focus (World Level?)
  • 7. To find locational focus, we have two tasks: 1. Find mentions of locations 2. Aggregate these to infer the main focus CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi7 |
  • 8. Finding location mentions 1. Where to look for the location mentions • Location mentions can be in Tweet text and or in hashtags • Some hashtags are concatenated words or abbreviations, e.g., #QLDflood = QLD + flood • Tweet texts may mention a geographical location, such as Sydney, or a Point- of-Interest (POI) such as an organisations name or a shop • Authors’ locations in their profile (not exactly a location mention) 2. How to find these mentions? • Hashtag segmentation • Named Entity Recognition CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi8 |
  • 9. Location mention extraction • Related work • NER tools for formal text, such Stanford NER and OpenNLP, are highly accurate (solved problem). • NER specific for Twitter: TwiNER [Wang et al.,2012], TwitterNLP [Ritter et al., 2011] • Retrained NER tools for Twitter [Lingad et al., 2013] – Location and Organisation entities only • In this work: • Segmented the hashtags using a simple greedy maximal matching heuristics: Used an English dictionary augmented with place name abbreviations • Used retrained Stanford NER, and used LOC and ORG CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi9 |
  • 10. Inferring locational focus Given a list of location mentions, determine what the focus is. For example: If mentions are VIC,NSW,QLD,WA then focus is Australia If mentions are Swanston St, RMIT then focus is RMIT University, Melbourne, VIC, Australia • Requires knowledge of the geographical locations as well as POIs and their relationships/hierarchy. • Gazetteer Australia 2010, GeoNames New Zealand, OpenStreetMap CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi10 |
  • 11. Gazetteer as a tree CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi11 | Specific POI City/Suburb/Town/ Non-Specific POI (e.g., river, highway) State/Territory/ Region Country
  • 12. Inference algorithm: Where on the map? • Step 1: Query location mentions from the gazetteer, and return matching (partial or exact match) results in full path in the gazetteer tree • Step 2: Create an inference tree using the returned paths • Step 3: Propagate the scores in the tree • Step 4: Find a maximum scoring path CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi12 | Goal: Finding the lowest granularity possible Assumption: More possible matches found within a geographical region indicates that region on the map is more likely the focus
  • 13. Querying the gazetteer tree • Location mentions: Sydney, Macquarie Centre • Author Location: Brisbane, Australia • Gazetteer querying returns: - brisbane, queensland, australia - south brisbane, queensland, australia - macquaire centre, north ryde, new south wales, australia - macquaire university, macquaire park, new south wales, australia - ... Each of these returned results get a matching score based on Jaccard similarity of the query and the matched node. CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi13 |
  • 14. Building the inference tree CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi14 | earth australia queensland brisbaneLeaf Score brisbane, queensland, australia macquaire centre, north ryde, new south wales, australia new south wales north ryde macquaire centre Leaf Score
  • 15. Propagating scores to the parents and finding the maximal path • More branches within a sub-tree increase the chance of their parent to be in the maximal path • Bottom-up scoring of parents from leaves to the root • Parent score = current score + 0.5*score of the highest scoring child • Top-down selection of the maximal path based on entropy as the termination condition. If entropy of children scores are higher than a pre-defined threshold, the algorithm stops at that level. CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi15 |
  • 16. CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi16 | earth australia queensland brisbane brisbane, queensland, australia macquaire centre, north ryde, new south wales, australia Sydney, new south wales, australia new south wales north ryde macquaire centre sydneyA 0.5*A B D macquaire University C 0.5*Max(B,C) 0.5*Max(0.5*B,D) Leaf score= w*2^level*Jaccard similarity
  • 17. Dataset & annotation • Queried Twitter with keywords such as fire, earthquake, storm, hurricane • Randomly sampled 7,000 tweets • Two annotation steps: 1. Indentify location mentions 2. Identify locational focus (based on tweet and author location) • Three annotators per tweet, only tweets with majority agreement (2 out of 3) were kept in the final set. • Tweets that their focus was not within Australia and New Zealand were removed. • There was a small set of tweets that were marked as impossible to detect the focus which were removed. • Final set: 1398 tweets (80 kept for parameter tuning) CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi17 |
  • 18. Baseline: Yahoo! PlaceFinder* • A service that accepted queries and returned a list of matching places in the form of country, state, city, poi • A query to the service was similar to a database querying: SELECT * FROM geo.placefinder WHERE text = query text And we chose the query text to be (a) tweet (text & hashtag) and user location from their profile (b) the list of location mentions from one tweet (human annotated) CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi18 | * As it was called in Jan 2014
  • 19. Accuracy with manual location mentions (without NER) All Text Hashtag User Location Level 1 - Country 89.9 35.3 45.2 71.6 Level 2 - State 73.5 29.3 37.4 36.3 Level 3 - City/Suburb 51.0 24.5 12.4 4.9 Level 4 - POI 29.7 11.7 8.1 1.8 No focus 58.5 95.8 96.4 63.2 CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi19 | User location was most useful in the county level, but did not contribute much in other levels of granularity. POI was hardest with only ~30% were correctly identified. All = 0.6 text + 0.3 hashtag + 0.1 user location
  • 20. Accuracy with location mentions extracted using NER Level 1 Level 2 Level 3 Level 4 No Focus PlaceFinder (a) 87.9 58.6 22.9 21.0 0.3 PlaceFinder (b) 87.8 59.1 23.5 18.8 25.5 Our Alg. No NER 89.9 73.5 51.0 29.7 58.5 Our Alg. With NER 91.3 65.7 47.0 24.9 53.4 CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi20 | (a) The whole tweet was queried (b) location mentions were queries (no NER) Country level focus was the easiest with all settings performing similar. PlaceFinder was consistently worse in other levels, but that could also be the effect of our gazetteer hierarchy.
  • 21. The sources of errors in our algorithm • Annotation mistakes: human annotators missed some of the mentions. • Missing some of the street and POIs in the gazetteer. • Heavily misspelled place named that were not corrected in our pre-processing step. • We favoured lower granularities in our scoring, which introduced wrong POIs that were not needed. • Gazetteer bias: if one mention had many matches in a region, the path could wrongly get stronger. CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi21 |
  • 22. What we learnt and what’s next? • Finding locational focus is difficult, even for human (low agreement in annotation) • Our method was accurate (90%) at country level, but accuracy dropped for state, city, and POI levels (29%). • All three information sources (text, hashtag, and author location) contribute in finding focus, but in different levels. • How to make it better? • Incorporate some context, e.g., tweets that share hashtags, replies, temporally close • Learning the weights of different information sources CSIRO: positive impact | Pinpointing Locational Focus in Tweets | Sarvnaz Karimi22 |
  • 23. Related Studies • Twitterstand: geotagging content of tweets. Used GeoNames gazetteer and heuristic rules to find and disambiguate the location focus. • Kinsella et al [2011]: learning language model of locations CSIRO: positive impact | Presentation title | Presenter name23 |

Editor's Notes

  1. Given a Twitter message, determine where on the map does the content refer to. Not interested in the location of the author.
  2. No location mention From the content we cannot identify where the author is refering to (unless other background information such as other tweets help) Multiple location mentions with all having a similar improtance