SlideShare a Scribd company logo
1 of 1
Download to read offline
Exploiting Text and Network Context for Geolocation of Social Media Users
Afshin Rahimi,♥
Duy Vu,♠
Trevor Cohn,♥
and Timothy Baldwin♥
♥ Department of Computing and Information Systems, ♠ Department of Mathematics and Statistics, The University of Melbourne
OVERVIEW
Task: Find the location of Twitter users based on text and net-
work information
Previous Shortcoming: No comparison of text-based and
network-based models, no use of both.
Datasets: 3 Twitter geolocation datasets:
GeoText, Twitter-US, Twitter-World.
Sample Format: userid, text, mention-list, latitude/longitude
YOU ARE WHERE YOUR WORDS SAY YOU ARE
Usage of mountain in U.S.
TEXT-BASED MODEL (LR)
Logistic regression with l1 regularisation
over k-d tree discretisation of latitude/longitude.
130 120 110 100 90 80 70 60
Longitude
25
30
35
40
45
50
Latitude
YOU ARE WHERE YOUR FRIENDS ARE
Most of our online interactions are local.
Twitter mention
NETWORK-BASED MODEL (LP)
Label Propagation in @-mention Network:
• Build an @-mention network.
• Initialise the location of training nodes with their
known location.
• Iteratively update non-training nodes’ location
to the median of their neighbours.
• Converges after 10 iterations.
NETWORK VERSUS TEXT
• For connected users, Network-based models
are more accurate.
• For disconnected users (about 20% of the
nodes), Text-based models are more accurate.
• Solution: Utilise both text and network informa-
tion together!
LABEL PROPAGATION OVER TEXT PREDICTIONS
• Initialise training nodes with their known location
and test nodes with their text-based prediction.
• Iteratively update the location of non-training
nodes to the median of their neighbours.
• Converges after 10 iterations.
• Isolated test nodes will keep their text-based
prediction.
DENVER’S TOP FEATURES
RESULTS
State of the art results over all three datasets!
GEOTEXT TwitterUS TwitterWorld0
100
200
300
400
500
600
MedianErrorinkm
Text-based Method (LR)
Network-based Method (LP)
Hybrid Method (LP-LR)
Wing and Baldrige (2014)
Ahmed et al. (2013)

More Related Content

Viewers also liked (8)

virus
virusvirus
virus
 
Calgary Facts
Calgary FactsCalgary Facts
Calgary Facts
 
Sonnet 31 Sydney
Sonnet 31 SydneySonnet 31 Sydney
Sonnet 31 Sydney
 
Наслідки аксіом стереометрії
Наслідки аксіом стереометріїНаслідки аксіом стереометрії
Наслідки аксіом стереометрії
 
Taller #3
Taller #3Taller #3
Taller #3
 
Plan clase
Plan clasePlan clase
Plan clase
 
La educación prohibida
La educación prohibidaLa educación prohibida
La educación prohibida
 
Escuela de frankfurt y teoría crítica
Escuela de frankfurt y teoría críticaEscuela de frankfurt y teoría crítica
Escuela de frankfurt y teoría crítica
 

Similar to Geolocation twitter-text-network

geolocation twitter network text geotagging
geolocation twitter network text geotagginggeolocation twitter network text geotagging
geolocation twitter network text geotaggingafshinrahimi1983
 
ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Pre...
ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Pre...ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Pre...
ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Pre...Afshin Rahimi
 
Vitus Masters Defense
Vitus Masters DefenseVitus Masters Defense
Vitus Masters DefensederDoc
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
 
Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache SparkMatthew Rowe
 
SE-IT DSA THEORY SYLLABUS
SE-IT DSA THEORY SYLLABUSSE-IT DSA THEORY SYLLABUS
SE-IT DSA THEORY SYLLABUSnikshaikh786
 
Named Entity Recognition using Tweet Segmentation
Named Entity Recognition using Tweet SegmentationNamed Entity Recognition using Tweet Segmentation
Named Entity Recognition using Tweet SegmentationIRJET Journal
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Mariana Damova, Ph.D
 
Learning Resource Metadata Initiative: Vocabulary Development Best Practices
Learning Resource Metadata Initiative: Vocabulary Development Best PracticesLearning Resource Metadata Initiative: Vocabulary Development Best Practices
Learning Resource Metadata Initiative: Vocabulary Development Best PracticesMike Linksvayer
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?Samet KILICTAS
 
A technical paper presentation on Evaluation of Deep Learning techniques in S...
A technical paper presentation on Evaluation of Deep Learning techniques in S...A technical paper presentation on Evaluation of Deep Learning techniques in S...
A technical paper presentation on Evaluation of Deep Learning techniques in S...VarshaR19
 
03 interlinking-dass
03 interlinking-dass03 interlinking-dass
03 interlinking-dassDiego Pessoa
 
MiningEmailSocialNetworks
MiningEmailSocialNetworksMiningEmailSocialNetworks
MiningEmailSocialNetworkswebuploader
 
Information Management Trends 2009
Information Management Trends 2009Information Management Trends 2009
Information Management Trends 2009Christopher Eagle
 

Similar to Geolocation twitter-text-network (20)

geolocation twitter network text geotagging
geolocation twitter network text geotagginggeolocation twitter network text geotagging
geolocation twitter network text geotagging
 
ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Pre...
ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Pre...ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Pre...
ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Pre...
 
Vitus Masters Defense
Vitus Masters DefenseVitus Masters Defense
Vitus Masters Defense
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
Resume6272016
Resume6272016Resume6272016
Resume6272016
 
Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache Spark
 
SE-IT DSA THEORY SYLLABUS
SE-IT DSA THEORY SYLLABUSSE-IT DSA THEORY SYLLABUS
SE-IT DSA THEORY SYLLABUS
 
Named Entity Recognition using Tweet Segmentation
Named Entity Recognition using Tweet SegmentationNamed Entity Recognition using Tweet Segmentation
Named Entity Recognition using Tweet Segmentation
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 
Learning Resource Metadata Initiative: Vocabulary Development Best Practices
Learning Resource Metadata Initiative: Vocabulary Development Best PracticesLearning Resource Metadata Initiative: Vocabulary Development Best Practices
Learning Resource Metadata Initiative: Vocabulary Development Best Practices
 
1808.10245v1 (1).pdf
1808.10245v1 (1).pdf1808.10245v1 (1).pdf
1808.10245v1 (1).pdf
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
090626cc tech-summit
090626cc tech-summit090626cc tech-summit
090626cc tech-summit
 
A technical paper presentation on Evaluation of Deep Learning techniques in S...
A technical paper presentation on Evaluation of Deep Learning techniques in S...A technical paper presentation on Evaluation of Deep Learning techniques in S...
A technical paper presentation on Evaluation of Deep Learning techniques in S...
 
03 interlinking-dass
03 interlinking-dass03 interlinking-dass
03 interlinking-dass
 
MiningEmailSocialNetworks
MiningEmailSocialNetworksMiningEmailSocialNetworks
MiningEmailSocialNetworks
 
Information Management Trends 2009
Information Management Trends 2009Information Management Trends 2009
Information Management Trends 2009
 
Computer Networks Foundation
Computer Networks FoundationComputer Networks Foundation
Computer Networks Foundation
 

Recently uploaded

The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 

Recently uploaded (17)

The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 

Geolocation twitter-text-network

  • 1. Exploiting Text and Network Context for Geolocation of Social Media Users Afshin Rahimi,♥ Duy Vu,♠ Trevor Cohn,♥ and Timothy Baldwin♥ ♥ Department of Computing and Information Systems, ♠ Department of Mathematics and Statistics, The University of Melbourne OVERVIEW Task: Find the location of Twitter users based on text and net- work information Previous Shortcoming: No comparison of text-based and network-based models, no use of both. Datasets: 3 Twitter geolocation datasets: GeoText, Twitter-US, Twitter-World. Sample Format: userid, text, mention-list, latitude/longitude YOU ARE WHERE YOUR WORDS SAY YOU ARE Usage of mountain in U.S. TEXT-BASED MODEL (LR) Logistic regression with l1 regularisation over k-d tree discretisation of latitude/longitude. 130 120 110 100 90 80 70 60 Longitude 25 30 35 40 45 50 Latitude YOU ARE WHERE YOUR FRIENDS ARE Most of our online interactions are local. Twitter mention NETWORK-BASED MODEL (LP) Label Propagation in @-mention Network: • Build an @-mention network. • Initialise the location of training nodes with their known location. • Iteratively update non-training nodes’ location to the median of their neighbours. • Converges after 10 iterations. NETWORK VERSUS TEXT • For connected users, Network-based models are more accurate. • For disconnected users (about 20% of the nodes), Text-based models are more accurate. • Solution: Utilise both text and network informa- tion together! LABEL PROPAGATION OVER TEXT PREDICTIONS • Initialise training nodes with their known location and test nodes with their text-based prediction. • Iteratively update the location of non-training nodes to the median of their neighbours. • Converges after 10 iterations. • Isolated test nodes will keep their text-based prediction. DENVER’S TOP FEATURES RESULTS State of the art results over all three datasets! GEOTEXT TwitterUS TwitterWorld0 100 200 300 400 500 600 MedianErrorinkm Text-based Method (LR) Network-based Method (LP) Hybrid Method (LP-LR) Wing and Baldrige (2014) Ahmed et al. (2013)