An exploratory visual analytics approach was used to identify temporal distributions, spatial clusters and popular routes of tourists in Amsterdam by making use of geotagged photos from social media platform Flickr. The presented methods combine the analytical strength of humans with the data processing power of computers, using geovisualisations and charts to explore data, find patterns, and draw conclusions from its outcomes. For this research, the metadata of 2,849,261 geotagged photos was harvested from Flickr and stored in a spatial database. From this dataset, 393,828 photos were located in the municipality of Amsterdam. A semi-automatic classification method classified 39,1% of the users as tourist with a very high precision and recall. The temporal distribution of tourists and locals is compared for different temporal granularities. A method is presented to assess photo timestamps by making use of photos that contain a real clock. An existing grid-based clustering method was implemented and improved to explore Amsterdam’s spatial distribution of tourists in Google Earth. The major tourist hotspots are detected using the density-based clustering algorithm DBSCAN. Finally, the most probable routes of tourists between subsequent photo locations were estimated and aggregated into a route density map. A qualitative approach was used to validate the study outcomes by interviewing eight tourism experts of the municipality of Amsterdam. Their knowledge about the city bears a good resemblance with the detected spatial clusters and route density map of tourists. Despite several imperfections of geosocial data, we conclude that the methods provide meaningful insight into the spatial and temporal patterns of tourists in urban spaces and are a valuable addition to traditional tourism surveys.
3. MORE AND MORE CONCERNS ABOUT TOURISM
A SELECTION OF RECENT NEWS ARTICLES
They are puking and peeing on the Zeedijk
NOS, December 5 2014
Is Amsterdam becoming a second Venice?
De Morgen, March 27 2015
The center of Amsterdam should not become too popular
Volkskrant, October 25 2014
Amsterdam taken over by tourists
RTL, April 3 2015
Amsterdam will welcome twice as many tourists in 2030
Het Parool, December 9 2014
4. INITIAL RESEARCH TOPIC
WAGENINGEN UNIVERSITY AND AMS
Explore the possibilities to use (geo)tweets for detecting
spatial and temporal patterns of tourists in Amsterdam
But why Twitter? How about Flickr?
Twitter Flickr
Number of users + + + / -
Amount of data + + +
Connection of data to real location + / - + +
Use by tourists + / - + +
Interval between subsequent posts + / - + +
5. RESEARCH PROJECT
The objective of this exploratory research project is to develop,
implement and test methods that reveal spatial and temporal patterns
of tourists from a large dataset of geotagged Flickr photos
OBJECTIVE
RESEARCH QUESTIONS
RQ-01: What methods are available to detect spatial and temporal
patterns from geosocial data?
RQ-02: What methods need to be implemented to identify
temporal distributions, spatial clusters and popular routes of
tourists from the metadata of Flickr photos?
RQ-03: How well do the identified temporal distributions, spatial
clusters and popular routes resemble the spatial and temporal
behaviour of tourists?
7. FLICKR DATA COLLECTION
OVERVIEW OF STEPS & TECHNIQUES
Flickr Database
(API)
Request
Local database
(PostgreSQL)
Java application
XML-file
Metadata
Restriction: 1 request per second
8. FLICKR DATA COLLECTION
STEP 1: HARVESTING PHOTO ID’S WITHIN BOUNDING BOXES (1550)
Search parameters:
• Xmin, Xmax, Ymin, Ymax
• Min date: January 1, 2005
• Max date: December 31, 2014
Search result:
• Photo ID
• User ID
• Photo title
9. FLICKR DATA COLLECTION
STEP 2: REQUESTING ADDITIONAL METADATA
Search parameters:
• Photo ID
Search result:
• Latitude, longitude
• Date and time
• User name
• User home location
• Tags
• Photo URL
• Location accuracy
2.849.261 photos
+/- 5 weeks of harvesting
10. FLICKR DATA COLLECTION
STEP 2: REQUESTING ADDITIONAL METADATA
Search parameters:
• Photo ID
484.346 photos
Search result:
• Latitude, longitude
• Date and time
• User name
• User home location
• Tags
• Photo URL
• Location accuracy
14. TOURIST CLASSIFICATION
1. Classification of user location by SQL
UPDATE users
SET countryname = 'Japan', istourist = 'True', classification = 'SQL'
WHERE geoname = '' AND userid IN
(SELECT userid FROM users WHERE (userlocation ~* 'y(japan|nippon|日本)y'))
(8628 users - 54%)
SQL AND ONLINE GEOCODING
Geonames API
(External database)
PostgreSQL
(Local database)
Java Application
2. Classification of user location by online geocoding
Tokyo Tokyo
Japan Japan
(450 users - 3%)
User location = Tokyo Tokyo = Japan
16. NUMBER OF UNIQUE PHOTOS
0
40.000
80.000
120.000
160.000
132.213
107.016
154.599
39,3% 27,2% 33,6%
Local Photos Tourist Photos Unclassified Photos
TOURIST CLASSIFICATION
Overall accuracy = 99%
17. CLASSIFICATION RESULTS AMSTERDAM
RELATIVE AMOUNT OF TOURISTS PER NATIONALITY (2013)
United States
United Kingdom
Germany
Italy
Spain
France
0% 5% 10% 15% 20%
Flickr nationalities 2013
CBS hotel nationalities 2013
19. TEMPORAL DISTRIBUTIONS
RELATIVE NUMBER OF TOURISTS AND PHOTOS PER HOUR (2005-2014)
0%
2%
4%
6%
8%
10%
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
0:00
Tourists
Tourist photos
Many daytime
photos
20. TEMPORAL DISTRIBUTIONS
RELATIVE NUMBER OF TOURISTS AND LOCALS PER HOUR (2005-2014)
0%
2%
4%
6%
8%
10%
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
0:00
Tourists
Locals
Maximums shifted
Relatively more
tourists photos
in the night
More local
photos in
the evening
21. Exact match
2 hours off
TIMESTAMP VALIDATION
TIME DIFFERENCE BETWEEN PHOTO TIMESTAMP AND REAL TIME
22. TIMESTAMP VALIDATION
TIME DIFFERENCE BETWEEN PHOTO TIMESTAMP AND REAL TIME
Selecting
• all photos tagged with ‘clock’
• all photos near Central Station
!
1032 photos of locals
1134 photos of tourists
Result
• 70 suitable photos of tourists
• 50 suitable photos of locals
24. PHOTOGRAPHERS PER DAY OF THE WEEK (2005-2014)
0%
5%
10%
15%
20% Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
Tourists
Locals
TEMPORAL DISTRIBUTIONS
25. PHOTOGRAPHERS PER MONTH (2005-2014)
0%
2%
4%
6%
8%
10%
12%
January
February
March
April
May
June
July
August
September
October
November
December
Tourists
Locals
TEMPORAL DISTRIBUTIONS
26. TOURISTS AND FOREIGN HOTEL GUESTS PER MONTH (2012+2013)
0%
2%
4%
6%
8%
10%
12%
January
February
March
April
May
June
July
August
September
October
November
December
Tourists (Flickr 2012 + 2013)
Hotel guests (CBS 2012 + 2013)
TEMPORAL DISTRIBUTIONS
32. SPATIAL DISTRIBUTION
DENSITY-BASED CLUSTERING
DBSCAN: Density-Based Spatial Clustering for Applications with Noise
• Detects clusters with different shapes and sizes
• Not sensitive to noise very suitable for geosocial data
!
• Eps: radius search area
• MinPts: minimum number of points in neighborhood
Eps
Noise
MinPts=4
45. STEP 2: REDUCE TRAVEL COST PER ROAD SEGMENT BASED ON PHOTO DENSITY
TOURISTIC ROUTES
2,6
1,9
1,4
4,2
3,1
1,8
6,9
6,2
4,1
7,3
9,3
9,6
46. 1. Create pairs of time-ordered photo locations per user
Point A Point B
Point B Point C
… …
!
2. Calculate distance, time interval and speed per photo pair
3. Select all photo pairs within thresholds:
• Distance > 50 m and < 750 m
• Time interval > 0 sec and < 600 sec
• Speed > 1 km/h and < 5 km/h
4. Calculate closest network node for start and end of every pair
TOURISTIC ROUTES
STEP 3: CREATE PHOTO PAIRS FOR ROUTING
47. TOURISTIC ROUTES
STEP 4: CALCULATE ROUTES AND AGGREGATE INTO ROUTE DENSITY MAP
1. Calculate route for 6,477 photo pairs with pgRouting
2. Aggregate and count overlaying route segments
3. Visualize touristic route densities
48. TOURISTIC CLUSTERS AND ROUTES
VALIDATION OF RESULTS
Solution: Expert judgement by a questionnaire
Participants: 8 tourism experts from different departments of the
municipality of Amsterdam
Problem: No comparable quantitative data available
49. TOURISTIC ROUTES
VALIDATION OF RESULTS BY 8 TOURISM EXPERTS
Match: 75% Match: 38% Match: 75%
Match: 100% Match: 100% Match: 63%
Match: 100% Match: 67% Match: 67%
Match: 100% Match: 100% Match: 100%
WITH HIGH CONFIDENCE (5/5)3
50. VALIDATION OF RESULTS
TOURISTIC CLUSTERS AND ROUTES
Expert # Profession
Validity
results [1-5]
Usefulness
results [1-5]
1 Policy Advisor Traffic & Public Space 4 5
2 Data Analyst, Information en Statistics 4 4
3 Senior Advisor Traffic Management 4 4
4 Researcher, Information en Statistics 3 4
5 Senior Advisor Traffic Research 5 4
6 Urban Planner 5 5
7 Urban Planner 4 5
8 Urban Designer 4 5
4.1 4.5
How well do the study outcomes resemble the real world?
Are the study outcomes useful for you or for your organization?
*
**
* **
51. SUGGESTIONS FOR FUTURE WORK
AND POTENTIAL THESIS TOPICS
• Calibrate thresholds with quantitative data
• Extensive validation of results in cooperation with tourism experts
• Cooperate with municipality to define objectives, some suggestions:
Additional data sources: Instagram, Twitter, Sina Weibo
Divide spatial distributions in different temporal intervals
Compare spatial distribution of locals and tourists
Divide the spatial distributions in different nationalities
Use the presented patterns as input for an agent-based model
Discover typical tourism problems with other geosocial data types
52. THANK YOU FOR YOUR ATTENTION!
ANY QUESTIONS OR REMARKS?