The digital reflection of our cities is sharpening and it is tracking their evolution with a decreasing delay. This happens thanks to the pervasive deployment of sensors, the wide adoption of smart phones, the usage of (location-based) social networks and the availability of datasets about urban environment. So while data becomes every day more abundant, decision makers face the challenge to increase their capability to create value out of the analysis of this data. This key note presents how advance visual analytics, ontology base data access and information flow processing methods can help in making sense of Social Media Streams and Call Data Records from Mobile Network Operators during city scale events. Real-world deployments demonstrate the ability of those methods to advance our ability to feel the pulse of our cities in order to deliver innovative services.
Listening to the pulse of our cities fusing Social Media Streams and Call Data Records
1. Listening to the pulse of our cities
fusing Social Media Streams and
Call Data Records
Emanuele Della Valle
emanuele.dellavalle@polimi.it
http://emanueledellavalle.org
18th International Conference on
Business Information Systems
24-26 June 2015, Poznań, Poland
2. http://emanueledellavalle.org - Emanuele Della Valle
Me
Assistant Professor at DEIB
Politecnico di Milano
Expert in semantic technologies
and stream computing
Inventor of stream reasoning:
an approach to master the
velocity and variety dimension
of Big Data
15 years experience in research
and innovation projects
startupper: fluxedo.com
3
Emanuele Della Valle http://emanueledellavalle.
3. http://emanueledellavalle.org - Emanuele Della Valle
Acknowledgements
Politecnico di Milano
• DEIB
– What
- Scientific direction
- Semantic technologies
- Stream Processing
- Data science
– Who
- Emanuele Della Valle
- Marco Balduini
• Density Design Lab
– What
- Visual analytics
– Who
- Paolo Ciuccarelli
- Matteo Azzi
Telecom Italia
• SKIL Lab
– What
- Big Data technology
- Data Science
– Who
- Fabrizio Antonelli
- Roberto Larker
Funding agency
4
5. http://emanueledellavalle.org - Emanuele Della Valle
The digital reflection of our cities is sharpening
6
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
6. http://emanueledellavalle.org - Emanuele Della Valle
The digital reflection of our cities is sharpening
7
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
because the urban environment
is captured in open datasets
7. http://emanueledellavalle.org - Emanuele Della Valle
The digital reflection of our cities is sharpening
8
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
and streams of information flows
through our cities thanks to
8. http://emanueledellavalle.org - Emanuele Della Valle
The digital reflection of our cities is sharpening
9
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
and streams of information flows
through our cities thanks to
the pervasive deployment
of sensors
9. http://emanueledellavalle.org - Emanuele Della Valle
The digital reflection of our cities is sharpening
10
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
and streams of information flows
through our cities thanks to
the wide adoption of smart
phones
10. http://emanueledellavalle.org - Emanuele Della Valle
The digital reflection of our cities is sharpening
11
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
and streams of information flows
through our cities thanks to
the usage of (location-based)
social networks
12. http://emanueledellavalle.org - Emanuele Della Valle
and it is tracking changes with a decreasing delay
13
Data source By when Frequency Delay
Census data 100s year years months
Newspaper 100s year days 1 day
Weather sensors 10s year hours/minutes hours/minutes
TV news 10s years hours minutes
Traffic sensors years 15 minutes minutes
Call Data Recors years 15 minutes hours
Social media years seconds seconds
IoT recently milliseconds milliseconds
14. http://emanueledellavalle.org - Emanuele Della Valle
But smarter Big Data can …
…advance our ability to feel the pulse of our cities
15
fusing all those
data sources
making sense of the
fused information
mayor
Definitely E!
to improve decision making and deliver innovative services
15. http://emanueledellavalle.org - Emanuele Della Valle
Can we collect, analyse and repurpose
• social media and
• Call Data Records
to allow
• perceiving emerging patterns and
• observing their dynamics?
Let's focus on a concrete research question
16
[photo: https://www.flickr.com/photos/debord/4932655275]
16. http://emanueledellavalle.org - Emanuele Della Valle
Can we collect, analyse and repurpose
• social media captured at place and events and
• privacy-preserving aggregates of Call Data Records
to allow visually
• perceiving emerging patterns and
• observing their dynamics?
More precisely, the research question is
17
[photo: https://www.flickr.com/photos/debord/4932655275]
17. http://emanueledellavalle.org - Emanuele Della Valle
How to set up an experiment?
18
[photo: https://www.flickr.com/photos/myfuturedotcom/6053042920]
Question Answer
Which city? Milan
Comparing what? Milan Design Week vs. Milan in general
Experimental subjects? Event Managers & casual audience
18. http://emanueledellavalle.org - Emanuele Della Valle
What's Milan Design Week?
19
[map: http://www.fuorisalone.it]
The Milan Design Week (MDW) is a city-scale event
• held yearly in Milan,
• featuring around 1,200 events
• in 500+ places spread across the city and
• attracting about half a million people from all over the
world.
19. http://emanueledellavalle.org - Emanuele Della Valle
Ingredients of the proposed solution
Big Data technologies
- Address "velocity" of data streams in memory
- Address "volume" of data that do not fit in memory
semantic technologies
- Address "variety" using Ontology Based Data Access
- Named Entity Recognition and Linking
data science
- Statistical modelling
- detecting anomalies
Visual analytics
- Allow no-expert access to data
- Tell stories out of data
20
20. http://emanueledellavalle.org - Emanuele Della Valle 21
CitySensing - a solution for event managers (2013)
F. Antonelli, M.Azzi,
M.Balduini, P.Ciuccarelli,
E.Della Valle, R. Larcher:
City sensing: visualising
mobile and social data
about a city scale event.
AVI 2014: 337-338
http://jol.telecomitalia.com/jols
kil/citysensing/
21. http://emanueledellavalle.org - Emanuele Della Valle 22
CitySensing - a solution for casual audience (2014)
M.Balduini, E.Della Valle, M.Azzi, R.Larcher, F.Antonelli, and P.Ciuccarelli:
CitySensing: Fusing City Data for Visual Storytelling. IEEE MultiMedia. TO APPEAR
http://jol.telecomitalia.com/jolskil/citysensing/
http://citysensing.fuorisalone.it/
22. http://emanueledellavalle.org - Emanuele Della Valle 23
How CitySensing works – step 0
Set up a conceptual model (FraPPE) to master the variety in the data sources
M.Balduini, E. Della Valle: FraPPE: a vocabulary to represent heterogeneous
spatio-temporal data to support visual analytics. ISWC 2015 TO APPEAR
23. http://emanueledellavalle.org - Emanuele Della Valle
How CitySensing works – step 0
FraPPE
• Goal: a vocabulary to represent heterogeneous spatio-
temporal data to support visual analytics
FraPPE offers an homogenous view to the
visual analytics interface built on heterogeneous
data
24
24. http://emanueledellavalle.org - Emanuele Della Valle
How CitySensing works – step 1
25
For every pixel compute the volume of Call Data Records
(using privacy-preserving aggregation)
Real data recorded on 13 April 2013 between 13:00 and 00:00
25. http://emanueledellavalle.org - Emanuele Della Valle
How CitySensing works – step 2
26
Find the anomalous pixels comparing the current
volumes with a model of the volumes in this time period
Real data recorded on 13 April 2013 between 13:00 and 00:00
26. http://emanueledellavalle.org - Emanuele Della Valle
How CitySensing works – step 3
27
Map anomalies to the districts of Milano Design Week
Brera
Tortona
What's
this?
Real data recorded on 13 April 2013 between 13:00 and 00:00
27. http://emanueledellavalle.org - Emanuele Della Valle
How CitySensing works – step 4
28
For every anomalous pixel capture the hashtags and semantic
entities named in the social media streams
Brera
Tortona
What's
this?
Real data recorded on 13 April 2013 between 13:00 and 00:00
28. http://emanueledellavalle.org - Emanuele Della Valle
How CitySensing works – step 5
29
Take away the hashtags and semantic entities that are
systematically used
Brera
Tortona
Real data recorded on 13 April 2013 between 13:00 and 00:00
29. http://emanueledellavalle.org - Emanuele Della Valle 30
Logical architecture of CitySensing – setup time
Analyse Data Stream
Build Models
Capture Data Stream Capture Static Data
MDW
30. http://emanueledellavalle.org - Emanuele Della Valle 31
Logical architecture of CitySensing – run time
Analyse Data Stream
Build Models
Detect Anomalies
Capture Data Stream
Visualize Analysis
Store Analysis
Capture Static Data
MDW
31. http://emanueledellavalle.org - Emanuele Della Valle
Capturing static data via FraPPE
The frame duration was fixed to
15 minutes
Milano area was covered with
• 1 grid (100x100)
• 10,000 cells
• 250x250 meters in each cell
(the size of the mobile
network cells in the centre
of Milan)
During the Milano Design Week
a total of 5.76 Mln pixel were
captured
+1000 events in +600 places
where collected using the
crowd-sourced databases of fuorisalone.it, breradesigndistrict.it and
tortonaroundesign.com thanks to a partnership with studiolabo
32
Cells in which there are places
hosting Milan Design Week 2013
events
32. http://emanueledellavalle.org - Emanuele Della Valle
Processing Telecom Italia Call Data Records
1.92 Mln Gaussian models were built
• one for each pixel (i.e., for each frame and cell)
• grouping the frames by working and week-end days
• using two months of Call Data Records, and
• verifying volume of CDR has a Gaussian distribution with an
Anderson-Darling test with a significance of 0.05
Built on Pig, R e Cascalog
The processing on 7 m1.large EC2 machines took 24 hours
33
Bad case Good case
Histogram
Histogram
Q-QPlot
Q-Qplot
33. http://emanueledellavalle.org - Emanuele Della Valle
Processing Telecom Italia Call Data Records
Volume of CDR captured in Milan during the Design Week
Calls, SMS and Internet access
were aggregated
(with privacy-preserving
methods) and an
anomaly index was
computed for each of
the 5.76 Mln pixel
The processing of 1 day on 7 m1.large EC2 took 20 mins
34
What 2013 2014
Calls 16,743,875 19,719,629
SMSs 19,454,497 20,240,485
Internet data accesses 137,381,761 197,767,245
[image: https://cerijayne.files.wordpress.com/2011/10/outliersss.png]
34. http://emanueledellavalle.org - Emanuele Della Valle
Do CDR-anomalous pixels relate to events?
CDR-anomalous pixels =pixels in which the anomaly
index is high (>+2σ and <-2σ)
To test if the anomalous pixels were related to the events
of the Milan Design Week
• We used three ground truth
– the pixel of Milan
– the pixels of Brera district
– the pixels of Tortona district
where there was at least an event of Milan Design Week 2013
• We compute
– Precision
– Recall
of the anomalous pixels to find pixels in those three ground
truths
35
38. http://emanueledellavalle.org - Emanuele Della Valle
Processing Social Streams
The machinery: the Streaming Linked Data framework
39
M.Balduini, E.Della Valle, D.Dell'Aglio, M.Tsytsarau, T.Palpanas, and C.Confalonieri:
Social Listening of City Scale Events Using the Streaming Linked Data Framework.
International Semantic Web Conference (2) 2013: 1-16
Stream Bus
AnalyserDecorator
Adapter Publisher VisualizerStream
HTTP
HTTP
Data Source Streaming Linked Data Server HTML5 Browser
39. http://emanueledellavalle.org - Emanuele Della Valle
Processing Social Streams
Decoration at work
40
Happily into a bottle of Heineken
bear #heinekendesignweek
@ the Heineken Magazzini
City-Scale Event: Milano Design Week
Event: Heineken Design Week
Location: The Magazzini
hosts
takesPlaceIn
M.Balduini, A.Bozzon, E.Della Valle, Y.Huang, G-J Houben: Recommending Venues Using
Continuous Predictive Social Media Analytics. IEEE Internet Computing 18(5): 28-35
(2014)
40. http://emanueledellavalle.org - Emanuele Della Valle
Processing Social Streams
predictive models were built
• For hastags and semantic entities systematically present
• Using a Holt-Winter method
• grouping the frames by
– working and week-end days and
– Early morning, morning, afternoon, evening, and late night
• Analysing 300,000 geo-located micro-posts collected other
6 months in Milano area (november 2013, aprile 2014)
• It takes few seconds per hashtag/semantic entity on a
60€/month VM in a IaaS
41
Data
Fitted
Forecast
Lower 2,5%
Upper 97,5%
41. http://emanueledellavalle.org - Emanuele Della Valle
Processing Social Streams
Usage of #milan in the weeks around Milan Design Week
Subtracting the predicted usage of #milan
42
200 – 700
700 – 1100
1100 – 1400
1400 – 1900
1900 – 200
200 – 700
700 – 1100
1100 – 1400
1400 – 1900
1900 – 200
WD WE WD WE WD WE WD WE WD
Milan
Design
Week
WD WE WD WE WD WE WD WE WD
42. http://emanueledellavalle.org - Emanuele Della Valle
Processing Social Streams
The difference between the observed and the predicted
usage of #milan perfectly fits the usage of #mdw (the official
hashtag of Milan Design Week)
43
200 – 700
700 – 1100
1100 – 1400
1400 – 1900
1900 – 200
200 – 700
700 – 1100
1100 – 1400
1400 – 1900
1900 – 200
WD WE WD WE WD WE WD WE WD
Milan
Design
Week
Anomalous
usage of
#milan
Usage of
#mdw
43. http://emanueledellavalle.org - Emanuele Della Valle
Processing Social Streams
Geo-references micro-posts captured, semantically annotated,
cleansed using the predictive models and analyzed in Milan area
For each pixel with at least 1 micro-post we computed
The volume related to Milano Design Week
The top-10 hashtags
The top-3 locations/events
Real-time processing was possible with our in-memory
C-SPARQL engine and the Streaming Linked Data framework on
a 20€/month VM in a IaaS
44
What 2013 2014
Geo-located micropost 57,154 21,782
Linked to Milano Design Week 3,569 3,499
Linked to a specific location/event 761 547
44. http://emanueledellavalle.org - Emanuele Della Valle
Do socially active pixels relate to events?
socially active pixels =pixels in which we captured social
media that talk about Milan
Design Week
To computes
• precision
• recall
of the socially active pixels in find pixels in pixels in the
three ground truths about Milan, Brera district and
Tortona district
45
49. http://emanueledellavalle.org - Emanuele Della Valle
Anomalous Socially active Intersection Similar?
Are CDR-anomalous and socially active pixels similar?
Which of the following four scenarios?
50
50. http://emanueledellavalle.org - Emanuele Della Valle
Are CDR-anomalous and socially active pixels similar?
More formally
• Jaccard
• E.g.,
51
J(A,B) = 8/11 J(A,B) = 3/11
A B A
B
J(A,B) =
|A ∩ B|
|A∪B|
54. http://emanueledellavalle.org - Emanuele Della Valle
Evaluation methodology for the casual audience
Guessability study
• Can you guess what I mean without any explanation?
E.g.
55
Dinosaur extinction
"The Shining" by Stephen King
56. http://emanueledellavalle.org - Emanuele Della Valle
The patters you should have got
The CDR-anomaly and the social activity is
57
Correlated Partially correlated Not correlated
57. http://emanueledellavalle.org - Emanuele Della Valle
Evaluation of interface guessability
58
Q: In Brera District
the volume of social
media signal is
partially correlated
with the value of
mobile anomaly
signal
A:
0
0.2
0.4
0.6
0.8
1
58. http://emanueledellavalle.org - Emanuele Della Valle
Evaluation of interface guessability
59
Q: In Porta Romana
the volume of social
media signal is
strongly correlated
with the value of
mobile anomaly
signal
A:
0
0.2
0.4
0.6
0.8
1
59. http://emanueledellavalle.org - Emanuele Della Valle
Evaluation of interface guessability
60
Q: In Tortona District
the volume of social
media signal is
strongly correlated
with the value of
mobile anomaly
signal
A:
0
0.2
0.4
0.6
0.8
1
60. http://emanueledellavalle.org - Emanuele Della Valle
Back to the research question
61
[photo: https://www.flickr.com/photos/debord/4932655275]
Can we collect, analyse and repurpose
• social media captured at place and events and
• privacy-preserving aggregates of Call Data Records
to allow visually
• perceiving emerging patterns and
• observing their dynamics?
Yes!
at least, in Milano Design Week 2013 and 2014
[photo: https://flic.kr/p/beuDaX ]
62. http://emanueledellavalle.org - Emanuele Della Valle 63
Take home message … guess it :-)
Emanuele Della Valle
emanuele.dellavalle@polimi.it
http://emanueledellavalle.org
63. Listening to the pulse of our cities
fusing Social Media Streams and
Call Data Records
Emanuele Della Valle
emanuele.dellavalle@polimi.it
http://emanueledellavalle.org
18th International Conference on
Business Information Systems
24-26 June 2015, Poznań, Poland