Description of a configurable, real-time system for automatic record, analysis and visualization of information from user interactions in Twitter. The system is designed to provide public bodies (government agencies) with a powerful tool to rapidly and easily understand what the citizen behavior trends are, what their opinion about city services, events, etc. is, and also may be used as a primary alert system to improve the efficiency of emergency systems. The citizen is here observed as a proactive city sensor capable of generating huge amounts of very rich, high-level and valuable data through social media platforms, which, after properly processed, summarized and annotated, allows city officers to better understand citizen needs. The architecture and component blocks are described and some key details of the design, implementation and scenarios of application are discussed. Textalytics APIS are used for the semantic analysis of relevant tweets.
Presentation by DAEDALUS, UPM and UC3M at PEGOV 2014, 2nd International Workshop on Personalization in eGovernment Services and Applications, Aalborg, Denmark, in conjunction with the 22nd Conference on User Modeling, Adaptation and Personalization - UMAP 2014.
Tweet alert - semantic analysis in social networks for citizen opinion mining
1. PeGOV 2014 – 2nd Workshop on Personalization in eGovernment Services and Applications
11 July 2014, Aalborg, Denmark
TweetAlert:
Semantic Analytics in Social Networks
for Citizen Opinion Mining
in the City of the Future
Julio Villena-Román1,2,
Adrián Luna-Cobos1,3, José Carlos González-Cristóbal3,1
1 DAEDALUS - Data, Decisions and Language, S.A.
2 Universidad Carlos III de Madrid
3 Universidad Politécnica de Madrid
jvillena@daedalus.es, aluna@daedalus.es, josecarlos.gonzalez@upm.es
2. PeGOV-2014
11 July 2014, Aalborg, Denmark 2
Agenda
! Framework
! Citizen Sensor
! System
! Business cases
! Future work
3. PeGOV-2014
11 July 2014, Aalborg, Denmark 3
Framework
! Ciudad 2020 aims to achieve significant improvements in areas of
energetic efficiency, Internet of the Future, Internet of Things, human
behaviour, environmental sustainability and mobility and transport, in
order to design the City of the Future: sustainable, efficient, smart.
! Spanish R&D project, INNPRONTA Programme, Center for Industrial
Technological Development (CDTI), Ministry of Economy and
Competitiveness
! 2011-2014
! 16,3 M€ budget
! 5 multinational corporations, 4 SMEs, 8 PRIs
! Daedalus focuses on the automatic extraction of meaning from all types
of multimedia content, using NLP technologies and data/text analytics to
help our customers solve any challenge in these areas.
4. leisure and free time
surveys
PeGOV-2014
11 July 2014, Aalborg, Denmark 4
Citizen Sensor
mobility
professional activities
opinions in
social media
relationship with
public administration
collaborative
sensing
relationship with
other people
Citizen 2020 = another city sensor
5. PeGOV-2014
11 July 2014, Aalborg, Denmark 5
Citizen Sensor
! Innovative way to capture a very descriptive high-level
heterogeneous information, bringing high added value
especially when considering aggregations
! More complex and richer information than other sensors
! “smells awful”, “there is a fire”, “I’m going to the sales”…
! Individual actions may show citizen trends
! validate a bus ticket " route density
! Opinion/sentiments of the citizen about the city
! follow social networks to assess the impact of new policies
! Collaborative sensing
! using smartphones to get data (pollution, energy consumption) with low
cost and new possibilities
6. Our approach
What: Build a system able to capture, store and analyze user
PeGOV-2014
11 July 2014, Aalborg, Denmark 6
messages
Where: In Twitter
For whom: City administrators
What for: To help them rapidly and easily understand citizen
behaviour trends and know their opinion about city
services, events, etc.
Why: To enable them to better understand citizen necessities,
generate hypotheses over urban behaviour models, in
order to improve municipal management policies,
bringing them closer to the actual reality of the citizens
How: Using NLP technologies
When: In real-time
8. Information Repository
! Stores the high volume of data and provides advanced search
functionality to exploit the information
! Based on Elasticsearch
! open source, distributed, real-time search and analytics engine
! complex search capabilities
! scalable high-performance solution
PeGOV-2014
11 July 2014, Aalborg, Denmark 8
http://www.elasticsearch.org
9. PeGOV-2014
11 July 2014, Aalborg, Denmark 9
Gatherer
! Set of concurrent processes that query the Twitter APIs to collect
tweets
! Search or Streaming API
! Filter by a list of user identifiers, a list of keywords to track (terms,
hashtags) and/or a set of geographical bounding boxes
! Returns tweet text, author, location, embedded media
https://dev.twitter.com/docs/api/1.1
10. Text
Classification
API
http://textalytics.com
PeGOV-2014
11 July 2014, Aalborg, Denmark 10
Inquirer
! Set of concurrent processes that annotate tweets using our
Textalytics Core APIs
! Entities
! Concepts
Topic Extraction API
! Hashtags
! Thematic area of the message (transport, economy, daily life…)
! Citizen Sensor model
! Alert situations (road accidents, fires, street violence…)
! Specific location of the user (building, means of transport...)
! Events to which the text refers (cultural events, sports...)
! Sentiment polarity : P+, P, NEU, N, N+, NONE
! Irony and subjectivity
! User demographics: gender, age, type of tweet author
Sentiment Analysis API
User Demographics API
11. Entities, concepts, hashtags
Advanced NLP to obtain POS, syntactic tree and semantic analyses of the
text and use it to identify different types of significant elements
PeGOV-2014
11 July 2014, Aalborg, Denmark 11
12. Text classification
State-of-the-art hybrid text classification model using a statistical
classification combined with a rule-based filtering
PeGOV-2014
11 July 2014, Aalborg, Denmark 12
Social Media
Citizen Sensor
16. Sentiment analysis
State-of-the-art lexicon-based model for sentiment analysis, using POS
and syntactic tree for detecting negation and controlling the scope of
modifiers + subjectivity classification + irony detection
PeGOV-2014
11 July 2014, Aalborg, Denmark 16
17. User Demographics
Text classification based on n-grams model to guess user type, gender and
age from his/her login, name and profile description
PeGOV-2014
11 July 2014, Aalborg, Denmark 17
18. PeGOV-2014
11 July 2014, Aalborg, Denmark 18
Example
{
"text":"el viento ha roto una rama y hay un atascazo increible en toda la gran vía...",
"tag_list":[
{"type":"sensor", "value":"011002 Ubicación - Exteriores - Vías públicas"},
{"type":"sensor", "value":"070700 Alertas meteorológicas - Viento"},
{"type":"sensor", "value":"080100 Incidencia - Congestión de tráfico"},
{"type":"topic", "value":"06 medio ambiente, meteorología y energía"},
{"type":"entity", "value":"Gran Vía"},
{"type":"concept", "value":"viento"},
{"type":"sentiment", "value":"N"},
{"type":"subjectivity", "value":"OBJ"},
{"type":"irony", "value":"NONIRONIC"},
{"type":"user_type", "value":"PERSON"},
{"type":"user_gender", "value":"FEMALE"},
{"type":"user_age", "value":"25-35"}
]
}
21. PeGOV-2014
11 July 2014, Aalborg, Denmark 21
Ongoing business cases
! City console for a local administration to analyze in real-time the
behaviour and topics of interest of the citizens, with two
components:
! a private console, internal for the city services, for analytics
! a public dashboard to engage citizens with their city, displaying
attractive, summarized, non-confidential information at selected
public locations (town hall, libraries, museums) or a LED video wall in
a populous square in downtown
! Social alert detection system
! For 112 emergency services, providing early detection of security-related
issues
22. For short/mid term future
! Trending topics geolocation clustering
PeGOV-2014
11 July 2014, Aalborg, Denmark 22
! Analysis at neighbourhood level
health
traffic
jam
air pollution
jellyfish
pollen
23. For short/mid term future
PeGOV-2014
11 July 2014, Aalborg, Denmark 23
! Analysis of city pace of life
24. For short/mid term future
PeGOV-2014
11 July 2014, Aalborg, Denmark 24
! Mobility analysis
! How, when, why people move through the city
! Route identification (home"work"free time"home)
! Route changes (due to weather)
25. For short/mid term future
! City reputation and brand personality
! Automated satisfaction surveys
PeGOV-2014
11 July 2014, Aalborg, Denmark 25
26. This work has been supported by several Spanish R&D projects: Ciudad2020: Hacia un nuevo modelo de ciudad inteligente
sostenible (INNPRONTA IPT-20111006), MA2VICMR: Improving the access, analysis and visibility of the multilingual and
multimedia information in web for the Region of Madrid (S2009/TIC-1542) and MULTIMEDICA: Multilingual Information
Extraction in Health domain and application to scientific and informative documents (TIN2010-20644-C03-01). Authors
would like to thank all partners for their knowledge and support.
PeGOV-2014
11 July 2014, Aalborg, Denmark 26