In this talk I will consider the analysis of social media data in an urban context, in particular we look at textual data, visual data and all their metadata to understand social and business phenomena. Analyzing such complex and diverse data poses major challenges for the analyst as the insight of interest is a result of an intricate interplay between the different modalities, their metadata and the evolving knowledge the analyst has about the problem. Our multimedia analytics solutions brings together automatic multimedia analysis and information visualization to give the analyst the optimal opportunities to get insight in complex datasets and use them in applications such as recommending venues to tourists, measuring the effect of city marketing campaigns, or seeing how social multimedia redefines urban borders.
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Analyzing large multimedia collections in an urban context - Prof. Marcel Worring
1. 12-7-2016
1
Amsterdam Data Science
Marcel Worring
Marcel Worring
Analyzing large multimedia
collections in an urban context
Marcel Worring
Stevan Rudinac, Jan Zahalka, Dennis Koelma
Joost Boonzajer Flaes, Jorrit van den Berg
Informatics Institute, Amsterdam Data Science
MSc. VU computer Science
PhD: UvA Informatics Institute
Now: 0.8fte Informatics Institute
0.2fte Amsterdam Business School
Associate Director Amsterdam Data Science
Amsterdam Data Science
Objective and Subjective data
Image data
Numeric data
Geographic data
Structured data
Unstructured data
Temporal data
Textual data
Open dataOpen Data
Geo location
.,. Amsterdam, Netherlands
Exif
.,. Camera: Nikon N60
.,. Focal length: 55 mm
.,. Exposure time: 1/200
.,. Flash: off
Author
.,. josemanuelerre (Flickr)
.,. Jose´ Manuel R´ıos
Valiente
Tags
.,. cyclist
.,. bike
.,. street
Comments
.,. “I love Amsterdam!
great photo!”
.,. “Great compostion,
beautiful B&W!!”
.,. “Estupendo B&N, bella
imagen.”
. . .
Data Sources
2. 12-7-2016
2
.,. “Koningsdag, or ‘King’s Day,’ is one of the principal
holidays of the Netherlands. . . ”
.,. In this case, the image says more than the text
Photo: quantz @ Flickr
Data Sources Objective and Subjective data
Open dataOpen Data
+ Content Analysis
WHAT DOES IS BRING?
Professional Recommender Systems
Recommender system for tourists
11
Touristic Routing
3. 12-7-2016
3
City Sentiment City Marketing Analytics
ALGORITHMS
Ranking of data
Some query defines starting point and order Result
Best
Worse
An image/video/text collection
For Social Media
• The Ranking can be based on
– The objective content of the comments
– The subjective content of the comments
– The objective visual content
– The subjective visual content
– ………
• Or any combination of the above
Concept detection
Learn model
Visual examples
Positive negative
Unknown images Score of presence
-> ranking
4. 12-7-2016
4
Zebu
Requires annotation
to learn
Animals
PeopleLions Lemurs
What do we learn?
14,197,122 images, 21841 synsets indexed
1200 trained visual concept detectors for adjective-noun pairs
The new trend: Deep learning
Krishevsky NIPS 2012
Start with raw pixels, learn all parameters
The learned filters
Zeiler and Fergus
The layered network
Krishevsky NIPS 2012
Convolution + pooling + fully connected layers +
output layers
60.000.000 parameters to learn
But what do all these layers do?
5. 12-7-2016
5
Visualizing deep networks
Zeiler and Fergus
Visualizing deep networks
Visualizing deep networks Visualizing deep networks
State-of-the-art: GoogleNet
and growing ……
Makes image search keyword driven
Text Analysis
D. Blei, 2003
Latent Dirichlet Allocation
Latent Dirichlet Allocation
6. 12-7-2016
6
Latent Dirichlet Allocation
D. Blei, 2003
.,. Generative model, discovers topics and scores them
.,. 100 topics are enough to sufficiently cover entire
Wikipedia
.,. Input: Raw text
.,. Output: Topic scores per document
0.054*mexico + 0.049*forest + 0.024*argentina
+ 0.022*islands + ...+ 0.014*aires
Latent Dirichlet Allocation
We treat comments or sets tags as documents
VENUE RECOMMENDER
.,. Venue recommendation — suggesting places of interest
(venues) based on user preferences
.,. The classic approach is collaborative filtering utilizing the
user-item matrix
The task
.,. City Melange — a venue explorer utilizing multimedia
analytics techniques
.,. Content-based — based solely on the content of
venue-related social media
.,. Multimodal — combining content from images and the
associated text
.,. Interactive — user preferences are modelled on the fly
as you explore the city
.,. Cross Platform — integrates data from diverse social
platforms
City Melange Characteristics
Venue information
Venue images
Images, metadata
User data
Q(venue name,geo)
Data Gathering
7. 12-7-2016
7
Content
V
Images
T
Tags
Comments
. . . VC
Venues Users
U
Data Analysis
Content
V
Images
T
Tags
Comments
. . . VC
Venues Users
U
Features
VF
ConvNet
TF
LDA
Data Analysis
Content
V
Images
T
Tags
Comments
. . . VC
Venues Users
U
Features
VF
ConvNet
TF
LDA
Clustering
Processed data
VT
V Visual venue
topics
Data Analysis
Content
V
Images
T
Tags
Comments
. . . VC
Venues Users
U
Features
VF
ConvNet
TF
LDA
Clustering
Processed data
VT
V Visual venue
topics
Visual user
topicsVT
U
Data Analysis
Content
V
Images
T
Tags
Comments
. . . VC
Venues Users
U
Features
VF
ConvNet
TF
LDA
Clustering
Processed data
VT
V
VT
U
Visual venue
topics
Visual user
topics
Text venue
topicsT
V
T
Data Analysis
Content
V
Images
T
Tags
Comments
. . . VC
Venues Users
U
Features
VF
ConvNet
TF
LDA
Clustering
Processed data
VT
V
VT
U
T
V
T
Visual venue
topics
Visual user
topics
Text venue
topics
Text user
topicsT
U
T
Data Analysis
8. 12-7-2016
8
Content
V
Images
T
Tags
Comments
. . . VC
Venues Users
U
Features
VF
ConvNet
TF
LDA
Clustering
Processed data
VT
V
T
T
U
T
Visual venue
topics
Visual user
topics
Text venue
topics
Text user
topics
User-venue
matrix
VT
U
V
T
UV
Data Analysis
.,. ACM Multimedia Grand Challenge 2014 1st Prize
.,. newyorkermelange.com
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
Users U UV User-venue
matrix
Interactive Recommendation
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
Users U UV User-venue
matrix
Grid
Interactive Recommendation
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
Users U UV User-venue
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
Interactive Recommendation
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
Users U UV User-venue
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
User ranking
Interactive Recommendation
9. 12-7-2016
9
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
Users U UV User-venue
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
User ranking
V− ,T −
T T
Negatives
Rand.
sample
Interactive Recommendation
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
Users U UV User-venue
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
User ranking
V− ,T −
T T
Negatives
Rand.
sample
Linear
USSVM User
ranking
Suggested
users
Interactive Recommendation
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
Users U UV User-venue
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
User ranking
V− ,T −
T T
Negatives
Rand.
sample
Linear
USSVM User
ranking
Suggested
users
Venue ranking
Interactive Recommendation
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
Users U UV
User-venue
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
User ranking
V− ,T −
T T
Negatives
Rand.
sample
Linear
USSVM User
ranking
Suggested
users
Venue ranking
Venue
ranking
VS
Suggested
venues
Interactive Recommendation
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
Users U UV
User-venue
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
User ranking
V− ,T −
T T
Negatives
Rand.
sample
SVM User
ranking
Linear
US
Suggested
users
Venue ranking
Venue
ranking
VS
Suggested
venues
(US,VS)
Suggestions
Interactive Recommendation
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
Users U UV
User-venue
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
User ranking
V− ,T −
T T
Negatives
Rand.
sample
SVM User
ranking
Linear
US
Suggested
users
Venue ranking
Venue
ranking
VS
Suggested
venues
(US,VS)
Suggestions
Map
Interactive Recommendation
10. 12-7-2016
10
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
Users U UV
User-venue
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
User ranking
V− ,T −
T T
Negatives
Rand.
sample
Linear
USSVM User
ranking
Suggested
users
Venue ranking
Venue
ranking
VS
Suggested
venues
(US,VS)
Suggestions
Map
Relevance
indication
Interactive Recommendation Recommender system for tourists
56
1. Can we recommend the right type of venue?
2. Can we recommend mainstream venues to mainstream
tourists and specialized venues to afficionados?
Evaluation
.,. 621 fine-grained venue types (Japanese restaurant,
skate park. . . )
.,. 100 artificial actors, use 75% of the data to seed Melange
.,. Perform 10 interaction rounds
Evaluation
• .,. City Melange
• .., Visual modality only
• .., Text modality only
• .., Multimedia (vis + txt)
• .,. Recommender baselines
• .., WRMF — Weighted regularized matrix factorization
• .., BPRMF — Bayesian personalized ranking matrix
factorization
• .,. Popularity ranking (PopRank) — most visited
venues according to Foursquare
Methods Compared
.,. New York — 1.07M images and associated text from
Foursquare, Flickr, and Picasa
.,. Amsterdam — 56K images and associated text from
Foursquare and Flickr
Data Collection
12. 12-7-2016
12
SceneMash
• Data collection
150,000 geotagged Flickr and Foursquare
images
from the region of Amsterdam
Metadata associated
with the images:
- image title
- description
- tags
- geotags
SceneMash
SceneMash SceneMash
Demo
CITY SENTIMENT
Data Collection
64K GeoTagged Tweets with Images
Various neighborhood statistics
(17 variables)
64K GeoTagged Images and
comments
Amsterdam Neighborhoods
13. 12-7-2016
13
Methodology Sentiment Maps
Sentimentanalysis
Sentiment Maps
Sentimentanalysis
Finding correlations
textual and
visual content
textual and
visual content
various statistics
Sentimentanalysis
Correlation Analysis
Correlations
Flickr Twitter
Correlations are only found with multimodal sentiment
Redefined Neighborhoods
People with similar social media interests
14. 12-7-2016
14
MARKETING ANALYTICS WHAT WE HAVE
“The purpose of computing is
insight, not numbers.” Richard
Hamming 1962
So what we want?
Insight
What is insight?
Insight
Complex
Insight is complex, involving all or
large amounts of the given data in
a synergistic way, not simply
individual data values.
Deep
Insight builds up over time,
accumulating and building on itself
to create depth often generating
further questions and, hence,
further insight.
Qualitative
Insight is not exact, can be
uncertain and subjective, and
can have multiple levels of
resolution.
Unexpected
Insight is often unpredictable,
serendipitous, and creative.
Relevant
Insight is deeply embedded in the data
domain, connecting the data to existing
domain knowledge and giving it relevant
meaning going beyond dry data analysis,
to relevant domain impact.
North CG&A, 2006
“Computers are incredibly fast, accurate, and
stupid. Humans are incredibly slow,
inaccurate and brilliant.
The marriage of the two is beyond
imagination” Leo Cherne 1968
15. 12-7-2016
15
Visual Analytics
• Combine the power of computer and human
• Compute power
• Storage capacity
• Flexibility
• Creativity
• Expert knowledge
Definition
Multimedia Analytics
=
Multimedia Analysis
+
Visual Analytics
Ref:Chinchor2010
Multimedia Analytics
INSIGHT
Analytics
• What is the best known Analytic tool?
Yes the Spreadsheet
Analytics
Fischer et.al, TVCG 2010.
MediaTable
Columns denote concept scores can be used for sorting
Colors denote
categories and
buckets are used
to collect elements
of (sub-) category
Heatmap like visualization
Grey values denote
values between 0 and 1
Allows to see correlations
Filters/sort order can be specified
Refs: deRooij2010b, deRooij2013
16. 12-7-2016
16
Multimedia Pivot Tables
ROWVARIABLE:Decompose
FILTER VARIABLES: Define active data set
Concepts Tags Nominals
COLUMN AGGREGATION
Integers
COLUMN VARIABLES: Sort and Weight
VALUE
VALUE
VALUE
VALUE
ROWAGGREGATION
Visualizations
Type Filter Column Row Value Visualization
Images Selection to
bucket
x Individual
images
Sorted list of images
Nominal Label
selection
x Individual
labels
Sorted and weighted
text histogram
Buckets Bucket
selection
x Individual
buckets
Weighted histogram
Geo Selection to
bucket
x x Map with weighted
elements
Numeric Range
selection
Weights 7-point
summary
Sum, max, avg,
weighted distribution
Concepts Range
selection
Weights 7-point
summary
Weighted distribution
Tags Tag
selection
Weights Individual tags Sorted and weighted
tag histogram
Statistics driven decomposition Column aggregation
Row aggregation
Top-N ConceptsRow specific concepts
Concept based sorting Relevance based sorting
18. 12-7-2016
18
Employing user interaction
User
Pool-Query
Set
Labeled
Resultant
set
Learning
Algorithm
Interactive
Learning
Strategy
Active Learning
Chen in 2005 was the first to explore this for Video Retrieval
Relevance feedback
Ref: Huang2008
Relevance feedback
Try to find boundary
in feature space best
separating positive
from negative
examples
F
F1
F2
Measure of class membership probability
Relevance feedback
In the next
iteration I will
have more samples
hence a better
estimate
of the boundary
F
F1
F2
This process is
usually known as
relevance feedback
Active Learning
In active learning
the system decides
which elements to
show for feedback
and which not.
F
F1
F2
For the system it is
relevant to know this label
The system can safely
assume this sample is
also negative
Automatic AND interactive
SVM based relevance feedback
Interactive categorization
Three interactive strategies
• Fully interactive
– User is interactively performing the sort/select/categorize
process
• Manual relevance feedback
– In addition to the above the user can perform relevance
feedback on any of the categories
• Unobtrusive relevance feedback
– In addition to the above the system automatically indicates
new potentially relevant elements
19. 12-7-2016
19
Fully interactive On demand suggestions
After categorizing some
elements
Learn and apply model
for user selected bucket
Uncategorized images
Category suggestions
Unobtrusive assistance
Continously observe
what happens
Learn and apply model
for system selected bucket
Uncategorized images
Category suggestions
Results: elements found
• significant at the p=0.01 level compared to baseline
o significant at the p=0.01 level compared to manual
Task 1: specific, high visual similarity
Task 2: generic visually diverse, concept available
Task 3: generic visually diverse, concept available
Task 4: generic visually diverse, no concept available
SCALABILITY
20. 12-7-2016
20
[Zahálka and Worring, VAST 2014] B.P. Jonsson et.al. MMM 2016
WRAP-UP
Objective and Subjective data
Image data
Numeric data
Geographic data
Structured data
Unstructured data
Temporal data
Textual data
Open dataOpen Data
The applications The Algorithms
And its variations