Analyzing large multimedia collections in an urban context - Prof. Marcel Worring

12-7-2016
1
Amsterdam Data Science
Marcel Worring
Marcel Worring
Analyzing large multimedia
collections in an urban context
Marcel Worring
Stevan Rudinac, Jan Zahalka, Dennis Koelma
Joost Boonzajer Flaes, Jorrit van den Berg
Informatics Institute, Amsterdam Data Science
MSc. VU computer Science
PhD: UvA Informatics Institute
Now: 0.8fte Informatics Institute
0.2fte Amsterdam Business School
Associate Director Amsterdam Data Science
Amsterdam Data Science
Objective and Subjective data
Image data
Numeric data
Geographic data
Structured data
Unstructured data
Temporal data
Textual data
Open dataOpen Data
Geo location
.,. Amsterdam, Netherlands
Exif
.,. Camera: Nikon N60
.,. Focal length: 55 mm
.,. Exposure time: 1/200
.,. Flash: off
Author
.,. josemanuelerre (Flickr)
.,. Jose´ Manuel R´ıos
Valiente
Tags
.,. cyclist
.,. bike
.,. street
Comments
.,. “I love Amsterdam!
great photo!”
.,. “Great compostion,
beautiful B&W!!”
.,. “Estupendo B&N, bella
imagen.”
. . .
Data Sources

12-7-2016
2
.,. “Koningsdag, or ‘King’s Day,’ is one of the principal
holidays of the Netherlands. . . ”
.,. In this case, the image says more than the text
Photo: quantz @ Flickr
Data Sources Objective and Subjective data
Open dataOpen Data
+ Content Analysis
WHAT DOES IS BRING?
Professional Recommender Systems
Recommender system for tourists
11
Touristic Routing

12-7-2016
3
City Sentiment City Marketing Analytics
ALGORITHMS
Ranking of data
Some query defines starting point and order Result
Best
Worse
An image/video/text collection
For Social Media
• The Ranking can be based on
– The objective content of the comments
– The subjective content of the comments
– The objective visual content
– The subjective visual content
– ………
• Or any combination of the above
Concept detection
Learn model
Visual examples
Positive negative
Unknown images Score of presence
-> ranking

12-7-2016
4
Zebu
Requires annotation
to learn
Animals
PeopleLions Lemurs
What do we learn?
14,197,122 images, 21841 synsets indexed
1200 trained visual concept detectors for adjective-noun pairs
The new trend: Deep learning
Krishevsky NIPS 2012
Start with raw pixels, learn all parameters
The learned filters
Zeiler and Fergus
The layered network
Krishevsky NIPS 2012
Convolution + pooling + fully connected layers +
output layers
60.000.000 parameters to learn
But what do all these layers do?

12-7-2016
5
Visualizing deep networks
Zeiler and Fergus
Visualizing deep networks
Visualizing deep networks Visualizing deep networks
State-of-the-art: GoogleNet
and growing ……
Makes image search keyword driven
Text Analysis
D. Blei, 2003
Latent Dirichlet Allocation

12-7-2016
6
D. Blei, 2003
.,. Generative model, discovers topics and scores them
.,. 100 topics are enough to sufficiently cover entire
Wikipedia
.,. Input: Raw text
.,. Output: Topic scores per document
0.054*mexico + 0.049*forest + 0.024*argentina
+ 0.022*islands + ...+ 0.014*aires
We treat comments or sets tags as documents
VENUE RECOMMENDER
.,. Venue recommendation — suggesting places of interest
(venues) based on user preferences
.,. The classic approach is collaborative filtering utilizing the
user-item matrix
The task
.,. City Melange — a venue explorer utilizing multimedia
analytics techniques
.,. Content-based — based solely on the content of
venue-related social media
.,. Multimodal — combining content from images and the
associated text
.,. Interactive — user preferences are modelled on the fly
as you explore the city
.,. Cross Platform — integrates data from diverse social
platforms
City Melange Characteristics
Venue information
Venue images
Images, metadata
User data
Q(venue name,geo)
Data Gathering

12-7-2016
7
Content
V
Images
T
Tags
Comments
. . . VC
Venues Users
U
Data Analysis
Content
V
Images
T
Tags
Comments
. . . VC
Venues Users
U
Features
VF
ConvNet
TF
LDA
Data Analysis
Content
V
Images
T
Tags
Comments
. . . VC
Venues Users
U
Features
VF
ConvNet
TF
LDA
Clustering
Processed data
VT
V Visual venue
topics
Data Analysis
Content
V
Images
T
Tags
Comments
. . . VC
Venues Users
U
Features
VF
ConvNet
TF
LDA
Clustering
Processed data
VT
V Visual venue
topics
Visual user
topicsVT
U
Data Analysis
Content
V
Images
T
Tags
Comments
. . . VC
Venues Users
U
Features
VF
ConvNet
TF
LDA
Clustering
Processed data
VT
V
VT
U
Visual venue
topics
Visual user
topics
Text venue
topicsT
V
T
Data Analysis
Content
V
Images
T
Tags
Comments
. . . VC
Venues Users
U
Features
VF
ConvNet
TF
LDA
Clustering
Processed data
VT
V
VT
U
T
V
T
Visual venue
topics
Visual user
topics
Text venue
topics
Text user
topicsT
U
T
Data Analysis

12-7-2016
8
Content
V
Images
T
Tags
Comments
. . . VC
Venues Users
U
Features
VF
ConvNet
TF
LDA
Clustering
Processed data
VT
V
T
T
U
T
Visual venue
topics
Visual user
topics
Text venue
topics
Text user
topics
User-venue
matrix
VT
U
V
T
UV
Data Analysis
.,. ACM Multimedia Grand Challenge 2014 1st Prize
.,. newyorkermelange.com
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
Users U UV User-venue
matrix
Interactive Recommendation
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
matrix
Grid
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
User ranking

12-7-2016
9
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
User ranking
V− ,T −
T T
Negatives
Rand.
sample
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
User ranking
V− ,T −
T T
Negatives
Rand.
sample
Linear
USSVM User
ranking
Suggested
users
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
User ranking
V− ,T −
T T
Negatives
Rand.
sample
Linear
USSVM User
ranking
Suggested
users
Venue ranking
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
Users U UV
User-venue
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
User ranking
V− ,T −
T T
Negatives
Rand.
sample
Linear
USSVM User
ranking
Suggested
users
Venue ranking
Venue
ranking
VS
Suggested
venues
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
Users U UV
User-venue
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
User ranking
V− ,T −
T T
Negatives
Rand.
sample
SVM User
ranking
Linear
US
Suggested
users
Venue ranking
Venue
ranking
VS
Suggested
venues
(US,VS)
Suggestions
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
Users U UV
User-venue
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
User ranking
V− ,T −
T T
Negatives
Rand.
sample
SVM User
ranking
Linear
US
Suggested
users
Venue ranking
Venue
ranking
VS
Suggested
venues
(US,VS)
Suggestions
Map

12-7-2016
10
VT ,TT
V V
Venue topics
VT ,TT
U U
User topics
Users U UV
User-venue
matrix
Grid
Rel.
venues
VT ,TT
+ +
Positives
User ranking
V− ,T −
T T
Negatives
Rand.
sample
Linear
USSVM User
ranking
Suggested
users
Venue ranking
Venue
ranking
VS
Suggested
venues
(US,VS)
Suggestions
Map
Relevance
indication
Interactive Recommendation Recommender system for tourists
56
1. Can we recommend the right type of venue?
2. Can we recommend mainstream venues to mainstream
tourists and specialized venues to afficionados?
Evaluation
.,. 621 fine-grained venue types (Japanese restaurant,
skate park. . . )
.,. 100 artificial actors, use 75% of the data to seed Melange
.,. Perform 10 interaction rounds
Evaluation
• .,. City Melange
• .., Visual modality only
• .., Text modality only
• .., Multimedia (vis + txt)
• .,. Recommender baselines
• .., WRMF — Weighted regularized matrix factorization
• .., BPRMF — Bayesian personalized ranking matrix
factorization
• .,. Popularity ranking (PopRank) — most visited
venues according to Foursquare
Methods Compared
.,. New York — 1.07M images and associated text from
Foursquare, Flickr, and Picasa
.,. Amsterdam — 56K images and associated text from
Foursquare and Flickr
Data Collection

12-7-2016
11
1 2
poprank
melange_vis
3 4 5 6
Interaction round
7 8 9 10
0.0
0.6
0.5
0.4
0.3
0.2
0.1
Venuetype
precision
bprmf
melange_txt
wrmf
melange_mm
New York
1 2
poprank
melange_vis
3 4 5 6
Interaction round
7 8 9 10
0.0
0.2
0.4
0.6
0.8
1.0
Venuetype
recall
bprmf
melange_txt
wrmf
melange_mm
New York
1 2
poprank
melange_vis
3 4 5 6
Interaction round
7 8 9 10
0.0
0.6
0.5
0.4
0.3
0.2
0.1
Venuetype
precision
bprmf
melange_txt
wrmf
melange_mm
Amsterdam
1 2
poprank
melange_vis
3 4 5 6
Interaction round
7 8 9 10
0.0
0.2
0.4
0.6
0.8
1.0
Venuetype
recall
bprmf
melange_txt
wrmf
melange_mm
Amsterdam
0.0
0.2
0.4
0.6
0.8
1.0
Trueuser-venuedistribution
density
0.2
0.1
0.0
0.1 melange
0.2
0.3
Density
difference
mm wrmf bprmf poprank
Distribution of recommendations
TOURIST ROUTING

12-7-2016
12
SceneMash
• Data collection
 150,000 geotagged Flickr and Foursquare
images
from the region of Amsterdam
 Metadata associated
with the images:
- image title
- description
- tags
- geotags
SceneMash
SceneMash SceneMash
Demo
CITY SENTIMENT
Data Collection
64K GeoTagged Tweets with Images
Various neighborhood statistics
(17 variables)
64K GeoTagged Images and
comments
Amsterdam Neighborhoods

12-7-2016
13
Methodology Sentiment Maps
Sentimentanalysis
Sentiment Maps
Sentimentanalysis
Finding correlations
textual and
visual content
textual and
visual content
various statistics
Sentimentanalysis
Correlation Analysis
Correlations
Flickr Twitter
Correlations are only found with multimodal sentiment
Redefined Neighborhoods
People with similar social media interests

12-7-2016
14
MARKETING ANALYTICS WHAT WE HAVE
“The purpose of computing is
insight, not numbers.” Richard
Hamming 1962
So what we want?
Insight
What is insight?
Insight
Complex
Insight is complex, involving all or
large amounts of the given data in
a synergistic way, not simply
individual data values.
Deep
Insight builds up over time,
accumulating and building on itself
to create depth often generating
further questions and, hence,
further insight.
Qualitative
Insight is not exact, can be
uncertain and subjective, and
can have multiple levels of
resolution.
Unexpected
Insight is often unpredictable,
serendipitous, and creative.
Relevant
Insight is deeply embedded in the data
domain, connecting the data to existing
domain knowledge and giving it relevant
meaning going beyond dry data analysis,
to relevant domain impact.
North CG&A, 2006
“Computers are incredibly fast, accurate, and
stupid. Humans are incredibly slow,
inaccurate and brilliant.
The marriage of the two is beyond
imagination” Leo Cherne 1968

12-7-2016
15
Visual Analytics
• Combine the power of computer and human
• Compute power
• Storage capacity
• Flexibility
• Creativity
• Expert knowledge
Definition
Multimedia Analytics
=
Multimedia Analysis
+
Visual Analytics
Ref:Chinchor2010
Multimedia Analytics
INSIGHT
Analytics
• What is the best known Analytic tool?
Yes the Spreadsheet
Analytics
Fischer et.al, TVCG 2010.
MediaTable
Columns denote concept scores can be used for sorting
Colors denote
categories and
buckets are used
to collect elements
of (sub-) category
Heatmap like visualization
Grey values denote
values between 0 and 1
Allows to see correlations
Filters/sort order can be specified
Refs: deRooij2010b, deRooij2013

12-7-2016
16
Multimedia Pivot Tables
ROWVARIABLE:Decompose
FILTER VARIABLES: Define active data set
Concepts Tags Nominals
COLUMN AGGREGATION
Integers
COLUMN VARIABLES: Sort and Weight
VALUE
VALUE
VALUE
VALUE
ROWAGGREGATION
Visualizations
Type Filter Column Row Value Visualization
Images Selection to
bucket
x Individual
images
Sorted list of images
Nominal Label
selection
x Individual
labels
Sorted and weighted
text histogram
Buckets Bucket
selection
x Individual
buckets
Weighted histogram
Geo Selection to
bucket
x x Map with weighted
elements
Numeric Range
selection
Weights 7-point
summary
Sum, max, avg,
weighted distribution
Concepts Range
selection
Weights 7-point
summary
Weighted distribution
Tags Tag
selection
Weights Individual tags Sorted and weighted
tag histogram
Statistics driven decomposition Column aggregation
Row aggregation
Top-N ConceptsRow specific concepts
Concept based sorting Relevance based sorting

12-7-2016
17
BM-25 BASED RANKING
Demo
https://staff.fnwi.uva.nl/m.worring/pivot-tables.html
Learning from interaction
Employing user interaction
pos
neg
Selection of pos/neg examples
Some elements in the collection are labeled
Many are not

12-7-2016
18
Employing user interaction
User
Pool-Query
Set
Labeled
Resultant
set
Learning
Algorithm
Interactive
Learning
Strategy
Active Learning
Chen in 2005 was the first to explore this for Video Retrieval
Relevance feedback
Ref: Huang2008
Relevance feedback
Try to find boundary
in feature space best
separating positive
from negative
examples
F
F1
F2
Measure of class membership probability
Relevance feedback
In the next
iteration I will
have more samples
hence a better
estimate
of the boundary
F
F1
F2
This process is
usually known as
relevance feedback
Active Learning
In active learning
the system decides
which elements to
show for feedback
and which not.
F
F1
F2
For the system it is
relevant to know this label
The system can safely
assume this sample is
also negative
Automatic AND interactive
SVM based relevance feedback
Interactive categorization
Three interactive strategies
• Fully interactive
– User is interactively performing the sort/select/categorize
process
• Manual relevance feedback
– In addition to the above the user can perform relevance
feedback on any of the categories
• Unobtrusive relevance feedback
– In addition to the above the system automatically indicates
new potentially relevant elements

12-7-2016
19
Fully interactive On demand suggestions
After categorizing some
elements
Learn and apply model
for user selected bucket
Uncategorized images
Category suggestions
Unobtrusive assistance
Continously observe
what happens
Learn and apply model
for system selected bucket
Uncategorized images
Category suggestions
Results: elements found
• significant at the p=0.01 level compared to baseline
o significant at the p=0.01 level compared to manual
Task 1: specific, high visual similarity
Task 2: generic visually diverse, concept available
Task 3: generic visually diverse, concept available
Task 4: generic visually diverse, no concept available
SCALABILITY

12-7-2016
20
[Zahálka and Worring, VAST 2014] B.P. Jonsson et.al. MMM 2016
WRAP-UP
Objective and Subjective data
Image data
Numeric data
Geographic data
Structured data
Unstructured data
Temporal data
Textual data
Open dataOpen Data
The applications The Algorithms
And its variations

12-7-2016
21
www.amsterdamdatascience.nl
m.worring@uva.nl

Analyzing large multimedia collections in an urban context - Prof. Marcel Worring

Recommended

Recommended

More Related Content

Similar to Analyzing large multimedia collections in an urban context - Prof. Marcel Worring

Similar to Analyzing large multimedia collections in an urban context - Prof. Marcel Worring (20)

More from Facultad de Informática UCM

More from Facultad de Informática UCM (20)

Recently uploaded

Recently uploaded (20)

Analyzing large multimedia collections in an urban context - Prof. Marcel Worring