3. The CUbRIK project
36 month large-scale
integrating project
partially funded by the
European Commission’s
7th Framework ICT
Programme for
Research and
Technological
Development
www.cubrikproject.eu
5/17/2012 SMILA Themenkonferenz 2
4. Objectives
The technical goal of CUbRIK is to build an open
search platform grounded on four objectives:
Advance the architecture of multimedia search
Place humans in the loop
Open the search box
Start up a search business ecosystem
5/17/2012 SMILA Themenkonferenz 3
5. Objective: Advance the
architecture of multimedia search
Multimedia search: coordinated result of three
main processes:
Content processing: acquisition, analysis, indexing
and knowledge extraction from multimedia content
Query processing: derivation of an information need
from a user and production of a sensible response
Feedback processing: quality feedback on the
appropriateness of search results
5/17/2012 SMILA Themenkonferenz 4
6. Objective: Advance the
architecture of multimedia search
Objective:
Content processing, query processing and feedback
processing phases will be implemented by means of
independent components
Components are organized in pipelines
Each application defines ad-hoc pipelines that provide
unique multimedia search capabilities in that scenario
5/17/2012 SMILA Themenkonferenz 5
8. SMILA is the backbone of CUbRIK
CUbRIK makes use of SMILA framework as a start-up service
engine for supporting workflow definition and execution
Provides architectural extensions to SMILA for enhanced
services:
Extensible content, query and feedback processing search workflow
Multimodality, Orchestration of human and machine computation tasks in all
search processes
Time and Space Awareness
Support for social and human computation
Persistency and Caching of content and metadata
Support of federated configurations across a distributed architecture
Different styles of User Interface for queries and presentation of search
results
Includes tools and methods for application design
6 March 2012 The CUbRIK Project is .... 7
9. Objective: Humans in the loop
Problem: the uncertainty of analysis algorithms leads to
low confidence results and conflicting opinions on
automatically extracted features
Solution: humans have superior capacity for
understanding the content of audiovisual material
State of the art: humans replace automatic feature extraction
processes (human annotations)
Our contribution: integration of human judgment and algorithms
Goal: improve the performance of multimedia content processing
5/17/2012 SMILA Themenkonferenz 88
10. Example of CUbRIK Human-enhanced
computation: Trademark Logo Detection
Problem statement: identifying occurrences of
trademark logos in a video collection through
keyword-based queries
Special case of the classic problem of object recognition
Use case: a professional user wants to retrieve all
the occurrences of logos in a large collection of video
clips
Applications: rating effectiveness of advertising,
subliminal advertising detection, automatic
annotation, trademark violation detection
99
11. Human-powered trademark logo
detection demo
Goal: integrate human and automatic
computation to increase precision and recall
w.r.t. fully automatic solutions
5/17/2012 SMILA Themenkonferenz 10
12. Trademark Logo Detection: problems in
automatic logo detection
Problems in automatic logo detection:
Object recognition is affected by the quality of the
input set of images
Uncertain matches, i.e., the ones with low matching
score, could not contain the searched logo
11
13. Trademark Logo Detection:
contribution of human computation
Contribution in human computation
Filter the input logos, eliminating the irrelevant ones
Segment the input logos
Validate the matching results
12
16. CrowdSearch framework in the
Logo detection application
Problem solving
process
Process
Task Crowd
Task
Types of tasks
• Automatic tasks
• Crowd tasks: tasks that are executed by an
open-ended community of performers
Crowd Task
1515
17. Community of Performers
Content edges,
e.g., IS-A, part.of Content elements
The application is deployed as a
Facebook application
Seed community
Information Technology
Performer to content department of Politecnico di
edges, e.g., topical
group membership
Milano
Performers
edges, e.g.,
friendship,
weak ties
Task propagation
Performers Each user in the seed
community can propagate
tasks through the social
networks
16
16
18. Design of “Validate Logo Images”
The “LIKE” task variant requires to choose
relevant logos among a set of not filtered images
Human Task
Design
The “ADD”task variant requires to add new
relevant image URLs
Please add new relevant logos
URL…
Send
17
19. People to task matching & Task
Assignment
Task Deployment Criteria Execution criteria
Constraints of task execution
Content Affinity Criteria
Time budget for the experiment
Execution Criteria
Content Affinity criteria
Query on a representation of the users’ capacities
• Current state: manual selection of users
People to • Future work: Geocultural affinity
task matching
Questions are dispatched to the crowd according to the
user experience in answering questions
• Expert user: an user that has already answered to
three questions
Task New users answer to “LIKE” questions
assignment
Expert users answer to “LIKE”+“ADD” questions
18
18
20. Task propagation
Propagation over the Facebook graph:
Platform: CrowdSearcher
Automatic task generation starting from a set of design
criteria (e.g., question type, public/private…)
Seed community: Information Technology
department of Politecnico di Milano
Each user in the seed community can propagate tasks
through the social networks
Work in progress:
Twitter/LinkedIn tasks
Task assignment according to expertise, geocultural
information, past work history
5/17/2012 CUbRIK Pipelines 1 19
22. Output aggregation
“LIKE” task variants
Top-5 rated logos are
selected as relevant logos
Task “ADD” task variants
execution New images are fed back to
the LIKE tasks
Task outputs
Task output
Output
aggregation
21
21
23. Experimental evaluation
Three experimental settings:
No human intervention
Logo validation performed by two domain experts
Inclusion of the actual crowd knowledge
Crowd involvement
40 people involved
50 task instances generated
70 collected answers
22
25. Experimental evaluation
1
0.9
0.8 Precision decreases
Crowd
0.7
Experts
0.6 Reasons for the wrong inclusion
Experts
Recall
Experts • Geographical location of the users
0.5 Aleve
• Expertise of the involved users
0.4 Crowd Chunky
0.3
No Crowd Shout
0.2 Crowd No Crowd
0.1
0 No Crowd
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
24
26. Experimental evaluation
1
Precision decreases
• Similarity between two
0.9
logos in the data set
0.8
Crowd
0.7
Experts
0.6
Experts
Recall
Experts
0.5 Aleve
0.4 Crowd Chunky
0.3
No Crowd Shout
0.2 Crowd No Crowd
0.1
0 No Crowd
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
25
27. Crowdsourced filtering of logos –
Problem concept
Google
Images
Filtered logos
Filter
Tasks
Added logos
Add
Tasks
5/17/2012 CUbRIK Pipelines 1 26
28. Integration in SMILA
The demo has been integrated into the SMILA
architecture
Two main parts:
Indexing part: made of asynchronous components
(in a SMILA sense)
Indexing of videos
Matching phase
Interaction with the crowd
Search part: end users query the system by
keyword-based queries
5/17/2012 CUbRIK Pipelines 1 27
43. CUbRIK Showcases
CUbRIK will showcase its technology with Demonstrators
of examples of innovation in two domains:
(Digital Libraries) History of Europe
(Business Processes) CUbRIK search for SMEs,
Technical evaluation in real-world conditions including
users will be based on these Demonstrators
6 March 2012 The CUbRIK Project is .... 42
44. Thanks for your attention
www.cubrikproject.eu
5/17/2012 SMILA Themenkonferenz 43