SlideShare a Scribd company logo
1 of 61
Download to read offline
Similarity-based retrieval of
multimedia content
Dr. Symeon Papadopoulos
Senior Researcher, CERTH-ITI
Monday Jan 28, 2019 @ Media AUTh
Our lab
Multimedia Knowledge and
Social Media Analytics Laboratory
• Part of Information Technologies Institute (ITI) -
Centre for Research and Technology Hellas (CERTH)
• 60+ researchers (20+ post-docs)
• key areas: multimedia, social media, computer vision,
data mining, machine learning
• applications: media, security, culture, environment
• involved in 60+ projects and published 600+ papers
https://mklab.iti.gr/
Related projects
2018-20212016-2018
https://www.invid-project.eu/ https://weverify.eu/
https://www.smartinsights.com/
internet-marketing-
statistics/happens-online-60-
seconds/
500 hours of video per min =
720,000 hours per day >
82 years of video per day!
Pope Francis
Pope Benedict
2007: iPhone release
2008: Android release
2010: iPad release
http://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/
Detecting disinformation
Claim:
Hurricane Irma, Sep 2017
Fact:
Hurricane Dolores, May 2016
A shark thriving in hurricanes
https://www.snopes.com/photos/animals/puertorico.asp
Memes
Similarity-based media search
Two main problems
•How to compute similarity between two
items (in accordance with my needs)?
•How to search (using above similarity
function) in very large collections in
reasonable time?
visual similarity
an overview of approaches
What is similar?
• Variety of definitions and understandings regarding what
can be considered to be similar
• Near-duplicate videos: definition by Wu et al. (2007)
• photometric variations: gamma, contrast, brightness, etc.
• editing operations: resize, shift, crop, flip
• insertion of patterns: caption, logo, subtitles, sliding captions, etc.
• re-encoding: video format, compression
• video modifications: frame rate, frame insertion, deletion, swap
X. Wu, A. G. Hauptmann, and C. W. Ngo. Practical elimination of near-duplicates from web video search. In
Proceedings of the 15th ACM international conference on Multimedia, pp. 218-227, 2007
Hashing
• Cryptographic or checksum hashing: MD5, SHA1
• Input: bitstream (not just images or videos)
• Output: hash code 128-bit (MD5), 160-bit (SHA1), etc.
• Property: minor changes in input can lead to completely
different hash codes
https://jenssegers.com/61/perceptual-image-hashes
Example
EA6BF04059B4CB0D
889296F1788B321B
8435D4A072804237
308F9566508C963C
http://onlinemd5.com/
Perceptual hashing
• Generate a fingerprint that can be used to compare
images using the Hamming Distance
• Instance: Average Hashing (aHash)
• Reduce size  8x8 pixels
• Reduce colour  RGB to grayscale
• Calculate average colour  among 64 grayscale values
• Compute hash  for each pixel, binary value depending
on whether it is higher or lower than average
 64-bit signature
aHash: example
11001001011010010011110000011000
00001000000000000000011100111111
https://jenssegers.com/61/perceptual-image-hashes
dHash and pHash
• dHash: Difference Hash
• same steps as aHash
• hash is generated based on whether left pixel is brighter
than the right one
• less false positives compared to aHash
• pHash: Perceptual Hash
• more complicated algorithm
• resize to 32x32
• DCT on luma (brightness) component
• top left 8x8  hash by comparing to median value
pHash examples
Hamming distance = 0
Hamming distance = 24
Hamming distance = 29
Hamming distance = 27
https://www.phash.org/demo/ (select DCT hash)
Pixel-based similarity doesn’t
match perception
All three variations of the first image are equidistant
from it in terms of L2 pixel distance!
http://cs231n.github.io/classification/
Global descriptors
• A single vector that attempts to capture the main
visual properties of an image, e.g. distribution of
colour, spatial layout of brightness, textures, etc.
• Popular choices include:
• GIST – spatial envelope (Oliva & Torralba, 2001)
• Color: Dominant Color, Scalable Color, Color Structure,
Color Layout Descriptor (MPEG-7, 2001)
• Texture: Texture Browsing, Homogeneous Texture, Edge
Histogram (MPEG-7, 2001)
A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation
of the spatial envelope. IJCV, 42(3):145–175, 2001
Text of ISO/IEC 15 938-3 Multimedia Content Description Interface—Part 3: Visual.
Final Committee Draft, ISO/IEC/JTC1/SC29/ WG11, Doc. N4062, Mar. 2001
GIST-based near-duplicate search
Douze, M., Jégou, H., Sandhawalia, H., Amsaleg, L., & Schmid, C. (2009, July). Evaluation of gist descriptors for web-
scale image search. In Proceedings of the ACM International Conference on Image and Video Retrieval (p. 19). ACM.
Local descriptors
• Basic scheme:
• Detect a set of features (i.e. interest points) in an image
• Extract one descriptor around each feature
• Plenty of options for both parts, e.g.:
• Feature detectors: Canny, Sobel, Harris, FAST, Laplacian
of Gaussian (LoG), Difference of Gaussians (DoG),
Determinant of Hessian (DoH), MSER
• Feature descriptors: SIFT, GLOH, SURF, ORB
• Much higher accuracy at the cost of increased
complexity
Scale-Invariant Feature Transforms (SIFT)
Set of descriptors
A single descriptor
(16 histograms of 8 bins 
128 dims)
http://faculty.ucmerced.edu/mhyang/project/iccv13_exemplar/ICCV13_exemplarCut/vlfeat-0.9.14/doc/overview/sift.html
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer
vision, 60(2), 91-110.
Example: SIFT matching
https://www.cc.gatech.edu/~hays/compvision/proj2/
Bag of Visual Words (BoVW)
https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
Bag of Visual Words (BoVW)
https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
extract a set of local features from each image
Bag of Visual Words (BoVW)
• a representative
sample of features
selected
• features are clustered
• cluster centroids (or
medoids) are
considered to be the
visual codebook
https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
Bag of Visual Words (BoVW)
https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
Indexing and Querying
• tf-idf weighting of visual words
𝑤𝑡𝑑 = 𝑛 𝑡𝑑 ∙ log 𝐷 𝑏 /𝑛 𝑡
• Inverted file indexing structure for fast search
• Retrieve candidates with at least one common
visual word
• Rank candidates, e.g. based on cosine similarity
of their tf-idf representations
𝑠𝑖𝑚 𝑞, 𝑝 =
𝒘 𝒒 ∙ 𝒘 𝒑
𝒘 𝒒 𝒘 𝒑
BoVW Discussion
• BoVW is a sparse representation: each image is
associated with few visual words (compared to the
whole vocabulary)
• Convenient for indexing and look-up
• Completely misses spatial layout  extensions
• Performance depends on:
• size of vocabulary
• dataset where vocabulary was learned
Neural network features
https://www.pnas.org/content/116/4/1074 (artist Lucy Reading-Ikkanda)
Popular CNN architectures
VGGNet (2014)
GoogleNet (2014)
https://cs.stanford.edu/people
/karpathy/cnnembed/
video search
towards building a reverse video
search engine
From Image to Video Similarity
• A video can be considered as a richer
representation compared to images:
• set of images (frames)
• frames and motion
• frames and motion and audio
• For efficiency purposes, we typically simplify or
discard part of the information:
• frames  descriptors  average
• frames  visual words  bag of frame-words
Video search architecture
Video indexing calls
/index (HTTP GET request)
Add the provided video to the video index
• url: the URL of the video that is going to be indexed
• async: flag for asynchronous processing
/youtube (HTTP GET request)
Query YouTube API with either a video ID or a provided text query
and add the retrieved videos to the video index
• video_id: video ID to query YouTube API
• text: provided text to query YouTube API
• max: maximum number of videos to be add to the video index
/delete (HTTP DELETE request)
Delete the provided video from the video index
• url: the URL of the video that is going to be deleted
Video search calls
/search (HTTP GET request)
Video-level search: retrieve relevant video by calculating the
similarity between the entire videos
• url: URL of the query video
• t_sim: similarity threshold
• t_rank: rank threshold
/partial (HTTP GET request)
Shot-level search: retrieve relevant video segments from the indexed
videos in the database
• url: URL of the query video
• v_sim: video similarity threshold
• s_sim: shot similarity threshold
Combining CNNs and BoVW
Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, January). Near-duplicate video retrieval by
aggregating intermediate CNN layers. In International Conference on Multimedia Modeling (pp. 251-263). Springer
An improved setup
Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, January). Near-duplicate video retrieval by
aggregating intermediate CNN layers. In International Conference on Multimedia Modeling (pp. 251-263). Springer
Learning similarity
Before training
After training
Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, October). Near-Duplicate Video Retrieval with
Deep Metric Learning. In 2017 IEEE International Conference on Computer Vision Workshop (ICCVW), (pp. 347-356). IEEE
Support for partial duplicate search
FIVR-200K
a dataset for evaluating NDVR
Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, I. (2018).
FIVR: Fine-grained Incident Video Retrieval. arXiv preprint arXiv:1809.04094
FIVR-200K
• A video dataset to help research on the problem of
Fine-grained Incident Video Retrieval
• Duplicate Scene Videos (DSVs)
• Complementary Scene Videos (CSVs)
• Incident Scene Videos (ISVs)
• 225,960 videos around 4,687 news events from Jan
1st 2013 to Dec 31st 2017
Wikipedia: current events
https://en.wikipedia.org/wiki/Portal:Current_events
Dataset statistics
Number of events
Number of videos
Dataset statistics
Video category
Video duration
Dataset statistics
Dataset statistics
Example videos
Boston Marathon bombing
query near-duplicate
complementary view same incident
Las Vegas shootings
query near-duplicate
complementary view same incident
Our Video Search Tool
http://ndd.iti.gr/video_search/
Ideas
• Pick one video around one event between 2013
and 2017 and try to find similar versions of it
• Pick one of the clusters-events in the Browse
section and try to find some important videos that
cover the event
• Given an event of interest, identify in which sources
it is covered (language, country, type of channel)
• Add videos from a newer event and use them to
perform new searches
Source code
https://github.com/MKLab-ITI/intermediate-cnn-features
https://github.com/MKLab-ITI/ndvr-dml
Papers
• Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y.
(2017, January). Near-duplicate video retrieval by aggregating
intermediate CNN layers. In International Conference on Multimedia
Modeling (pp. 251-263). Springer
• Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y.
(2017, October). Near-Duplicate Video Retrieval with Deep Metric
Learning. In 2017 IEEE International Conference on Computer Vision
Workshop (ICCVW), (pp. 347-356). IEEE
• Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, I.
(2018). FIVR: Fine-grained Incident Video Retrieval. arXiv preprint
arXiv:1809.04094
Acknowledgements
• Giorgos Kordopatis-Zilos / near-duplicate video
retrieval, back-end development, FIVR-200K
collection and annotation
• Lazaros Apostolidis / web front-end development
• Polichronis Charitidis / FIVR-200K annotation
Thank you for your attention!
Akis Papadopoulos papadop@iti.gr
@sympap

More Related Content

What's hot

Computer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and GradientComputer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and GradientAhmed Gad
 
Noise filtering
Noise filteringNoise filtering
Noise filteringAlaa Ahmed
 
Image compression standards
Image compression standardsImage compression standards
Image compression standardskirupasuchi1996
 
Digital Image Processing: Image Restoration
Digital Image Processing: Image RestorationDigital Image Processing: Image Restoration
Digital Image Processing: Image RestorationMostafa G. M. Mostafa
 
Application of edge detection
Application of edge detectionApplication of edge detection
Application of edge detectionNaresh Biloniya
 
Chapter 9 morphological image processing
Chapter 9   morphological image processingChapter 9   morphological image processing
Chapter 9 morphological image processingAhmed Daoud
 
COM2304: Intensity Transformation and Spatial Filtering – III Spatial Filters...
COM2304: Intensity Transformation and Spatial Filtering – III Spatial Filters...COM2304: Intensity Transformation and Spatial Filtering – III Spatial Filters...
COM2304: Intensity Transformation and Spatial Filtering – III Spatial Filters...Hemantha Kulathilake
 
Kohonen self organizing maps
Kohonen self organizing mapsKohonen self organizing maps
Kohonen self organizing mapsraphaelkiminya
 
Image restoration and degradation model
Image restoration and degradation modelImage restoration and degradation model
Image restoration and degradation modelAnupriyaDurai
 
Region Splitting and Merging Technique For Image segmentation.
Region Splitting and Merging Technique For Image segmentation.Region Splitting and Merging Technique For Image segmentation.
Region Splitting and Merging Technique For Image segmentation.SomitSamanto1
 
Multimedia synchronization
Multimedia synchronizationMultimedia synchronization
Multimedia synchronizationI World Tech
 
Image Segmentation (Digital Image Processing)
Image Segmentation (Digital Image Processing)Image Segmentation (Digital Image Processing)
Image Segmentation (Digital Image Processing)VARUN KUMAR
 

What's hot (20)

Computer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and GradientComputer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and Gradient
 
Data Redundacy
Data RedundacyData Redundacy
Data Redundacy
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image Processing
 
Noise filtering
Noise filteringNoise filtering
Noise filtering
 
Image compression standards
Image compression standardsImage compression standards
Image compression standards
 
Digital Image Processing: Image Restoration
Digital Image Processing: Image RestorationDigital Image Processing: Image Restoration
Digital Image Processing: Image Restoration
 
JPEG Image Compression
JPEG Image CompressionJPEG Image Compression
JPEG Image Compression
 
Module 31
Module 31Module 31
Module 31
 
Application of edge detection
Application of edge detectionApplication of edge detection
Application of edge detection
 
Shape Features
 Shape Features  Shape Features
Shape Features
 
Chapter 9 morphological image processing
Chapter 9   morphological image processingChapter 9   morphological image processing
Chapter 9 morphological image processing
 
COM2304: Intensity Transformation and Spatial Filtering – III Spatial Filters...
COM2304: Intensity Transformation and Spatial Filtering – III Spatial Filters...COM2304: Intensity Transformation and Spatial Filtering – III Spatial Filters...
COM2304: Intensity Transformation and Spatial Filtering – III Spatial Filters...
 
Digital image processing
Digital image processing  Digital image processing
Digital image processing
 
Kohonen self organizing maps
Kohonen self organizing mapsKohonen self organizing maps
Kohonen self organizing maps
 
Image compression
Image compression Image compression
Image compression
 
Image restoration and degradation model
Image restoration and degradation modelImage restoration and degradation model
Image restoration and degradation model
 
Region Splitting and Merging Technique For Image segmentation.
Region Splitting and Merging Technique For Image segmentation.Region Splitting and Merging Technique For Image segmentation.
Region Splitting and Merging Technique For Image segmentation.
 
Multimedia synchronization
Multimedia synchronizationMultimedia synchronization
Multimedia synchronization
 
Image Segmentation (Digital Image Processing)
Image Segmentation (Digital Image Processing)Image Segmentation (Digital Image Processing)
Image Segmentation (Digital Image Processing)
 
Image segmentation
Image segmentation Image segmentation
Image segmentation
 

Similar to Similarity-based retrieval of multimedia content

Video Hyperlinking Tutorial (Part B)
Video Hyperlinking Tutorial (Part B)Video Hyperlinking Tutorial (Part B)
Video Hyperlinking Tutorial (Part B)LinkedTV
 
Video Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingVideo Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingAlpen-Adria-Universität
 
Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfVignesh V Menon
 
Presentation of the InVID verification technologies at IPTC 2018
Presentation of the InVID verification technologies at IPTC 2018Presentation of the InVID verification technologies at IPTC 2018
Presentation of the InVID verification technologies at IPTC 2018InVID Project
 
Video Compression Algorithm Based on Frame Difference Approaches
Video Compression Algorithm Based on Frame Difference Approaches Video Compression Algorithm Based on Frame Difference Approaches
Video Compression Algorithm Based on Frame Difference Approaches ijsc
 
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...SWAMI06
 
Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...webhostingguy
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersSymeon Papadopoulos
 
How to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pdfHow to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pdfPubrica
 
A Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional VideoA Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional VideoAlpen-Adria-Universität
 
An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...IRJET Journal
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringTao Xie
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalIAEME Publication
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalIAEME Publication
 
How to prepare a perfect video abstract for your research paper – Pubrica.pptx
How to prepare a perfect video abstract for your research paper – Pubrica.pptxHow to prepare a perfect video abstract for your research paper – Pubrica.pptx
How to prepare a perfect video abstract for your research paper – Pubrica.pptxPubrica
 
Key Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionKey Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionSuhas Pillai
 

Similar to Similarity-based retrieval of multimedia content (20)

A04840107
A04840107A04840107
A04840107
 
Video Hyperlinking Tutorial (Part B)
Video Hyperlinking Tutorial (Part B)Video Hyperlinking Tutorial (Part B)
Video Hyperlinking Tutorial (Part B)
 
Video Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingVideo Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive Streaming
 
Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdf
 
Presentation of the InVID verification technologies at IPTC 2018
Presentation of the InVID verification technologies at IPTC 2018Presentation of the InVID verification technologies at IPTC 2018
Presentation of the InVID verification technologies at IPTC 2018
 
Video Compression Algorithm Based on Frame Difference Approaches
Video Compression Algorithm Based on Frame Difference Approaches Video Compression Algorithm Based on Frame Difference Approaches
Video Compression Algorithm Based on Frame Difference Approaches
 
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
 
Guru_poster
Guru_posterGuru_poster
Guru_poster
 
Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
50120130404055
5012013040405550120130404055
50120130404055
 
How to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pdfHow to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pdf
 
A Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional VideoA Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional Video
 
An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrieval
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrieval
 
How to prepare a perfect video abstract for your research paper – Pubrica.pptx
How to prepare a perfect video abstract for your research paper – Pubrica.pptxHow to prepare a perfect video abstract for your research paper – Pubrica.pptx
How to prepare a perfect video abstract for your research paper – Pubrica.pptx
 
What’s new in MPEG?
What’s new in MPEG?What’s new in MPEG?
What’s new in MPEG?
 
Key Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionKey Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity Recognition
 

More from Symeon Papadopoulos

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...Symeon Papadopoulos
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionSymeon Papadopoulos
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationSymeon Papadopoulos
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Symeon Papadopoulos
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingSymeon Papadopoulos
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualitySymeon Papadopoulos
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentSymeon Papadopoulos
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetSymeon Papadopoulos
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionSymeon Papadopoulos
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterSymeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Symeon Papadopoulos
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceSymeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Symeon Papadopoulos
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsSymeon Papadopoulos
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsSymeon Papadopoulos
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Symeon Papadopoulos
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015Symeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015Symeon Papadopoulos
 

More from Symeon Papadopoulos (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015
 
Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Similarity-based retrieval of multimedia content

  • 1. Similarity-based retrieval of multimedia content Dr. Symeon Papadopoulos Senior Researcher, CERTH-ITI Monday Jan 28, 2019 @ Media AUTh
  • 2. Our lab Multimedia Knowledge and Social Media Analytics Laboratory • Part of Information Technologies Institute (ITI) - Centre for Research and Technology Hellas (CERTH) • 60+ researchers (20+ post-docs) • key areas: multimedia, social media, computer vision, data mining, machine learning • applications: media, security, culture, environment • involved in 60+ projects and published 600+ papers https://mklab.iti.gr/
  • 5. 500 hours of video per min = 720,000 hours per day > 82 years of video per day!
  • 6. Pope Francis Pope Benedict 2007: iPhone release 2008: Android release 2010: iPad release http://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/
  • 7.
  • 8.
  • 9. Detecting disinformation Claim: Hurricane Irma, Sep 2017 Fact: Hurricane Dolores, May 2016
  • 10. A shark thriving in hurricanes https://www.snopes.com/photos/animals/puertorico.asp
  • 11.
  • 12. Memes
  • 13. Similarity-based media search Two main problems •How to compute similarity between two items (in accordance with my needs)? •How to search (using above similarity function) in very large collections in reasonable time?
  • 15. What is similar? • Variety of definitions and understandings regarding what can be considered to be similar • Near-duplicate videos: definition by Wu et al. (2007) • photometric variations: gamma, contrast, brightness, etc. • editing operations: resize, shift, crop, flip • insertion of patterns: caption, logo, subtitles, sliding captions, etc. • re-encoding: video format, compression • video modifications: frame rate, frame insertion, deletion, swap X. Wu, A. G. Hauptmann, and C. W. Ngo. Practical elimination of near-duplicates from web video search. In Proceedings of the 15th ACM international conference on Multimedia, pp. 218-227, 2007
  • 16. Hashing • Cryptographic or checksum hashing: MD5, SHA1 • Input: bitstream (not just images or videos) • Output: hash code 128-bit (MD5), 160-bit (SHA1), etc. • Property: minor changes in input can lead to completely different hash codes https://jenssegers.com/61/perceptual-image-hashes
  • 18. Perceptual hashing • Generate a fingerprint that can be used to compare images using the Hamming Distance • Instance: Average Hashing (aHash) • Reduce size  8x8 pixels • Reduce colour  RGB to grayscale • Calculate average colour  among 64 grayscale values • Compute hash  for each pixel, binary value depending on whether it is higher or lower than average  64-bit signature
  • 20. dHash and pHash • dHash: Difference Hash • same steps as aHash • hash is generated based on whether left pixel is brighter than the right one • less false positives compared to aHash • pHash: Perceptual Hash • more complicated algorithm • resize to 32x32 • DCT on luma (brightness) component • top left 8x8  hash by comparing to median value
  • 21. pHash examples Hamming distance = 0 Hamming distance = 24 Hamming distance = 29 Hamming distance = 27 https://www.phash.org/demo/ (select DCT hash)
  • 22. Pixel-based similarity doesn’t match perception All three variations of the first image are equidistant from it in terms of L2 pixel distance! http://cs231n.github.io/classification/
  • 23. Global descriptors • A single vector that attempts to capture the main visual properties of an image, e.g. distribution of colour, spatial layout of brightness, textures, etc. • Popular choices include: • GIST – spatial envelope (Oliva & Torralba, 2001) • Color: Dominant Color, Scalable Color, Color Structure, Color Layout Descriptor (MPEG-7, 2001) • Texture: Texture Browsing, Homogeneous Texture, Edge Histogram (MPEG-7, 2001) A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, 42(3):145–175, 2001 Text of ISO/IEC 15 938-3 Multimedia Content Description Interface—Part 3: Visual. Final Committee Draft, ISO/IEC/JTC1/SC29/ WG11, Doc. N4062, Mar. 2001
  • 24. GIST-based near-duplicate search Douze, M., Jégou, H., Sandhawalia, H., Amsaleg, L., & Schmid, C. (2009, July). Evaluation of gist descriptors for web- scale image search. In Proceedings of the ACM International Conference on Image and Video Retrieval (p. 19). ACM.
  • 25. Local descriptors • Basic scheme: • Detect a set of features (i.e. interest points) in an image • Extract one descriptor around each feature • Plenty of options for both parts, e.g.: • Feature detectors: Canny, Sobel, Harris, FAST, Laplacian of Gaussian (LoG), Difference of Gaussians (DoG), Determinant of Hessian (DoH), MSER • Feature descriptors: SIFT, GLOH, SURF, ORB • Much higher accuracy at the cost of increased complexity
  • 26. Scale-Invariant Feature Transforms (SIFT) Set of descriptors A single descriptor (16 histograms of 8 bins  128 dims) http://faculty.ucmerced.edu/mhyang/project/iccv13_exemplar/ICCV13_exemplarCut/vlfeat-0.9.14/doc/overview/sift.html Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2), 91-110.
  • 28. Bag of Visual Words (BoVW) https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
  • 29. Bag of Visual Words (BoVW) https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb extract a set of local features from each image
  • 30. Bag of Visual Words (BoVW) • a representative sample of features selected • features are clustered • cluster centroids (or medoids) are considered to be the visual codebook https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
  • 31. Bag of Visual Words (BoVW) https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
  • 32. Indexing and Querying • tf-idf weighting of visual words 𝑤𝑡𝑑 = 𝑛 𝑡𝑑 ∙ log 𝐷 𝑏 /𝑛 𝑡 • Inverted file indexing structure for fast search • Retrieve candidates with at least one common visual word • Rank candidates, e.g. based on cosine similarity of their tf-idf representations 𝑠𝑖𝑚 𝑞, 𝑝 = 𝒘 𝒒 ∙ 𝒘 𝒑 𝒘 𝒒 𝒘 𝒑
  • 33. BoVW Discussion • BoVW is a sparse representation: each image is associated with few visual words (compared to the whole vocabulary) • Convenient for indexing and look-up • Completely misses spatial layout  extensions • Performance depends on: • size of vocabulary • dataset where vocabulary was learned
  • 35. Popular CNN architectures VGGNet (2014) GoogleNet (2014)
  • 37. video search towards building a reverse video search engine
  • 38. From Image to Video Similarity • A video can be considered as a richer representation compared to images: • set of images (frames) • frames and motion • frames and motion and audio • For efficiency purposes, we typically simplify or discard part of the information: • frames  descriptors  average • frames  visual words  bag of frame-words
  • 40. Video indexing calls /index (HTTP GET request) Add the provided video to the video index • url: the URL of the video that is going to be indexed • async: flag for asynchronous processing /youtube (HTTP GET request) Query YouTube API with either a video ID or a provided text query and add the retrieved videos to the video index • video_id: video ID to query YouTube API • text: provided text to query YouTube API • max: maximum number of videos to be add to the video index /delete (HTTP DELETE request) Delete the provided video from the video index • url: the URL of the video that is going to be deleted
  • 41. Video search calls /search (HTTP GET request) Video-level search: retrieve relevant video by calculating the similarity between the entire videos • url: URL of the query video • t_sim: similarity threshold • t_rank: rank threshold /partial (HTTP GET request) Shot-level search: retrieve relevant video segments from the indexed videos in the database • url: URL of the query video • v_sim: video similarity threshold • s_sim: shot similarity threshold
  • 42. Combining CNNs and BoVW Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, January). Near-duplicate video retrieval by aggregating intermediate CNN layers. In International Conference on Multimedia Modeling (pp. 251-263). Springer
  • 43. An improved setup Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, January). Near-duplicate video retrieval by aggregating intermediate CNN layers. In International Conference on Multimedia Modeling (pp. 251-263). Springer
  • 44. Learning similarity Before training After training Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, October). Near-Duplicate Video Retrieval with Deep Metric Learning. In 2017 IEEE International Conference on Computer Vision Workshop (ICCVW), (pp. 347-356). IEEE
  • 45. Support for partial duplicate search
  • 46. FIVR-200K a dataset for evaluating NDVR Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, I. (2018). FIVR: Fine-grained Incident Video Retrieval. arXiv preprint arXiv:1809.04094
  • 47. FIVR-200K • A video dataset to help research on the problem of Fine-grained Incident Video Retrieval • Duplicate Scene Videos (DSVs) • Complementary Scene Videos (CSVs) • Incident Scene Videos (ISVs) • 225,960 videos around 4,687 news events from Jan 1st 2013 to Dec 31st 2017
  • 49. Dataset statistics Number of events Number of videos
  • 54. Boston Marathon bombing query near-duplicate complementary view same incident
  • 55. Las Vegas shootings query near-duplicate complementary view same incident
  • 56. Our Video Search Tool http://ndd.iti.gr/video_search/
  • 57. Ideas • Pick one video around one event between 2013 and 2017 and try to find similar versions of it • Pick one of the clusters-events in the Browse section and try to find some important videos that cover the event • Given an event of interest, identify in which sources it is covered (language, country, type of channel) • Add videos from a newer event and use them to perform new searches
  • 59. Papers • Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, January). Near-duplicate video retrieval by aggregating intermediate CNN layers. In International Conference on Multimedia Modeling (pp. 251-263). Springer • Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, October). Near-Duplicate Video Retrieval with Deep Metric Learning. In 2017 IEEE International Conference on Computer Vision Workshop (ICCVW), (pp. 347-356). IEEE • Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, I. (2018). FIVR: Fine-grained Incident Video Retrieval. arXiv preprint arXiv:1809.04094
  • 60. Acknowledgements • Giorgos Kordopatis-Zilos / near-duplicate video retrieval, back-end development, FIVR-200K collection and annotation • Lazaros Apostolidis / web front-end development • Polichronis Charitidis / FIVR-200K annotation
  • 61. Thank you for your attention! Akis Papadopoulos papadop@iti.gr @sympap