SlideShare a Scribd company logo
1 of 15
Download to read offline
The MIRROR project has received funding from the European Union’s Horizon 2020 research and innovation action program under grant agreement № 832921.
Hard-Negatives or Non-Negatives?
A Hard-Negative Selection Strategy for
Cross-Modal Retrieval Using the Improved
Marginal Ranking Loss
Damianos Galanopoulos, Vasileios Mezaris
2nd International Workshop on Video Retrieval Methods and Their Limits @
ICCV 2021 conference, 16 Oct. 2021
2
Introduction
● Cross-modal learning has gained a lot of interest
● The improved marginal ranking loss is extensively used
● State-of-the-art approaches rely on hard-negative samples during training
● We aim on extracting actual hard-negative samples
○ We focus on samples that are semantical closeby to the anchor and should
not be considered as negatives
● We examine different strategies for efficient combination of multiple trained
models
3
Problem statement
Sample A is the anchor
video-caption sample
Which one of B or C
should be considered as a
hard-negative sample?
Typical approaches will select B
(the nearest-to-anchor sample),
but this is a positive one!
C is clearly a negative sample
and should be used as the
hard-negative
4
Baseline
● We utilized the attention based dual encoding network of [1]
● The improved marginal ranking loss is used to train the network
[1] D. Galanopoulos, V. Mezaris, "Attention Mechanisms, Signal Encodings and Fusion Strategies for Improved Ad-hoc Video Search
with Dual Encoding Networks", Proc. ACM Int. Conf. on Multimedia Retrieval (ICMR 2020), Dublin, Ireland, October 2020.
5
Baseline
● The combination of multiple models boosts performance
● As in [1], combination of 24 different models by modifying parameters:
○ Attention mechanism in the textual or visual stream
○ Two textual encodings (BERT and W2V+BERT)
○ Two optimizers (Adam and RMSprop)
○ Three learning rates
6
Hard-negative mining
● We designed an offline-online strategy to exclude potentially-positive samples
● At the offline stage, we estimate a threshold p for the similarity of samples, so
that only samples with similarity < p will be treated as hard-negative candidates
○ Randomly split the training dataset into batches (as done for training)
○ In each batch, the cosine similarity , between all possible caption pairs
is calculated
○ Within the entire set of calculated similarities for all batches, we make the
assumption that x% (e.g. 1%) of them indicate very similar samples (thus, one
could not be treated as a hard-negative for the other)
○ The similarity threshold p for which x% of the similarities are higher than p is
identified
7
Hard-negative mining
● At the online stage (during training) we enforce the threshold value p
○ In every batch, for an anchor (vi,ci), every sample (vj,cj) (within the batch)
with > p is not considered as a negative at all
○ Every other sample is labeled as negative
○ Out of this subset of samples, the negative one with the highest is
selected as the hard-negative sample
8
Fusion strategies
● Following the proposed hard-negative mining strategy for different plausible
assumptions about x% (e.g. 1%, 2%), thus different p values, the number of
available models can be quickly increased
● We study the combination of multiple trained models (late fusion)
● Every trained model, and for a given query, results in a ranking list of the most
relevant videos
● Three different strategies for combining them are examined:
○ AVG
○ MAX
○ Hybrid
9
Fusion strategies
AVG
● We assume that every model as a well-performing one, and we treat them equally
● We average the rankings for a given video
MAX
● We assume that not all our models have very good recall
● But, we assume that at least the samples they place at the very top of their
ranking lists are true positives
● Thus, if a video appears very high in the ranking list generated by at least one
model, we trust this video to be a good answer to the query.
10
Fusion strategies
Hybrid
● Neither of the previous two assumptions seems perfectly plausible
● In our Hybrid strategy, for a retrieved video, we select the Q’ ranking lists where
the video is ranked the highest among the Q in total ranking lists
● These top-Q’ rankings are averaged, to calculate the final ranking for this video
● All retrieved videos are re-ordered according to their final ranking
So, if at least Q’ models bring a video high in their ranking lists, we trust this to be a
good answer to the query. Special cases:
● If Q’=Q, the Hybrid approach is the same as the AVG
● If Q’=1, the Hybrid approach is the same as the MAX
11
Experimental results
● Training datasets:
○ MSR-VTT, TGIF, ActivityNet Captions and Vatex
● Evaluation datasets:
○ V3C1 evaluated on TRECVID AVS 2019 and 2020 queries
● Evaluation metric:
○ Mean extended inferred average precision (MXinfAP)
● Keyframe representation:
○ ResNet-152 trained on Imagenet 11K
12
Experimental results
● Results in MxinfAP of the combination of multiple models and different setups.
● Comparison between the baseline hard-negative mining strategy and the
proposed one with x=1% and x=2%
● The last row shows the results when all models from every mining strategy are
combined
● Results on the AVS19 and AVS20 datasets for the Hybrid fusion strategy and
different values of Q’
13
Experimental results
14
Conclusion
● New strategy for hard-negative mining to improve the performance of a cross-
modal video retrieval network
● We focused on excluding positive samples from being wrongfully utilized as
hard-negatives
● We proposed a hybrid strategy for model combination to take advantage of
the high number of trained models
● The new hard-negative mining strategy gives small improvements
● In combination with the Hybrid fusion strategy, the performance is further
boosted
15
Contact details
Damianos Galanopoulos, Vasileios Mezaris
Information Technologies Institute-CERTH
dgalanop@iti.gr, bmezaris@iti.gr
www.iti.gr/~bmezaris

More Related Content

Similar to Hard-Negatives Selection Strategy for Cross-Modal Retrieval

Machine Learning Approach.pptx
Machine Learning Approach.pptxMachine Learning Approach.pptx
Machine Learning Approach.pptxCYPatrickKwee
 
Ad Click Prediction - Paper review
Ad Click Prediction - Paper reviewAd Click Prediction - Paper review
Ad Click Prediction - Paper reviewMazen Aly
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmSupun Abeysinghe
 
Towards Confidence-aware Calibrated Recommendation (Slides)
Towards Confidence-aware Calibrated Recommendation (Slides)Towards Confidence-aware Calibrated Recommendation (Slides)
Towards Confidence-aware Calibrated Recommendation (Slides)Hossein A. (Saeed) Rahmani
 
Statistical Arbitrage
Statistical ArbitrageStatistical Arbitrage
Statistical ArbitrageShubham Patil
 
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022VasileiosMezaris
 
Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading c...
Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading c...Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading c...
Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading c...Dat Nguyen
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiestaeseon ryu
 
Uber Data Analysis - SAS Project
Uber Data Analysis - SAS ProjectUber Data Analysis - SAS Project
Uber Data Analysis - SAS ProjectKushal417
 
IFTA2020 Kei Nakagawa
IFTA2020 Kei NakagawaIFTA2020 Kei Nakagawa
IFTA2020 Kei NakagawaKei Nakagawa
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
KaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep LearningKaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep LearningVan Huy
 
Recommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross EntropyRecommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross EntropyVito Walter Anelli
 
PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 Natalia Díaz Rodríguez
 
Talwalkar mlconf (1)
Talwalkar mlconf (1)Talwalkar mlconf (1)
Talwalkar mlconf (1)MLconf
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM RecommendersYONG ZHENG
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Lexically constrained decoding for sequence generation using grid beam search
Lexically constrained decoding for sequence generation using grid beam searchLexically constrained decoding for sequence generation using grid beam search
Lexically constrained decoding for sequence generation using grid beam searchSatoru Katsumata
 

Similar to Hard-Negatives Selection Strategy for Cross-Modal Retrieval (20)

Machine Learning Approach.pptx
Machine Learning Approach.pptxMachine Learning Approach.pptx
Machine Learning Approach.pptx
 
Ad Click Prediction - Paper review
Ad Click Prediction - Paper reviewAd Click Prediction - Paper review
Ad Click Prediction - Paper review
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn Algorithm
 
Towards Confidence-aware Calibrated Recommendation (Slides)
Towards Confidence-aware Calibrated Recommendation (Slides)Towards Confidence-aware Calibrated Recommendation (Slides)
Towards Confidence-aware Calibrated Recommendation (Slides)
 
Statistical Arbitrage
Statistical ArbitrageStatistical Arbitrage
Statistical Arbitrage
 
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
 
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
 
Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading c...
Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading c...Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading c...
Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading c...
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
 
Uber Data Analysis - SAS Project
Uber Data Analysis - SAS ProjectUber Data Analysis - SAS Project
Uber Data Analysis - SAS Project
 
IFTA2020 Kei Nakagawa
IFTA2020 Kei NakagawaIFTA2020 Kei Nakagawa
IFTA2020 Kei Nakagawa
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
KaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep LearningKaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep Learning
 
Recommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross EntropyRecommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross Entropy
 
PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018
 
Talwalkar mlconf (1)
Talwalkar mlconf (1)Talwalkar mlconf (1)
Talwalkar mlconf (1)
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
 
TransQuest
TransQuestTransQuest
TransQuest
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Lexically constrained decoding for sequence generation using grid beam search
Lexically constrained decoding for sequence generation using grid beam searchLexically constrained decoding for sequence generation using grid beam search
Lexically constrained decoding for sequence generation using grid beam search
 

More from VasileiosMezaris

Multi-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationMulti-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationVasileiosMezaris
 
CERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages TaskCERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages TaskVasileiosMezaris
 
Spatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosSpatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosVasileiosMezaris
 
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...VasileiosMezaris
 
TAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsTAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsVasileiosMezaris
 
Explaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionExplaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionVasileiosMezaris
 
Combining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video SearchCombining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video SearchVasileiosMezaris
 
Explaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersExplaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersVasileiosMezaris
 
Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...VasileiosMezaris
 
Are all combinations equal? Combining textual and visual features with multi...
Are all combinations equal?  Combining textual and visual features with multi...Are all combinations equal?  Combining textual and visual features with multi...
Are all combinations equal? Combining textual and visual features with multi...VasileiosMezaris
 
CA-SUM Video Summarization
CA-SUM Video SummarizationCA-SUM Video Summarization
CA-SUM Video SummarizationVasileiosMezaris
 
Misinformation on the internet: Video and AI
Misinformation on the internet: Video and AIMisinformation on the internet: Video and AI
Misinformation on the internet: Video and AIVasileiosMezaris
 
PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020VasileiosMezaris
 
GAN-based video summarization
GAN-based video summarizationGAN-based video summarization
GAN-based video summarizationVasileiosMezaris
 
Migration-related video retrieval
Migration-related video retrievalMigration-related video retrieval
Migration-related video retrievalVasileiosMezaris
 
Fractional step discriminant pruning
Fractional step discriminant pruningFractional step discriminant pruning
Fractional step discriminant pruningVasileiosMezaris
 
Icme2020 tutorial video_summarization_part1
Icme2020 tutorial video_summarization_part1Icme2020 tutorial video_summarization_part1
Icme2020 tutorial video_summarization_part1VasileiosMezaris
 
Video, AI and News: video analysis and verification technologies for supporti...
Video, AI and News: video analysis and verification technologies for supporti...Video, AI and News: video analysis and verification technologies for supporti...
Video, AI and News: video analysis and verification technologies for supporti...VasileiosMezaris
 
Subclass deep neural networks
Subclass deep neural networksSubclass deep neural networks
Subclass deep neural networksVasileiosMezaris
 

More from VasileiosMezaris (20)

Multi-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationMulti-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and Localization
 
CERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages TaskCERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages Task
 
Spatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosSpatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees Videos
 
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
 
TAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsTAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for Explanations
 
Gated-ViGAT
Gated-ViGATGated-ViGAT
Gated-ViGAT
 
Explaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionExplaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attention
 
Combining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video SearchCombining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video Search
 
Explaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersExplaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiers
 
Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...
 
Are all combinations equal? Combining textual and visual features with multi...
Are all combinations equal?  Combining textual and visual features with multi...Are all combinations equal?  Combining textual and visual features with multi...
Are all combinations equal? Combining textual and visual features with multi...
 
CA-SUM Video Summarization
CA-SUM Video SummarizationCA-SUM Video Summarization
CA-SUM Video Summarization
 
Misinformation on the internet: Video and AI
Misinformation on the internet: Video and AIMisinformation on the internet: Video and AI
Misinformation on the internet: Video and AI
 
PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020
 
GAN-based video summarization
GAN-based video summarizationGAN-based video summarization
GAN-based video summarization
 
Migration-related video retrieval
Migration-related video retrievalMigration-related video retrieval
Migration-related video retrieval
 
Fractional step discriminant pruning
Fractional step discriminant pruningFractional step discriminant pruning
Fractional step discriminant pruning
 
Icme2020 tutorial video_summarization_part1
Icme2020 tutorial video_summarization_part1Icme2020 tutorial video_summarization_part1
Icme2020 tutorial video_summarization_part1
 
Video, AI and News: video analysis and verification technologies for supporti...
Video, AI and News: video analysis and verification technologies for supporti...Video, AI and News: video analysis and verification technologies for supporti...
Video, AI and News: video analysis and verification technologies for supporti...
 
Subclass deep neural networks
Subclass deep neural networksSubclass deep neural networks
Subclass deep neural networks
 

Recently uploaded

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 

Hard-Negatives Selection Strategy for Cross-Modal Retrieval

  • 1. The MIRROR project has received funding from the European Union’s Horizon 2020 research and innovation action program under grant agreement № 832921. Hard-Negatives or Non-Negatives? A Hard-Negative Selection Strategy for Cross-Modal Retrieval Using the Improved Marginal Ranking Loss Damianos Galanopoulos, Vasileios Mezaris 2nd International Workshop on Video Retrieval Methods and Their Limits @ ICCV 2021 conference, 16 Oct. 2021
  • 2. 2 Introduction ● Cross-modal learning has gained a lot of interest ● The improved marginal ranking loss is extensively used ● State-of-the-art approaches rely on hard-negative samples during training ● We aim on extracting actual hard-negative samples ○ We focus on samples that are semantical closeby to the anchor and should not be considered as negatives ● We examine different strategies for efficient combination of multiple trained models
  • 3. 3 Problem statement Sample A is the anchor video-caption sample Which one of B or C should be considered as a hard-negative sample? Typical approaches will select B (the nearest-to-anchor sample), but this is a positive one! C is clearly a negative sample and should be used as the hard-negative
  • 4. 4 Baseline ● We utilized the attention based dual encoding network of [1] ● The improved marginal ranking loss is used to train the network [1] D. Galanopoulos, V. Mezaris, "Attention Mechanisms, Signal Encodings and Fusion Strategies for Improved Ad-hoc Video Search with Dual Encoding Networks", Proc. ACM Int. Conf. on Multimedia Retrieval (ICMR 2020), Dublin, Ireland, October 2020.
  • 5. 5 Baseline ● The combination of multiple models boosts performance ● As in [1], combination of 24 different models by modifying parameters: ○ Attention mechanism in the textual or visual stream ○ Two textual encodings (BERT and W2V+BERT) ○ Two optimizers (Adam and RMSprop) ○ Three learning rates
  • 6. 6 Hard-negative mining ● We designed an offline-online strategy to exclude potentially-positive samples ● At the offline stage, we estimate a threshold p for the similarity of samples, so that only samples with similarity < p will be treated as hard-negative candidates ○ Randomly split the training dataset into batches (as done for training) ○ In each batch, the cosine similarity , between all possible caption pairs is calculated ○ Within the entire set of calculated similarities for all batches, we make the assumption that x% (e.g. 1%) of them indicate very similar samples (thus, one could not be treated as a hard-negative for the other) ○ The similarity threshold p for which x% of the similarities are higher than p is identified
  • 7. 7 Hard-negative mining ● At the online stage (during training) we enforce the threshold value p ○ In every batch, for an anchor (vi,ci), every sample (vj,cj) (within the batch) with > p is not considered as a negative at all ○ Every other sample is labeled as negative ○ Out of this subset of samples, the negative one with the highest is selected as the hard-negative sample
  • 8. 8 Fusion strategies ● Following the proposed hard-negative mining strategy for different plausible assumptions about x% (e.g. 1%, 2%), thus different p values, the number of available models can be quickly increased ● We study the combination of multiple trained models (late fusion) ● Every trained model, and for a given query, results in a ranking list of the most relevant videos ● Three different strategies for combining them are examined: ○ AVG ○ MAX ○ Hybrid
  • 9. 9 Fusion strategies AVG ● We assume that every model as a well-performing one, and we treat them equally ● We average the rankings for a given video MAX ● We assume that not all our models have very good recall ● But, we assume that at least the samples they place at the very top of their ranking lists are true positives ● Thus, if a video appears very high in the ranking list generated by at least one model, we trust this video to be a good answer to the query.
  • 10. 10 Fusion strategies Hybrid ● Neither of the previous two assumptions seems perfectly plausible ● In our Hybrid strategy, for a retrieved video, we select the Q’ ranking lists where the video is ranked the highest among the Q in total ranking lists ● These top-Q’ rankings are averaged, to calculate the final ranking for this video ● All retrieved videos are re-ordered according to their final ranking So, if at least Q’ models bring a video high in their ranking lists, we trust this to be a good answer to the query. Special cases: ● If Q’=Q, the Hybrid approach is the same as the AVG ● If Q’=1, the Hybrid approach is the same as the MAX
  • 11. 11 Experimental results ● Training datasets: ○ MSR-VTT, TGIF, ActivityNet Captions and Vatex ● Evaluation datasets: ○ V3C1 evaluated on TRECVID AVS 2019 and 2020 queries ● Evaluation metric: ○ Mean extended inferred average precision (MXinfAP) ● Keyframe representation: ○ ResNet-152 trained on Imagenet 11K
  • 12. 12 Experimental results ● Results in MxinfAP of the combination of multiple models and different setups. ● Comparison between the baseline hard-negative mining strategy and the proposed one with x=1% and x=2% ● The last row shows the results when all models from every mining strategy are combined
  • 13. ● Results on the AVS19 and AVS20 datasets for the Hybrid fusion strategy and different values of Q’ 13 Experimental results
  • 14. 14 Conclusion ● New strategy for hard-negative mining to improve the performance of a cross- modal video retrieval network ● We focused on excluding positive samples from being wrongfully utilized as hard-negatives ● We proposed a hybrid strategy for model combination to take advantage of the high number of trained models ● The new hard-negative mining strategy gives small improvements ● In combination with the Hybrid fusion strategy, the performance is further boosted
  • 15. 15 Contact details Damianos Galanopoulos, Vasileios Mezaris Information Technologies Institute-CERTH dgalanop@iti.gr, bmezaris@iti.gr www.iti.gr/~bmezaris